Support of AOT compilation (refine #6992) #7581

zpcore · 2024-06-26T20:24:55Z

This is a follow up PR to refine #6992.

In this PR, I created the PjRtCompilationClient to serve the ahead of time compilation. In this way, we don't need to create the CompileOnlyPjRtClient, CompileOnlyPjRtDevice etc. This makes it easier during openxla pin update.

Instructions on how to run AOT compilation has been updated. We need to specify two extra flags when run on CPU device: XLA_PERSISTENT_CACHE_PATH as follows:
----------------------- ON CPU--------------------

PJRT_DEVICE=TPU   XLA_PERSISTENT_CACHE_PATH=./  python aot_encode.py

aot_encode.py:

import torch
import torch_xla
import torch_xla.core.xla_model as xm

torch_xla._XLAC._xla_set_virtual_topology("v4:2x2x1")  # <----- define virtual topology here. Must be specified before any device declaration.

a = torch.rand([2,3])
b = torch.rand([2,3])
device = xm.xla_device()
a = a.to(device)
b = b.to(device)
f = torch.hstack([a,b])

torch_xla._XLAC._xla_warm_up_cache([f],[])  # <----- call this to avoid real computation.

This will genereate a hashing file named like 229013763457648799216243727807636414712, which can be deserialized by running the same graph code on a TPU:
-----------------------ON TPU v4-8--------------------

PJRT_DEVICE=TPU XLA_PERSISTENT_CACHE_PATH=./  python aot_decode.py

aot_decode.py:

import torch
import torch_xla
import torch_xla.core.xla_model as xm

a = torch.rand([2,3])
b = torch.rand([2,3])
device = xm.xla_device()
a = a.to(device)
b = b.to(device)
f = torch.hstack([a,b])

print(f)

torch_xla/csrc/runtime/pjrt_compile_only.h

will-cromar · 2024-06-27T22:14:52Z

torch_xla/csrc/runtime/runtime.cc

@@ -12,6 +13,8 @@ namespace runtime {

 std::atomic<bool> g_computation_client_initialized(false);

+std::string aot_topology = "";


Since this state is only ever used by PjRtCompilationClient, would it make sense to make this a static field on PjRtCompilationClient?

I think it's probably even better to just imperatively initialize this client when set_virtual_topology() happens... but that also would introduce some extra edge cases for initialization. For the POC, a global var is okay, but I think a static field makes more sense to make it clear what uses that state.

Moving aot_topology to pjrt_compilation_client.h may cause circular dependency issue because setting the aot_topology need to check g_computation_client_initialized in the current setup.

Let's use the global var for now since this makes it clear in runtime.cc that we have three kinds of clients all together.

will-cromar · 2024-06-27T22:17:49Z

torch_xla/csrc/runtime/pjrt_compilation_client.cc

+
+// Builds a map from the device's global ordinal to its index in the `devices`
+// array.
+std::unordered_map<int, int> build_index_map(


I see some duplication of helpers here and in the PJRT/IFRT client implementations. It would be cleaner to factor it out rather than have 3 copies, but both AOT and IFRT are just in a concept stage...

I'll let @JackCaoG have the final word on readability so I don't have to decide.

torch_xla/csrc/runtime/pjrt_compilation_client.cc

zpcore mentioned this pull request Jun 26, 2024

Support ahead of time (AOT) cross-device compilation #6992

Closed

zpcore requested review from will-cromar, JackCaoG, jonb377 and vanbasten23 June 26, 2024 20:26

zpcore added the AOT ahead of time cross device compilation label Jun 26, 2024

zpcore marked this pull request as ready for review June 26, 2024 20:31

zpcore changed the title ~~refining the AOT compilation~~ Support of AOT compilation (refine https://github.com/pytorch/xla/pull/6992) Jun 26, 2024

zpcore changed the title ~~Support of AOT compilation (refine https://github.com/pytorch/xla/pull/6992)~~ Support of AOT compilation (refine #6992) Jun 26, 2024

zpcore force-pushed the piz/aot_refine branch from 5825df1 to 40e3f88 Compare June 26, 2024 21:41

will-cromar reviewed Jun 27, 2024

View reviewed changes

zpcore force-pushed the piz/aot_refine branch from 40e3f88 to 1d61699 Compare June 28, 2024 20:27

zpcore added 6 commits July 18, 2024 22:46

refining the AOT compilation

7c94087

get rid of PjRtBuffer

946d463

clear up dependency

598f492

nit update

f0d5541

nit update

eeded81

sync with xla

21823e3

zpcore force-pushed the piz/aot_refine branch from 78d9033 to 21823e3 Compare July 18, 2024 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of AOT compilation (refine #6992) #7581

Support of AOT compilation (refine #6992) #7581

zpcore commented Jun 26, 2024 •

edited

Loading

will-cromar Jun 27, 2024

zpcore Jun 28, 2024

will-cromar Jun 27, 2024

		@@ -12,6 +13,8 @@ namespace runtime {

		std::atomic<bool> g_computation_client_initialized(false);

		std::string aot_topology = "";

Support of AOT compilation (refine #6992) #7581

Are you sure you want to change the base?

Support of AOT compilation (refine #6992) #7581

Conversation

zpcore commented Jun 26, 2024 • edited Loading

will-cromar Jun 27, 2024

Choose a reason for hiding this comment

zpcore Jun 28, 2024

Choose a reason for hiding this comment

will-cromar Jun 27, 2024

Choose a reason for hiding this comment

zpcore commented Jun 26, 2024 •

edited

Loading