Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to regenerate table64 for a dynablock? #1711

Open
Coekjan opened this issue Jul 31, 2024 · 12 comments
Open

Is it possible to regenerate table64 for a dynablock? #1711

Coekjan opened this issue Jul 31, 2024 · 12 comments

Comments

@Coekjan
Copy link
Contributor

Coekjan commented Jul 31, 2024

Background

I am implementing a external cache-system (my research project, wip), so that the generated dynablock for program A could be reused in the other programs or the next time program A starts. This idea could help to by-pass the complex native_passes.

Currently the dynablock can be stored in the cache system correctly and box64 can lookup the external cache correctly. I found however the fetched external cache (dynablock) can not be used directly, as it contains many position-dependent information. For example (if I am wrong, please correct me):

  1. the first void * word of the block: easy to regenerate
  2. the next pointer in the block, and the jmpnext address: easy to regenerate
  3. the table64 in the block: hard to regenerate???
  4. ... (any other information? I am not sure ...)

A Block must have this layout:
0x0000..0x0007 : dynablock_t* : self
0x0008..8+4*n : actual Native instructions, (n is the total number)
A .. A+8*n : Table64: n 64bits values
B .. B+7 : dynablock_t* : self (as part of JmpNext, that simulate another block)
B+8 .. B+15 : 2 Native code for jmpnext (or jmp epilog in case of empty block)
B+16 .. B+23 : jmpnext (or jmp_epilog) address. jumpnext is used when the block needs testing
B+24 .. B+31 : empty (in case an architecture needs more than 2 opcodes)
B+32 .. B+32+sz : instsize (compressed array with each instruction length on x64 and native side)

SO, I think the cache can be effectively reused if we can have a (cheap) way to regenerate the table64.

But, how to?

I tried to introduced a native_pass4 to regenerate the table64. To define this new pass, I basically created such header file:

#define INIT
#define FINI
#define EMIT(A)

#define MESSAGE(A, ...)
#define NEW_INST
#define INST_EPILOG
#define INST_NAME(name)

#define TABLE64(A, V)   {int val64offset = Table64(dyn, (V), 4); MESSAGE(LOG_DUMP, "  Table64: 0x%lx\n", (V)); AUIPC(A, SPLIT20(val64offset)); LD(A, A, SPLIT12(val64offset));}
#define FTABLE64(A, V)  {mmx87_regs_t v = {.d = V}; int val64offset = Table64(dyn, v.q, 4); MESSAGE(LOG_DUMP, "  FTable64: %g\n", v.d); AUIPC(x1, SPLIT20(val64offset)); FLD(A, x1, SPLIT12(val64offset));}

and adapted some conditional-compilation directives in the codebase, e.g.:

- #if STEP == 3
+ #if STEP == 3 || STEP == 4
  #define X87_COMBINE(A, B) extcache_st_coherency(dyn, ninst, A, B)
  #else

Before calling native_pass4, I set the helper (just like before calling native_pass3 in the original code path):

helper.block = block->block;
helper.tablestart = (uintptr_t)tablestart;
helper.jmp_next = (uintptr_t)next + sizeof(void*);
helper.instsize = (instsize_t*)(block->instsize);
helper.table64cap = (next - tablestart) / sizeof(uint64_t);
helper.table64 = (uint64_t*)tablestart;
helper.native_size = 0;
helper.table64size = 0;
helper.insts_size = 0;
native_pass4(&helper, addr, alternate, is32bits);

However, this seemed not working well. The table64 seemed not generated correctly. I think it could be something wrong at the helper, because I actually did not do all actions on helper as the original code path does before native_pass3.

So, my question is, is it possible to regenerate the table64? 🤔

I would appreciate it if you can help me to understand how FillBlock64 generates table64 or give some hints to regenerate the table64 for an existing dynablock.

@Coekjan
Copy link
Contributor Author

Coekjan commented Jul 31, 2024

AHH, what can I say...

I moved my lookup-cache + regenerate-table code to the end of pass0 (before pass1) and it seems working perfectly. So maybe at least now I can by-pass original pass1-3.

@ptitSeb
Copy link
Owner

ptitSeb commented Jul 31, 2024

this seems like a typical "relocation" problem?

Well, If you fixed your issue, hat's good :)

@Coekjan
Copy link
Contributor Author

Coekjan commented Aug 1, 2024

I am not sure if I solved the problem indeed, as I do not fully understand what the requirements (I mean which fields in the helper or something else should be properly setup) are to run native_pass3.

Roughly placing the "lookup-cache + regenerate-table64" between pass 0 and pass 1 does work currently, but I only tested it with ls program. (More engineering efforts are needed to support working with general programs.)

@Coekjan
Copy link
Contributor Author

Coekjan commented Aug 1, 2024

Now I have to move the "lookup-cache + regenerate-table64" to the end of pass 1. As filling the table64 depends on some information provided by pass 1. This makes ls program run well, but some complex programs (e.g. bash, python) are still crashing in various styles. I don't have ideas about what is happening.

Hmm, does box64 generate position-dependent native code (e.g. address in the imm of instruction) in dynablock? If so, it would require more efforts to relocate the code.

@ptitSeb
Copy link
Owner

ptitSeb commented Aug 1, 2024

I'm not sure how you can regenerate tabme64 offset wt pass1, while arm64 (or rv64/la64) offset are known only in pass 3.

And yes, there will be some offset in native code that will relocation. There is also the effect of elf relocation that might need relocation in native code, some jumptable that use use offset from the jumptable that is allocated and so have some per-run address, etc...

@Coekjan
Copy link
Contributor Author

Coekjan commented Aug 1, 2024

I'm not sure how you can regenerate tabme64 offset wt pass1, while arm64 (or rv64/la64) offset are known only in pass 3.

I did not regenerate table64 in pass 1, but regenerate it in a new defined pass 4 (macros are copied from pass3, but write table64 only).

And yes, there will be some offset in native code that will relocation. There is also the effect of elf relocation that might need relocation in native code, some jumptable that use use offset from the jumptable that is allocated and so have some per-run address, etc...

Thanks for your reply. I would appreciate it if you could address my question:

  1. Where are the jumptables generated?
  2. Do we have a full list about what information in the dynablock has per-run value/address and where are they generated?

EDIT: I already learnt that table64 contains some per-run thing and should be relocated (or regenerated)

@ptitSeb
Copy link
Owner

ptitSeb commented Aug 1, 2024

Thanks for your reply. I would appreciate it if you could address my question:

  1. Where are the jumptables generated?

In custommem.c, look at JmpTable64 functions.

  1. Do we have a full list about what information in the dynablock has per-run value/address and where are they generated?

Nope, I didn't planned to do disk-save dynablock before a few version, so there is no infrastructure about that yet.

@Coekjan
Copy link
Contributor Author

Coekjan commented Aug 1, 2024

Nope, I didn't planned to do disk-save dynablock before a few version, so there is no infrastructure about that yet.

Do we have the full list in mind about the per-run value/address in dynablock?

@ptitSeb
Copy link
Owner

ptitSeb commented Aug 1, 2024

Nope, I didn't planned to do disk-save dynablock before a few version, so there is no infrastructure about that yet.

Do we have the full list in mind about the per-run value/address in dynablock?

Not really. Again, the issue is that many per-run values comes from the "relocation process", that can come from the elfloader for linux process, but also from wine for exe program...

The other per-run value comes from table64 (that can probably be disabled if needed) and the jumptable (so dynablock inter link basicaly)

@Coekjan
Copy link
Contributor Author

Coekjan commented Aug 3, 2024

These two days, I suffered from some memory-corruption issue. I am not sure if it comes from box64 or my side.

In my design, my external cache system provides a dynamic library (a .so file) for box64, and box64 will call these functions in the provided library. My functions are written in Rust and will dynamically allocate memory from its seperate heap.

When debugging, I saw that in the rust side, the address of object on heap was just behind box64 loaded memory

(( omitted ))
34800000-35d79000 r--p 00000000 103:02 17315407                          box64
35d79000-35e5f000 r--p 01578000 103:02 17315407                          box64
35e5f000-35e63000 rw-p 0165e000 103:02 17315407                          box64
35e63000-37d32000 rw-p 00000000 00:00 0                                  <------- rust on-heap objects here
100000000-100003000 r--p 00000000 103:02 39584943                        the-main-elf
(( omitted ))

And I observed that box64 preserved some memory regions for its own usage. So I am now not sure if the rust on-heap objects are placed in the correct space....

@Coekjan
Copy link
Contributor Author

Coekjan commented Aug 4, 2024

These two days, I suffered from some memory-corruption issue. I am not sure if it comes from box64 or my side.

UPDATE: I changed my rust-side malloc backend to mimalloc. And now the allocated objects have higher addresses:

34800000-35d79000 r--p 00000000 103:09 17315407                          box64
35d79000-35e5f000 r--p 01578000 103:09 17315407                          box64
35e5f000-35e63000 rw-p 0165e000 103:09 17315407                          box64
35e63000-37d11000 rw-p 00000000 00:00 0                                  <------- box_{malloc,realloc,free} here
100000000-100003000 r--p 00000000 103:09 39584943                        main-elf
100003000-100006000 rw-p 00000000 00:00 0 
57a94000000-57ad4000000 rw-p 00000000 00:00 0                            <------- rust mimalloc here

So I can now assert that rust heap & box64 heap have no overlaps.

However, the memory-corruption issue is still existing. As long as my rust dylib allocates some objects on its own heap, the box64 will have wrong behaviors, resulting in python3.12 failure. Does anyone have any idea about this issue?

@Coekjan
Copy link
Contributor Author

Coekjan commented Aug 4, 2024

However, the memory-corruption issue is still existing. As long as my rust dylib allocates some objects on its own heap, the box64 will have wrong behaviors, resulting in python3.12 failure. Does anyone have any idea about this issue?

UPDATE: I finally found that the issue comes from libc function realpath. If my rust code called fs::canonicalize (finally equals to realpath), the memory-corruption issue happened.

So it must be something wrong happened when calling realpath in my dynamic library...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants