relocate_dynamorio relies on globals that fail pre-relocation esp on AArch64
I observed this with my new test in PR #5462 where I added a blocking call to select() to infloop. This caused attach on AArch64 to fail because DR self-relocation crashed. It took a long time to track this down: it didn't reproduce running the test commands to launch infloop and attach to it manually; it only reproduced with infloop launched under run_in_bg with output redirection when under ctest. I don't know why that combination causes the issue: you would think that would affect i/o only and not where the app is loaded and whether DR needs relocation.
The crash showed up as a PTRACE_CONT failure:
bad status 0x8b
That's WCOREDUMP.
Trying op_exec_gdb from ctest shell (I made a test that runs bash :)) and running the two attach commands:
derek@dynamorio:~/dr/build$ ctest -V -R shtest
<...>
296: Test command: /usr/bin/bash
296: Test timeout computed to be: 1500
pwd
296: /home/derek/dr/build/suite/tests
/home/derek/dr/build/bin64/run_in_bg -out /home/derek/dr/build/suite/tests/client.attach_blocking-out /home/derek/dr/build/bin64/drrun -no_inject -- /home/derek/dr/build/suite/tests/bin/linux.infloop -v -attach -block
296: 2878210
/home/derek/dr/build/bin64/drrun -v -attach $(pgrep infloop)
296: INFO: app cmdline: "2876797" "2878210"
<...>
296: Program received signal SIGSEGV, Segmentation fault.
296: 0x00000000713b61f8 in privload_mem_is_elf_so_header (mem=0x10114400 <error: Cannot access memory at address 0x10114400>)
296: at /home/derek/dr/src/core/unix/loader.c:1738
296: 1738 if (elf_hdr->e_ident[EI_MAG0] != ELFMAG0 || elf_hdr->e_ident[EI_MAG1] != ELFMAG1 ||
296: (gdb) bt
296: #0 0x00000000713b61f8 in privload_mem_is_elf_so_header (mem=0x10114400 <error: Cannot access memory at address 0x10114400>)
296: at /home/derek/dr/src/core/unix/loader.c:1738
296: #1 0x00000000713b6514 in relocate_dynamorio (dr_map=0x10114400 <error: Cannot access memory at address 0x10114400>, dr_size=0,
296: sp=0xffffef6a33b0 "\377\377\377\377\377\377\377\377") at /home/derek/dr/src/core/unix/loader.c:1827
296: #2 0x000000007137a19c in _start () at /home/derek/dr/src/core/arch/aarchxx/aarchxx.asm:58
Looks like we need relocs to run relocate_dynamorio!
296: (gdb) shell grep libdynamo /proc/2883519/maps
296: 71000000-7148b000 r-xp 00000000 103:03 13245473 /home/derek/dr/build/lib64/debug/libdynamorio.so
296: 7149a000-714bf000 rw-p 0048a000 103:03 13245473 /home/derek/dr/build/lib64/debug/libdynamorio.so
296: (gdb) x/3i relocate_dynamorio
296: 0x713b6468 <relocate_dynamorio>: stp x29, x30, [sp, #-480]!
296: 0x713b646c <relocate_dynamorio+4>: mov x29, sp
296: 0x713b6470 <relocate_dynamorio+8>: str x0, [sp, #40]
296: (gdb) c
296: Continuing.
296:
296: Program received signal SIGSTOP, Stopped (signal).
296: _start () at /home/derek/dr/src/core/arch/aarchxx/aarchxx.asm:50
296: 50 * (xref i#1865).
296: (gdb) c
296: Continuing.
296:
296: Program received signal SIGSTOP, Stopped (signal).
296: _start () at /home/derek/dr/src/core/arch/aarchxx/aarchxx.asm:50
296: 50 * (xref i#1865).
296: (gdb) b relocate_dynamorio
296: Breakpoint 1 at 0x713b647c: file /home/derek/dr/src/core/unix/loader.c, line 1817.
296: (gdb) c
296: Continuing.
296:
296: Breakpoint 1, relocate_dynamorio (dr_map=0x0, dr_size=0, sp=0xffffc2b0b8e0 "\377\377\377\377\377\377\377\377")
296: at /home/derek/dr/src/core/unix/loader.c:1817
296: 1817 ptr_uint_t argc = *(ptr_uint_t *)sp;
296: (gdb) next
296: 1819 const char **env = (const char **)sp + argc + 2;
296: (gdb)
296: 1820 os_privmod_data_t opd = { { 0 } };
296: (gdb)
296: 1822 os_page_size_init(env, true);
296: (gdb)
296: 1824 if (dr_map == NULL) {
296: (gdb)
296: 1826 dr_map = (app_pc)ALIGN_BACKWARD((ptr_uint_t)relocate_dynamorio, PAGE_SIZE);
296: (gdb)
296: 1827 while (dr_map != NULL && !privload_mem_is_elf_so_header(dr_map)) {
296: (gdb) p /x dr_map
296: $1 = 0x310b4060
296: 0x00000000713b64d8 <+112>: adrp x0, 0x713b6000 <map_exe_file_and_brk+72>
296: 0x00000000713b64dc <+116>: add x0, x0, #0x468
296: 0x00000000713b64e0 <+120>: and x0, x1, x0
296: 0x00000000713b64e4 <+124>: str x0, [sp, #40]
296: => 0x00000000713b64e8 <+128>: b 0x713b6500 <relocate_dynamorio+152>
296: (gdb) x/1wx 0x00000000713b64d8
296: 0x713b64d8 <relocate_dynamorio+112>: 0x90000000
$ disasm_a64 90000000
llvm-mc: 90000000 adrp x0, #0
capstone: 90000000 adrp x0, #0
PAGE_SIZE is similar: its load fails and we get garbage.
To get the PC: I solved locally via asm calling to the next PC and getting the link register. For the page size: I just used 4K as a minimum size (longer loop but safer).