AArch64 stolen reg not restored in non-main-threads on detach
For #4461 (closed) I manually enabled the tool.drcacheoff.burst_threadfilter test (it was disabled for A64 b/c of the #2007 (closed) link failure in the Jenkins toolchain, but it does link on the packet.net tx1 toolchain) and with #4461 (closed) locally fixed I hit crashes:
pre-DR init
pre-DR start
pre-DR detach
pre-DR init
pre-DR start
pre-DR detach
pre-DR init
pre-DR start
pre-DR detach
pre-DR init
pre-DR start
pre-DR detach
Thread 3 "tool.drcacheoff" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffb73e61e0 (LWP 80911)]
0x0000ffffb7fb32e4 in __pthread_cond_wait (cond=0xfffdb18f1000, mutex=0xaaaaaafb7e30) at pthread_cond_wait.c:192
192 pthread_cond_wait.c: No such file or directory.
(gdb) b
Breakpoint 1 at 0xffffb7fb32e4: file pthread_cond_wait.c, line 192.
(gdb) bt
#0 0x0000ffffb7fb32e4 in __pthread_cond_wait (cond=0xfffdb18f1000, mutex=0xaaaaaafb7e30) at pthread_cond_wait.c:192
#1 0x0000aaaaaaada120 in wait_cond_var (var=0xaaaaaafb7e00) at /home/derek/dr/src/clients/drcachesim/tests/../../../suite/tests/condvar.h:134
#2 0x0000aaaaaaada6a8 in thread_func (arg=0x1) at /home/derek/dr/src/clients/drcachesim/tests/burst_threadfilter.cpp:198
#3 0x0000ffffb7fad0a0 in start_thread (arg=0xaaaaaaada36c <thread_func(void*)>) at pthread_create.c:335
#4 0x0000ffffb7cb2eac in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:77
(gdb) x/8i $pc
=> 0xffffb7fb32e4 <__pthread_cond_wait+340>: ldaxr w2, [x28]
0xffffb7fb32e8 <__pthread_cond_wait+344>: cmp w2, w0
0xffffb7fb32ec <__pthread_cond_wait+348>: b.ne 0xffffb7fb32f8 <__pthread_cond_wait+360>
(gdb) x/4gx $x28
0xfffdb18f1000: Cannot access memory at address 0xfffdb18f1000
-steal_reg 29 results in a crash here:
Thread 2 "tool.drcacheoff" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffb7be61e0 (LWP 82997)]
__pthread_cond_wait (cond=0xaaaaaafb9470, mutex=0xaaaaaafb94a0) at pthread_cond_wait.c:189
189 pthread_cond_wait.c: No such file or directory.
(gdb) x/8i $pc
=> 0xffffb7fb32d0 <__pthread_cond_wait+320>: ldr w0, [x29,#160]
-steal_reg r25: it now passes!
Seems different from stolen reg mangling bug #4460 (closed) which is during DR control, not post-detach.
This is after the #4457 (closed) fix: fcache_enter_gonative should be restoring the stolen reg for us, for the main thread. But these crashes are in non-main threads. But there translate_mcontext should do the job. Hmm: but it only restores the stolen reg for a thread in the fcache, not for a thread at a syscall! OK that's the bug.