on attach, DR signal handler invoked before takeover can result in self-interp problem
I was hitting cases of self-interp when using -prof_pcs and a static DR with drmemtrace.
It's a big app, where debug build w/ logs doesn't really work.
Since DR is static, we can't easily tell what's a target inside DR vs the app. I put in diagnostics looking for targets near dispatch.
Finally reproduced on build w/o drmemtrace:
#4 0x00007f61af4645cc in build_basic_block_fragment (dcontext=dcontext@entry=0x7f6135fa6380, start=0x7f61af44cf78 <master_signal_handler> "H\211\341\351`+\v",
initial_flags=initial_flags@entry=0, link=link@entry=1 '\001', visible=visible@entry=1 '\001', for_trace=for_trace@entry=0 '\000', unmangled_ilist=0x0)
at core/arch/interp.c:5081
#5 0x00007f61af487a84 in dispatch (dcontext=0x7f6135fa6380) at core/dispatch.c:216
So the first DR target (or at least first target reasonably near dispatch) is master_signal_handler.
(gdb) p dcontext->next_tag
$1 = (app_pc) 0x7f61af44cf78 <master_signal_handler> "H\211\341\351`+\v"
(gdb) p dcontext->last_exit
$2 = (linkstub_t *) 0x7f61b4085014 <linkstub_starting>
(gdb) p dcontext->last_fragment
$3 = (fragment_t *) 0x7f61b4085060 <linkstub_empty_fragment>
Hmm, did we just take over this thread and we sent it there?
Can this happen w/o -prof_pcs? See above where the signal was the app's, not our alarm.
This one looks like SIGSEGV?:
(gdb) p /x *dcontext
rsp = 0x7f61a2244138
(gdb) dps 0x7f61a2244138 0x7f61a2244938
0x00007f61a2244138 0x00007f61af44cf09 dynamorio_sigreturn in section .text of myapp
0x00007f61a2244140 0x0000000000000001 No symbol matches (void *)$retaddr.
0x00007f61a2244148 0x0000000000000000 No symbol matches (void *)$retaddr.
0x00007f61a2244150 0x00007f61a2235000 No symbol matches (void *)$retaddr.
0x00007f61a2244158 0x0000000000000001 No symbol matches (void *)$retaddr.
(gdb) p /x *(sigframe_rt_t*)0x00007f61a2244138
uc_mcontext = {
r8 = 0x0,
r9 = 0xffffffffffffffff,
r10 = 0x0,
r11 = 0x246,
r12 = 0x1b,
r13 = 0x0,
r14 = 0x7f61a2244ac0,
r15 = 0x7f61b4456880,
rdi = 0x1b,
rsi = 0x7f61a2244bf0,
rbp = 0x7f61a22446d0,
rbx = 0x7f61ab07bda0,
rdx = 0x7f61a2244ac0,
rax = 0x7f61b58933c0,
rcx = 0x7f61a2244ab8,
rsp = 0x7f61a22446c8,
rip = 0x7f61af44d2e3,
eflags = 0x10246,
info = {
si_signo = 0xb,
si_errno = 0x0,
si_code = 0x1,
_sifields = {
_sigfault = {
si_addr = 0x60,
si_addr_lsb = 0x0
},
(gdb) x/4i 0x7f61af44d2e3
0x7f61af44d2e3 <safe_read_tls_magic>: mov %gs:0x60,%eax
0x7f61af44d2eb <safe_read_tls_magic_recover>: retq
(gdb) dps 0x7f61a22446c8 0x7f61a22446c8+200
0x00007f61a22446c8 0x00007f61af4f36c9 get_thread_private_dcontext + 89 in section .text of myapp
0x00007f61a22446d0 0x00007f61a2244ab0 No symbol matches (void *)$retaddr.
0x00007f61a22446d8 0x00007f61af4ffb16 master_signal_handler_C + 54 in section .text of myapp
(gdb) disas master_signal_handler_C
Dump of assembler code for function master_signal_handler_C:
0x00007f61af4ffae0 <+0>: push %rbp
0x00007f61af4ffae1 <+1>: mov %rsp,%rbp
0x00007f61af4ffae4 <+4>: push %r15
0x00007f61af4ffae6 <+6>: push %r14
0x00007f61af4ffae8 <+8>: push %r13
0x00007f61af4ffaea <+10>: push %r12
0x00007f61af4ffaec <+12>: mov %rdx,%r14
0x00007f61af4ffaef <+15>: push %rbx
0x00007f61af4ffaf0 <+16>: mov %edi,%r12d
0x00007f61af4ffaf3 <+19>: sub $0x3a8,%rsp
(gdb) dps 0x7f61a22446c8+0x3a8 0x7f61a22446c8+0x3a8+200
0x00007f61a2244a70 0x0000000000000000 No symbol matches (void *)$retaddr.
0x00007f61a2244a78 0x0000000000000000 No symbol matches (void *)$retaddr.
0x00007f61a2244a80 0x0000000000000000 No symbol matches (void *)$retaddr.
0x00007f61a2244a88 0x00007f61ab07bda0 No symbol matches (void *)$retaddr.
0x00007f61a2244a90 0x00007f61ab07bdb8 No symbol matches (void *)$retaddr.
0x00007f61a2244a98 0x0000000000000000 No symbol matches (void *)$retaddr.
0x00007f61a2244aa0 0x0000000000000000 No symbol matches (void *)$retaddr.
0x00007f61a2244aa8 0x00007f61b4456880 base::kExclusiveS in section .rodata of myapp
0x00007f61a2244ab0 0x00007f61a2489600 No symbol matches (void *)$retaddr.
0x00007f61a2244ab8 0x00007f61af44cf09 dynamorio_sigreturn in section .text of myapp
0x00007f61a2244ac0 0x0000000000000001 No symbol matches (void *)$retaddr.
(gdb) p /x *(sigframe_rt_t*)0x00007f61a2244ab8
$7 = {
pretcode = 0x7f61af44cf09,
uc = {
uc_flags = 0x1,
uc_link = 0x0,
uc_stack = {
ss_sp = 0x7f61a2235000,
ss_flags = 0x0,
ss_size = 0x10000
},
uc_mcontext = {
r8 = 0x0,
r9 = 0xffffffffffffffff,
r10 = 0x0,
r11 = 0x246,
r12 = 0x7f61ab07bdb8,
r13 = 0x0,
r14 = 0x0,
r15 = 0x7f61b4456880,
rdi = 0x7f61ab07bdb8,
rsi = 0x189,
rbp = 0x7f61a2489600,
rbx = 0x7f61ab07bda0,
rdx = 0x0,
rax = 0xfffffffffffffffc,
rcx = 0xffffffffffffffff,
rsp = 0x7f61a24895b0,
rip = 0x7f61b0e05a3f,
info = {
si_signo = 0x1b, <=== SIGPROF
si_errno = 0x0,
si_code = 0x80,
_sifields = {
_sigfault = {
si_addr = 0x0,
si_addr_lsb = 0x0
},
(gdb) x/4i 0x7f61b0e05a3f
0x7f61b0e05a3f <base::internal::PerThreadSem::Wait(prodkernel::api::base::KernelTimeout)+511>: mov %eax,%r8d
OK, so a SIGPROF came in after DR installed its own signal handler but before it took this thread over. master_signal_handler_C then crashes on safe_read_tls_magic processing the SIGPROF, as expected. The resulting SIGSEGV is delivered by the kernel, but the takeover SIGUSR2 shows up and is delivered before a single instruction in master_signal_handler is executed?
If the SIGSEGV safe-read fault is handled it should go back to the SIGPROF which should hit this code:
if (sig_is_alarm_signal(sig)) {
/* assuming an alarm during thread exit or init (xref PR 596127,
* i#359): suppressing is fine
*/
This seems unrelated to -prof_pcs: why is it only happening w/ -prof_pcs?
So how do we solve this? Have sig_take_over() check for DR addresses (could be in some callee invoked from master_signal_handler_C? though safe_read_tls_magic is pretty early) and do what: find the signal frame and walk back to the prior signal frame and see if it's an alarm signal, and if so, skip it and just take over at its interruption point (or deliver the alarm I guess)?