CRASH: tool pc-rel instru mangled by DR can be mistaken as clean call instru and mis-translated
We are seeing crashes when using drbbdup on x86 on detach. The drbbdup state restore function seems perfectly correct: it is DR that is messing things up. We finally tracked it down to DR thinking that drbbdup's case load instructions are in fact a clean call setup. DR then tries to restore the context from the dstack and gets incorrect values.
This is with drmemtrace where the drbbdup case operand is a pc-relative load. DR has a feature where it will mangle tool-inserted pc-relative references into two-part sequences if they won't reach in their code cache destination. Thus we end up with something like this (this is with no tool instrumentation aside from drbbdup, so we just have 2 copies of this simple cmp;jnz app block) where diagnostics are marking 3 instrs considered "our_mangling":
+0 m4 @0x00007f6b07b75f70 65 48 89 3c 25 20 01 mov %rdi -> %gs:0x00000120[8byte]
00 00
+9 m4MANGLING @0x00007f6b0b183178 48 bf 88 34 35 6d 17 mov $0x000056176d353488 -> %rdi
56 00 00
+19 m4MANGLING @0x00007f6b0b1832c8 48 8b 3f mov (%rdi)[8byte] -> %rdi
+22 m4 @0x00007f6b07b75d90 <label note=0x000000000000004e>
+22 m4 @0x00007f6b0b183260 e9 49 00 00 00 jmp @0x00007f6b0b1830a8[8byte]
+27 m4 @0x00007f6b0b183040 65 48 8b 3c 25 20 01 mov %gs:0x00000120[8byte] -> %rdi
00 00
+36 L3 @0x00007f6b07b75ec8 80 7d bc 00 cmp 0xffffffbc(%rbp)[1byte] $0x00
+40 m4 @0x00007f6b0b183348 <label note=0x0000000000000001>
+40 m4 @0x00007f6b07b75c40 e9 4a 00 00 00 jmp @0x00007f6b07b75cc0[8byte]
+45 m4 @0x00007f6b07b75d28 <label note=0x0000000000000002>
+45 m4 @0x00007f6b0b1830a8 <label note=0x000000000000004e>
+45 m4 @0x00007f6b0b1831e0 65 48 8b 3c 25 20 01 mov %gs:0x00000120[8byte] -> %rdi
00 00
+54 L3 @0x00007f6b0b1833c8 80 7d bc 00 cmp 0xffffffbc(%rbp)[1byte] $0x00
+58 m4 @0x00007f6b07b75e60 <label note=0x0000000000000001>
+58 m4 @0x00007f6b07b75cc0 <label note=0x000000000000004f>
+58 m4 @0x00007f6b07b75df8 <label note=0x0000000000000002>
+58 L4 @0x00007f6b07b75b98 0f 85 62 53 e2 09 jnz $0x000056176cab2268
+64 L4MANGLING @0x00007f6b0b183430 e9 f2 52 e2 09 jmp $0x000056176cab21f7
The first 2 of those are the 2-part case load.
The current translation code has no precise way to identify clean calls and instead it identifies everything else and says that anything in "our_mangling" with no translation must then be a clean call. But DR mangling tool code breaks this assumption.
We do not want to not mark DR-mangled tool code as not our_mangling as there can be spills and restores in there and we want DR to properly restore those.
We need to either precisely identify mangle_rel_addr() sequences, or precisely identify clean call sequences. Both are complex. Since the pc-rel doesn't need any action beyond regular spill tracking, it seems better to identify clean calls. There are plenty of bits in translation_info_t, but not in instr_t: so I'm thinking of using labels for the instrlist case, and flags for the stored-info case.