drmemtrace filtered thread or max-refs tracing can jump over required register restores
drmemtrace's thread filtering and -max_global_trace_refs both insert conditional jumps to skip address gathering and storing on a per-instruction basis. The thread filtering from #2820 (closed) uses a barrier of all but the scratch register used as the buffer base in order to have this conditional jump have parity of register restores.
However, if the base scratch register switches, the barrier will restore the prior one, resulting in a restore that gets skipped and subsequent register corruption, leading to crashes and other problems downstream.
Here is an example:
TAG 0x0000aaaabaa26b80
<spill x4 up front>
+0 m4 @0x0000fffd6da41bf8 f900a784 str %x4 -> +0x0148(%x28)[8byte]
<skip if no buffer, b/c of -max_global_trace_refs:>
+4 m4 @0x0000fffd6da481e0 f940af84 ldr +0x0158(%x28)[8byte] -> %x4
+8 m4 @0x0000fffd6da46060 b4000004 cbz @0x0000fffd6da497b8[8byte] %x4
+12 m4 @0x0000fffd6da446b8 f900ab8e str %x14 -> +0x0150(%x28)[8byte]
+16 m4 @0x0000fffd6e191e08 d285700e movz $0x2b80 lsl $0x00 -> %x14
+20 m4 @0x0000fffd6da46a28 f2a0c9ee movk %x14 $0x064f lsl $0x10 -> %x14
+24 m4 @0x0000fffd6da3f4a8 f2e4114e movk %x14 $0x208a lsl $0x30 -> %x14
+28 m4 @0x0000fffd6da48860 f900008e str %x14 -> (%x4)[8byte]
+32 m4 @0x0000fffd6da40330 f900008e <label>
+32 m4 @0x0000fffd6da40af8 d101c3ee sub %sp $0x0000000000000070 lsl $0x0000000000000000 -> %x14
+36 m4 @0x0000fffd6da3e240 f900048e str %x14 -> +0x08(%x4)[8byte]
+40 m4 @0x0000fffd6da456b8 f900048e <label>
+40 m4 @0x0000fffd6da3d3f0 91004084 add %x4 $0x0010 lsl $0x0000000000000000 -> %x4
+44 m4 @0x0000fffd6da3e640 f900af84 str %x4 -> +0x0158(%x28)[8byte]
<barrier restores just the local x14:>
+48 m4 @0x0000fffd6da44170 f940ab8e ldr +0x0150(%x28)[8byte] -> %x14
<target of cbz above skips that restore as it should:>
+52 m4 @0x0000fffd6da497b8 f940ab8e <label>
+52 L3 @0x0000fffd6e1906e0 fc190fe8 str %d8 %sp $0xffffffffffffff90 -> -0x70(%sp)[8byte] %sp
...
<x4 used as buf base for every instruction in between here>
...
+556 L3 @0x0000fffd6e190760 f9008be3 str %x3 -> +0x0110(%sp)[8byte]
<skip if no buffer, b/c of -max_global_trace_refs:>
+560 m4 @0x0000fffd6da401f8 f940af84 ldr +0x0158(%x28)[8byte] -> %x4
+564 m4 @0x0000fffd6da43e98 b4000004 cbz @0x0000fffd6da49e58[8byte] %x4
+568 m4 @0x0000fffd6da3cea8 b4000004 <label>
+568 m4 @0x0000fffd6da44650 910363e3 add %sp $0x00000000000000d8 lsl $0x0000000000000000 -> %x3
+572 m4 @0x0000fffd6da432d0 f9000083 str %x3 -> (%x4)[8byte]
+576 m4 @0x0000fffd63a31a08 f9000083 <label>
+576 m4 @0x0000fffd6e1903c0 91002084 add %x4 $0x0008 lsl $0x0000000000000000 -> %x4
+580 m4 @0x0000fffd63a31a88 f900af84 str %x4 -> +0x0158(%x28)[8byte]
+584 m4 @0x0000fffd6da49e58 f900af84 <label>
+584 L3 @0x0000fffd6da445e8 f9006fe8 str %x8 -> +0xd8(%sp)[8byte]
+588 m4 @0x0000fffd6e192248 f900af84 <label>
+588 m4 @0x0000fffd6da3e400 f940af83 ldr +0x0158(%x28)[8byte] -> %x3
+592 m4 @0x0000fffd6da43a20 b4000003 cbz @0x0000fffd6da437b0[8byte] %x3
===> <seems to be the bug:>
<this skips the restore of x4 below!>
+596 m4 @0x0000fffd6da43bc0 b4000003 <label>
+596 m4 @0x0000fffd6da43748 b4000003 <label>
<barrier restores x4 this time b/c x3 was the base scratch (just due to liveness>!>
+596 m4 @0x0000fffd6da3c7f8 f940a784 ldr +0x0148(%x28)[8byte] -> %x4
+600 m4 @0x0000fffd6da437b0 f940a784 <label>
+600 L3 @0x0000fffd6da3c960 f9405fa8 ldr +0xb8(%x29)[8byte] -> %x8
+604 L3 @0x0000fffd6da46c60 d10183a1 sub %x29 $0x0060 lsl $0x00 -> %x1
+608 L3 @0x0000fffd63a31e48 d10243a2 sub %x29 $0x0090 lsl $0x00 -> %x2
+612 L3 @0x0000fffd6da43950 d10303a3 sub %x29 $0x00c0 lsl $0x00 -> %x3
This barrier and the internal control flow (technically violating drreg's requirements) are rather fragile here.