SPIN exclusive load to zero register
For #1698 we have -ldstex2cas (see https://dynamorio.org/page_ldstex.html). It records what was loaded, and then before the exclusive store it re-loads and if the result does not match it loops.
Unfortunately it does not handle a discarded result to the zero register:
1:
ldaxp xzr, x1, [x0]
add x2, x1, #0x1
stlxp w3, xzr, x2, [x0]
cbnz w3, 1b
That just loops forever as it is mangled into:
+0 L4 @0x0000fffd6f51f6a0 c87f841f ldaxp (%x0)[16byte] -> %xzr %x1
+4 m4 @0x0000fffd6f51f310 f9000b82 str %x2 -> +0x10(%x28)[8byte]
+8 m4 @0x0000fffd6f51f3d8 a90aff80 stp %x0 %xzr -> +0xa8(%x28)[16byte]
+12 m4 @0x0000fffd6f51f470 d2800102 movz $0x0000000000000008 lsl $0x0000000000000000 -> %x2
+16 m4 @0x0000fffd6f51f5a0 a90b8b81 stp %x1 %x2 -> +0xb8(%x28)[16byte]
+20 m4 @0x0000fffd6f51f620 f9400b82 ldr +0x10(%x28)[8byte] -> %x2
+220 L3 @0x0000fffd6f51fc80 91000422 add %x1 $0x0001 lsl $0x00 -> %x2
+224 m4 @0x0000fffd6f51f720 f9000781 str %x1 -> +0x08(%x28)[8byte]
+228 m4 @0x0000fffd6f51f7a0 f9001384 str %x4 -> +0x20(%x28)[8byte]
+232 m4 @0x0000fffd6f51f820 f9001785 str %x5 -> +0x28(%x28)[8byte]
+236 m4 @0x0000fffd6f51f8e8 f9405781 ldr +0xa8(%x28)[8byte] -> %x1
+240 m4 @0x0000fffd6f51f968 cb216003 sub %x0 %x1 uxtx $0x0000000000000000 -> %x3
+244 m4 @0x0000fffd6f51f9e8 b5000003 cbnz @0x0000fffd6f51ded8[8byte] %x3
+248 m4 @0x0000fffd6f51fa68 f9406381 ldr +0xc0(%x28)[8byte] -> %x1
+252 m4 @0x0000fffd6f51fb30 d1002023 sub %x1 $0x0000000000000008 lsl $0x0000000000000000 -> %x3
+256 m4 @0x0000fffd6f51e0b8 b5000003 cbnz @0x0000fffd6f51ded8[8byte] %x3
+260 m4 @0x0000fffd6f51b7e8 f9405b81 ldr +0xb0(%x28)[8byte] -> %x1 <===== loads the stored xzr
+264 m4 @0x0000fffd6f51ae40 f9405f84 ldr +0xb8(%x28)[8byte] -> %x4
+268 m4 @0x0000fffd6f51b630 c87f9403 ldaxp (%x0)[16byte] -> %x3 %x5 <===== loads an actual maybe non-zero value
+272 m4 @0x0000fffd6f51add8 cb216063 sub %x3 %x1 uxtx $0x0000000000000000 -> %x3 <==== likely to not be the same
+276 m4 @0x0000fffd6f51e188 b5000003 cbnz @0x0000fffd6f51ded8[8byte] %x3
+280 m4 @0x0000fffd6f51e120 cb2460a3 sub %x5 %x4 uxtx $0x0000000000000000 -> %x3
+284 m4 @0x0000fffd6f51e3d8 b5000003 cbnz @0x0000fffd6f51ded8[8byte] %x3
+288 L3 @0x0000fffd6f51dc30 c823881f stlxp %xzr %x2 -> (%x0)[16byte] %w3
+292 m4 @0x0000fffd6f51b930 14000000 b @0x0000fffd6f51bfd0[8byte]
+296 m4 @0x0000fffd6f51ded8 14000000 <label>
+296 m4 @0x0000fffd6f51ac30 d503305f clrex $0x0000000000000000
+300 m3 @0x0000fffd6f51c4a8 c823881f stlxp %xzr %x2 -> (%x0)[16byte] %w3
+304 m4 @0x0000fffd6f51bfd0 d503305f <label>
+304 m4 @0x0000fffd6f51ce30 f9400781 ldr +0x08(%x28)[8byte] -> %x1
+308 m4 @0x0000fffd6f51d040 f9401384 ldr +0x20(%x28)[8byte] -> %x4
+312 m4 @0x0000fffd6f51dae0 f9401785 ldr +0x28(%x28)[8byte] -> %x5
+316 L3 @0x0000fffd6f51d208 35ffffa3 cbnz $0x0000ffffaf4756d8 %w3
+320 L4 @0x0000fffd6f51f290 14000000 b $0x0000ffffaf4756e8
We need to add special-casing of xzr
as an exclusive load target register.