SPIN exclusive load dest matching store result
Xref #5245 (closed) which is another spin coming from -ldstex2cas.
For #1698 we have -ldstex2cas (see https://dynamorio.org/page_ldstex.html). It records what was loaded, and then before the exclusive store it re-loads and if the result does not match it loops.
Unfortunately it does not handle an exclusive load whose destination matches the exclusive store result when the sequence is mangled using the optimized same-block approach:
1:
ldaxp w2, w1, [x0]
add w4, w1, #0x1
add w5, w2, #0x1
stlxp w1, w4, w5, [x0]
cbnz w1, 1b
That just loops forever as it is mangled into:
bb ilist after mangling:
TAG 0x0000ffff86e355b8
+0 L4 @0x0000fffd46ede918 887f8402 ldaxp (%x0)[8byte] -> %w2 %w1
+4 m4 @0x0000fffd46edcac8 887f8402 <label>
+4 L3 @0x0000fffd46edfce8 11000424 add %w1 $0x0001 lsl $0x00 -> %w4
+8 L3 @0x0000fffd46edda78 11000445 add %w2 $0x0001 lsl $0x00 -> %w5
+12 m4 @0x0000fffd46edf4f0 f9000f83 str %x3 -> +0x18(%x28)[8byte]
+16 m4 @0x0000fffd46edd3d0 887f8c01 ldaxp (%x0)[8byte] -> %w1 %w3
+20 m4 @0x0000fffd46edf5a0 cb226021 sub %x1 %x2 uxtx $0x0000000000000000 -> %x1
+24 m4 @0x0000fffd46edf090 b5000001 cbnz @0x0000fffd46edb5b0[8byte] %x1
+28 m4 @0x0000fffd46edea50 cb216061 sub %x3 %x1 uxtx $0x0000000000000000 -> %x1
+32 m4 @0x0000fffd46ede8b0 b5000001 cbnz @0x0000fffd46edb5b0[8byte] %x1
+36 L3 @0x0000fffd46edf7a0 88219404 stlxp %w4 %w5 -> (%x0)[8byte] %w1
+40 m4 @0x0000fffd46eddb48 14000000 b @0x0000fffd46edd288[8byte]
+44 m4 @0x0000fffd46edb5b0 14000000 <label>
+44 m4 @0x0000fffd46ede6a8 d503305f clrex $0x0000000000000000
+48 m3 @0x0000fffd46edbad8 88219404 stlxp %w4 %w5 -> (%x0)[8byte] %w1
+52 m4 @0x0000fffd46edd288 d503305f <label>
+52 m4 @0x0000fffd46edf820 f9400f83 ldr +0x18(%x28)[8byte] -> %x3
+56 L3 @0x0000fffd46ededf0 35ffff81 cbnz $0x0000ffff86e355b8 %w1
+60 L4 @0x0000fffd46edec70 14000000 b $0x0000ffff86e355cc
END 0x0000ffff86e355b8
The synthetic ldaxp
picks %w1
to hold the live first value for comparing to the originally loaded first value: but that clobbers the second value which is in %w1
, and so the second value comparison fails and we retry forever.
The reason it picks %w1
is that it's dead so it thinks it's a fine scratch register.
The reason this doesn't happen for the non-optimized approach is that pick_scratch_reg
for the
load-value-from-TLS passes dead_reg_ok==false
.