Adds a new option -ldstex2cas which enables conversion of load-exclusive store-exclusive exclusive monitor sequences into compare-and-swap instead, which allows for insertion of instrumentation inside the sequence without disrupting the synchronization operation. The downside is that compare-and-swap is not semantically equivalent and will not detect an "ABA" change pattern.
For more background information and discussion see the design document: https://github.com/DynamoRIO/dynamorio/wiki/Exclusive-Monitors
The new option -ldstex2cas is on by default for AArch64, but off for ARM due to several missing pieces there:
- Handle ldrexd to use even,even+1 regs
- Handle predication
- Handle acquire loads being unsupported on some processors. Everything else is implemented for ARM.
Load-exclusive, store-exclusive, and clear-exclusive instructions are all mangled independently of each other, allowing arbitrary control flow in between. The address, size, and value are stored in new TLS slots for comparison in the compare-and-swap. An optimization avoids some of the checks for same-basic-block sequences.
For cases where the exclusive monitor instructions use the stolen register, a strategy different from mangling of other instructions that use the stolen register is used: the application instructions are changed to use a swap register. This greatly simplifies translation.
The TLS slots are increased to add r4 and r5 on ARM as multiple scratch registers are needed for this mangling.
Translation code is added to recognize all of the new mangling instructions.
Adds documentation on the new feature.
Adds two new RSTATS to help ascertain how much conversion is occurring in real applications.
The plan is to keep the existing code which avoids clean calls (but still permits short inlined instrumentation) in exclusive monitor regions in our tracing samples and tools, to keep multiple layers of solutions in place.
Adds a comprehensive test which tests racing atomic increments, stolen register usage, multi-block sequences, different memory sizes, unpaired instructions, faults in various locations with and without the stolen register, and clearing the monitor. The test includes both AArch64 and ARM assembly.
Issue: #1698, #3005 (closed)