races in ARM lockless data structure reads
In DR we have some data structures we read without holding a lock, relying on the hardware write visibility. The design and code was created with x86 in mind and we did not do a thorough enough re-evaluation for ARM and AArch64. For ARM's memory model we need to add barriers in multiple places to ensure that writes are visible in other threads in the order we require.
Suspect data structures include:
- The indirect branch lookup (IBL) table: we rely on writing the tag being seen by other threads before the write of the start pc.
- flushtime_global's store in increment_global_flushtime()
- likely more...the code needs an audit