i#1369: Use synch flush callback to enable drcachesim tracing. (!4491) · Merge requests · DynamoRIO / dynamorio

Merged Abhinav Anil Sharma requested to merge i1369-drcachesim-synch-flush-callback into master Oct 20, 2020

DR translates a fault in the code cache to a fault at the corresponding application address. This is done using ilist reconstruction for the fragment where the fault occurred.

But, this does not work as expected when the DR client changes instrumentation during execution; currently, drcachesim does this to enable tracing after -trace_after_instrs. The reconstructed basic block gets the new instrumentation whereas the one in code cache has the old one. This causes issues during fault handling.

In the current drcachesim case, it appears as though a meta-instr has faulted because the reconstructed ilist has a meta-instr at the code cache fault pc. This issue may manifest differently if the basic block with the new instrumentation is smaller than the old one (unlike the drcachesim 'meta-instr faulted' case) and the faulting address lies beyond the end of the new instrumented basic block. We may see an ASSERT_NOT_REACHED due to the ilist walk ending before the faulting code cache pc was found in the reconstructed ilist.

In the existing code, drcachesim attempts to avoid this by flushing old fragments using dr_unlink_flush_region after it switches to the tracing instrumentation. However, due to the flush being asynch, there's a race and the flush does not complete in time.

This PR adds support for a callback in the synchronous dr_flush_region API. The callback is executed after the flush but before the threads are resumed.

Using the dr_flush_region callback to change drcachesim instrumentation ensures that old instrumentation is not applied after the flush and the new one is not applied before.

Issue: #1369