Converts the existing -trace_after_instrs delayed tracing feature to use the drbbdup multi-instrumentation library with two cases: counting instructions, and full tracing. It also uses the drwrap support for drbbdup via its "control inversion".
Removes the #4893 (closed) workaround where function tracing via drwrap could not be delayed and the tracer instead discarded the data. Now we have proper delaying.
The drbbdup case encoding is a global std::atomic, written to using language features which are lock-free and safe for client use.
This will lay the groundwork for the full i#3995 feature of repeatedly swapping between the two cases.
Tested the #4893 (closed) removal on a small app that calls "malloc":
$ bin64/drrun -t drcachesim -record_heap -offline -- ~/dr/test/mprot && bin64/drrun -t drcachesim -indir $(ls -1td drmem*.dir | head -1) -simulator_type basic_counts
drmemtrace exiting process 1161557; traced 62158 references.
Total counts:
135566 total (fetched) instructions
...
4 total function id markers
2 total function return address markers
2 total function argument markers
2 total function return value markers
$ bin64/drrun -t drcachesim -trace_after_instrs 10M -record_heap -offline -- ~/dr/test/mprot && bin64/drrun -t drcachesim -indir $(ls -1td drmem*.dir | head -1) -simulator_type basic_counts
drmemtrace exiting process 1161726; traced 1 references.
Basic counts tool results:
Total counts:
0 total (fetched) instructions
...
0 total function id markers
0 total function return address markers
0 total function argument markers
0 total function return value markers
Issue: #3995, #4893 (closed)