drcachesim: support trace files split by thread to analyze in parallel
This is a feature proposal to support trace files stored on disk in a separate file per thread. For very large traces this can help solve disk size limits. It also enables parallelizing trace analysis, for tools that want to perform per-thread computations (i.e., not the cache simulator, but most of the other tools like reuse_distance (#3227 (closed)) or basic_counts or opcode_mix).
One complexity is how to handle aggregation: these tools want to report whole-process results. Rather than running a separate instance of each tool per thread, we probably want each parallel-enabled tool to split its stats by thread and support handle_memref() being called concurrently, with aggregation in print_results().
For using a split-file trace with something like the cache simulator, we could extend the trace reader to support interleaving the threads into a single timestamp-ordered stream.