drcachesim trace cpu markers in wrong order vs delayed branches
I have a "cpustats" tool which collects the load on each CPU from a drcachesim trace. But I was looking at a result:
CPU #93 in socket #1:
45602 (fetched) instructions
0 non-fetched instructions
9 unique threads: 244797 244788 244760 244720 244852 244848 245781 244805 242932
46 thread segments
In actually examining the raw trace there are only 5 threads on that cpu:
Thread 244852 on CPU #93 in socket #1 @13182363546989616
Thread 244852 on CPU #93 in socket #1 @13182363546992165
Thread 245781 on CPU #93 in socket #1 @13182363546994055
Thread 244760 on CPU #93 in socket #1 @13182363547443368
Thread 244760 on CPU #93 in socket #1 @13182363547443412
Thread 242932 on CPU #93 in socket #1 @13182363547443476
Thread 244788 on CPU #93 in socket #1 @13182363547443630
Thread 244788 on CPU #93 in socket #1 @13182363547443664
Thread 244788 on CPU #93 in socket #1 @13182363547443685
Thread 244788 on CPU #93 in socket #1 @13182363547443800
Thread 244788 on CPU #93 in socket #1 @13182363547444007
Thread 244788 on CPU #93 in socket #1 @13182363547444082
Thread 244788 on CPU #93 in socket #1 @13182363547444103
Thread 244788 on CPU #93 in socket #1 @13182363547444124
Thread 244788 on CPU #93 in socket #1 @13182363547444134
Thread 244788 on CPU #93 in socket #1 @13182363547444145
Thread 244788 on CPU #93 in socket #1 @13182363547444159
Thread 244788 on CPU #93 in socket #1 @13182363547444177
Thread 244788 on CPU #93 in socket #1 @13182363547444238
Thread 244788 on CPU #93 in socket #1 @13182363547444345
Thread 244788 on CPU #93 in socket #1 @13182363547444383
Thread 244788 on CPU #93 in socket #1 @13182363547444405
Thread 244788 on CPU #93 in socket #1 @13182363547444419
Thread 244788 on CPU #93 in socket #1 @13182363547444474
Where are threads: 244797, 244720, 244848, 244805? Re-generating the per-thread .raw file we see there's no cpu #93 (closed) for 244852:
(gdb) p /x 93 | (1<<12)
$1 = 0x105d
$ od -t x8 -A x 244720.raw | grep c20300000000105d
$ od -t x8 -A x 244852.raw | grep c20300000000105d
0005e0 802ed54a3084b430 c20300000000105d
000620 802ed54a3084be25 c20300000000105d
Adding diagnostics to the cpustats tool we see the problem:
$ grep -4 '1\.93' NEWCPU2
type=10 tid=244805 instr
type=0 tid=244805 read
type=28 tid=244852
type=28 tid=244852
cpu 1.93 tid 244852
type=10 tid=244852
type=10 tid=244852
type=13 tid=244852
type=11 tid=244852
--
type=10 tid=244852
type=10 tid=244852
type=28 tid=244797 timestamp
type=13 tid=244797
MISSED cpu 1.93 tid 244797 type=13
type=28 tid=244797
cpu 1.39 tid 244797
type=10 tid=244797
The delayed branch is the culprit: it's inserted after the timestamp. We've added this cpu marker and the trace stream model implies that only entries after the marker are on that cpu, so it seems reasonable for cpustats to interpret this as the next thread running one instruction on this cpu.
We should put all markers prior to the delayed branch.