drcachesim's ifetch results are inflated from rep string expansion
Working on #1729 (closed) I saw that offline had fewer instruction fetches than online:
# bin64/drrun -t drcachesim -- suite/tests/bin/simple_app
---- <application exited with code 0> ----
Core #0 (1 thread(s))
L1I stats:
Hits: 163,521
Misses: 793
Miss rate: 0.48%
L1D stats:
Hits: 55,281
Misses: 2,098
Miss rate: 3.66%
Core #1 (0 thread(s))
Core #2 (0 thread(s))
Core #3 (0 thread(s))
LL stats:
Hits: 334
Misses: 2,557
Local miss rate: 88.45%
Child hits: 218,802
Total miss rate: 1.15%
# bin64/drrun -t drcachesim -offline -- suite/tests/bin/simple_app
# clients/bin64/drcachesim -indir drmemtrace.simple_app.32731.0000.dir/
Core #0 (1 thread(s))
L1I stats:
Hits: 146,282
Misses: 793
Miss rate: 0.54%
L1D stats:
Hits: 55,282
Misses: 2,095
Miss rate: 3.65%
Core #1 (0 thread(s))
Core #2 (0 thread(s))
Core #3 (0 thread(s))
LL stats:
Hits: 329
Misses: 2,559
Local miss rate: 88.61%
Child hits: 201,564
Total miss rate: 1.25%
That's a significant difference.
Online, disable bundles:
L1I stats:
Hits: 160,887
Misses: 793
Tiny app is the same for both:
# bin64/drrun -t drcachesim -- /work/dr/test/allasm_x64
---- <application exited with code 0> ----
Core #0 (1 thread(s))
L1I stats:
Hits: 39
Misses: 2
Miss rate: 4.88%
# bin64/drrun -t drcachesim -offline -- /work/dr/test/allasm_x64
# bin64/drrun -t drcachesim -indir drmemtrace.allasm_x64.10981.0000.dir/
Core #0 (1 thread(s))
L1I stats:
Hits: 39
Misses: 2
Miss rate: 4.88%
Diffing simple_app, online has many more *4bb is the difference.
offline:
Instr 0x00007f82984464b0
Instr 0x00007f82984464b3
Instr 0x00007f82984464b7
Instr 0x00007f82984464ba
Instr 0x00007f82984464ba
Instr 0x00007f82984464ba
online:
Instr 0x00007fd6ea0fb4b0
Instr 0x00007fd6ea0fb4b3
Instr 0x00007fd6ea0fb4b7
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4ba
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4ba
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4ba
Online has 4 or 5 *4bb before every *4ba.
It's rep stos:
interp: start_pc = 0x00007fd6ea0fb4ba
0x00007fd6ea0fb4ba f3 aa rep stos %al %rdi %rcx -> %es:(%rdi)[1byte] %rdi %rcx
0x00007fd6ea0fb4bc 48 89 d0 mov %rdx -> %rax
drutil uses 2 different translations: everything's xl8 is +1 except the stos or movs itself:
* rep movs
* =>
* jecxz zero
* jmp iter
* zero:
* mov $0x00000001 -> %ecx
* jmp pre_loop
* iter:
* movs %ds:(%esi) %esi %edi -> %es:(%edi) %esi %edi
* pre_loop:
* loop
Online is counting all those expanded instrs. I explicitly have code to not count them for offline. I'll address this by trying to do something similar for online.