Adds new options -L0_filter, -L0I_size, and -L0D_size controlling a new inline "Level 0" cache filter for trace analysis tools that want to focus on higher-level cache behavior. The inlined caches are per-thread and direct-mapped, and only misses are written out over the pipe for online or to a file for offline traces.
On ARM, if a memref's predicate misses, we correctly skip the full filter.
Adds online and offline tests to the suite.
Leaves instrs and memrefs that cross cache lines to future work: for now the L0 filter only checks the first cache line.
Fixes #2439 (closed)