ASSERT fcache.c:1706 unit->full || unit->cur_pc != start_pc + header->size

On AArch64, for #4424 (closed), I made the cache units all larger for a higher content-to-guard-page ratio. But (after addressing the page alignment assert from #4430: https://github.com/DynamoRIO/dynamorio/pull/4430#issuecomment-686058468) on a 64K-page machine, just running "ls" hits this assert at exit:

$ exports-a64/bin64/drrun -debug -loglevel 2 -- ls
<...>
<Stopping application /usr/bin/ls (260076)>
<Application /usr/bin/ls (260076).  Internal Error: DynamoRIO debug check failure: src/core/fcache.c:1706 unit->full || unit->cur_pc != start_pc + header->size

#1  0x0000fffff7c4eb30 in d_r_internal_error (file=0xfffff7efa1e0 "src/core/fcache.c", line=1707, 
    expr=0xfffff7efaf88 "unit->full || unit->cur_pc != start_pc + header->size") at src/core/utils.c:176
#2  0x0000fffff7bfc298 in fcache_free_list_consistency (dcontext=0xffffffffffffffff, cache=0xfffdb3c70050, bucket=8)
    at src/core/fcache.c:1707
#3  0x0000fffff7bfcbac in fcache_cache_stats (dcontext=0xffffffffffffffff, cache=0xfffdb3c70050) at src/core/fcache.c:1779
#4  0x0000fffff7bf7b6c in fcache_stats_exit () at src/core/fcache.c:994
#5  0x0000fffff7cd496c in dump_global_stats (raw=false) at src/core/utils.c:3171
#6  0x0000fffff7bbd618 in dynamo_thread_exit_common (dcontext=0xfffdb3b54560, id=13424, other_thread=false) at src/core/dynamo.c:2554
#7  0x0000fffff7bbd9b4 in dynamo_thread_exit () at src/core/dynamo.c:2709
#8  0x0000fffff7bbaf20 in dynamo_shared_exit (toexit=0x0) at src/core/dynamo.c:1107
#9  0x0000fffff7bbb3b4 in dynamo_process_exit_cleanup () at src/core/dynamo.c:1376
#10 0x0000fffff7bbb59c in dynamo_process_exit () at src/core/dynamo.c:1431
(gdb) info local
start_pc = 0xffffb3c7c508 ""
size = 65312
live = 3
charge = 165024
waste = 164508
header = 0xffffb3c7c508
prev_size = 65312
unit = 0xfffdb3c70448
(gdb) info arg
dcontext = 0xffffffffffffffff
cache = 0xfffdb3c70050
bucket = 8
(gdb) p *unit
$1 = {start_pc = 0xffffb3c50000 "", end_pc = 0xffffb3ca0000 "", cur_pc = 0xffffb3c8c428 "\240", reserved_end_pc = 0xffffb3cd0000 "", size = 327680, full = false, 
  cache = 0xfffdb3c70050, writable = true, guarded = true, pending_free = false, pending_flush = false, flushtime = 0, next_global = 0x0, prev_global = 0xfffdb3c85660, 
  next_local = 0x0}
(gdb) p/x start_pc
$2 = 0xffffb3c7c508
(gdb) p/x start_pc + header->size
$3 = 0xffffb3c8c428
(gdb) p *header
$4 = {next = 0xffffb3c50000, flags = 2304, size = 65312, prev = 0xffffb3c54068}
(gdb) p cache->units
$7 = (fcache_unit_t *) 0xfffdb3c70448

Entry into F2093(0x0000fffff7a0392c).0x0000ffffb3c7c4a0 (shared)
Exit from F2093(0x0000fffff7a0392c).0x0000ffffb3c7c4c8 (shared) 

Entry into F2094(0x0000fffff7a03938).0x0000ffffb3c7c514 (shared)
Exit from F2094(0x0000fffff7a03938).0x0000ffffb3c7c534 (shared)

Seems similar to #4433 (closed)! Disappears at -loglevel 4!

Could it be the larger unit sizes crossing this size limit? header->size is very close.

#define MAX_FREE_ENTRY_SIZE USHRT_MAX
<...>
add_to_free_list<...>
            /* only coalesce if not over size limit */

Maybe it wouldn't coalesce this w/ the freed entry after it, and it then rolled back cur_pc, ending up in this state?

Could we increase free_list_header_t.size to 32-bits, but leave fragment_t? A code examination will be required to verify that the header can be expanded.