Augment DR VMM to parcel out sub-page pieces, esp on large-page machines
DR has been tuned for x86 with 4K pages. For an ARM machine with a 64K page, memory usage suddenly jumps, especially per-thread usage where the signal queue and TLS are now each taking 64K (but only really need <4K) due to the VMM block granularity. This can result in a lot of wasted memory. This is a feature request to have DR's VMM parcel out sub-page granularities. The simplest scheme is to add another bitmap to track both committed and parceled memory but that does take up a bunch of memory and can likely be optimized (at least shrink the comitted bitmap to match the page granularity).