Reduces the compile-time minimum VMM block size as well as the default -vmm_block_size from 16KB to 4KB on UNIX to avoid wasted space from non-16KB-aligned allocation sizes. The savings are non-trivial for applications with many threads where we have multiple per-thread small allocations (such as the TLS mmaps) and will make further memory reductions via changing unit size parameters more fruitful by allowing a wider range of sizes without overhead. The downside is more memory and overhead on memory management but the tradeoff is worthwhile. This is much simpler than trying to share VMM block allocations among separate uses like we do on Windows for the stack and gencode.