Adds a new option -per_thread_guard_pages controlling whether per-thread allocations have guard pages. This option is turned off by default for >4K pages, removing guard pages from thread-private initial cache units, all thread-private heap units, thread-private signal-pending special heap units, stacks, TLS segments (DR and privlib), and thread-private gencode. This saves a lot of space on many-threaded apps, while maintaining guards throughout the VMM from shared units, which are used far more often than private these days.
Implements the option with a new VMM flag VMM_PER_THREAD, added to the reservation calls for the above list of unit types.
Additionally, for >4K pages, tunes the default unit sizes to take into account the guard changes. Increases the shared unit sizes further for a better ratio of content to guard pages.
Tested on a large app on a 64K-page machine which originally hit OOM without "-no_guard_pages" before (along with other now-fixed bugs: i#4335; i#4418). Here are some stats for a smaller version of the app that is easier to capture statistics for:
-no_guard_pages: Peak threads under DynamoRIO control : 152 Threads ever created : 701 Peak vmm blocks for unreachable heap : 914 Peak vmm blocks for stack : 610 Peak vmm blocks for unreachable special heap : 152 Peak vmm blocks for unreachable special mmap : 304 Peak vmm blocks for reachable heap : 161 Peak vmm blocks for cache : 368 Peak vmm blocks for reachable special mmap : 5 Peak vmm virtual memory in use (bytes) : 164757504
Here's this PR with -per_thread_guard_pages (i.e., overriding the new default: so matching the HEAD defaults): Peak threads under DynamoRIO control : 157 Threads ever created : 706 Peak vmm blocks for unreachable heap : 1690 Peak vmm blocks for stack : 945 Peak vmm blocks for unreachable special heap : 471 Peak vmm blocks for unreachable special mmap : 942 Peak vmm blocks for reachable heap : 482 Peak vmm blocks for cache : 741 Peak vmm blocks for reachable special mmap : 7 Peak vmm virtual memory in use (bytes) : 345899008
Vs the new defaults in this PR: Peak threads under DynamoRIO control : 141 Threads ever created : 701 Peak vmm blocks for unreachable heap : 884 Peak vmm blocks for stack : 566 Peak vmm blocks for unreachable special heap : 141 Peak vmm blocks for unreachable special mmap : 282 Peak vmm blocks for reachable heap : 152 Peak vmm blocks for cache : 411 Peak vmm blocks for reachable special mmap : 7 Peak vmm virtual memory in use (bytes) : 160104448
Fixes #4424 (closed)