For 32-bit, when the private loader calls functions in potentially third-party libraries, it needs to align the stack to 16 to account for the mismatch in DR's 4-byte alignment (i#847) and the new gcc ABI.
Tested on libdrmemtrace which with gcc 8.3.0 has SIMD code which crashes without alignment.
This is a short-term fix. More issues remain with clean calls and we should perhaps consider changing our base alignment.
Issue: #847 (closed), #3966 (closed)