The computation in signal_thread_init() did not guarantee the divisibility of the computed size, it only guaranteed that the size is large enough to told the extended fpstate including xstate. We have observed machines with 8 bytes larger extended state. While it is unclear which machine or kernel feature is causing this, this fix adds alignment to the special allocator's block size, effectively respecting the requested alignment that is already there.
Fixes #3749 (closed)