Adds support for dynamic AVX-512 context switching to/from the code cache. DynamoRIO now saves and restores the AVX-512 context once AVX-512 has been detected in the application. The latter is called lazy context switching in DynamoRIO. Mask registers are saved with respect to their width dependent on the AVX-512BW feature support of the processor.
Adds the internal function proc_set_num_simd_saved() used to change the number of saved SIMD registers that is also returned by proc_num_simd_saved().
Adds the internal function move_mm_avx512_reg_opcode() that is intentionally distinct from move_mm_reg_opcode(), because its uses are distinct as well.
Renames FEATURE_AVX512 to FEATURE_AVX512F and adds FEATURE_AVX512BW. In the same context, renames OPMASK_REG_SIZE to OPMASK_AVX512BW_REG_SIZE and adds OPMASK_AVX512F_REG_SIZE. Both changes reflect the different mask register width dependent on the AVX-512BW processor feature. DynamoRIO reserves enough space by default for the wider variant, but different kmov[wq] instructions are used to read and write mask registers.
For x86, saving SIMD registers during a code-cache context switch has been moved after saving the flags, because saving SIMD now adds control flow. The same applies to restoring the SIMD registers, respectively.
Moves the global variable d_r_avx512_code_in_use to reachable heap.
Adds an ATOMIC_1BYTE_WRITE() macro.
Adds a per-thread LOG entry when AVX-512 detection occurs.
Adds an AVX-512 test for above. The test also works in 32-bit mode.
Issue: #1312