Extends insert_push_all_registers() and insert_pop_all_registers() with AVX-512 context. As with code cache context switching, saving and restoring flags is moved before and after saving and restoring SIMD registers.
Adds the function proc_num_opmask_registers() and replaces some internal uses of MCXT_NUM_OPMASK_SLOTS with the new function.
Extends clean call optimization analysis with checking for mask registers.
Fixes two latent bugs in clean call optimization analysis. Firstly, it adds ymm register detection when checking for used SIMD registers. Secondly, it fixes a bug decrementing num_regs_skip.
Adds zmm register detection to clean call optimization analysis.
Adds the statistics counter cleancall_opmask_skipped.
Switches the default AVX-512 vmov opcodes from vmovdq[au]64 to vmov[au]ps. This is not functionally needed, but makes it more clear that there are no element alignment issues to deal with.
Adds support to return AVX-512 mask register values to reg_get_value_ex().
Fixes an incorrect size of OPMASK_AVX512F_REG_SIZE.
Adds looking for opmask registers to function instr_may_write_to_zmm_register() and renames into instr_may_write_to_zmm_or_opmask_register().
Adds multiple tests for above to client.avx512ctx. Refactors the test. Adds an AVX-512 derivative of the client.cleancall-opt-1 test.
Issue: #1312