Adds function get_opmask_caller_saved() that saves 16-bit AVX-512 OpMask registers to a buffer. It still lacks support for AVX512BW, which will save 64-bit registers.
Adds a test for get_opmask_caller_saved().
Adds the define MCXT_NUM_OPMASK_SLOTS that will also be used for a future structure in DynamoRIO's mcontext.
Adds a missing release note that dr_zmm_t was added and adds release notes for this patch.
Issue: #1312