Adds the function dr_mcontext_zmm_fields_valid() that returns true once AVX-512 has been observed in main interp loop. The detection tries to be performance sensitive. Post-client instructions are checked with a new function described below.
Adds the internal function instr_may_write_zmm_register() that conservatively returns whether an instruction writes a zmm register.
Adds proc_avx512_enabled() and support to read AVX-512 feature bits similar to proc_avx_enabled().
Renames get_implied_mm_vex_opcode_bytes() to get_implied_mm_e_vex_opcode_bytes().
Adds a test for above covering the full decode case.
In an unrelated change, pulls in performance critical code into DO_ONCE in the decoder.
Fixes a cmake bug introduced in 6ca8d4a9 that set 0FF instead of OFF.
Issue: #1312