Added support for AVX512 bfloat16 instructions
These are the three bfloat16 instructions.
VCVTNE2PS2BF16—Convert Two Packed Single Data to One Packed BF16 Data
EVEX.128.F2.0F38.W0 72 /r VCVTNE2PS2BF16 xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
EVEX.256.F2.0F38.W0 72 /r VCVTNE2PS2BF16 ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
EVEX.512.F2.0F38.W0 72 /r VCVTNE2PS2BF16 zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
Op/En Tuple Operand 1 Operand 2 Operand 3
A Full ModRM:reg (w) EVEX.vvvv (r) ModRM:r/m (r)
VCVTNEPS2BF16—Convert Packed Single Data to Packed BF16 Data
EVEX.128.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, xmm2/m128/m32bcst
EVEX.256.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, ymm2/m256/m32bcst
EVEX.512.F3.0F38.W0 72 /r VCVTNEPS2BF16 ymm1{k1}{z}, zmm2/m512/m32bcst
Op/En Tuple Operand 1 Operand 2
A Full ModRM:reg (w) ModRM:r/m (r)
VDPBF16PS—Dot Product of BF16 Pairs Accumulated into Packed Single Precision
EVEX.128.F3.0F38.W0 52 /r VDPBF16PS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
EVEX.256.F3.0F38.W0 52 /r VDPBF16PS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
EVEX.512.F3.0F38.W0 52 /r VDPBF16PS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
Op/En Tuple Operand 1 Operand 2 Operand 3
A Full ModRM:reg (w) EVEX.vvvv (r) ModRM:r/m (r)
List of places to update
From https://github.com/DynamoRIO/dynamorio/blob/master/core/ir/x86/opcode_api.h#L53
* When adding new instructions, be sure to update all of these places:
* 1) decode_table op_instr array
* 2) decode_table decoding table entries
* 3) OP_ enum (here) via x86opnums.pl
* 4) update OP_LAST at end of enum here
* 5) decode_fast tables if necessary (they are conservative)
* 6) instr_create macros
* 7) suite/tests/api/ir* tests
* 8) add binutils tests in third_party/binutils/test_decenc
Step 1: update op_instr
array
Added entries to op_instr
. These point directly to evex_Wb_extensions
since these instructions only have evex
encoding.
Step 2: add decode_table entries
- updated
third_byte_38
table to point toprefix_extensions
since these instructions have common opcodes and differ in prefix.- The instructions
VCVTNEPS2BF16
andVCVTNE2PS2BF16
have three byte opcodes starting with0f 38
so the decoder looks atthird_byte_38[third_byte_38_index[opcode]]
- Since these instructions have the same opcode (
72
) and differ only in the prefix (f2/f3
), we need to point thethird_byte_38
toprefix_extensions
which in turn points to the appropriateEVEX_Wb
entries. - The instruction
VDPBF16PS
has the same opcode (52) as the VNNI instructionvpdpwsd
and they differ only in the prefix (F3/66
). We need to update that entry to point toprefix_extensions
instead ofe_vex_extensions
. This causes thee_vex_extensions
entry (e_vex ext 151
) to be orphaned - do we remove this entry?
- The instructions
- added entries in
prefix_extensions
to point to appropriate vex/evex entries - added leaf entries in
evex_Wb_extensions
Updated opcodes for invalid entries in e_vex ext 151 and 152 for consistency.
Step 3: add OP_ enums
Done
Step 4: update OP_LAST
Not needed since OP_LAST already points to the last enum.
Step 5: decode_fast tables if necessary
Not done
Step 6: instr_create macros
Added 1dst_3src
macros for VCVTNE2PS2BF16
and VDPBF16PS
since they write to operand 1 and read from mask register, operand 2, and operand 3.
Added 1dst_2src
macro for VCVTNEPS2BF16
since it writes to operand 1 and reads from mask register and operand 2. We are setting the destination size explicitly since this writes to "half" the destination.
Step 7: suite/tests/api/ir tests
Added tests in ir_x86_3args_avx512_evex_mask.h and ir_x86_4args_avx512_evex_mask_C.h.
Currently commented out the VCVTNEPS2BF16
test because the destination size needs to be set explicitly.
Step 8: binutils tests
Added binutils tests that encode the assembly instructions using instr_create_..
APIs and match against the opcode bytes rather than the opposite because we don't produce disassembly that can match exactly against binutils disassembly.
These currently have two workarounds
- set dest size explicitly
- set zeroing prefix explicitly