AArch64 decoder bug on 128-bit SIMD variant of STUR/LDUR instructions
Created by: mwvantol
While working with drcachesim we discovered that 128-bit SIMD variants of the STUR/LDUR instructions report an incorrect memory access size of 1 byte instead of the expected 16 bytes. These are quadword load/stores to the 128-bit FP/SIMD registers.
I noticed that the STUR/LDUR instructions are flagged as 'mem9' in the decoder template in core/arch/aarch64/codec.txt, where 'mem9' is defined as:
??---------xxxxxxxxx--xxxxx----- mem9 # gets size from 31:30
However, according to the Arm ARM bits [23:22] form an additional opc field that is also used to determine the size (in addition to the size field in [31:30]), though effectively only bit 23 matters here;
8-bit variant
Applies when size == 00 && opc == 00
STUR <Bt>, [<Xn|SP>{, #<simm>}]
16-bit variant
Applies when size == 01 && opc == 00
STUR <Ht>, [<Xn|SP>{, #<simm>}]
32-bit variant
Applies when size == 10 && opc == 00
STUR <St>, [<Xn|SP>{, #<simm>}]
64-bit variant
Applies when size == 11 && opc == 00
STUR <Dt>, [<Xn|SP>{, #<simm>}]
128-bit variant
Applies when size == 00 && opc == 10
STUR <Qt>, [<Xn|SP>{, #<simm>}]
This also shows that if the 'opc' field is ignored, the 128-bit variant maps onto the 8-bit variant, which appears to correspond with the incorrect size I see reported. I don't know if this can potentially affect applications, but it certainly affects the cache model behavior.