Support using the original encoding template even after IR modifications
Created by: rlyerly
Decoding an instruction into DynamoRIO's IR format and then directly re-encoding back to machine code may change the machine-level encoding type. This happens even if the instruction is never modified using the IR APIs. Below is an example of dumping a function's instructions using DynamoRIO as a standalone decoder/re-encoder:
byte *start, *end; // Start & end address of memory-mapped function
byte *real; // Real virtual address of function
byte *prev;
... (retrieve real, start & end) ...
while(start < end) {
instr_t *instr = instr_create(GLOBAL_DCONTEXT);
instr_init(GLOBAL_DCONTEXT, instr);
prev = start;
start = decode_from_copy(GLOBAL_DCONTEXT, start, real, instr);
std::cout << "Instruction size: " << instr_length(GLOBAL_DCONTEXT, instr) << std::endl;
disassemble_with_bytes(GLOBAL_DCONTEXT, real, STDERR);
instr_free(GLOBAL_DCONTEXT, instr);
real += start - prev;
}
The while-loop decodes all instructions in a function (mapped into memory from an on-disk binary) and prints their sizes using instr_length()
. This which forces re-encoding the instruction to determine its size since the loop uses decode_from_copy()
to decode instructions. This code produces the following output for a given binary:
Instruction size: 10
48 8b 04 25 d0 35 92 mov 0x009235d0[8byte] -> %rax
00
The instruction's original size is 8 bytes, but DynamoRIO's re-encoding process changes the machine-level encoding so that it is now 10 bytes. According to the discussion here, this is because DynamoRIO walks an encoding template from specialized to general encoding types. In this particular situation, DynamoRIO found a more specialized encoding for the instruction versus what was emitted by the compiler. The instruction's change in size is a side-effect of changing the encoding.
Being able to control the encoding types may provide more flexibility for users, especially for instances where the user explicit control. For example, when using DynamoRIO as a standalone decoder/re-encoder users may not want to change the code size as it invalidates control flow targets. Exposing encoding controls may lead to finicky APIs, however, especially for encoding- or user-specified restrictions. For example, what if the user requests an encoding type that is not compatible with the instruction's operands?
As a step in that direction, DynamoRIO could expose an API to allow the user to specify that it wants to use the same encoding as the original instruction, e.g., instr_use_orig_encoding(instr_t *instr)
or instr_encode(void *drcontext, instr_t *instr, byte *pc, bool orig_encoding)
. If the user changed the instruction or operands in such a way that the original encoding is invalid, DynamoRIO could return an error code or nullptr
indicating the encoding failed.