Tutorial 5: Extending Revizor
In this tutorial, we will switch gears: instead of using Revizor's existing components, we will extend Revizor by adding custom functionality to some of its core modules.
Workflow
The general workflow for extending any part of Revizor is as follows:
- Subclass the exiting module or interface you want to extend. For a list of all interfaces, refer to the Architecture Overview document.
- Implement your custom logic by overwriting the necessary methods.
- Register your new class in the factory so that Revizor can access the new implementation. It will also enable the user to select the new implementation via a config file.
- Add new configuration options if your extension requires additional parameters.
Changing Data Generation Algorithm
As our first example, we will modify the data (input) generation algorithm used by Revizor. By default, Revizor generates random input data for each test case. However, in some scenarios, it may be beneficial to generate inputs that contain extreme values (e.g., minimum or maximum integers) to test edge cases in the microarchitecture. We will implement this feature.
The data generation logic is defined by the DataGenerator interface, with its default implementation located in rvzr/data_generator.py. We will create a new subclass of DataGenerator that generates minimum or maximum integer values with a configurable probability.
Implement the new generation algorithm by overwriting the generation logic in the default DataGenerator class.
# rvzr/data_generator.py
class MinMaxIntGenerator(DataGenerator):
"""
A variant of DataGenerator that generates minimum or maximum integer
values with a configurable probability.
"""
int_sizes: Final[List[int]] = [8, 16, 32, 64]
def __init__(self, seed: int):
super().__init__(seed)
self._probability_of_max = CONF.input_gen_probability_of_minmax
def _generate_one(self, state: int, n_actors: int) -> Tuple[InputData, int]:
input_ = InputData(n_actors)
input_.seed = state
per_actor_data_size = input_.itemsize // 8
rng = np.random.default_rng(seed=state)
for i in range(n_actors):
# generate random data
data = rng.integers(
self.max_input_value, size=per_actor_data_size, dtype=np.uint64) # type: ignore
# if the probability of max is 0, we're done
if self._probability_of_max == 0:
input_.set_actor_data(i, data)
continue
# otherwise, with a given probability, set some values to min or max int
for val_id in range(per_actor_data_size):
roll = rng.random()
if roll > self._probability_of_max:
continue
int_size = random.choice(self.int_sizes)
int_sign = random.choice([True, False])
value = (2 ** (int_size - 1))
if not int_sign:
value = -value
data[val_id] = np.uint64(value & 0xFFFFFFFFFFFFFFFF)
input_.set_actor_data(i, data)
return input_, state + 1
We now need to let Revizor know about the existence of this new class. This is achieved by via the factory module rvzr/factory.py:
# rvzr/factory.py
_DATA_GENERATORS: Dict[str, Type[data_generator.DataGenerator]] = {
'random': data_generator.DataGenerator,
'minmax': data_generator.MinMaxIntGenerator, # <<<<<<<<<<<<<<<< ADDED LINE
}
Finally, our implementation used a new config option (input_gen_probability_of_minmax) to control the probability of generating extreme values. We need to register this new option in the configuration module rvzr/config.py:
# rvzr/config.py
class Config:
...
input_gen_probability_of_minmax: float = 0.5 # <<<<<<<<<<<<<<<< ADDED LINE
That's it. That's all it takes to change the data generation algorithm in Revizor.
Now, let's test the implementation:
Run Revizor with the new configuration:
See that the new generator was applied:
$ hexdump -C ./tc0/input0.bin| head -10
00000000 80 ff ff ff ff ff ff ff 00 00 00 00 00 00 00 80 |................|
00000010 00 80 ff ff ff ff ff ff 2a 35 00 00 00 00 00 00 |........*5......|
00000020 80 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 |................|
00000030 c4 83 00 00 00 00 00 00 37 26 00 00 00 00 00 00 |........7&......|
00000040 36 d5 00 00 00 00 00 00 00 80 ff ff ff ff ff ff |6...............|
00000050 41 27 00 00 00 00 00 00 00 80 00 00 00 00 00 00 |A'..............|
00000060 32 69 00 00 00 00 00 00 64 b0 00 00 00 00 00 00 |2i......d.......|
00000070 00 80 ff ff ff ff ff ff 7c d7 00 00 00 00 00 00 |........|.......|
00000080 00 00 00 00 00 00 00 80 00 80 00 00 00 00 00 00 |................|
00000090 31 86 00 00 00 00 00 00 f9 f4 00 00 00 00 00 00 |1...............|
Success! We can see large and small integer values in the generated input data (ff ff ff ...),
meaning that our new data generator is working as expected.
Adding a Code Generation Pass
We will now explore the other part of the test case generation pipeline - generation of test case programs (code). In this example, we will add a new code generation pass that replaces all registers in the test case with a fixed register (RAX).
Note
Frankly, it is not a very useful generation pass, but it serves the purpose of demonstration. The same principles apply to more complex generation passes.
We will follow the same steps as before. The code pass interface is located in rvzr/code_generator.py as the Pass class. We will create a new subclass of it, and, since we are creating an ISA-specific pass, we will place it into rvzr/arch/x86/generator.py.
# rvzr/arch/x86/generator.py
class _X86RaxPass(Pass):
"""
Demonstration-only pass that replaces all register operands with RAX.
"""
def run_on_test_case(self, test_case: TestCaseProgram) -> None:
for bb in test_case.iter_basic_blocks():
for node in bb.iter_nodes():
inst = node.instruction
for op in inst.operands:
if isinstance(op, RegisterOp):
op.value = "rax"
Register the new class with the generator:
# rvzr/arch/x86/generator.py
class X86Generator(CodeGenerator):
...
self._passes = [
_X86PatchUndefinedFlagsPass(self._instruction_set, self),
_X86SandboxPass(self._target_desc, self._faults),
_X86PatchUndefinedResultPass(),
_X86RaxPass(), # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ADDED LINE
]
That's it. Now, let's test our new code generation pass by running Revizor again:
Check the generated program:
$ cat tc0/program.asm | head -10
.intel_syntax noprefix
.section .data.main
.function_0:
.bb_0.0:
.macro.measurement_start: nop qword ptr [rax + 0xff]
and rax, 0b1111111111000 # instrumentation
lock add byte ptr [r14 + rdi], rax
cmp rax, 106
or rax, 0b1000000000000000000000000000000 # instrumentation
bsr rax, rax
As we can see, all register operands have been replaced with RAX, confirming that our new code generation pass is functioning correctly.