DynamoRIO-based Model Backend
This document describes the DynamoRIO-based model. As any other model, this backend is responsible for collecting contract traces for generated test cases.
Design Overview
This backend is composed of several parts:
- The Python adapter (
rvzr/model_dynamorio/model.py) is responsible for receiving a test case from Revizor, transforming it into a format that can be executed by the backend, triggering the backend to execute the test case, and returning the collected contract traces to Revizor. - The Test Case Loader (
rvzr/model_dynamorio/adapter.c) is a C program that loads a test case program and a batch of inputs into its memory, and executes the test case program with each input in a sequence. - The DynamoRIO components (
rvzr/model_dynamorio/backend) are executed together with the test case loader, and they instrument the loader binary to collect contract traces. 
These components can be roughly divided into the instrumentation-time components that are responsible for modifying the binary, and execution-time components that implement the model logic (i.e., the contract).
Python Adapter
Revizor communicates with the backend through a Python adapter (rvzr/model_dynamorio/model.py:DynamoRIOModel).
At the beginning of the fuzzing process, Revizor configures the backend by calling configure_clauses method.
This configuration will be later passed down to the backend when the test case is executed.
During the fuzzing process, Revizor sends test cases to the backend by calling load_test_case method, and then triggers the backend to execute the test case by calling trace_test_case method.
Internally, trace_test_case will call the backend to execute the test case and collect the contract traces.
The adapter will then parse the traces and return them back to the caller.
The trace_test_case method implements the following algorithm:
- Convert test case program and inputs into RCBF and RDBF files, respectively
 - For each input, call the test case loader with the RCBF and RDBF files. Attach the DynamoRIO backend to the call so that the binary instrumentation is performed:
~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so --tracer <observation-clause> -- ~/.local/dynamorio/adapter <rcbf> <rdbf> - Parse contract traces from the backend and convert them into 
CTraceobjects - Return the list of collected 
CTraceobjects to the caller (usually,fuzzer.py) 
Test Case Loader
Since the test cases produced by Revizor are raw binaries, they cannot be directly executed (e.g., they don't have libc linked).
The test case loader (rvzr/model_dynamorio/adapter.c) is a simple C program that fixes this issue by providing a wrapper around the test case binary.
The loader implements the following algorithm:
- Receive the test case binary and an input from the Python adapter via CLI arguments
 - Load the test case binary and the input into dedicated memory regions
 - Print the addresses of the test case and input memory regions (for trace normalization)
 - Initialize registers based on the input
 - Jump to the test case binary entry point
 - Return
 
DynamoRIO Tool
The DynamoRIO tool (rvzr/model_dynamorio/backend) is responsible for instrumenting the test case loader binary and collecting contract traces.
Implementation Overview
All instrumentation logic is implemented as a DynamoRIO client. In particular,
model.cpp contains the event callbacks that are executed at instrumentation time,
while dispatcher.cpp contains the body of the callbacks that are inserted
by the DR client and are executed before every instruction at runtime. Finally, the Dispatcher object holds the state that is
shared between instrumentation-time callbacks and execution-time callbacks.
The following figure provides an overview of the implementation.
dr_client_main()is responsible of installing the initial instrumentation callbacks to hook all relevant DR events (module_load,bb_translation, exceptions and theexitevent)dr_client_main()also sets the name of the function to instrument (passed bycli.cpp)- on 
module_load, the instrumentation checks for the presence of the target function in the loaded module. If found, the callback adds adrwarpcallback (event_instrumentation_start) which will be executed at the start of the target function - once a call to the target function is found, the 
event_instrumentation_startwill save the return address in a global object (instrumented_func) and callstart()on the dispatcher - from that moment on, every translated basic block is instrumented by our client, in particular:
- a 
dispatch_callback()is inserted before every instruction - at the function exit point (i.e. the previously saved return addres) an 
exit_callbackis inserted 
 - a 
 - these callbacks are executed at runtime with the following effects:
- the 
dispatch_callback()implements the observation and execution clauses (see next section) - the 
exit_callback()checks the current speculation state before exiting:- speculative exits cause a rollback
 - architectural exit causes the instrumentation to stop
 
 
 - the 
 
Finally, exceptions and the exit event are also forwarded to the Dispatcher:
- Speculative exceptions will cause a rollback, while architectural ones are forwarded to the target program
 - The exit event stops instrumentation and flushes all logs (in case the exit callback has not been executed architecturally)
 
Instrumentation Components
The instrumentation components modify the binary of the test case loader by adding a call to the function dispatch_callback before every instruction in the binary (or more specifically, every instruction in the test_case_entry function of the loader).
The tool interacts with DynamoRIO through the model.cpp module.
This module registers an event for entering the test_case_entry, which triggers the flush of the internal DynamoRIO code fragment cache and the start of instrumentation.
The module also registers an event for every instruction in the test_case_entry, and the event in turn calls the Dispatch::instrument_instruction(). Finally, exceptions are hooked and passed to the dispatcher through Dispatch::handle_exception(), which can decide to either handle the signal (e.g. on speculative paths) or forward it to the test case (e.g. architectural exceptions).
The Dispatch class implements the actual instrumentation logic.
When the instrument_instruction() method is called, it inserts a clean call to the dispatch_callback function before the instruction.
The call receives the PC and opcode of the instruction as arguments.
DynamoRIO also automatically saves the complete register state before the call, thus making it available to dispatch_callback.
Execution-Time Components
The execution-time components are responsible for implementing the contract logic, and are triggered by the dispatch_callback function.
At the current state of the backend, the dispatch callback invokes only two classes, Tracer and Speculator, that implement the observation and execution clauses, respectively. Optionally, each component can log additional events, e.g.
speculation rollbacks or the current register state, through a shared Logger component.
Subclasses of TracerABC record contract-relevant information via observe_instruction and observe_mem_access methods.
E.g., TracerCT implements CT observation clause by recording the PC of instructions upon observe_instruction and the address of memory accesses upon observe_mem_access. Currently, observe_exception simply adds a special entry to the trace to indicate that the program ended due to an (architectural) exception.
Subclasses of SpeculatorABC implement the contract speculation logic.
E.g., SpeculatorCond implements speculate_instruction.
When this method is called with a branch instruction, the class takes a checkpoint of the process state, flips the branch condition (i.e., modified FLAGS register), and continues the execution.
During the simulated speculation, each call to speculate_instruction counts the number of executed instructions, and when the number reaches the limit (e.g., 256), the class restores the checkpoint and continues the execution from the original state. (Actually, the algorithm is more complex, but this is the general idea.)
When the instrumentation ends (according to model.cpp), the tracer's tracing_finalized method is called, during which any remaining traces are flushed into the trace file, together with an "End Of Trace" entry.
The Python adapter will then read the trace file, decode it, and return the corresponding CTrace to Revizor.
Standalone Usage
The DR tool can be used as a standalone tool to collect the runtime trace of any program, independently from the rest of Revizor's infrastructure.
A typical usage is for example:
~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so --tracer <observation-clause> --speculator <speculation-clause> -- ls /dev/null
By default, this will instrument ls starting from __libc_start_main until the end of the program, run it with /dev/null as an argument, and generate a binary file called rvzr_trace.dat that contains the collected trace. Other flags can be printed using ~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so -h
The trace file location can be changed by adding --trace-output <PATH>. Additionally, the tool can also dump the trace in human-readable format to STDOUT using the --print-trace flag.
To decode and analyze the trace file, downstream tools should always use the TraceDecoder class provided by trace_decoder.py. For internal usage, this module also provides a simple entrypoint for trace printing:
python3 trace_decoder.py rzvr_trace.dat
Debugging
Attaching a debugger like GDB to the DR tool might not always be the best debugging option, as the program has three separate states:
- the state of the program being instrumented (e.g. 
ls) - the state of the DR client (
libdr_model.so) instrumentation - the state of DynamoRIO itself (
drrun) 
More information about debugging DR clients can be found here.
For our instrumentation, other (possibly simpler) options are available:
- Inspecting Debug Traces: the DR tool can optionally log extra information, e.g. the complete state of the register file before each instruction, each value being read and written to memory, and speculation events like checkpoints are rollbacks, in a separate debug trace:
- This option can be enabled using 
--log-level <N> - By default, the tool will dump debug entries to 
rzvr_dbg_trace.datin binary format; to change the path of the debug trace file use--debug-output <PATH> --print-debug-tracecan be used to pretty-print debug entries to STDOUT during executiontrace_decoder.pyalso provides a decoder for debug entries- WARNING: debug traces can become very big, especially for nested speculation
 
 - This option can be enabled using 
 - Running DynamoRIO with logging: DynamoRIO can also produce logs (see DR documentation):
 
~/.local/dynamorio/drrun -debug -loglevel 3 -c ~/.local/dynamorio/libdr_model.so --tracer <observation-clause> --speculator <speculation-clause> -- ls /dev/null

