DynamoRIO-based Model Backend
This document describes the DynamoRIO-based model. As any other model, this backend is responsible for collecting contract traces for generated test cases.
Design Overview
This backend is composed of several parts:
- The Python adapter (
rvzr/model_dynamorio/model.py
) is responsible for receiving a test case from Revizor, transforming it into a format that can be executed by the backend, triggering the backend to execute the test case, and returning the collected contract traces to Revizor. - The Test Case Loader (
rvzr/model_dynamorio/adapter.c
) is a C program that loads a test case program and a batch of inputs into its memory, and executes the test case program with each input in a sequence. - The DynamoRIO components (
rvzr/model_dynamorio/backend
) are executed together with the test case loader, and they instrument the loader binary to collect contract traces.
These components can be roughly divided into the instrumentation-time components that are responsible for modifying the binary, and execution-time components that implement the model logic (i.e., the contract).
Python Adapter
Revizor communicates with the backend through a Python adapter (rvzr/model_dynamorio/model.py:DynamoRIOModel
).
At the beginning of the fuzzing process, Revizor configures the backend by calling configure_clauses
method.
This configuration will be later passed down to the backend when the test case is executed.
During the fuzzing process, Revizor sends test cases to the backend by calling load_test_case
method, and then triggers the backend to execute the test case by calling trace_test_case
method.
Internally, trace_test_case
will call the backend to execute the test case and collect the contract traces.
The adapter will then parse the traces and return them back to the caller.
The trace_test_case
method implements the following algorithm:
- Convert test case program and inputs into RCBF and RDBF files, respectively
- For each input, call the test case loader with the RCBF and RDBF files. Attach the DynamoRIO backend to the call so that the binary instrumentation is performed:
~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so --tracer <observation-clause> -- ~/.local/dynamorio/adapter <rcbf> <rdbf>
- Parse contract traces from the backend and convert them into
CTrace
objects - Return the list of collected
CTrace
objects to the caller (usually,fuzzer.py
)
Test Case Loader
Since the test cases produced by Revizor are raw binaries, they cannot be directly executed (e.g., they don't have libc
linked).
The test case loader (rvzr/model_dynamorio/adapter.c
) is a simple C program that fixes this issue by providing a wrapper around the test case binary.
The loader implements the following algorithm:
- Receive the test case binary and an input from the Python adapter via CLI arguments
- Load the test case binary and the input into dedicated memory regions
- Print the addresses of the test case and input memory regions (for trace normalization)
- Initialize registers based on the input
- Jump to the test case binary entry point
- Return
DynamoRIO Tool
The DynamoRIO tool (rvzr/model_dynamorio/backend
) is responsible for instrumenting the test case loader binary and collecting contract traces.
Implementation Overview
All instrumentation logic is implemented as a DynamoRIO client. In particular,
model.cpp
contains the event callbacks that are executed at instrumentation time,
while dispatcher.cpp
contains the body of the callbacks that are inserted
by the DR client and are executed before every instruction at runtime. Finally, the Dispatcher
object holds the state that is
shared between instrumentation-time callbacks and execution-time callbacks.
The following figure provides an overview of the implementation.
dr_client_main()
is responsible of installing the initial instrumentation callbacks to hook all relevant DR events (module_load
,bb_translation
, exceptions and theexit
event)dr_client_main()
also sets the name of the function to instrument (passed bycli.cpp
)- on
module_load
, the instrumentation checks for the presence of the target function in the loaded module. If found, the callback adds adrwarp
callback (event_instrumentation_start
) which will be executed at the start of the target function - once a call to the target function is found, the
event_instrumentation_start
will save the return address in a global object (instrumented_func
) and callstart()
on the dispatcher - from that moment on, every translated basic block is instrumented by our client, in particular:
- a
dispatch_callback()
is inserted before every instruction - at the function exit point (i.e. the previously saved return addres) an
exit_callback
is inserted
- a
- these callbacks are executed at runtime with the following effects:
- the
dispatch_callback()
implements the observation and execution clauses (see next section) - the
exit_callback()
checks the current speculation state before exiting:- speculative exits cause a rollback
- architectural exit causes the instrumentation to stop
- the
Finally, exceptions and the exit
event are also forwarded to the Dispatcher:
- Speculative exceptions will cause a rollback, while architectural ones are forwarded to the target program
- The exit event stops instrumentation and flushes all logs (in case the exit callback has not been executed architecturally)
Instrumentation Components
The instrumentation components modify the binary of the test case loader by adding a call to the function dispatch_callback
before every instruction in the binary (or more specifically, every instruction in the test_case_entry
function of the loader).
The tool interacts with DynamoRIO through the model.cpp
module.
This module registers an event for entering the test_case_entry
, which triggers the flush of the internal DynamoRIO code fragment cache and the start of instrumentation.
The module also registers an event for every instruction in the test_case_entry
, and the event in turn calls the Dispatch::instrument_instruction()
. Finally, exceptions are hooked and passed to the dispatcher through Dispatch::handle_exception()
, which can decide to either handle the signal (e.g. on speculative paths) or forward it to the test case (e.g. architectural exceptions).
The Dispatch
class implements the actual instrumentation logic.
When the instrument_instruction()
method is called, it inserts a clean call to the dispatch_callback
function before the instruction.
The call receives the PC and opcode of the instruction as arguments.
DynamoRIO also automatically saves the complete register state before the call, thus making it available to dispatch_callback
.
Execution-Time Components
The execution-time components are responsible for implementing the contract logic, and are triggered by the dispatch_callback
function.
At the current state of the backend, the dispatch callback invokes only two classes, Tracer and Speculator, that implement the observation and execution clauses, respectively. Optionally, each component can log additional events, e.g.
speculation rollbacks or the current register state, through a shared Logger
component.
Subclasses of TracerABC
record contract-relevant information via observe_instruction
and observe_mem_access
methods.
E.g., TracerCT
implements CT
observation clause by recording the PC of instructions upon observe_instruction
and the address of memory accesses upon observe_mem_access
. Currently, observe_exception
simply adds a special entry to the trace to indicate that the program ended due to an (architectural) exception.
Subclasses of SpeculatorABC
implement the contract speculation logic.
E.g., SpeculatorCond
implements speculate_instruction
.
When this method is called with a branch instruction, the class takes a checkpoint of the process state, flips the branch condition (i.e., modified FLAGS
register), and continues the execution.
During the simulated speculation, each call to speculate_instruction
counts the number of executed instructions, and when the number reaches the limit (e.g., 256), the class restores the checkpoint and continues the execution from the original state. (Actually, the algorithm is more complex, but this is the general idea.)
When the instrumentation ends (according to model.cpp
), the tracer's tracing_finalized
method is called, during which any remaining traces are flushed into the trace file, together with an "End Of Trace" entry.
The Python adapter will then read the trace file, decode it, and return the corresponding CTrace to Revizor.
Standalone Usage
The DR tool can be used as a standalone tool to collect the runtime trace of any program, independently from the rest of Revizor's infrastructure.
A typical usage is for example:
~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so --tracer <observation-clause> --speculator <speculation-clause> -- ls /dev/null
By default, this will instrument ls
starting from __libc_start_main
until the end of the program, run it with /dev/null
as an argument, and generate a binary file called rvzr_trace.dat
that contains the collected trace. Other flags can be printed using ~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so -h
The trace file location can be changed by adding --trace-output <PATH>
. Additionally, the tool can also dump the trace in human-readable format to STDOUT using the --print-trace
flag.
To decode and analyze the trace file, downstream tools should always use the TraceDecoder
class provided by trace_decoder.py
. For internal usage, this module also provides a simple entrypoint for trace printing:
python3 trace_decoder.py rzvr_trace.dat
Debugging
Attaching a debugger like GDB to the DR tool might not always be the best debugging option, as the program has three separate states:
- the state of the program being instrumented (e.g.
ls
) - the state of the DR client (
libdr_model.so
) instrumentation - the state of DynamoRIO itself (
drrun
)
More information about debugging DR clients can be found here.
For our instrumentation, other (possibly simpler) options are available:
- Inspecting Debug Traces: the DR tool can optionally log extra information, e.g. the complete state of the register file before each instruction, each value being read and written to memory, and speculation events like checkpoints are rollbacks, in a separate debug trace:
- This option can be enabled using
--log-level <N>
- By default, the tool will dump debug entries to
rzvr_dbg_trace.dat
in binary format; to change the path of the debug trace file use--debug-output <PATH>
--print-debug-trace
can be used to pretty-print debug entries to STDOUT during executiontrace_decoder.py
also provides a decoder for debug entries- WARNING: debug traces can become very big, especially for nested speculation
- This option can be enabled using
- Running DynamoRIO with logging: DynamoRIO can also produce logs (see DR documentation):
~/.local/dynamorio/drrun -debug -loglevel 3 -c ~/.local/dynamorio/libdr_model.so --tracer <observation-clause> --speculator <speculation-clause> -- ls /dev/null