DynamoRIO-based Model Backend

This document describes the DynamoRIO-based model. As any other model, this backend is responsible for collecting contract traces for generated test cases.

Design Overview

This backend is composed of several parts:

The Python adapter (src/model_dynamorio/model.py) is responsible for receiving a test case from Revizor, transforming it into a format that can be executed by the backend, triggering the backend to execute the test case, and returning the collected contract traces to Revizor.
The Test Case Loader (src/model_dynamorio/adapter.c) is a C program that loads a test case program and a batch of inputs into its memory, and executes the test case program with each input in a sequence.
The DynamoRIO components (src/model_dynamorio/backend) are executed together with the test case loader, and they instrument the loader binary to collect contract traces.

These components can be roughly divided into the instrumentation-time components that are responsible for modifying the binary, and execution-time components that implement the model logic (i.e., the contract).

Python Adapter

Revizor communicates with the backend through a Python adapter (src/model_dynamorio/model.py:DynamoRIOModel).

At the beginning of the fuzzing process, Revizor configures the backend by calling configure_clauses method. This configuration will be later passed down to the backend when the test case is executed.

During the fuzzing process, Revizor sends test cases to the backend by calling load_test_case method, and then triggers the backend to execute the test case by calling trace_test_case method. Internally, trace_test_case will call the backend to execute the test case and collect the contract traces. The adapter will then parse the traces and return them back to the caller.

The trace_test_case method implements the following algorithm:

Convert test case program and inputs into RCBF and RDBF files, respectively
For each input, call the test case loader with the RCBF and RDBF files. Attach the DynamoRIO backend to the call so that the binary instrumentation is performed:
```
~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so --tracer <observation-clause> -- ~/.local/dynamorio/adapter <rcbf> <rdbf>
```
Receive contract traces from the backend and convert them into CTrace objects
Return the list of collected CTrace objects to the caller (usually, fuzzer.py)

Test Case Loader

Since the test cases produced by Revizor are raw binaries, they cannot be directly executed (e.g., they don't have libc linked). The test case loader (src/model_dynamorio/adapter.c) is a simple C program that fixes this issue by providing a wrapper around the test case binary.

The loader implements the following algorithm:

Receive the test case binary and an input from the Python adapter via CLI arguments
Load the test case binary and the input into dedicated memory regions
Print the addresses of the test case and input memory regions (for trace normalization)
Initialize registers based on the input
Jump to the test case binary entry point
Return

DynamoRIO Tool

The DynamoRIO tool (src/model_dynamorio/backend) is responsible for instrumenting the test case loader binary and collecting contract traces.

Instrumentation Components

The instrumentation components modify the binary of the test case loader by adding a call to the function dispatch_callback before every instruction in the binary (or more specifically, every instruction in the test_case_entry function of the loader).

The tool interacts with DynamoRIO through the model.cpp module. This module registers an event for entering the test_case_entry, which triggers the start of instrumentation. The module also registers an event for every instruction in the test_case_entry, and the event in turn calls the Dispatch::instrument_instruction().

The Dispatch class implements the actual instrumentation logic. When the instrument_instruction() method is called, it inserts a clean call to the dispatch_callback function before the instruction. The call receives the PC and opcode of the instruction as arguments. DynamoRIO also automatically saves the complete register state before the call, thus making it available to dispatch_callback.

Execution-Time Components

The execution-time components are responsible for implementing the contract logic, and are triggered by the dispatch_callback function. At the current state of the backend, the dispatch callback invokes only two classes, Tracer and Speculator, that implement the observation and execution clauses, respectively.

Subclasses of TracerABC record contract-relevant information via observe_instruction and observe_mem_access methods. E.g., TracerCT implements CT observation clause by recording the PC of instructions upon observe_instruction and the address of memory accesses upon observe_mem_access.

Subclasses of SpeculatorABC implement the contract speculation logic. E.g., SpeculatorCond implements speculate_instruction. When this method is called with a branch instruction, the class takes a checkpoint of the process state, flips the branch condition (i.e., modified FLAGS register), and continues the execution. During the simulated speculation, each call to speculate_instruction counts the number of executed instructions, and when the number reaches the limit (e.g., 256), the class restores the checkpoint and continues the execution from the original state. (Actually, the algorithm is more complex, but this is the general idea.)

When the instrumentation ends (according to model.cpp), the tracer's tracing_finalized method is called, and the tracer prints the collected traces to stdout in a compressed binary format. The Python adapter picks up this output, decodes, and returns it to Revizor.