TCP Internals

Overview

In CCF, the TCP host layer is implemented using libuv, allowing us to listen for connections from other nodes and requests from clients as well as connect to other nodes.

Both RPC and Node-to-Node connections use TCP to communicate with external resources and then pass the packets through the ring buffer to communicate with the enclave.

CCF uses a HTTP REST interface to call programs inside the enclave, so the process is usually read request, call enclave function and receive response (via ring buffer message), send the response to the client.

However, the TCP implementation in CCF is generic and could adapt to other common communication processes, but perhaps would need to change how the users (RPC, Node-to-node) use it.

Overall structure

The TCPImpl class (in src/host/tcp.h) implements all TCP logic (using the asynchronous libuv), used by both RPCConnections and NodeConnections.

Because TCPImpl does not have access to the ring buffer, it must use behaviour classes to allow users to register callbacks on actions (ex. on_read, on_accept, etc).

Most of the call backs are for logging purposes, but the two important ones are: - on_accept on servers, which creates a new socket to communicate with the particular connecting client - on_read, which takes the data that is read and writes it to the ring buffer

For note-to-node connections, the behaviours are: - NodeServerBehaviour, the main listening socket and, on_accept, creates a new socket to communicate with a particular connecting client - NodeIncomingBehaviour, the socket that is created above, that waits for input and passes that to the enclave - NodeOutgoingBehaviour, a socket that is created by the enclave (via ring buffer messages into the host), to connect to external nodes

For RPC connections, the behaviours are: - RPCServerBehaviour, same as the NodeServerBehaviour above - RPCClientBehaviour, a misnomer, used for both incoming and outgoing behaviours above

Here’s a diagram with the types of behaviours and their relationships:

        graph BT
    subgraph TCP
        TCPBehaviour
        TCPServerBehaviour
    end

    subgraph RPCConnections
        RPCClientBehaviour
        RCPServerBehaviour
    end

    subgraph NodeConnections
        NodeConnectionBehaviour
        NodeIncomingBehaviour
        NodeOutgoingBehaviour
        NodeServerBehaviour
    end

        RPCClientBehaviour --> TCPBehaviour
        NodeConnectionBehaviour --> TCPBehaviour
        NodeIncomingBehaviour --> NodeConnectionBehaviour
        NodeOutgoingBehaviour --> NodeConnectionBehaviour
        NodeServerBehaviour --> TCPServerBehaviour
        RCPServerBehaviour --> TCPServerBehaviour
    

State machine

The TCPImpl has an internal state machine where states change as reactions to callbacks from libuv.

Since it implements both server (listen, peer, read) and client (connect, write) logic, the state helps common functions to know where to continue to on completion.

The complete state machine diagram, without failed states, is:

        stateDiagram-v2
    %% Server side
    FRESH --> LISTENING_RESOLVING : server
    LISTENING_RESOLVING --> LISTENING : uv_listen

    %% Client side
    state client_host <<choice>>
    FRESH --> client_host : client
    client_host --> BINDING : client_host != null

    BINDING --> CONNECTING_RESOLVING : client_host resolved

    client_host --> CONNECTING_RESOLVING : client_host == null
    CONNECTING_RESOLVING --> CONNECTING : host resolved

    CONNECTING --> CONNECTING_RESOLVING : retry
    CONNECTING --> CONNECTED : uv_tcp_connect

    %% Peer side
    FRESH --> CONNECTED : peer

    %% Disconnect / reconnect
    CONNECTED --> DISCONNECTED : error<br>close
    DISCONNECTED --> RECONNECTING : retry
    RECONNECTING --> FRESH : init
    

Some failed states give transition to retries / reconnects, others are terminal and close the connection.

Server logic

The main cycle of a server is the following: - create a main socket and listen for connections - on accepting a new connection, creates a new (peer) socket to communicate with that client

  • read the request, communicate with the enclave, get the response backs

  • send the response to the client

  • close the socket

There could be several peer sockets open communicating with different clients at the same time and it’s up to libuv to handle the asynchronous tasks.

Here’s a diagram of the control flow for a server connection:

        graph TD
    subgraph RPCConnections
        rl(listen)
        subgraph RPCServerBehaviour
            rsboa(on_accept)
        end
    end

    subgraph TCPImpl
        tl(listen)
        tr(resolve)
        tor(on_resolved)
        tlr(listen_resolved)
        toa(on_accept)
        tp[TCP peer]
    end

    subgraph NodeConnections
        nctor(NodeConnections)
        subgraph NodeServerBehaviour
            nsboa(on_accept)
        end
    end

    %% Entry Points
    rl --> tl
    nctor --> tl

    %% Listen path
    tl -- LISTENING_RESOLVING --> tr
    tr -. via: DNS::resolve .-> tor
    tor --> tlr
    tlr -. LISTENING<br>via: uv_listen .-> toa
    toa --> rsboa
    toa --> nsboa
    toa ==> tp
    

The control flow of the peer connection is similar to the client (below), but the order is reverse.

The client first writes the request and then waits for the response, while the peer first waits for the request and then writes the response back.

Client logic

Clients don’t have a cycle, as they connect to an existing server, send the request, wait for the response and disconnect.

Clients are used from the enclave side (Node-to-node and RPC), via a ring buffer message.

Node-to-node clients are used for pings across nodes, electing a new leader, etc.

RPC clients are used for REST service callbacks from other services, ex. metrics.

Here’s the diagram of the client control flow:

        graph TD
    subgraph RPCConnections
        rc(connect)
        rw(write)
        subgraph RPCClientBehaviour
            rsbor(on_read)
        end
    end

    subgraph TCPImpl
        tc(connect)
        tocr(on_client_resolved)
        tcb(client_bind)
        tr(resolve)
        tor(on_resolved)
        tcr(connect_resolved)
        toc(on_connect<br>CONNECTED)

        trs(read_start)
        toa(on_alloc)
        tore(on_read)
        tof(on_free)

        tw(write)
        tow(on_write)
        tfw(free_write)
        tsw(send_write)
    end

    subgraph NodeConnections
        ncc(create_connection)
        nw(ccf::node_outbound)
        subgraph NodeConnectionBehaviour
            nsbor(on_read)
        end
    end

    %% Entry Points
    rc --> tc
    ncc --> tc
    rw --> tw
    nw --> tw

    %% Connect path
    tc -- CONNECTING_RESOLVING --> tr
    tc -. BINDING<br>via: DNS::resolve .-> tocr
    tocr --> tcb
    tcb -- uv_tcp_bind<br>CONNECTING_RESOLVING --> tr
    tr -. via: DNS::resolve .-> tor
    tor --> tcr
    tcr -. CONNECTING<br>via: uv_tcp_connect .-> toc
    toc -- retry<br>CONNECTING_RESOLVING --> tcr
    toc -- pending writes --> tw
    toc --> trs

    %% Read path
    trs -. via: uv_read_start .-> toa
    trs -. via: uv_read_start .-> tore
    tore -- DISCONNECTED<br>uv_read_stop --> tof
    tore --> rsbor
    tore --> nsbor

    %% Write path
    tw -- CONNECTED --> tsw
    tw -- DISCONNECTED<br>no data --> tfw
    tsw -. via: uv_write .-> tow
    tow --> tfw
    

Note that some clients have a client_host parameter separate from host that is used for testing, and uses the state BINDING.

The client_host is resolved separately, bound to the client handle (via uv_tcp_bind) but the call to uv_tcp_connect is done on the host address.

This allows us to bind separate addresses to the client side while connecting to the host, to allow external packet filters (like iptables) to restrict traffic.