Graph Generator

Source Code on GitHub #

Graph generator is a useful tool for generating synthetic graphs. We demonstrate how to write a simple yet efficient graph generator for generating RMAT graphs using GE.

Graph Modeling #

Before we start working on generating a graph, we need to first define the schema of the graph nodes. A possible graph node schema specification can be:

cell struct GraphNode
{
  string Label;
  List<CellId> Edges;
}

This works. However, the graph generation process usually poses a great pressure on GC. In this demo application, we demonstrate how to use a simple space reservation mechanism to improve performance by reducing the GC-pressure. The trick is straightforward: instead of appending CellIds one by one to Edges, we reserve some edge slots in the Edges container before actually generating graph edges. At the same time, we use an EdgeCount to indicate the actual number of edges inserted to the current graph node. The refined graph node schema is as follows:

cell struct GraphNode
{
  int EdgeCount;
  string Label;
  List<CellId> Edges;
}

Graph Generation #

To work with the edge reservation mechanism described above, we split the graph generation process into three steps:

  • Generate graph nodes and reserve edge slots;
  • Insert edges to the existing graph nodes;
  • Clean up the graph nodes to remove the unused edge slots.

Graph Node Generation #

In the newly created graph node cell, we reserve some empty edge slots. The code sketch is:

...
List<long> reservationList = new List<long>((int)capacity);
...
GraphNode node = new GraphNode(nodeId, 0, label, reservationList);
Global.LocalStorage.SaveGraphNode(node);

Graph Edge Generation #

We append a new edge to the Edges container of a graph node if there are one or more free reserved edge slots. Otherwise, we need to allocate a larger Edges container before we can append a new edge. The code sketch is:

// y1 is the cell id of a new edge
// y2 is the cell id of the current graph node
using (var node = Global.LocalStorage.UseGraphNode(y2))
{
    if (!node.Edges.Contains(y1))
    {
        if (node.EdgeCount < node.Edges.Count)
        {
            node.Edges[node.EdgeCount++] = y1;
        }
        else
        {
            ...
            List<long> reserved_list = new List<long>();
            ...
            node.Edges.AddRange(reserved_list);
            node.Edges[node.EdgeCount++] = y1;
        }
    }
}

Cleaning Up #

At the end, we need to clean up the unused edge slots to save space. This is the code sketch:

using (var node = Global.LocalStorage.UseGraphNode(i))
{
    if (node.EdgeCount < node.Edges.Count)
    {
        node.Edges.RemoveRange(node.EdgeCount, node.Edges.Count - node.EdgeCount);
    }
}