Signal Subgraph Estimators

This tutorial will introduce the following signal-subgraph estimators:

  • Incoherent subgraph estimator

  • Coherent subgraph estimator

[1]:
import graspologic.subgraph as sg
import matplotlib.pyplot as plt
import numpy as np
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
/home/runner/work/graspologic/graspologic/graspologic/models/edge_swaps.py:215: NumbaDeprecationWarning: The keyword argument 'nopython=False' was supplied. From Numba 0.59.0 the default is being changed to True and use of 'nopython=False' will raise a warning as the argument will have no effect. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  _edge_swap_numba = nb.jit(_edge_swap, nopython=False)

Preliminaries

The general graph model is characterized by \(M_V(m,s;\pi,p,q)\), where:

\(V\) - the number of vertices in the graph

\(n\) - the number of graph samples

\(s\) - the number of edges that must be present in the subgraph

\(m\) - the number of vertices that each edge in the subgraph must be incident to

\(\pi\) - the probability of a graph sample being of class 1

\(p\) - the probability of edges in the signal-subgraph, conditioned on Class 0

\(q\) - the probability of edges in the signal-subgraph, conditioned on Class 1

The signal-subgraph is a subset of edges with distinct class-conditional likelihood parameters. The signal-subgraph estimator evaluates a test statistic for each edge in the graph and selects \(s\) edges with the lowest test statistic, where \(s\) is the desired size of the signal-subgraph.

The estimator that is used to find the signal-subgraph determines certain properties of the resulting subgraph. Both estimators use \(s\) to determine the size of the resulting subgraph. \(m\) is only used for the coherent estimator, which constrains the subgraph to \(m\) vertices.

Incoherent Signal-Subgraph Estimator

For this example we will randomly select 20 edges from a graph with 70 vertices. These edges will have distinct class-conditional edge probabilities, and the graphs will be sampled from the model \(M_{70}(20; 0.5, 0.8, 0.1)\), with \(n = 100\).

[2]:
from graspologic.plot import heatmap

verts = 70
sedges = 20
pi = 0.5
p = 0.8
q = 0.1
nsamples = 100

np.random.seed(8888)
classlabels = np.zeros(nsamples, dtype=int)
classlabels[1::2] = 1

sigsubindex = np.random.choice(verts ** 2, sedges, replace=False)
vect = p * np.ones(verts ** 2)
vect[sigsubindex] = q
vect = np.reshape(vect, (verts, verts))
expected = np.where(vect == q, 1, 0)

blank = vect[:, :, None] + np.zeros(int(nsamples / 2))
A = p * np.ones((verts, verts, nsamples))
A[:, :, 1::2] = blank
A = np.random.binomial(1, A)

sigsub = sg.SignalSubgraph()
sigsub.fit_transform(graphs=A, labels=classlabels, constraints=sedges)

estimatesigsub = np.zeros((verts, verts))
estimatesigsub[sigsub.sigsub_] = 1

fig, ax = plt.subplots(ncols=2, figsize=(10, 5), constrained_layout=True)
heatmap(expected, ax=ax[0], cbar=False, title="Expected Signal-Subgraph")
_ = heatmap(estimatesigsub, ax=ax[1], cbar=False, title="Estimated Signal-Subgraph")
../../_images/tutorials_subgraph_subgraph_5_0.png

Note that because \(p\) and \(q\) are sufficiently distinct and \(n\) is sufficiently large, the Expected and Estimated signal-subgraphs should match exactly.

Incoherent Signal-Subgraph Estimator

Once again, we will randomly select 20 edges from a graph with 70 vertices. These edges will have distinct class-conditional edge probabilities, but the graphs will be sampled from the model \(M_{70}(1, 20; 0.5, 0.8, 0.1)\), with \(n = 100\).

The estimated signal-subgraph will have 20 edges, constrained so that each edge must be incident to the same vertex. First, we will use the same expected signal-subgraph as the previous example.

[3]:
np.random.seed(8888)
classlabels = np.zeros(nsamples, dtype=int)
classlabels[1::2] = 1

sigsubindex = np.random.choice(verts ** 2, sedges, replace=False)
vect = p * np.ones(verts ** 2)
vect[sigsubindex] = q
vect = np.reshape(vect, (verts, verts))
expected = np.where(vect == q, 1, 0)

blank = vect[:, :, None] + np.zeros(int(nsamples / 2))
A = p * np.ones((verts, verts, nsamples))
A[:, :, 1::2] = blank
A = np.random.binomial(1, A)

sigsub = sg.SignalSubgraph()
sigsub.fit_transform(A, classlabels, [20, 1])

estimatesigsub = np.zeros((verts, verts))
estimatesigsub[sigsub.sigsub_] = 1

fig, ax = plt.subplots(ncols=2, figsize=(10, 5), constrained_layout=True)
heatmap(expected, ax=ax[0], cbar=False, title="Expected Signal-Subgraph")
_ = heatmap(estimatesigsub, ax=ax[1], cbar=False, title="Estimated Signal-Subgraph")
../../_images/tutorials_subgraph_subgraph_8_0.png

Notice how the coherent estimator constrains the estimated signal-subgraph to 20 edges that are incident to 1 vertex with the best total significance values. Now, we will try an expected signal-subgraph that is also limited to one vertex.

[4]:
mverts = 1

np.random.seed(7777)
classlabels = np.zeros(nsamples, dtype=int)
classlabels[1::2] = 1

m = np.random.choice(verts, mverts)
vect = p * np.ones(2 * verts * mverts - (mverts ** 2))
vect[np.random.choice(len(vect), sedges, replace=False)] = q

blank = p * np.ones((verts, verts))
blank[m, :] = np.nan
blank[:, m] = np.nan
blank[np.isnan(blank)] = vect
expected = np.where(blank == q, 1, 0)

blank = blank[:, :, None] + np.zeros(int(nsamples / 2))
A = p * np.ones((verts, verts, nsamples))
A[:, :, 1::2] = blank
A = np.random.binomial(1, A)

sigsub = sg.SignalSubgraph()
sigsub.fit_transform(graphs=A, labels=classlabels, constraints=sedges)

estimatesigsub = np.zeros((verts, verts))
estimatesigsub[sigsub.sigsub_] = 1

fig, ax = plt.subplots(ncols=2, figsize=(10, 5), constrained_layout=True)
heatmap(expected, ax=ax[0], cbar=False, title="Expected Signal-Subgraph")
_ = heatmap(estimatesigsub, ax=ax[1], cbar=False, title="Estimated Signal-Subgraph")
../../_images/tutorials_subgraph_subgraph_10_0.png

Now that the expected signal-subgraph is constrained to the coherent signal-subgraph model, the Expected and Estimated signal-subgraphs are exactly equal.