# Correlated Random Dot Product Graph (RDPG) Graph Pair¶

:

from graspologic.simulations.rdpg_corr import rdpg_corr
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_context('talk')


RDPG is a latent position generative model. An explanation of the uncorrelated model is in the tutorial.

Here, we want to generate a pair of graphs with the same latent positions but with correlation between edges.

There are several parameters in this function: $$X$$ and $$Y$$ are the input matrices (latent positions) which are used to generate the probability matrix; $$r$$ is the correlation between the graph pair, which should be (-1,1) (note that not all values of r may be possible for a given set of latent positions).

Below, we sample a RDPG graph pair (undirected and no self-loops), G1 and G2, with the following parameters: \begin{align*} n &= [50, 50]\\ r &= 0.5 \end{align*}

:

np.random.seed(1234)
X = np.array([[0.5, 0.2, 0.2]] * 50 + [[0.1, 0.1, 0.1]] * 50)
Y = None
r = 0.3

G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)

:

X @ X.T

:

array([[0.33, 0.33, 0.33, ..., 0.09, 0.09, 0.09],
[0.33, 0.33, 0.33, ..., 0.09, 0.09, 0.09],
[0.33, 0.33, 0.33, ..., 0.09, 0.09, 0.09],
...,
[0.09, 0.09, 0.09, ..., 0.03, 0.03, 0.03],
[0.09, 0.09, 0.09, ..., 0.03, 0.03, 0.03],
[0.09, 0.09, 0.09, ..., 0.03, 0.03, 0.03]])


## Visualize the graphs using heatmap¶

Here, we define difference rate to be the number of edges between the two graphs which are not the same (exist or not exist) out of all potential edges (roughly $$n^2$$)

:

from graspologic.plot import heatmap

fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(G1, ax=axs, cbar=False, title = 'Corr. RDPG 1')
heatmap(G2, ax=axs, cbar=False, title = 'Corr. RDPG 2')
heatmap(G1-G2, ax=axs, cbar=False, title='diff(G1-G2)')
ndim=G1.shape
print("Difference rate is ", np.sum(abs(G1-G2))/(ndim*ndim))

Difference rate is  0.1438 ## Compare it to the correlated SBM graph pair¶

Below, we sample a two-block SBM graph pair (undirected and no self-loops) G1 and G2 with the following parameters:

\begin{align*} n &= [50, 50]\\ p &= \begin{bmatrix} 0.33 & 0.09\\ 0.09 & 0.03 \end{bmatrix}\\ r &= 0.5 \end{align*}

This happens to be the SBM formulation of the same model framed as an RDPG above. Let’s see the difference between the correlated RDPG and correlated SBM graph pairs.

:

from graspologic.simulations import sbm_corr

np.random.seed(123)
directed = False
loops = False
n_per_block = 50
n_blocks = 2
block_members = np.array(n_blocks * [n_per_block])
n_verts = block_members.sum()
rho = .3
block_probs = np.array([[0.33, 0.09], [0.09, 0.03]])

A1, A2 = sbm_corr(block_members, block_probs, rho, directed=directed, loops=loops)
fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(A1, ax=axs, cbar=False, title="Corr. SBM 1")
heatmap(A2, ax=axs, cbar=False, title="Corr. SBM 2")
heatmap(A1 - A2, ax=axs, cbar=False, title="Diff (G1 - G2)")

ndim=G1.shape
print("Difference rate with sbm_corr function is ", np.sum(abs(A1-A2))/(ndim*ndim))

Difference rate with sbm_corr function is  0.145 We can see the difference between G1 and G2 with both functions are similar.

## Varying the correlation¶

We change the correlation between the graph pairs from -0.5 to 0.9 and see the difference between graph 1 and graph 2:

:

X = np.random.dirichlet([10, 10], size=100)
Y = None

np.random.seed(12345)
r = -0.5

G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)

fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(G1, ax=axs, cbar=False, title="Corr. RDPG 1")
heatmap(G2, ax=axs, cbar=False, title="Corr. RDPG 2")
heatmap(G1 - G2, ax=axs, cbar=False, title="Diff (G1 - G2)")
ndim=G1.shape
print("Difference rate when correlation = -0.5 is ", np.sum(abs(G1-G2))/(ndim*ndim))

Difference rate when correlation = -0.5 is  0.7332 :

np.random.seed(12345)
r = 0.3

G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)

fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(G1, ax=axs, cbar=False, title="Corr. RDPG 1")
heatmap(G2, ax=axs, cbar=False, title="Corr. RDPG 2")
heatmap(G1 - G2, ax=axs, cbar=False, title="Diff (G1 - G2)")
ndim=G1.shape
print("Difference rate when correlation =0.3 is ", np.sum(abs(G1-G2))/(ndim*ndim))

Difference rate when correlation =0.3 is  0.3544 :

np.random.seed(12345)
r = 0.9

G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)

fig, axs = plt.subplots(1, 3, figsize=(10, 5))
heatmap(G1, ax=axs, cbar=False, title="Corr. RDPG 1")
heatmap(G2, ax=axs, cbar=False, title="Corr. RDPG 2")
heatmap(G1 - G2, ax=axs, cbar=False, title="Diff (G1 - G2)")
ndim=G1.shape
print("Difference rate when correlation =0.9 is ", np.sum(abs(G1-G2))/(ndim*ndim))

Difference rate when correlation =0.9 is  0.0508 We calculate the difference rate between graph 1 and graph 2 with different correlation ranging from -0.5 to 0.9, and show them in a scatter plot:

:

np.random.seed(12345)
X = np.random.dirichlet([10, 10], size=100)
Y = None
rlist=[]
for i in range(-5,10):
g1,g2 = rdpg_corr(X, Y, i/10, rescale=False, directed=False, loops=False)
ndim=g1.shape
rate=np.sum(abs(g1-g2))/(ndim*ndim)
rlist.append(rate)

:

x_list = np.linspace(-0.5,0.9,15)
plt.plot(x_list,rlist,'o-')
plt.xlabel("Correlation")
_ = plt.ylabel('Difference rate') We can see that the difference rate goes down as the correlation grows.