{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Correlated Random Dot Product Graph (RDPG) Graph Pair" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from graspologic.simulations.rdpg_corr import rdpg_corr\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "%matplotlib inline\n", "sns.set_context('talk')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "RDPG is a latent position generative model. An explanation of the uncorrelated model is in the [tutorial](https://graspy.neurodata.io/tutorials/simulations/rdpg.html).\n", "\n", "Here, we want to generate a pair of graphs with the same latent positions but with correlation between edges. \n", "\n", "There are several parameters in this function: $X$ and $Y$ are the input matrices (latent positions) which are used to generate the probability matrix; $r$ is the correlation between the graph pair, which should be (-1,1) (note that not all values of r may be possible for a given set of latent positions).\n", "\n", "Below, we sample a RDPG graph pair (undirected and no self-loops), G1 and G2, with the following parameters:\n", "\\begin{align*}\n", "n &= [50, 50]\\\\\n", "r &= 0.5\n", "\\end{align*}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(1234)\n", "X = np.array([[0.5, 0.2, 0.2]] * 50 + [[0.1, 0.1, 0.1]] * 50)\n", "Y = None\n", "r = 0.3\n", "\n", "G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X @ X.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualize the graphs using heatmap\n", "Here, we define *difference rate* to be the number of edges between the two graphs which are not the same (exist or not exist) out of all potential edges (roughly $n^2$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from graspologic.plot import heatmap\n", "\n", "fig, axs = plt.subplots(1, 3, figsize=(10, 5))\n", "heatmap(G1, ax=axs[0], cbar=False, title = 'Corr. RDPG 1')\n", "heatmap(G2, ax=axs[1], cbar=False, title = 'Corr. RDPG 2')\n", "heatmap(G1-G2, ax=axs[2], cbar=False, title='diff(G1-G2)')\n", "ndim=G1.shape[0]\n", "print(\"Difference rate is \", np.sum(abs(G1-G2))/(ndim*ndim))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare it to the correlated SBM graph pair" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below, we sample a two-block SBM graph pair (undirected and no self-loops) G1 and G2 with the following parameters:\n", "\n", "\\begin{align*}\n", "n &= [50, 50]\\\\\n", "p &= \\begin{bmatrix} \n", "0.33 & 0.09\\\\\n", "0.09 & 0.03\n", "\\end{bmatrix}\\\\\n", "r &= 0.5\n", "\\end{align*}\n", "\n", "This happens to be the SBM formulation of the same model framed as an RDPG above. Let's see the difference between the correlated RDPG and correlated SBM graph pairs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from graspologic.simulations import sbm_corr\n", "\n", "np.random.seed(123)\n", "directed = False\n", "loops = False\n", "n_per_block = 50\n", "n_blocks = 2\n", "block_members = np.array(n_blocks * [n_per_block])\n", "n_verts = block_members.sum()\n", "rho = .3\n", "block_probs = np.array([[0.33, 0.09], [0.09, 0.03]])\n", "\n", "A1, A2 = sbm_corr(block_members, block_probs, rho, directed=directed, loops=loops)\n", "fig, axs = plt.subplots(1, 3, figsize=(10, 5))\n", "heatmap(A1, ax=axs[0], cbar=False, title=\"Corr. SBM 1\")\n", "heatmap(A2, ax=axs[1], cbar=False, title=\"Corr. SBM 2\")\n", "heatmap(A1 - A2, ax=axs[2], cbar=False, title=\"Diff (G1 - G2)\")\n", "\n", "ndim=G1.shape[0]\n", "print(\"Difference rate with sbm_corr function is \", np.sum(abs(A1-A2))/(ndim*ndim))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see the difference between G1 and G2 with both functions are similar." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Varying the correlation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We change the correlation between the graph pairs from -0.5 to 0.9 and see the difference between graph 1 and graph 2:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "X = np.random.dirichlet([10, 10], size=100)\n", "Y = None\n", "\n", "np.random.seed(12345)\n", "r = -0.5\n", "\n", "G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)\n", "\n", "\n", "fig, axs = plt.subplots(1, 3, figsize=(10, 5))\n", "heatmap(G1, ax=axs[0], cbar=False, title=\"Corr. RDPG 1\")\n", "heatmap(G2, ax=axs[1], cbar=False, title=\"Corr. RDPG 2\")\n", "heatmap(G1 - G2, ax=axs[2], cbar=False, title=\"Diff (G1 - G2)\")\n", "ndim=G1.shape[0]\n", "print(\"Difference rate when correlation = -0.5 is \", np.sum(abs(G1-G2))/(ndim*ndim))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "np.random.seed(12345)\n", "r = 0.3\n", "\n", "G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)\n", "\n", "\n", "fig, axs = plt.subplots(1, 3, figsize=(10, 5))\n", "heatmap(G1, ax=axs[0], cbar=False, title=\"Corr. RDPG 1\")\n", "heatmap(G2, ax=axs[1], cbar=False, title=\"Corr. RDPG 2\")\n", "heatmap(G1 - G2, ax=axs[2], cbar=False, title=\"Diff (G1 - G2)\")\n", "ndim=G1.shape[0]\n", "print(\"Difference rate when correlation =0.3 is \", np.sum(abs(G1-G2))/(ndim*ndim))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "np.random.seed(12345)\n", "r = 0.9\n", "\n", "G1, G2 = rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)\n", "\n", "\n", "fig, axs = plt.subplots(1, 3, figsize=(10, 5))\n", "heatmap(G1, ax=axs[0], cbar=False, title=\"Corr. RDPG 1\")\n", "heatmap(G2, ax=axs[1], cbar=False, title=\"Corr. RDPG 2\")\n", "heatmap(G1 - G2, ax=axs[2], cbar=False, title=\"Diff (G1 - G2)\")\n", "ndim=G1.shape[0]\n", "print(\"Difference rate when correlation =0.9 is \", np.sum(abs(G1-G2))/(ndim*ndim))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We calculate the difference rate between graph 1 and graph 2 with different correlation ranging from -0.5 to 0.9, and show them in a scatter plot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(12345)\n", "X = np.random.dirichlet([10, 10], size=100)\n", "Y = None\n", "rlist=[]\n", "for i in range(-5,10):\n", " g1,g2 = rdpg_corr(X, Y, i/10, rescale=False, directed=False, loops=False)\n", " ndim=g1.shape[0]\n", " rate=np.sum(abs(g1-g2))/(ndim*ndim)\n", " rlist.append(rate)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x_list = np.linspace(-0.5,0.9,15)\n", "plt.plot(x_list,rlist,'o-')\n", "plt.xlabel(\"Correlation\")\n", "_ = plt.ylabel('Difference rate')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the difference rate goes down as the correlation grows." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 4 }