{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import graspologic\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Padded Graph Matching\n", "Consider the scenario where one would like to match graphs $A$ and $B$ with $n_1$ and $n_2$ nodes, respectively, where \n", "$n_1 < n_2$. The most straightforward fashion to 'pad' $A$, such that $A$ and $B$ have the same shape, is to add $n_2 - n_1$ isolated nodes to $A$ (represented as empty row/columns in the adjacency matrix). This padding scheme is known as $\\textit{naive padding}$, substituting $A \\oplus 0_{(n_2-n_1)x(n_2-n_1)}$ and $B$ in place of $A$ and $B$, respectively.\n", "\n", "The effect of this is that one matches $A$ to the best subgraph of $B$. That is, the isolated vertices added to $A$ through padding have an affinity to the low-density subgraphs of $B$, in effect giving the isolates a false signal. \n", "\n", "Instead, we may desire to match $A$ to the best fitting induced subgraph of $B$. This padding scheme is known as $\\textit{adopted padding}$, and is achieved by substituting $\\tilde{A} \\oplus 0_{(n_2-n_1)x(n_2-n_1)}$ and $\\tilde{B}$ in place of $A$ and $B$, respectively, where $\\tilde{A} = 2A - 1_{n_1}1_{n_1}^T$ and $\\tilde{B} = 2B - 1_{n_2}1_{n_2}^T$.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To demonstrate the difference between the two padding schemes, we sample two graph's $G_1'$ and $G_2$, each having 400 vertices, from a $0.5 \\sim SBM(4,b,\\Lambda)$, where b assigns 100 vertices to each of the k = 4 blocks, and\n", "\n", "\\begin{align*}\n", "\\Lambda &= \\begin{bmatrix} \n", "0.9 & 0.4 & 0.3 & 0.2\\\\\n", "0.4 & 0.9 & 0.4 & 0.3\\\\\n", "0.3 & 0.4 & 0.9 & 0.4\\\\\n", "0.2 & 0.3 & 0.4 & 0.7\n", "\\end{bmatrix}\\\\\n", "\\end{align*}\n", "\n", "We realize $G_1$ from $G_1'$ by removing 25 nodes from each block of $G_1'$, yielding a 300 node graph (example adapted from section 2.5 of [1]).\n", "\n", "The goal of the matching in this case is to recover $G_1$ by matching the right most figure below and $G_2$. That is, we seek to recover the shared community structure common between two graphs of differing shapes.\n", "\n", "[1] \n", "D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski, C. Priebe,\n", " \"Seeded graph matching\", Pattern Recognit. 87 (2019) 203–215" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SBM correlated graph pairs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# simulating G1', G2, deleting 25 vertices\n", "from graspologic.match import GraphMatch as GMP\n", "from graspologic.simulations import sbm_corr\n", "from graspologic.plot import heatmap\n", "np.random.seed(1)\n", "\n", "directed = False\n", "loops = False\n", "block_probs = [[0.9,0.4,0.3,0.2],\n", " [0.4,0.9,0.4,0.3],\n", " [0.3,0.4,0.9,0.4],\n", " [0.2,0.3,0.4,0.7]]\n", "n =100\n", "n_blocks = 4\n", "rho = 0.5\n", "block_members = np.array(n_blocks * [n])\n", "n_verts = block_members.sum()\n", "G1p, G2 = sbm_corr(block_members,block_probs, rho, directed, loops)\n", "G1 = np.zeros((300,300))\n", "c = np.copy(G1p)\n", "\n", "step1 = np.arange(4) * 100 + 75\n", "step2 = np.arange(5) * 75\n", "step3 = np.arange(4) * 100\n", "for i in range(len(step1)):\n", " block1 = np.arange(step1[i], step1[i]+25)\n", " c[block1,:] = -1\n", " c[:, block1] = -1\n", " for j in range(len(step3)):\n", " G1[step2[i]:step2[i+1], step2[j]:step2[j+1]] = G1p[step3[i]: step1[i], step3[j]:step1[j]]\n", " \n", "topleft_G1 = np.zeros((400,400))\n", "topleft_G1[:300,:300] = G1\n", "fig, axs = plt.subplots(1, 4, figsize=(20, 10))\n", "heatmap(G1p, ax=axs[0], cbar=False, title=\"G1'\")\n", "heatmap(G2, ax=axs[1], cbar=False, title=\"G2\")\n", "heatmap(c, ax=axs[2], cbar=False, title=\"G1\")\n", "_ = heatmap(topleft_G1, ax=axs[3], cbar=False, title=\"G1 (to top left corner)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Naive vs Adopted Padding" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(1)\n", "\n", "gmp_naive = GMP(padding='naive')\n", "seed1 = np.random.choice(np.arange(300),8)\n", "seed2 = [int(x/75)*25 + x for x in seed1]\n", "gmp_naive = gmp_naive.fit(G2, G1, seed2, seed1)\n", "G1_naive = topleft_G1[gmp_naive.perm_inds_][:, gmp_naive.perm_inds_]\n", "\n", "gmp_adopted = GMP(padding='adopted')\n", "gmp_adopted = gmp_adopted.fit(G2, G1, seed2, seed1)\n", "G1_adopted = topleft_G1[gmp_adopted.perm_inds_][:, gmp_adopted.perm_inds_]\n", "\n", "fig, axs = plt.subplots(1, 2, figsize=(14, 7))\n", "heatmap(G1_naive, ax=axs[0], cbar=False, title=\"Naive Padding\")\n", "heatmap(G1_adopted, ax=axs[1], cbar=False, title=\"Adopted Padding\")\n", "\n", "naive_matching = np.concatenate([gmp_naive.perm_inds_[x * 100 : (x * 100) + 75] for x in range(n_blocks)])\n", "adopted_matching = np.concatenate([gmp_adopted.perm_inds_[x * 100 : (x * 100) + 75] for x in range(n_blocks)])\n", "\n", "print(f'Match ratio of nodes remaining in G1, with naive padding: {sum(naive_matching == np.arange(300))/300}')\n", "print(f'Match ratio of nodes remaining in G1, with adopted padding: {sum(adopted_matching == np.arange(300))/300}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe that the two padding schemes perform as expected. The naive scheme permutes $G_1$ such that it matches a subgraph of $G_2$, specifically the subgraph of the first three blocks. Additionally, (almost) all isolated vertices of $G_1$ are permuted to the fourth block of $G_2$.\n", "\n", "On the other hand, we see that adopted padding preserves the common block structure between $G_1$ and $G_2$." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 4 }