{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Random Graph Models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### This tutorial will introduce the following random graph models: \n",
    "- Erdos-Reyni (ER)\n",
    "- Degree-corrected Erdos-Reyni (DCER)\n",
    "- Stochastic block model (SBM)\n",
    "- Degree-corrected stochastic block model (DCSBM)\n",
    "- Random dot product graph (RDPG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load some data from _graspologic_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example we will use the _Drosophila melanogaster_ larva right mushroom body connectome from [Eichler et al. 2017](https://www.ncbi.nlm.nih.gov/pubmed/28796202). Here we will consider a binarized and directed version of the graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np \n",
    "from graspologic.datasets import load_drosophila_right\n",
    "from graspologic.plot import heatmap\n",
    "from graspologic.utils import binarize, symmetrize\n",
    "%matplotlib inline\n",
    "\n",
    "adj, labels = load_drosophila_right(return_labels=True)\n",
    "adj = binarize(adj)\n",
    "_ = heatmap(adj,\n",
    "        inner_hier_labels=labels,\n",
    "        title='Drosophila right MB',\n",
    "        font_scale=1.5,\n",
    "        sort_nodes=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preliminaries\n",
    "\n",
    "$n$ - the number of nodes in the graph\n",
    "\n",
    "$A$ - $n \\times n$ adjacency matrix\n",
    "\n",
    "$P$ - $n \\times n$ matrix of probabilities\n",
    "\n",
    "\n",
    "For the class of models we will consider here, a graph (adjacency matrix) $A$ is sampled as follows:\n",
    "\n",
    "$$A \\sim Bernoulli(P)$$\n",
    "\n",
    "While each model we will discuss follows this formulation, they differ in how the matrix $P$ is constructed. So, for each model, we will consider how to model $P_{ij}$, or the probability of connection between any node $i$ and $j$, with $i \\neq j$ in this case (i.e. no \"loops\" are allowed for the sake of this tutorial).\n",
    "\n",
    "\n",
    "------\n",
    "\n",
    "\n",
    "### For each graph model we will show: \n",
    "- how the model is formulated \n",
    "- how to fit the model using graspologic\n",
    "- the P matrix that was fit by the model \n",
    "- a single sample from the fit model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Erdos-Reyni (ER)\n",
    "The Erdos-Reyni (ER) model is the simplest random graph model. We are interested in modeling the probability of an edge existing between any two nodes, $i$ and $j$. We denote this probability $P_{ij}$. For the ER model:\n",
    "\n",
    "$$P_{ij} = p$$\n",
    "\n",
    "for any combination of $i$ and $j$.  \n",
    "This means that the one parameter $p$ is the overall probability of connection for any two nodes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from graspologic.models import EREstimator\n",
    "er = EREstimator(directed=True,loops=False)\n",
    "er.fit(adj)\n",
    "print(f\"ER \\\"p\\\" parameter: {er.p_}\")\n",
    "heatmap(er.p_mat_,\n",
    "        inner_hier_labels=labels,\n",
    "        font_scale=1.5,\n",
    "        title=\"ER probability matrix\",\n",
    "        vmin=0, vmax=1, \n",
    "        sort_nodes=True)\n",
    "_ = heatmap(er.sample()[0],\n",
    "        inner_hier_labels=labels,\n",
    "        font_scale=1.5,\n",
    "        title=\"ER sample\",\n",
    "        sort_nodes=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Degree-corrected Erdos-Reyni (DCER)\n",
    "\n",
    "A slightly more complicated variant of the ER model is the degree-corrected Erdos-Reyni model (DCER). Here, there is still a global parameter $p$ to specify relative connection probability between all edges. However, we add a promiscuity parameter $\\theta_i$ for each node $i$ which specifies its expected degree relative to other nodes:\n",
    "\n",
    "$$P_{ij} = \\theta_i \\theta_j p$$\n",
    "\n",
    "so the probability of an edge from $i$ to $j$ is a function of the two nodes' degree-correction parameters, and the overall probability of an edge in the graph. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from graspologic.models import DCEREstimator\n",
    "dcer = DCEREstimator(directed=True,loops=False)\n",
    "dcer.fit(adj)\n",
    "print(f\"DCER \\\"p\\\" parameter: {dcer.p_}\")\n",
    "heatmap(dcer.p_mat_,\n",
    "        inner_hier_labels=labels,\n",
    "        vmin=0,\n",
    "        vmax=1,\n",
    "        font_scale=1.5,\n",
    "        title=\"DCER probability matrix\", \n",
    "        sort_nodes=True);\n",
    "_ = heatmap(dcer.sample()[0],\n",
    "        inner_hier_labels=labels,\n",
    "        font_scale=1.5,\n",
    "        title=\"DCER sample\", \n",
    "        sort_nodes=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Stochastic block model (SBM)\n",
    "Under the stochastic block model (SBM), each node is modeled as belonging to a block (sometimes called a community or group). The probability of node $i$ connecting to node $j$ is simply a function of the block membership of the two nodes. Let $n$ be the number of nodes in the graph, then $\\tau$ is a length $n$ vector which indicates the block membership of each node in the graph. Let $K$ be the number of blocks, then $B$ is a $K \\times K$ matrix of block-block connection probabilities. \n",
    "\n",
    "$$P_{ij} = B_{\\tau_i \\tau_j}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from graspologic.models import SBMEstimator\n",
    "sbme = SBMEstimator(directed=True,loops=False)\n",
    "sbme.fit(adj, y=labels)\n",
    "print(\"SBM \\\"B\\\" matrix:\")\n",
    "print(sbme.block_p_)\n",
    "heatmap(sbme.p_mat_,\n",
    "        inner_hier_labels=labels,\n",
    "        vmin=0,\n",
    "        vmax=1,\n",
    "        font_scale=1.5,\n",
    "        title=\"SBM probability matrix\", \n",
    "        sort_nodes=True)\n",
    "_ = heatmap(sbme.sample()[0],\n",
    "        inner_hier_labels=labels,\n",
    "        font_scale=1.5,\n",
    "        title=\"SBM sample\", \n",
    "        sort_nodes=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Degree-corrected stochastic block model (DCSBM)\n",
    "Just as we could add a degree-correction term to the ER model, so too can we modify the stochastic block model to allow for heterogeneous expected degrees. Again, we let $\\theta$ be a length $n$ vector of degree correction parameters, and all other parameters remain as they were defined above for the SBM: \n",
    "\n",
    "$$P_{ij} = \\theta_i \\theta_j B_{\\tau_i, \\tau_j}$$\n",
    "\n",
    "Note that the matrix $B$ may no longer represent true probabilities, because the addition of the $\\theta$ vectors introduces a multiplicative constant that can be absorbed into the elements of $\\theta$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from graspologic.models import DCSBMEstimator\n",
    "dcsbme = DCSBMEstimator(directed=True,loops=False)\n",
    "dcsbme.fit(adj, y=labels)\n",
    "print(\"DCSBM \\\"B\\\" matrix:\")\n",
    "print(dcsbme.block_p_)\n",
    "heatmap(dcsbme.p_mat_,\n",
    "        inner_hier_labels=labels,\n",
    "        font_scale=1.5, \n",
    "        title=\"DCSBM probability matrix\",\n",
    "        vmin=0,\n",
    "        vmax=1,\n",
    "        sort_nodes=True)\n",
    "_ = heatmap(dcsbme.sample()[0],\n",
    "        inner_hier_labels=labels,\n",
    "        title=\"DCSBM sample\",\n",
    "        font_scale=1.5,\n",
    "        sort_nodes=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random dot product graph (RDPG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Under the random dot product graph model, each node is assumed to have a \"latent position\" in some $d$-dimensional Euclidian space. This vector dictates that node's probability of connection to other nodes. For a given pair of nodes $i$ and $j$, the probability of connection is the dot product between their latent positions. If $x_i$ and $x_j$ are the latent positions of nodes $i$ and $j$, respectively:\n",
    "\n",
    "$$P_{ij} = \\langle x_i, x_j \\rangle$$\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from graspologic.models import RDPGEstimator\n",
    "rdpge = RDPGEstimator(loops=False)\n",
    "rdpge.fit(adj, y=labels)\n",
    "heatmap(rdpge.p_mat_,\n",
    "        inner_hier_labels=labels,\n",
    "        vmin=0,\n",
    "        vmax=1,\n",
    "        font_scale=1.5,\n",
    "        title=\"RDPG probability matrix\",\n",
    "        sort_nodes=True\n",
    "       )\n",
    "_ = heatmap(rdpge.sample()[0],\n",
    "        inner_hier_labels=labels,\n",
    "        font_scale=1.5,\n",
    "        title=\"RDPG sample\",\n",
    "        sort_nodes=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a figure combining all of the models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from matplotlib import cm\n",
    "\n",
    "fig, axs = plt.subplots(2, 3, figsize=(20, 16))\n",
    "\n",
    "# colormapping\n",
    "cmap = cm.get_cmap(\"RdBu_r\")\n",
    "\n",
    "center = 0\n",
    "vmin = 0\n",
    "vmax = 1\n",
    "norm = mpl.colors.Normalize(0, 1)\n",
    "cc = np.linspace(0.5, 1, 256) \n",
    "cmap = mpl.colors.ListedColormap(cmap(cc))\n",
    "\n",
    "# heatmapping\n",
    "heatmap_kws = dict(\n",
    "    inner_hier_labels=labels,\n",
    "    vmin=0,\n",
    "    vmax=1,\n",
    "    cbar=False,\n",
    "    cmap=cmap,\n",
    "    center=None,\n",
    "    hier_label_fontsize=20,\n",
    "    title_pad=45,\n",
    "    font_scale=1.6,\n",
    "    sort_nodes=True\n",
    ")\n",
    "\n",
    "models = [rdpge, dcsbme, dcer, sbme, er]\n",
    "model_names = [\"RDPG\", \"DCSBM\",\"DCER\", \"SBM\", \"ER\"]\n",
    "\n",
    "heatmap(adj, ax=axs[0][0], title=\"IER\", **heatmap_kws)\n",
    "heatmap(models[0].p_mat_, ax=axs[0][1], title=model_names[0], **heatmap_kws)\n",
    "heatmap(models[1].p_mat_, ax=axs[0][2], title=model_names[1], **heatmap_kws)\n",
    "heatmap(models[2].p_mat_, ax=axs[1][0], title=model_names[2], **heatmap_kws)\n",
    "heatmap(models[3].p_mat_, ax=axs[1][1], title=model_names[3], **heatmap_kws)\n",
    "heatmap(models[4].p_mat_, ax=axs[1][2], title=model_names[4], **heatmap_kws)\n",
    "\n",
    "\n",
    "# add colorbar\n",
    "sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)\n",
    "sm.set_array(dcsbme.p_mat_)\n",
    "_ = fig.colorbar(sm,\n",
    "             ax=axs,\n",
    "             orientation=\"horizontal\",\n",
    "             pad=0.04,\n",
    "             shrink=0.8,\n",
    "             fraction=0.08,\n",
    "             drawedges=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}