Simulations

graspologic.simulations.er_np(n, p, directed=False, loops=False, wt=1, wtargs=None, dc=None, dc_kws={})[source]

Samples a Erdos Renyi (n, p) graph with specified edge probability.

Erdos Renyi (n, p) graph is a simple graph with n vertices and a probability p of edges being connected.

Read more in the Erdos-Renyi (ER) Model Tutorial

Parameters:
n: int

Number of vertices

p: float

Probability of an edge existing between two vertices, between 0 and 1.

directed: boolean, optional (default=False)

If False, output adjacency matrix will be symmetric. Otherwise, output adjacency matrix will be asymmetric.

loops: boolean, optional (default=False)

If False, no edges will be sampled in the diagonal. Otherwise, edges are sampled in the diagonal.

wt: object, optional (default=1)

Weight function for each of the edges, taking only a size argument. This weight function will be randomly assigned for selected edges. If 1, graph produced is binary.

wtargs: dictionary, optional (default=None)

Optional arguments for parameters that can be passed to weight function wt.

dc: function or array-like, shape (n_vertices)

dc is used to generate a degree-corrected Erdos Renyi Model in which each node in the graph has a parameter to specify its expected degree relative to other nodes.

  • function:

    should generate a non-negative number to be used as a degree correction to create a heterogenous degree distribution. A weight will be generated for each vertex, normalized so that the sum of weights is 1.

  • array-like of scalars, shape (n_vertices):

    The weights should sum to 1; otherwise, they will be normalized and a warning will be thrown. The scalar associated with each vertex is the node's relative expected degree.

dc_kws: dictionary

Ignored if dc is none or array of scalar. If dc is a function, dc_kws corresponds to its named arguments. If not specified, in either case all functions will assume their default parameters.

Returns:
Andarray, shape (n, n)

Sampled adjacency matrix

Parameters:
Return type:

ndarray

Examples

>>> np.random.seed(1)
>>> n = 4
>>> p = 0.25

To sample a binary Erdos Renyi (n, p) graph:

>>> er_np(n, p)
array([[0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [1., 1., 0., 0.],
       [0., 0., 0., 0.]])

To sample a weighted Erdos Renyi (n, p) graph with Uniform(0, 1) distribution:

>>> wt = np.random.uniform
>>> wtargs = dict(low=0, high=1)
>>> er_np(n, p, wt=wt, wtargs=wtargs)
array([[0.        , 0.        , 0.95788953, 0.53316528],
       [0.        , 0.        , 0.        , 0.        ],
       [0.95788953, 0.        , 0.        , 0.31551563],
       [0.53316528, 0.        , 0.31551563, 0.        ]])
graspologic.simulations.er_nm(n, m, directed=False, loops=False, wt=1, wtargs=None)[source]

Samples an Erdos Renyi (n, m) graph with specified number of edges.

Erdos Renyi (n, m) graph is a simple graph with n vertices and exactly m number of total edges.

Read more in the Erdos-Renyi (ER) Model Tutorial

Parameters:
n: int

Number of vertices

m: int

Number of edges, a value between 1 and \(n^2\).

directed: boolean, optional (default=False)

If False, output adjacency matrix will be symmetric. Otherwise, output adjacency matrix will be asymmetric.

loops: boolean, optional (default=False)

If False, no edges will be sampled in the diagonal. Otherwise, edges are sampled in the diagonal.

wt: object, optional (default=1)

Weight function for each of the edges, taking only a size argument. This weight function will be randomly assigned for selected edges. If 1, graph produced is binary.

wtargs: dictionary, optional (default=None)

Optional arguments for parameters that can be passed to weight function wt.

Returns:
A: ndarray, shape (n, n)

Sampled adjacency matrix

Parameters:
Return type:

ndarray

Examples

>>> np.random.seed(1)
>>> n = 4
>>> m = 4

To sample a binary Erdos Renyi (n, m) graph:

>>> er_nm(n, m)
array([[0., 1., 1., 1.],
       [1., 0., 0., 1.],
       [1., 0., 0., 0.],
       [1., 1., 0., 0.]])

To sample a weighted Erdos Renyi (n, m) graph with Uniform(0, 1) distribution:

>>> wt = np.random.uniform
>>> wtargs = dict(low=0, high=1)
>>> er_nm(n, m, wt=wt, wtargs=wtargs)
array([[0.        , 0.66974604, 0.        , 0.38791074],
       [0.66974604, 0.        , 0.        , 0.39658073],
       [0.        , 0.        , 0.        , 0.93553907],
       [0.38791074, 0.39658073, 0.93553907, 0.        ]])
graspologic.simulations.sbm(n, p, directed=False, loops=False, wt=1, wtargs=None, dc=None, dc_kws={}, return_labels=False)[source]

Samples a graph from the stochastic block model (SBM).

SBM produces a graph with specified communities, in which each community can have different sizes and edge probabilities.

Read more in the Stochastic Block Model (SBM) Tutorial

Parameters:
n: list of int, shape (n_communities)

Number of vertices in each community. Communities are assigned n[0], n[1], ...

p: array-like, shape (n_communities, n_communities)

Probability of an edge between each of the communities, where p[i, j] indicates the probability of a connection between edges in communities [i, j]. 0 < p[i, j] < 1 for all i, j.

directed: boolean, optional (default=False)

If False, output adjacency matrix will be symmetric. Otherwise, output adjacency matrix will be asymmetric.

loops: boolean, optional (default=False)

If False, no edges will be sampled in the diagonal. Otherwise, edges are sampled in the diagonal.

wt: object or array-like, shape (n_communities, n_communities)

if wt is an object, a weight function to use globally over the sbm for assigning weights. 1 indicates to produce a binary graph. If wt is an array-like, a weight function for each of the edge communities. wt[i, j] corresponds to the weight function between communities i and j. If the entry is a function, should accept an argument for size. An entry of wt[i, j] = 1 will produce a binary subgraph over the i, j community.

wtargs: dictionary or array-like, shape (n_communities, n_communities)

if wt is an object, wtargs corresponds to the trailing arguments to pass to the weight function. If Wt is an array-like, wtargs[i, j] corresponds to trailing arguments to pass to wt[i, j].

dc: function or array-like, shape (n_vertices) or (n_communities), optional

dc is used to generate a degree-corrected stochastic block model [1] in which each node in the graph has a parameter to specify its expected degree relative to other nodes within its community.

  • function:

    should generate a non-negative number to be used as a degree correction to create a heterogenous degree distribution. A weight will be generated for each vertex, normalized so that the sum of weights in each block is 1.

  • array-like of functions, shape (n_communities):

    Each function will generate the degree distribution for its respective community.

  • array-like of scalars, shape (n_vertices):

    The weights in each block should sum to 1; otherwise, they will be normalized and a warning will be thrown. The scalar associated with each vertex is the node's relative expected degree within its community.

dc_kws: dictionary or array-like, shape (n_communities), optional

Ignored if dc is none or array of scalar. If dc is a function, dc_kws corresponds to its named arguments. If dc is an array-like of functions, dc_kws should be an array-like, shape (n_communities), of dictionary. Each dictionary is the named arguments for the corresponding function for that community. If not specified, in either case all functions will assume their default parameters.

return_labels: boolean, optional (default=False)

If False, only output is adjacency matrix. Otherwise, an additional output will be an array with length equal to the number of vertices in the graph, where each entry in the array labels which block a vertex in the graph is in.

Returns:
A: ndarray, shape (sum(n), sum(n))

Sampled adjacency matrix

labels: ndarray, shape (sum(n))

Label vector

Parameters:
Return type:

ndarray | Tuple[ndarray, ndarray]

References

[1]

Tai Qin and Karl Rohe. "Regularized spectral clustering under the Degree-Corrected Stochastic Blockmodel," Advances in Neural Information Processing Systems 26, 2013

Examples

>>> np.random.seed(1)
>>> n = [3, 3]
>>> p = [[0.5, 0.1], [0.1, 0.5]]

To sample a binary 2-block SBM graph:

>>> sbm(n, p)
array([[0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 1.],
       [1., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0., 0.]])

To sample a weighted 2-block SBM graph with Poisson(2) distribution:

>>> wt = np.random.poisson
>>> wtargs = dict(lam=2)
>>> sbm(n, p, wt=wt, wtargs=wtargs)
array([[0., 4., 0., 1., 0., 0.],
       [4., 0., 0., 0., 0., 2.],
       [0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 2., 0., 0., 0., 0.]])
graspologic.simulations.rdpg(X, Y=None, rescale=False, directed=False, loops=False, wt=1, wtargs=None)[source]

Samples a random graph based on the latent positions in X (and optionally in Y)

If only X \(\in\mathbb{R}^{n\times d}\) is given, the P matrix is calculated as \(P = XX^T\). If X, Y \(\in\mathbb{R}^{n\times d}\) is given, then \(P = XY^T\). These operations correspond to the dot products between a set of latent positions, so each row in X or Y represents the latent positions in \(\mathbb{R}^{d}\) for a single vertex in the random graph Note that this function may also rescale or clip the resulting P matrix to get probabilities between 0 and 1, or remove loops. A binary random graph is then sampled from the P matrix described by X (and possibly Y).

Read more in the Random Dot Product Graph (RDPG) Model Tutorial

Parameters:
X: np.ndarray, shape (n_vertices, n_dimensions)

latent position from which to generate a P matrix if Y is given, interpreted as the left latent position

Y: np.ndarray, shape (n_vertices, n_dimensions) or None, optional

right latent position from which to generate a P matrix

rescale: boolean, optional (default=False)

when rescale is True, will subtract the minimum value in P (if it is below 0) and divide by the maximum (if it is above 1) to ensure that P has entries between 0 and 1. If False, elements of P outside of [0, 1] will be clipped

directed: boolean, optional (default=False)

If False, output adjacency matrix will be symmetric. Otherwise, output adjacency matrix will be asymmetric.

loops: boolean, optional (default=False)

If False, no edges will be sampled in the diagonal. Diagonal elements in P matrix are removed prior to rescaling (see above) which may affect behavior. Otherwise, edges are sampled in the diagonal.

wt: object, optional (default=1)

Weight function for each of the edges, taking only a size argument. This weight function will be randomly assigned for selected edges. If 1, graph produced is binary.

wtargs: dictionary, optional (default=None)

Optional arguments for parameters that can be passed to weight function wt.

Returns:
A: ndarray (n_vertices, n_vertices)

A matrix representing the probabilities of connections between vertices in a random graph based on their latent positions

Parameters:
Return type:

ndarray

References

[1]

Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Vol. 107(499), 2012

Examples

>>> np.random.seed(1)

Generate random latent positions using 2-dimensional Dirichlet distribution.

>>> X = np.random.dirichlet([1, 1], size=5)

Sample a binary RDPG using sampled latent positions.

>>> rdpg(X, loops=False)
array([[0., 1., 0., 0., 1.],
       [1., 0., 0., 1., 1.],
       [0., 0., 0., 1., 1.],
       [0., 1., 1., 0., 0.],
       [1., 1., 1., 0., 0.]])

Sample a weighted RDPG with Poisson(2) weight distribution

>>> wt = np.random.poisson
>>> wtargs = dict(lam=2)
>>> rdpg(X, loops=False, wt=wt, wtargs=wtargs)
array([[0., 4., 0., 2., 0.],
       [1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 2.],
       [1., 0., 0., 0., 1.],
       [0., 2., 2., 0., 0.]])
graspologic.simulations.er_corr(n, p, r, directed=False, loops=False)[source]

Generate a pair of correlated graphs with specified edge probability Both G1 and G2 are binary matrices.

Parameters:
n: int

Number of vertices

p: float

Probability of an edge existing between two vertices, between 0 and 1.

r: float

The value of the correlation between the same vertices in two graphs.

directed: boolean, optional (default=False)

If False, output adjacency matrix will be symmetric. Otherwise, output adjacency matrix will be asymmetric.

loops: boolean, optional (default=False)

If False, no edges will be sampled in the diagonal. Otherwise, edges are sampled in the diagonal.

Returns:
G1: ndarray (n_vertices, n_vertices)

Adjacency matrix the same size as P representing a random graph.

G2: ndarray (n_vertices, n_vertices)

Adjacency matrix the same size as P representing a random graph.

Parameters:
Return type:

Tuple[ndarray, ndarray]

Examples

>>> np.random.seed(2)
>>> p = 0.5
>>> r = 0.3
>>> n = 5

To sample a correlated ER graph pair based on n, p and r:

>>> er_corr(n, p, r, directed=False, loops=False)
(array([[0., 0., 1., 0., 0.],
    [0., 0., 0., 1., 0.],
    [1., 0., 0., 1., 1.],
    [0., 1., 1., 0., 1.],
    [0., 0., 1., 1., 0.]]), array([[0., 1., 1., 1., 0.],
    [1., 0., 0., 1., 0.],
    [1., 0., 0., 1., 1.],
    [1., 1., 1., 0., 1.],
    [0., 0., 1., 1., 0.]]))
graspologic.simulations.sbm_corr(n, p, r, directed=False, loops=False)[source]

Generate a pair of correlated graphs with specified edge probability Both G1 and G2 are binary matrices.

Parameters:
n: list of int, shape (n_communities)

Number of vertices in each community. Communities are assigned n[0], n[1], ...

p: array-like, shape (n_communities, n_communities)

Probability of an edge between each of the communities, where p[i, j] indicates the probability of a connection between edges in communities [i, j]. 0 < p[i, j] < 1 for all i, j.

r: float

Probability of the correlation between the same vertices in two graphs.

directed: boolean, optional (default=False)

If False, output adjacency matrix will be symmetric. Otherwise, output adjacency matrix will be asymmetric.

loops: boolean, optional (default=False)

If False, no edges will be sampled in the diagonal. Otherwise, edges are sampled in the diagonal.

Returns:
G1: ndarray (n_vertices, n_vertices)

Adjacency matrix the same size as P representing a random graph.

G2: ndarray (n_vertices, n_vertices)

Adjacency matrix the same size as P representing a random graph.

Parameters:
Return type:

Tuple[ndarray, ndarray]

Examples

>>> np.random.seed(3)
>>> n = [3, 3]
>>> p = [[0.5, 0.1], [0.1, 0.5]]
>>> r = 0.3

To sample a correlated SBM graph pair based on n, p and r:

>>> sbm_corr(n, p, r, directed=False, loops=False)
(array([[0., 1., 0., 0., 0., 0.],
    [1., 0., 0., 0., 0., 0.],
    [0., 0., 0., 0., 0., 0.],
    [0., 0., 0., 0., 0., 1.],
    [0., 0., 0., 0., 0., 0.],
    [0., 0., 0., 1., 0., 0.]]), array([[0., 1., 0., 0., 0., 0.],
    [1., 0., 0., 1., 1., 0.],
    [0., 0., 0., 0., 0., 0.],
    [0., 1., 0., 0., 0., 1.],
    [0., 1., 0., 0., 0., 0.],
    [0., 0., 0., 1., 0., 0.]]))
graspologic.simulations.rdpg_corr(X, Y, r, rescale=False, directed=False, loops=False)[source]

Samples a random graph pair based on the latent positions in X (and optionally in Y) If only X \(\in\mathbb{R}^{n\times d}\) is given, the P matrix is calculated as \(P = XX^T\). If X, Y \(\in\mathbb{R}^{n\times d}\) is given, then \(P = XY^T\). These operations correspond to the dot products between a set of latent positions, so each row in X or Y represents the latent positions in \(\mathbb{R}^{d}\) for a single vertex in the random graph. Note that this function may also rescale or clip the resulting P matrix to get probabilities between 0 and 1, or remove loops. A binary random graph is then sampled from the P matrix described by X (and possibly Y).

Read more in the Correlated Random Dot Product Graph (RDPG) Graph Pair Tutorial

Parameters:
X: np.ndarray, shape (n_vertices, n_dimensions)

latent position from which to generate a P matrix if Y is given, interpreted as the left latent position

Y: np.ndarray, shape (n_vertices, n_dimensions) or None, optional

right latent position from which to generate a P matrix

r: float

The value of the correlation between the same vertices in two graphs.

rescale: boolean, optional (default=True)

when rescale is True, will subtract the minimum value in P (if it is below 0) and divide by the maximum (if it is above 1) to ensure that P has entries between 0 and 1. If False, elements of P outside of [0, 1] will be clipped.

directed: boolean, optional (default=False)

If False, output adjacency matrix will be symmetric. Otherwise, output adjacency matrix will be asymmetric.

loops: boolean, optional (default=True)

If False, no edges will be sampled in the diagonal. Diagonal elements in P matrix are removed prior to rescaling (see above) which may affect behavior. Otherwise, edges are sampled in the diagonal.

Returns:
G1: ndarray (n_vertices, n_vertices)

A matrix representing the probabilities of connections between vertices in a random graph based on their latent positions

G2: ndarray (n_vertices, n_vertices)

A matrix representing the probabilities of connections between vertices in a random graph based on their latent positions

Parameters:
  • X (ndarray) --

  • Y (ndarray) --

  • r (float) --

  • rescale (bool) --

  • directed (bool) --

  • loops (bool) --

Return type:

Tuple[ndarray, ndarray]

References

[1]

Vince Lyzinski, Donniell E Fishkind profile imageDonniell E. Fishkind, Carey E Priebe. "Seeded graph matching for correlated Erdös-Rényi graphs". The Journal of Machine Learning Research, January 2014

Examples

>>> np.random.seed(1234)
>>> X = np.random.dirichlet([1, 1], size=5)
>>> Y = None

Generate random latent positions using 2-dimensional Dirichlet distribution. Then sample a correlated RDPG graph pair:

>>> rdpg_corr(X, Y, 0.3, rescale=False, directed=False, loops=False)
(array([[0., 1., 0., 1., 0.],
       [1., 0., 0., 1., 1.],
       [0., 0., 0., 0., 0.],
       [1., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0.]]), array([[0., 1., 0., 1., 0.],
       [1., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.]]))
graspologic.simulations.mmsbm(n, p, alpha=None, rng=None, directed=False, loops=False, return_labels=False)[source]

Samples a graph from Mixed Membership Stochastic Block Model (MMSBM).

MMSBM produces a graph given the specified block connectivity matrix B, which indicates the probability of connection between nodes based upon their community membership. Each node is assigned a membership vector drawn from Dirichlet distribution with parameter \(\vec{\alpha}\). The entries of this vector indicate the probabilities for that node of pertaining to each community when interacting with another node. Each node's membership is determined according to those probabilities. Finally, interactions are sampled according to the assigned memberships.

Read more in the Mixed Membership Stochastic Blockmodel (MMSBM) Tutorial

Parameters:
n: int

Number of vertices of the graph.

p: array-like, shape (n_communities, n_communities)

Probability of an edge between each of the communities, where p[i, j] indicates the probability of a connection between edges in communities \((i, j)\). 0 < p[i, j] < 1 for all \(i, j\).

alpha: array-like, shape (n_communities,)

Parameter alpha of the Dirichlet distribution used to sample the mixed-membership vectors for each node. alpha[i] > 0 for all \(i\).

rng: numpy.random.Generator, optional (default = None)

numpy.random.Generator object to sample from distributions. If None, the random number generator is the Generator object constructed by np.random.default_rng().

directed: boolean, optional (default=False)

If False, output adjacency matrix will be symmetric. Otherwise, output adjacency matrix will be asymmetric.

loops: boolean, optional (default=False)

If False, no edges will be sampled in the diagonal. Otherwise, edges are sampled in the diagonal.

return_labels: boolean, optional (default=False)

If False, the only output is the adjacency matrix. If True, output is a tuple. The first element of the tuple is the adjacency matrix. The second element is a matrix in which the \((i^{th}, j^{th})\) entry indicates the membership assigned to node i when interacting with node j. Community 1 is labeled with a 0, community 2 with 1, etc. -1 indicates that no community was assigned for that interaction.

Returns:
A: ndarray, shape (n, n)

Sampled adjacency matrix

labels: ndarray, shape (n, n), optional

Array containing the membership assigned to each node when interacting with another node.

Parameters:
  • n (int) --

  • p (ndarray) --

  • alpha (ndarray | None) --

  • rng (Generator | None) --

  • directed (bool) --

  • loops (bool) --

  • return_labels (bool) --

Return type:

ndarray | Tuple[ndarray, ndarray]

References

[1]

Airoldi, Edoardo, et al. “Mixed Membership Stochastic Blockmodels.” Journal of Machine Learning Research, vol. 9, 2008, pp. 1981–2014.

Examples

>>> rng = np.random.default_rng(1)
>>> np.random.seed(1)
>>> n = 6
>>> p = [[0.5, 0], [0, 1]]

To sample a binary MMSBM in which very likely all nodes pertain to community two:

>>> alpha = [0.05, 1000]
>>> mmsbm(n, p, alpha, rng = rng)
array([[0., 1., 1., 1., 1., 1.],
       [1., 0., 1., 1., 1., 1.],
       [1., 1., 0., 1., 1., 1.],
       [1., 1., 1., 0., 1., 1.],
       [1., 1., 1., 1., 0., 1.],
       [1., 1., 1., 1., 1., 0.]])

To sample a binary MMSBM similar to 2-block SBM with connectivity matrix B:

>>> rng = np.random.default_rng(1)
>>> np.random.seed(1)
>>> alpha = [0.05, 0.05]
>>> mmsbm(n, p, alpha, rng = rng)
array([[0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 1.],
       [0., 0., 0., 1., 0., 1.],
       [0., 0., 0., 1., 1., 0.]])