Models¶

Erdos-Reyni models¶

class graspologic.models.EREstimator[source]¶

Erdos-Reyni Model

The Erdos-Reyni (ER) model is a simple random graph model in which the probability of any potential edge in the graph existing is the same for any two nodes $i$ and $j$.

$P_{ij} = p$ for all i, j

Read more in the Erdos-Renyi (ER) Model Tutorial

Parameters:

directedboolean, optional (default=True): Whether to treat the input graph as directed. Even if a directed graph is input, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
loopsboolean, optional (default=False): Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.

Attributes:

p_float: Value between 0 and 1 (inclusive) representing the probability of any edge in the ER graph model
p_mat_np.ndarray, shape (n_verts, n_verts): Probability matrix $P$ for the fit model, from which graphs could be sampled.

See also

graspologic.models.DCEREstimator
graspologic.models.SBMEstimator
graspologic.simulations.er_np

References

[1]

https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model

__init__(directed=True, loops=False)[source]¶

Parameters:

directed (bool) --
loops (bool) --

fit(graph, y=None)[source]¶

Fit the SBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters:

grapharray_like or networkx.Graph: Input graph to fit
yarray_like, length graph.shape[0], optional: Categorical labels for the block assignments of the graph

Parameters:

graph (ndarray | csr_array | Graph) --
y (Any | None) --

Return type:

EREstimator

bic(graph)¶

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

bicfloat: The lower the better

Parameters:

graph (ndarray) --

Return type:

float

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

mse(graph)¶

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

msefloat: Mean square error for the model's fit P matrix

Parameters:

graph (ndarray) --

Return type:

float

sample(n_samples=1)¶

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:

n_samplesint (default 1), optional: The number of graphs to sample

Returns:

graphsnp.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

Parameters:

n_samples (int) --

Return type:

ndarray

score(graph)¶

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute

Returns:

scorefloat: sum of log-loglikelihoods for each potential edge in input graph

Parameters:

graph (ndarray) --

Return type:

float

score_samples(graph, clip=None)¶

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute
clipscalar or None, optional (default=None): Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:

sample_scoresnp.ndarray (size of graph): log-likelihood per potential edge in the graph

Parameters:

graph (ndarray) --
clip (float | None) --

Return type:

ndarray

set_fit_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (EREstimator) --
graph (bool | None | str) --

Return type:

EREstimator

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_score_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in score.

Returns:

selfobject: The updated object.

Parameters:

self (EREstimator) --
graph (bool | None | str) --

Return type:

EREstimator

block_p_: ndarray¶

vertex_assignments_: ndarray¶

class graspologic.models.DCEREstimator[source]¶

Degree-corrected Erdos-Reyni Model

The Degree-corrected Erdos-Reyni (DCER) model is an extension of the ER model in which each node has an additional "promiscuity" parameter $\theta_i$ that determines its expected degree in the graph.

$P_{ij} = \theta_i \theta_j p$

Read more in the Erdos-Renyi (ER) Model Tutorial

Parameters:

directedboolean, optional (default=True): Whether to treat the input graph as directed. Even if a directed graph is input, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
loopsboolean, optional (default=False): Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
degree_directedboolean: Whether to allow seperate degree correction parameters for the in and out degree of each node. Ignored if directed is False.

Attributes:

p_float: The $p$ parameter as described in the above model, which weights the overall probability of connections between any two nodes.
p_mat_np.ndarray, shape (n_verts, n_verts): Probability matrix $P$ for the fit model, from which graphs could be sampled.
degree_corrections_np.ndarray, shape (n_verts, 1) or (n_verts, 2): Degree correction vector(s) $\theta$. If degree_directed parameter was False, then will be of shape (n_verts, 1) and element i represents the degree correction for node $i$. Otherwise, the first column contains out degree corrections and the second column contains in degree corrections.

See also

graspologic.models.DCSBMEstimator
graspologic.models.EREstimator
graspologic.simulations.er_np

Notes

The DCER model is rarely mentioned in literature, though it is simply a special case of the DCSBM where there is only one community.

References

[1]

https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model

[2]

Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.

__init__(directed=True, loops=False, degree_directed=False)[source]¶

Parameters:

directed (bool) --
loops (bool) --
degree_directed (bool) --

fit(graph, y=None)[source]¶

Fit the DCSBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters:

grapharray_like or networkx.Graph: Input graph to fit
yarray_like, length graph.shape[0], optional: Categorical labels for the block assignments of the graph

Returns:

selfDCSBMEstimator object: Fitted instance of self

Parameters:

graph (ndarray | csr_array | Graph) --
y (Any | None) --

Return type:

DCEREstimator

bic(graph)¶

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

bicfloat: The lower the better

Parameters:

graph (ndarray) --

Return type:

float

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

mse(graph)¶

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

msefloat: Mean square error for the model's fit P matrix

Parameters:

graph (ndarray) --

Return type:

float

sample(n_samples=1)¶

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:

n_samplesint (default 1), optional: The number of graphs to sample

Returns:

graphsnp.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

Parameters:

n_samples (int) --

Return type:

ndarray

score(graph)¶

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute

Returns:

scorefloat: sum of log-loglikelihoods for each potential edge in input graph

Parameters:

graph (ndarray) --

Return type:

float

score_samples(graph, clip=None)¶

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute
clipscalar or None, optional (default=None): Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:

sample_scoresnp.ndarray (size of graph): log-likelihood per potential edge in the graph

Parameters:

graph (ndarray) --
clip (float | None) --

Return type:

ndarray

set_fit_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (DCEREstimator) --
graph (bool | None | str) --

Return type:

DCEREstimator

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_score_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in score.

Returns:

selfobject: The updated object.

Parameters:

self (DCEREstimator) --
graph (bool | None | str) --

Return type:

DCEREstimator

Stochastic block models¶

class graspologic.models.SBMEstimator[source]¶

Stochastic Block Model

The stochastic block model (SBM) represents each node as belonging to a block (or community). For a given potential edge between node $i$ and $j$, the probability of an edge existing is specified by the block that nodes $i$ and $j$ belong to:

$P_{ij} = B_{\tau_i \tau_j}$

where $B \in \mathbb{[0, 1]}^{K x K}$ and $\tau$ is an n_nodes length vector specifying which block each node belongs to.

Read more in the Stochastic Block Model (SBM) Tutorial

Parameters:

directedboolean, optional (default=True): Whether to treat the input graph as directed. Even if a directed graph is input, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
loopsboolean, optional (default=False): Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
n_componentsint, optional (default=None): Desired dimensionality of embedding for clustering to find communities. n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension().
min_commint, optional (default=1): The minimum number of communities (blocks) to consider.
max_commint, optional (default=10): The maximum number of communities (blocks) to consider (inclusive).
cluster_kwsdict, optional (default={}): Additional kwargs passed down to GaussianCluster
embed_kwsdict, optional (default={}): Additional kwargs passed down to AdjacencySpectralEmbed

Attributes:

block_p_np.ndarray, shape (n_blocks, n_blocks): The block probability matrix $B$, where the element $B_{i, j}$ represents the probability of an edge between block $i$ and block $j$.
p_mat_np.ndarray, shape (n_verts, n_verts): Probability matrix $P$ for the fit model, from which graphs could be sampled.
vertex_assignments_np.ndarray, shape (n_verts): A vector of integer labels corresponding to the predicted block that each node belongs to if y was not passed during the call to fit().
block_weights_np.ndarray, shape (n_blocks): Contains the proportion of nodes that belong to each block in the fit model.

See also

graspologic.models.DCSBMEstimator
graspologic.simulations.sbm

References

[1]

Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social networks, 5(2), 109-137.

block_p_: ndarray¶

vertex_assignments_: ndarray¶

__init__(directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]¶

Parameters:

directed (bool) --
loops (bool) --
n_components (int | None) --
min_comm (int) --
max_comm (int) --
cluster_kws (Dict[str, Any]) --
embed_kws (Dict[str, Any]) --

fit(graph, y=None)[source]¶

Fit the SBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters:

grapharray_like or networkx.Graph: Input graph to fit
yarray_like, length graph.shape[0], optional: Categorical labels for the block assignments of the graph

Parameters:

graph (ndarray | csr_array | Graph) --
y (Any | None) --

Return type:

SBMEstimator

bic(graph)¶

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

bicfloat: The lower the better

Parameters:

graph (ndarray) --

Return type:

float

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

mse(graph)¶

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

msefloat: Mean square error for the model's fit P matrix

Parameters:

graph (ndarray) --

Return type:

float

sample(n_samples=1)¶

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:

n_samplesint (default 1), optional: The number of graphs to sample

Returns:

graphsnp.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

Parameters:

n_samples (int) --

Return type:

ndarray

score(graph)¶

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute

Returns:

scorefloat: sum of log-loglikelihoods for each potential edge in input graph

Parameters:

graph (ndarray) --

Return type:

float

score_samples(graph, clip=None)¶

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute
clipscalar or None, optional (default=None): Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:

sample_scoresnp.ndarray (size of graph): log-likelihood per potential edge in the graph

Parameters:

graph (ndarray) --
clip (float | None) --

Return type:

ndarray

set_fit_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (SBMEstimator) --
graph (bool | None | str) --

Return type:

SBMEstimator

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_score_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in score.

Returns:

selfobject: The updated object.

Parameters:

self (SBMEstimator) --
graph (bool | None | str) --

Return type:

SBMEstimator

class graspologic.models.DCSBMEstimator[source]¶

Degree-corrected Stochastic Block Model

The degree-corrected stochastic block model (DCSBM) represents each node as belonging to a block (or community). For a given potential edge between node $i$ and $j$, the probability of an edge existing is specified by the block that nodes $i$ and $j$ belong to as in the SBM. However, an additional "promiscuity" parameter $\theta$ is added for each node, allowing the vertices within a block to have heterogeneous expected degree distributions:

$P_{ij} = \theta_i \theta_j B_{\tau_i \tau_j}$

where $B \in \mathbb{[0, 1]}^{K x K}$ $\tau$ is an n_nodes length vector specifying which block each node belongs to, and $\theta$ is an n_nodes length vector specifiying the degree correction for each node.

The degree_directed parameter of this model allows the degree correction parameter to be different for the in and out degree of each node:

$P_{ij} = \theta_i \eta_j B_{\tau_i \tau_j}$

where $\theta$ and $\eta$ need not be the same.

Read more in the Stochastic Block Model (SBM) Tutorial

Parameters:

directedboolean, optional (default=True): Whether to treat the input graph as directed. Even if a directed graph is input, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
degree_directedboolean, optional (default=False): Whether to fit an "in" and "out" degree correction for each node. In the degree_directed case, the fit model can have a different expected in and out degree for each node.
loopsboolean, optional (default=False): Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
n_componentsint, optional (default=None): Desired dimensionality of embedding for clustering to find communities. n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension().
min_commint, optional (default=1): The minimum number of communities (blocks) to consider.
max_commint, optional (default=10): The maximum number of communities (blocks) to consider (inclusive).
cluster_kwsdict, optional (default={}): Additional kwargs passed down to GaussianCluster
embed_kwsdict, optional (default={}): Additional kwargs passed down to LaplacianSpectralEmbed

Attributes:

block_p_np.ndarray, shape (n_blocks, n_blocks): The block probability matrix $B$, where the element $B_{i, j}$ represents the expected number of edges between block $i$ and block $j$.
p_mat_np.ndarray, shape (n_verts, n_verts): Probability matrix $P$ for the fit model, from which graphs could be sampled.
degree_corrections_np.ndarray, shape (n_verts, 1) or (n_verts, 2): Degree correction vector(s) $\theta$. If degree_directed parameter was False, then will be of shape (n_verts, 1) and element $i$ represents the degree correction for node $i$. Otherwise, the first column contains out degree corrections and the second column contains in degree corrections.
vertex_assignments_np.ndarray, shape (n_verts): A vector of integer labels corresponding to the predicted block that each node belongs to if y was not passed during the call to fit().
block_weights_np.ndarray, shape (n_blocks): Contains the proportion of nodes that belong to each block in the fit model.

See also

graspologic.models.SBMEstimator
graspologic.simulations.sbm

Notes

Note that many examples in the literature describe the DCSBM as being sampled with a Poisson distribution. Here, we implement this model with a Bernoulli. When individual edge probabilities are relatively low these two distributions will yield similar results.

References

[1]

Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.

__init__(degree_directed=False, directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]¶

Parameters:

degree_directed (bool) --
directed (bool) --
loops (bool) --
n_components (int | None) --
min_comm (int) --
max_comm (int) --
cluster_kws (Dict[str, Any]) --
embed_kws (Dict[str, Any]) --

fit(graph, y=None)[source]¶

Fit the DCSBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters:

grapharray_like or networkx.Graph: Input graph to fit
yarray_like, length graph.shape[0], optional: Categorical labels for the block assignments of the graph

Returns:

selfDCSBMEstimator object: Fitted instance of self

Parameters:

graph (ndarray | csr_array | Graph) --
y (Any | None) --

Return type:

DCSBMEstimator

bic(graph)¶

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

bicfloat: The lower the better

Parameters:

graph (ndarray) --

Return type:

float

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

mse(graph)¶

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

msefloat: Mean square error for the model's fit P matrix

Parameters:

graph (ndarray) --

Return type:

float

sample(n_samples=1)¶

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:

n_samplesint (default 1), optional: The number of graphs to sample

Returns:

graphsnp.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

Parameters:

n_samples (int) --

Return type:

ndarray

score(graph)¶

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute

Returns:

scorefloat: sum of log-loglikelihoods for each potential edge in input graph

Parameters:

graph (ndarray) --

Return type:

float

score_samples(graph, clip=None)¶

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute
clipscalar or None, optional (default=None): Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:

sample_scoresnp.ndarray (size of graph): log-likelihood per potential edge in the graph

Parameters:

graph (ndarray) --
clip (float | None) --

Return type:

ndarray

set_fit_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (DCSBMEstimator) --
graph (bool | None | str) --

Return type:

DCSBMEstimator

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_score_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in score.

Returns:

selfobject: The updated object.

Parameters:

self (DCSBMEstimator) --
graph (bool | None | str) --

Return type:

DCSBMEstimator

Latent position models¶

class graspologic.models.RDPGEstimator[source]¶

Random Dot Product Graph

Under the random dot product graph model, each node is assumed to have a "latent position" in some $d$-dimensional Euclidian space. This vector dictates that node's probability of connection to other nodes. For a given pair of nodes $i$ and $j$, the probability of connection is the dot product between their latent positions:

$P_{ij} = \langle x_i, y_j \rangle$

where $x_i$ is the left latent position of node $i$, and $y_j$ is the right latent position of node $j$. If the graph being modeled is is undirected, then $x_i = y_i$. Latent positions can be estimated via AdjacencySpectralEmbed.

Parameters:

loopsboolean, optional (default=False): Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
n_componentsint, optional (default=None): The dimensionality of the latent space used to model the graph. If None, the method of Zhu and Godsie will be used to select an embedding dimension.
ase_kwsdict, optional (default={}): Dictionary of keyword arguments passed down to AdjacencySpectralEmbed, which is used to fit the model.
diag_aug_weightint or float, optional (default=1): Weighting used for diagonal augmentation, which is a form of regularization for fitting the RDPG model.
plus_c_weightint or float, optional (default=1): Weighting used for a constant scalar added to the adjacency matrix before embedding as a form of regularization.

Attributes:

latent_tuple, length 2, or np.ndarray, shape (n_verts, n_components): The fit latent positions for the RDPG model. If a tuple, then the graph that was input to fit was directed, and the first and second elements of the tuple are the left and right latent positions, respectively. The left and right latent positions will both be of shape (n_verts, n_components). If latent_ is an array, then the graph that was input to fit was undirected and the left and right latent positions are the same.
p_mat_np.ndarray, shape (n_verts, n_verts): Probability matrix $P$ for the fit model, from which graphs could be sampled.

See also

graspologic.simulations.rdpg
graspologic.embed.AdjacencySpectralEmbed
graspologic.utils.augment_diagonal

References

[1]

Athreya, A., Fishkind, D. E., Tang, M., Priebe, C. E., Park, Y., Vogelstein, J. T., ... & Sussman, D. L. (2018). Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18(226), 1-92.

[2]

Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918-930.

__init__(loops=False, n_components=None, ase_kws={}, diag_aug_weight=1, plus_c_weight=1)[source]¶

Parameters:

loops (bool) --
n_components (int | None) --
ase_kws (Dict[str, Any]) --
diag_aug_weight (float) --
plus_c_weight (float) --

fit(graph, y=None)[source]¶

Calculate the parameters for the given graph model

Parameters:

graph (ndarray | csr_array | Graph) --
y (Any | None) --

Return type:

RDPGEstimator

bic(graph)¶

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

bicfloat: The lower the better

Parameters:

graph (ndarray) --

Return type:

float

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

mse(graph)¶

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph

Returns:

msefloat: Mean square error for the model's fit P matrix

Parameters:

graph (ndarray) --

Return type:

float

sample(n_samples=1)¶

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:

n_samplesint (default 1), optional: The number of graphs to sample

Returns:

graphsnp.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

Parameters:

n_samples (int) --

Return type:

ndarray

score(graph)¶

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute

Returns:

scorefloat: sum of log-loglikelihoods for each potential edge in input graph

Parameters:

graph (ndarray) --

Return type:

float

score_samples(graph, clip=None)¶

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:

graphnp.ndarray: Input graph. Must be same shape as model's p_mat_ attribute
clipscalar or None, optional (default=None): Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:

sample_scoresnp.ndarray (size of graph): log-likelihood per potential edge in the graph

Parameters:

graph (ndarray) --
clip (float | None) --

Return type:

ndarray

set_fit_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (RDPGEstimator) --
graph (bool | None | str) --

Return type:

RDPGEstimator

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_score_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in score.

Returns:

selfobject: The updated object.

Parameters:

self (RDPGEstimator) --
graph (bool | None | str) --

Return type:

RDPGEstimator

Edge swapping (configuration models)¶

class graspologic.models.EdgeSwapper[source]¶

Degree Preserving Edge Swaps

This class allows for performing degree preserving edge swaps to generate new networks with the same degree sequence as the input network.

Attributes:

adjacencynp.ndarray OR csr_array, shape (n_verts, n_verts): The initial adjacency matrix to perform edge swaps on. Must be unweighted and undirected.
edge_listnp.ndarray, shape (n_verts, 2): The corresponding edgelist for the input network
seed: int, optional: Random seed to make outputs reproducible, must be positive

References

[1]

Fosdick, B. K., Larremore, D. B., Nishimura, J., & Ugander, J. (2018). Configuring random graph models with fixed degree sequences. Siam Review, 60(2), 315-355.

[2]

Carstens, C. J., & Horadam, K. J. (2017). Switching edges to randomize networks: what goes wrong and how to fix it. Journal of Complex Networks, 5(3), 337-351.

[3]

https://github.com/joelnish/double-edge-swap-mcmc/blob/master/dbl_edge_mcmc.py

__init__(adjacency, seed=None)[source]¶

Parameters:

adjacency (ndarray | csr_array) --
seed (int | None) --

swap_edges(n_swaps=1)[source]¶

Performs a number of edge swaps on the graph

Parameters:

n_swapsint (default 1), optional: The number of edge swaps to be performed

Returns:

adjacencynp.ndarray OR csr.matrix, shape (n_verts, n_verts): The adjancency matrix after a number of edge swaps are performed on the graph
edge_listnp.ndarray (n_verts, 2): The edge_list after a number of edge swaps are perfomed on the graph

Parameters:

n_swaps (int) --

Return type:

Tuple[ndarray | csr_array, ndarray]