Models¶
Erdos-Reyni models¶
- class graspologic.models.EREstimator[source]¶
Erdos-Reyni Model
The Erdos-Reyni (ER) model is a simple random graph model in which the probability of any potential edge in the graph existing is the same for any two nodes \(i\) and \(j\).
\(P_{ij} = p\) for all i, j
Read more in the Erdos-Renyi (ER) Model Tutorial
- Parameters:
- directedboolean, optional (default=True)
Whether to treat the input graph as directed. Even if a directed graph is input, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
- loopsboolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
- Attributes:
- p_float
Value between 0 and 1 (inclusive) representing the probability of any edge in the ER graph model
- p_mat_np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
See also
References
- fit(graph, y=None)[source]¶
Fit the SBM to a graph, optionally with known block labels
If y is None, the block assignments for each vertex will first be estimated.
- Parameters:
- grapharray_like or networkx.Graph
Input graph to fit
- yarray_like, length graph.shape[0], optional
Categorical labels for the block assignments of the graph
- Parameters:
- Return type:
- bic(graph)¶
Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- bicfloat
The lower the better
- Parameters:
graph (ndarray) --
- Return type:
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- mse(graph)¶
Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- msefloat
Mean square error for the model's fit P matrix
- Parameters:
graph (ndarray) --
- Return type:
- sample(n_samples=1)¶
Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
- Parameters:
- n_samplesint (default 1), optional
The number of graphs to sample
- Returns:
- graphsnp.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
- Parameters:
n_samples (int) --
- Return type:
ndarray
- score(graph)¶
Compute the average log-likelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
- Returns:
- scorefloat
sum of log-loglikelihoods for each potential edge in input graph
- Parameters:
graph (ndarray) --
- Return type:
- score_samples(graph, clip=None)¶
Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute- clipscalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
- Returns:
- sample_scoresnp.ndarray (size of
graph
) log-likelihood per potential edge in the graph
- sample_scoresnp.ndarray (size of
- Parameters:
graph (ndarray) --
clip (float | None) --
- Return type:
ndarray
- set_fit_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter infit
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (EREstimator) --
- Return type:
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it's possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter inscore
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (EREstimator) --
- Return type:
- block_p_: ndarray¶
- vertex_assignments_: ndarray¶
- class graspologic.models.DCEREstimator[source]¶
Degree-corrected Erdos-Reyni Model
The Degree-corrected Erdos-Reyni (DCER) model is an extension of the ER model in which each node has an additional "promiscuity" parameter \(\theta_i\) that determines its expected degree in the graph.
\(P_{ij} = \theta_i \theta_j p\)
Read more in the Erdos-Renyi (ER) Model Tutorial
- Parameters:
- directedboolean, optional (default=True)
Whether to treat the input graph as directed. Even if a directed graph is input, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
- loopsboolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
- degree_directedboolean
Whether to allow seperate degree correction parameters for the in and out degree of each node. Ignored if
directed
is False.
- Attributes:
- p_float
The \(p\) parameter as described in the above model, which weights the overall probability of connections between any two nodes.
- p_mat_np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
- degree_corrections_np.ndarray, shape (n_verts, 1) or (n_verts, 2)
Degree correction vector(s) \(\theta\). If
degree_directed
parameter was False, then will be of shape (n_verts, 1) and element i represents the degree correction for node \(i\). Otherwise, the first column contains out degree corrections and the second column contains in degree corrections.
See also
Notes
The DCER model is rarely mentioned in literature, though it is simply a special case of the DCSBM where there is only one community.
References
[2]Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.
- fit(graph, y=None)[source]¶
Fit the DCSBM to a graph, optionally with known block labels
If y is None, the block assignments for each vertex will first be estimated.
- Parameters:
- grapharray_like or networkx.Graph
Input graph to fit
- yarray_like, length graph.shape[0], optional
Categorical labels for the block assignments of the graph
- Returns:
- self
DCSBMEstimator
object Fitted instance of self
- self
- Parameters:
- Return type:
- bic(graph)¶
Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- bicfloat
The lower the better
- Parameters:
graph (ndarray) --
- Return type:
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- mse(graph)¶
Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- msefloat
Mean square error for the model's fit P matrix
- Parameters:
graph (ndarray) --
- Return type:
- sample(n_samples=1)¶
Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
- Parameters:
- n_samplesint (default 1), optional
The number of graphs to sample
- Returns:
- graphsnp.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
- Parameters:
n_samples (int) --
- Return type:
ndarray
- score(graph)¶
Compute the average log-likelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
- Returns:
- scorefloat
sum of log-loglikelihoods for each potential edge in input graph
- Parameters:
graph (ndarray) --
- Return type:
- score_samples(graph, clip=None)¶
Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute- clipscalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
- Returns:
- sample_scoresnp.ndarray (size of
graph
) log-likelihood per potential edge in the graph
- sample_scoresnp.ndarray (size of
- Parameters:
graph (ndarray) --
clip (float | None) --
- Return type:
ndarray
- set_fit_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter infit
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (DCEREstimator) --
- Return type:
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it's possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter inscore
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (DCEREstimator) --
- Return type:
Stochastic block models¶
- class graspologic.models.SBMEstimator[source]¶
Stochastic Block Model
The stochastic block model (SBM) represents each node as belonging to a block (or community). For a given potential edge between node \(i\) and \(j\), the probability of an edge existing is specified by the block that nodes \(i\) and \(j\) belong to:
\(P_{ij} = B_{\tau_i \tau_j}\)
where \(B \in \mathbb{[0, 1]}^{K x K}\) and \(\tau\) is an n_nodes length vector specifying which block each node belongs to.
Read more in the Stochastic Block Model (SBM) Tutorial
- Parameters:
- directedboolean, optional (default=True)
Whether to treat the input graph as directed. Even if a directed graph is input, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
- loopsboolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
- n_componentsint, optional (default=None)
Desired dimensionality of embedding for clustering to find communities.
n_components
must be< min(X.shape)
. If None, then optimal dimensions will be chosen byselect_dimension()
.- min_commint, optional (default=1)
The minimum number of communities (blocks) to consider.
- max_commint, optional (default=10)
The maximum number of communities (blocks) to consider (inclusive).
- cluster_kwsdict, optional (default={})
Additional kwargs passed down to
GaussianCluster
- embed_kwsdict, optional (default={})
Additional kwargs passed down to
AdjacencySpectralEmbed
- Attributes:
- block_p_np.ndarray, shape (n_blocks, n_blocks)
The block probability matrix \(B\), where the element \(B_{i, j}\) represents the probability of an edge between block \(i\) and block \(j\).
- p_mat_np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
- vertex_assignments_np.ndarray, shape (n_verts)
A vector of integer labels corresponding to the predicted block that each node belongs to if
y
was not passed during the call tofit()
.- block_weights_np.ndarray, shape (n_blocks)
Contains the proportion of nodes that belong to each block in the fit model.
References
[1]Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social networks, 5(2), 109-137.
- block_p_: ndarray¶
- vertex_assignments_: ndarray¶
- __init__(directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]¶
- fit(graph, y=None)[source]¶
Fit the SBM to a graph, optionally with known block labels
If y is None, the block assignments for each vertex will first be estimated.
- Parameters:
- grapharray_like or networkx.Graph
Input graph to fit
- yarray_like, length graph.shape[0], optional
Categorical labels for the block assignments of the graph
- Parameters:
- Return type:
- bic(graph)¶
Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- bicfloat
The lower the better
- Parameters:
graph (ndarray) --
- Return type:
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- mse(graph)¶
Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- msefloat
Mean square error for the model's fit P matrix
- Parameters:
graph (ndarray) --
- Return type:
- sample(n_samples=1)¶
Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
- Parameters:
- n_samplesint (default 1), optional
The number of graphs to sample
- Returns:
- graphsnp.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
- Parameters:
n_samples (int) --
- Return type:
ndarray
- score(graph)¶
Compute the average log-likelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
- Returns:
- scorefloat
sum of log-loglikelihoods for each potential edge in input graph
- Parameters:
graph (ndarray) --
- Return type:
- score_samples(graph, clip=None)¶
Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute- clipscalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
- Returns:
- sample_scoresnp.ndarray (size of
graph
) log-likelihood per potential edge in the graph
- sample_scoresnp.ndarray (size of
- Parameters:
graph (ndarray) --
clip (float | None) --
- Return type:
ndarray
- set_fit_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter infit
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (SBMEstimator) --
- Return type:
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it's possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter inscore
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (SBMEstimator) --
- Return type:
- class graspologic.models.DCSBMEstimator[source]¶
Degree-corrected Stochastic Block Model
The degree-corrected stochastic block model (DCSBM) represents each node as belonging to a block (or community). For a given potential edge between node \(i\) and \(j\), the probability of an edge existing is specified by the block that nodes \(i\) and \(j\) belong to as in the SBM. However, an additional "promiscuity" parameter \(\theta\) is added for each node, allowing the vertices within a block to have heterogeneous expected degree distributions:
\(P_{ij} = \theta_i \theta_j B_{\tau_i \tau_j}\)
where \(B \in \mathbb{[0, 1]}^{K x K}\) \(\tau\) is an n_nodes length vector specifying which block each node belongs to, and \(\theta\) is an n_nodes length vector specifiying the degree correction for each node.
The
degree_directed
parameter of this model allows the degree correction parameter to be different for the in and out degree of each node:\(P_{ij} = \theta_i \eta_j B_{\tau_i \tau_j}\)
where \(\theta\) and \(\eta\) need not be the same.
Read more in the Stochastic Block Model (SBM) Tutorial
- Parameters:
- directedboolean, optional (default=True)
Whether to treat the input graph as directed. Even if a directed graph is input, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
- degree_directedboolean, optional (default=False)
Whether to fit an "in" and "out" degree correction for each node. In the degree_directed case, the fit model can have a different expected in and out degree for each node.
- loopsboolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
- n_componentsint, optional (default=None)
Desired dimensionality of embedding for clustering to find communities.
n_components
must be< min(X.shape)
. If None, then optimal dimensions will be chosen byselect_dimension()
.- min_commint, optional (default=1)
The minimum number of communities (blocks) to consider.
- max_commint, optional (default=10)
The maximum number of communities (blocks) to consider (inclusive).
- cluster_kwsdict, optional (default={})
Additional kwargs passed down to
GaussianCluster
- embed_kwsdict, optional (default={})
Additional kwargs passed down to
LaplacianSpectralEmbed
- Attributes:
- block_p_np.ndarray, shape (n_blocks, n_blocks)
The block probability matrix \(B\), where the element \(B_{i, j}\) represents the expected number of edges between block \(i\) and block \(j\).
- p_mat_np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
- degree_corrections_np.ndarray, shape (n_verts, 1) or (n_verts, 2)
Degree correction vector(s) \(\theta\). If
degree_directed
parameter was False, then will be of shape (n_verts, 1) and element \(i\) represents the degree correction for node \(i\). Otherwise, the first column contains out degree corrections and the second column contains in degree corrections.- vertex_assignments_np.ndarray, shape (n_verts)
A vector of integer labels corresponding to the predicted block that each node belongs to if
y
was not passed during the call tofit()
.- block_weights_np.ndarray, shape (n_blocks)
Contains the proportion of nodes that belong to each block in the fit model.
Notes
Note that many examples in the literature describe the DCSBM as being sampled with a Poisson distribution. Here, we implement this model with a Bernoulli. When individual edge probabilities are relatively low these two distributions will yield similar results.
References
[1]Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.
- __init__(degree_directed=False, directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]¶
- fit(graph, y=None)[source]¶
Fit the DCSBM to a graph, optionally with known block labels
If y is None, the block assignments for each vertex will first be estimated.
- Parameters:
- grapharray_like or networkx.Graph
Input graph to fit
- yarray_like, length graph.shape[0], optional
Categorical labels for the block assignments of the graph
- Returns:
- self
DCSBMEstimator
object Fitted instance of self
- self
- Parameters:
- Return type:
- bic(graph)¶
Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- bicfloat
The lower the better
- Parameters:
graph (ndarray) --
- Return type:
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- mse(graph)¶
Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- msefloat
Mean square error for the model's fit P matrix
- Parameters:
graph (ndarray) --
- Return type:
- sample(n_samples=1)¶
Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
- Parameters:
- n_samplesint (default 1), optional
The number of graphs to sample
- Returns:
- graphsnp.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
- Parameters:
n_samples (int) --
- Return type:
ndarray
- score(graph)¶
Compute the average log-likelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
- Returns:
- scorefloat
sum of log-loglikelihoods for each potential edge in input graph
- Parameters:
graph (ndarray) --
- Return type:
- score_samples(graph, clip=None)¶
Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute- clipscalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
- Returns:
- sample_scoresnp.ndarray (size of
graph
) log-likelihood per potential edge in the graph
- sample_scoresnp.ndarray (size of
- Parameters:
graph (ndarray) --
clip (float | None) --
- Return type:
ndarray
- set_fit_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter infit
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (DCSBMEstimator) --
- Return type:
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it's possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter inscore
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (DCSBMEstimator) --
- Return type:
Latent position models¶
- class graspologic.models.RDPGEstimator[source]¶
Random Dot Product Graph
Under the random dot product graph model, each node is assumed to have a "latent position" in some \(d\)-dimensional Euclidian space. This vector dictates that node's probability of connection to other nodes. For a given pair of nodes \(i\) and \(j\), the probability of connection is the dot product between their latent positions:
\(P_{ij} = \langle x_i, y_j \rangle\)
where \(x_i\) is the left latent position of node \(i\), and \(y_j\) is the right latent position of node \(j\). If the graph being modeled is is undirected, then \(x_i = y_i\). Latent positions can be estimated via
AdjacencySpectralEmbed
.Read more in the Random Dot Product Graph (RDPG) Model Tutorial
- Parameters:
- loopsboolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
- n_componentsint, optional (default=None)
The dimensionality of the latent space used to model the graph. If None, the method of Zhu and Godsie will be used to select an embedding dimension.
- ase_kwsdict, optional (default={})
Dictionary of keyword arguments passed down to
AdjacencySpectralEmbed
, which is used to fit the model.- diag_aug_weightint or float, optional (default=1)
Weighting used for diagonal augmentation, which is a form of regularization for fitting the RDPG model.
- plus_c_weightint or float, optional (default=1)
Weighting used for a constant scalar added to the adjacency matrix before embedding as a form of regularization.
- Attributes:
- latent_tuple, length 2, or np.ndarray, shape (n_verts, n_components)
The fit latent positions for the RDPG model. If a tuple, then the graph that was input to fit was directed, and the first and second elements of the tuple are the left and right latent positions, respectively. The left and right latent positions will both be of shape (n_verts, n_components). If
latent_
is an array, then the graph that was input to fit was undirected and the left and right latent positions are the same.- p_mat_np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
See also
References
[1]Athreya, A., Fishkind, D. E., Tang, M., Priebe, C. E., Park, Y., Vogelstein, J. T., ... & Sussman, D. L. (2018). Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18(226), 1-92.
[2]Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918-930.
- fit(graph, y=None)[source]¶
Calculate the parameters for the given graph model
- Parameters:
- Return type:
- bic(graph)¶
Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- bicfloat
The lower the better
- Parameters:
graph (ndarray) --
- Return type:
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- mse(graph)¶
Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph
- Returns:
- msefloat
Mean square error for the model's fit P matrix
- Parameters:
graph (ndarray) --
- Return type:
- sample(n_samples=1)¶
Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
- Parameters:
- n_samplesint (default 1), optional
The number of graphs to sample
- Returns:
- graphsnp.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
- Parameters:
n_samples (int) --
- Return type:
ndarray
- score(graph)¶
Compute the average log-likelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
- Returns:
- scorefloat
sum of log-loglikelihoods for each potential edge in input graph
- Parameters:
graph (ndarray) --
- Return type:
- score_samples(graph, clip=None)¶
Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
- Parameters:
- graphnp.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute- clipscalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
- Returns:
- sample_scoresnp.ndarray (size of
graph
) log-likelihood per potential edge in the graph
- sample_scoresnp.ndarray (size of
- Parameters:
graph (ndarray) --
clip (float | None) --
- Return type:
ndarray
- set_fit_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter infit
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (RDPGEstimator) --
- Return type:
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it's possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, graph='$UNCHANGED$')¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
graph
parameter inscore
.
- Returns:
- selfobject
The updated object.
- Parameters:
self (RDPGEstimator) --
- Return type:
Edge swapping (configuration models)¶
- class graspologic.models.EdgeSwapper[source]¶
Degree Preserving Edge Swaps
This class allows for performing degree preserving edge swaps to generate new networks with the same degree sequence as the input network.
- Attributes:
- adjacencynp.ndarray OR csr_array, shape (n_verts, n_verts)
The initial adjacency matrix to perform edge swaps on. Must be unweighted and undirected.
- edge_listnp.ndarray, shape (n_verts, 2)
The corresponding edgelist for the input network
- seed: int, optional
Random seed to make outputs reproducible, must be positive
References
[1]Fosdick, B. K., Larremore, D. B., Nishimura, J., & Ugander, J. (2018). Configuring random graph models with fixed degree sequences. Siam Review, 60(2), 315-355.
[2]Carstens, C. J., & Horadam, K. J. (2017). Switching edges to randomize networks: what goes wrong and how to fix it. Journal of Complex Networks, 5(3), 337-351.
- __init__(adjacency, seed=None)[source]¶
- Parameters:
adjacency (ndarray | csr_array) --
seed (int | None) --
- swap_edges(n_swaps=1)[source]¶
Performs a number of edge swaps on the graph
- Parameters:
- n_swapsint (default 1), optional
The number of edge swaps to be performed
- Returns:
- adjacencynp.ndarray OR csr.matrix, shape (n_verts, n_verts)
The adjancency matrix after a number of edge swaps are performed on the graph
- edge_listnp.ndarray (n_verts, 2)
The edge_list after a number of edge swaps are perfomed on the graph
- Parameters:
n_swaps (int) --
- Return type:
Tuple[ndarray | csr_array, ndarray]