Embedding¶

Decomposition¶

graspologic.embed.select_dimension(X, n_components=None, n_elbows=2, threshold=None, return_likelihoods=False)[source]¶

Generates profile likelihood from array based on Zhu and Godsie method. Elbows correspond to the optimal embedding dimension.

Parameters:

X1d or 2d array-like: Input array generate profile likelihoods for. If 1d-array, it should be sorted in decreasing order. If 2d-array, shape should be (n_samples, n_features).
n_componentsint, optional, default: None.: Number of components to embed. If None, n_components = floor(log2(min(n_samples, n_features))). Ignored if X is 1d-array.
n_elbowsint, optional, default: 2.: Number of likelihood elbows to return. Must be > 1.
thresholdfloat, int, optional, default: None: If given, only consider the singular values that are > threshold. Must be >= 0.
return_likelihoodsbool, optional, default: False: If True, returns the all likelihoods associated with each elbow.

Returns:

elbowslist: Elbows indicate subsequent optimal embedding dimensions. Number of elbows may be less than n_elbows if there are not enough singular values.
sing_valslist: The singular values associated with each elbow.
likelihoodslist of array-like: Array of likelihoods of the corresponding to each elbow. Only returned if return_likelihoods is True.

Parameters:

X (ndarray | csr_array) --
n_components (int | None) --
n_elbows (int) --
threshold (float | None) --
return_likelihoods (bool) --

Return type:

Tuple[List[int], List[float]] | Tuple[List[int], List[float], List[ndarray]]

References

[1]

Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918-930.

graspologic.embed.select_svd(X, n_components=None, n_elbows=2, algorithm='randomized', n_iter=5, svd_seed=None)[source]¶

Dimensionality reduction using SVD.

Performs linear dimensionality reduction by using either full singular value decomposition (SVD) or truncated SVD. Full SVD is performed using SciPy's wrapper for ARPACK, while truncated SVD is performed using either SciPy's wrapper for LAPACK or Sklearn's implementation of randomized SVD.

It also performs optimal dimensionality selection using Zhu & Godsie algorithm if number of target dimension is not specified.

Parameters:

Xarray-like, shape (n_samples, n_features)

The data to perform svd on.

n_componentsint or None, default = None

Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension() using n_elbows argument.

n_elbowsint, optional, default: 2

If n_components is None, then compute the optimal embedding dimension using select_dimension(). Otherwise, ignored.

algorithm{'randomized' (default), 'full', 'truncated'}, optional

SVD solver to use:

'randomized'
Computes randomized svd using sklearn.utils.extmath.randomized_svd()
'full'
Computes full svd using scipy.linalg.svd() Does not support graph input of type scipy.sparse.csr_array
'truncated'
Computes truncated svd using scipy.sparse.linalg.svds()
'eigsh'
Computes svd of a real, symmetric square matrix using scipy.sparse.linalg.eigsh(). Extremely fast for these types of matrices.

n_iterint, optional (default = 5)

Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.

svd_seedint or None (default None)

Only applicable for algorithm="randomized"; allows you to seed the randomized svd solver for deterministic, albeit pseudo-randomized behavior.

Returns:

Uarray-like, shape (n_samples, n_components): Left singular vectors corresponding to singular values.
Darray-like, shape (n_components): Singular values in decreasing order, as a 1d array.
Varray-like, shape (n_components, n_samples): Right singular vectors corresponding to singular values.

Parameters:

X (ndarray | csr_array) --
n_components (int | None) --
n_elbows (int | None) --
algorithm (typing_extensions.Literal[full, truncated, randomized, eigsh]) --
n_iter (int) --
svd_seed (int | None) --

Return type:

Tuple[ndarray, ndarray, ndarray]

References

[1]

Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918-930.

Single graph embedding¶

class graspologic.embed.AdjacencySpectralEmbed[source]¶

Class for computing the adjacency spectral embedding of a graph.

The adjacency spectral embedding (ASE) is a k-dimensional Euclidean representation of the graph based on its adjacency matrix. It relies on an SVD to reduce the dimensionality to the specified k, or if k is unspecified, can find a number of dimensions automatically (see select_svd).

Read more in the Adjacency Spectral Embedding Tutorial

Parameters:

n_componentsint or None, default = None

Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension() using n_elbows argument.

n_elbowsint, optional, default: 2

If n_components is None, then compute the optimal embedding dimension using select_dimension(). Otherwise, ignored.

algorithm{'randomized' (default), 'full', 'truncated'}, optional

SVD solver to use:

'randomized'
Computes randomized svd using sklearn.utils.extmath.randomized_svd()
'full'
Computes full svd using scipy.linalg.svd() Does not support graph input of type scipy.sparse.csr_array
'truncated'
Computes truncated svd using scipy.sparse.linalg.svds()

n_iterint, optional (default = 5)

Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.

check_lccbool , optional (default = True)

Whether to check if input graph is connected. May result in non-optimal results if the graph is unconnected. If True and input is unconnected, a UserWarning is thrown. Not checking for connectedness may result in faster computation.

diag_augbool, optional (default = True)

Whether to replace the main diagonal of the adjacency matrix with a vector corresponding to the degree (or sum of edge weights for a weighted network) before embedding. Empirically, this produces latent position estimates closer to the ground truth.

concatbool, optional (default False)

If graph is directed, whether to concatenate left and right (out and in) latent positions along axis 1.

svd_seedint or None (default None)

Only applicable for algorithm="randomized"; allows you to seed the randomized svd solver for deterministic, albeit pseudo-randomized behavior.

Attributes:

n_features_in_: int: Number of features passed to the fit() method.
latent_left_array, shape (n_samples, n_components): Estimated left latent positions of the graph.
latent_right_array, shape (n_samples, n_components), or None: Only computed when the graph is directed, or adjacency matrix is assymetric. Estimated right latent positions of the graph. Otherwise, None.
singular_values_array, shape (n_components): Singular values associated with the latent position matrices.

See also

graspologic.embed.select_svd
graspologic.embed.select_dimension

Notes

The singular value decomposition:

\[A = U \Sigma V^T\]

is used to find an orthonormal basis for a matrix, which in our case is the adjacency matrix of the graph. These basis vectors (in the matrices U or V) are ordered according to the amount of variance they explain in the original matrix. By selecting a subset of these basis vectors (through our choice of dimensionality reduction) we can find a lower dimensional space in which to represent the graph.

References

[1]

Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Vol. 107(499), 2012

[2]

Levin, K., Roosta-Khorasani, F., Mahoney, M. W., & Priebe, C. E. (2018). Out-of-sample extension of graph adjacency spectral embedding. PMLR: Proceedings of Machine Learning Research, 80, 2975-2984.

__init__(n_components=None, n_elbows=2, algorithm='randomized', n_iter=5, check_lcc=True, diag_aug=True, concat=False, svd_seed=None)[source]¶

Parameters:

n_components (int | None) --
n_elbows (int | None) --
algorithm (typing_extensions.Literal[full, truncated, randomized, eigsh]) --
n_iter (int) --
check_lcc (bool) --
diag_aug (bool) --
concat (bool) --
svd_seed (int | None) --

Return type:

None

fit(graph, y=None, *args, **kwargs)[source]¶

Fit ASE model to input graph

Parameters:

grapharray-like, scipy.sparse.csr_array, or networkx.Graph: Input graph to embed.
y: Ignored

Returns:

selfobject: Returns an instance of self.

Parameters:

graph (ndarray | csr_array | Graph) --
y (Any | None) --
args (Any) --
kwargs (Any) --

Return type:

AdjacencySpectralEmbed

fit_transform(graph, y=None, *args, **kwargs)¶

Fit the model with graphs and apply the transformation.

n_dimension is either automatically determined or based on user input.

Parameters:

graph: np.ndarray or networkx.Graph: Input graph to embed.

Returns:

outnp.ndarray OR length 2 tuple of np.ndarray.: If undirected then returns single np.ndarray of latent position, shape(n_vertices, n_components). If directed, concat is True then concatenate latent matrices on axis 1, shape(n_vertices, 2*n_components). If directed, concat is False then tuple of the latent matrices. Each of shape (n_vertices, n_components).

Parameters:

graph (ndarray | csr_array | Graph) --
y (Any | None) --
args (Any) --
kwargs (Any) --

Return type:

ndarray | Tuple[ndarray, ndarray]

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_fit_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (AdjacencySpectralEmbed) --
graph (bool | None | str) --

Return type:

AdjacencySpectralEmbed

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X)¶

Obtain latent positions from an adjacency matrix or matrix of out-of-sample vertices. For more details on transforming out-of-sample vertices, see Out-of-Sample (OOS) Embedding

For mathematical background, see [2].

Parameters:

Xarray-like or tuple, original shape or (n_oos_vertices, n_vertices).

The original fitted matrix ("graph" in fit) or new out-of-sample data. If X is the original fitted matrix, returns a matrix close to self.fit_transform(X).

If X is an out-of-sample matrix, n_oos_vertices is the number of new vertices, and n_vertices is the number of vertices in the original graph. If tuple, graph is directed and X[0] contains edges from out-of-sample vertices to in-sample vertices.

Returns:

outnp.ndarray OR length 2 tuple of np.ndarray

Array of latent positions, shape (n_oos_vertices, n_components) or (n_vertices, n_components). Transforms the fitted matrix if it was passed in.

If X is an array or tuple containing adjacency vectors corresponding to new nodes, returns the estimated latent positions for the new out-of-sample adjacency vectors. If undirected, returns array. If directed, returns (X_out, X_in), where X_out contains latent positions corresponding to nodes with edges from out-of-sample vertices to in-sample vertices.

Notes

If the matrix was diagonally augmented (e.g., self.diag_aug was True), fit followed by transform will produce a slightly different matrix than fit_transform.

To get the original embedding, using fit_transform is recommended. In the directed case, if A is the original in-sample adjacency matrix, the tuple (A.T, A) will need to be passed to transform if you do not wish to use fit_transform.

References

[1]

Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Vol. 107(499), 2012

[2]

Levin, K., Roosta-Khorasani, F., Mahoney, M. W., & Priebe, C. E. (2018). Out-of-sample extension of graph adjacency spectral embedding. PMLR: Proceedings of Machine Learning Research, 80, 2975-2984

class graspologic.embed.LaplacianSpectralEmbed[source]¶

Class for computing the laplacian spectral embedding of a graph.

The laplacian spectral embedding (LSE) is a k-dimensional Euclidean representation of the graph based on its Laplacian matrix. It relies on an SVD to reduce the dimensionality to the specified n_components, or if n_components is unspecified, can find a number of dimensions automatically.

Parameters:

form{'DAD' (default), 'I-DAD', 'R-DAD'}, optional

Specifies the type of Laplacian normalization to use. See to_laplacian() for more details regarding form.

n_componentsint or None, default = None

Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension() using n_elbows argument.

n_elbowsint, optional, default: 2

If n_components is None, then compute the optimal embedding dimension using select_dimension(). Otherwise, ignored.

algorithm{'randomized' (default), 'full', 'truncated'}, optional

SVD solver to use:

'randomized'
Computes randomized svd using sklearn.utils.extmath.randomized_svd()
'full'
Computes full svd using scipy.linalg.svd()
'truncated'
Computes truncated svd using scipy.sparse.linalg.svds()

n_iterint, optional (default = 5)

Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.

check_lccbool , optional (defult = True)

Whether to check if input graph is connected. May result in non-optimal results if the graph is unconnected. If True and input is unconnected, a UserWarning is thrown. Not checking for connectedness may result in faster computation.

regularizer: int, float or None, optional (default=None)

Constant to be added to the diagonal of degree matrix. If None, average node degree is added. If int or float, must be >= 0. Only used when form is 'R-DAD'.

concatbool, optional (default False)

If graph is directed, whether to concatenate left and right (out and in) latent positions along axis 1.

Attributes:

n_features_in_: int: Number of features passed to the fit() method.
latent_left_array, shape (n_samples, n_components): Estimated left latent positions of the graph.
latent_right_array, shape (n_samples, n_components), or None: Only computed when the graph is directed, or adjacency matrix is assymetric. Estimated right latent positions of the graph. Otherwise, None.
singular_values_array, shape (n_components): Singular values associated with the latent position matrices.
svd_seedint or None (default None): Only applicable for algorithm="randomized"; allows you to seed the randomized svd solver for deterministic, albeit pseudo-randomized behavior.

See also

graspologic.embed.select_svd
graspologic.embed.select_dimension
graspologic.utils.to_laplacian

Notes

The singular value decomposition:

\[A = U \Sigma V^T\]

is used to find an orthonormal basis for a matrix, which in our case is the Laplacian matrix of the graph. These basis vectors (in the matrices U or V) are ordered according to the amount of variance they explain in the original matrix. By selecting a subset of these basis vectors (through our choice of dimensionality reduction) we can find a lower dimensional space in which to represent the graph.

References

[1]

Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Vol. 107(499), 2012.

[2]

Von Luxburg, Ulrike. "A tutorial on spectral clustering," Statistics and computing, Vol. 17(4), pp. 395-416, 2007.

[3]

Rohe, Karl, Sourav Chatterjee, and Bin Yu. "Spectral clustering and the high-dimensional stochastic blockmodel," The Annals of Statistics, Vol. 39(4), pp. 1878-1915, 2011.

__init__(form='DAD', n_components=None, n_elbows=2, algorithm='randomized', n_iter=5, check_lcc=True, regularizer=None, concat=False, svd_seed=None)[source]¶

Parameters:

form (typing_extensions.Literal[I-DAD, DAD, R-DAD]) --
n_components (int | None) --
n_elbows (int | None) --
algorithm (typing_extensions.Literal[full, truncated, randomized, eigsh]) --
n_iter (int) --
check_lcc (bool) --
regularizer (float | None) --
concat (bool) --
svd_seed (int | None) --

fit(graph, y=None, *args, **kwargs)[source]¶

Fit LSE model to input graph

By default, uses the Laplacian normalization of the form:

\[L = D^{-1/2} A D^{-1/2}\]

Parameters:

grapharray-like, scipy.sparse.csr_array, or networkx.Graph: Input graph to embed. see graspologic.utils.import_graph

Returns:

selfobject: Returns an instance of self.

Parameters:

graph (ndarray | csr_array | Graph) --
y (Any | None) --
args (Any) --
kwargs (Any) --

Return type:

LaplacianSpectralEmbed

fit_transform(graph, y=None, *args, **kwargs)¶

Fit the model with graphs and apply the transformation.

n_dimension is either automatically determined or based on user input.

Parameters:

graph: np.ndarray or networkx.Graph: Input graph to embed.

Returns:

outnp.ndarray OR length 2 tuple of np.ndarray.: If undirected then returns single np.ndarray of latent position, shape(n_vertices, n_components). If directed, concat is True then concatenate latent matrices on axis 1, shape(n_vertices, 2*n_components). If directed, concat is False then tuple of the latent matrices. Each of shape (n_vertices, n_components).

Parameters:

graph (ndarray | csr_array | Graph) --
y (Any | None) --
args (Any) --
kwargs (Any) --

Return type:

ndarray | Tuple[ndarray, ndarray]

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_fit_request(*, graph='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graph parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (LaplacianSpectralEmbed) --
graph (bool | None | str) --

Return type:

LaplacianSpectralEmbed

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X)¶

Obtain latent positions from an adjacency matrix or matrix of out-of-sample vertices. For more details on transforming out-of-sample vertices, see Out-of-Sample (OOS) Embedding

For mathematical background, see [2].

Parameters:

Xarray-like or tuple, original shape or (n_oos_vertices, n_vertices).

The original fitted matrix ("graph" in fit) or new out-of-sample data. If X is the original fitted matrix, returns a matrix close to self.fit_transform(X).

If X is an out-of-sample matrix, n_oos_vertices is the number of new vertices, and n_vertices is the number of vertices in the original graph. If tuple, graph is directed and X[0] contains edges from out-of-sample vertices to in-sample vertices.

Returns:

outnp.ndarray OR length 2 tuple of np.ndarray

Array of latent positions, shape (n_oos_vertices, n_components) or (n_vertices, n_components). Transforms the fitted matrix if it was passed in.

If X is an array or tuple containing adjacency vectors corresponding to new nodes, returns the estimated latent positions for the new out-of-sample adjacency vectors. If undirected, returns array. If directed, returns (X_out, X_in), where X_out contains latent positions corresponding to nodes with edges from out-of-sample vertices to in-sample vertices.

Notes

If the matrix was diagonally augmented (e.g., self.diag_aug was True), fit followed by transform will produce a slightly different matrix than fit_transform.

To get the original embedding, using fit_transform is recommended. In the directed case, if A is the original in-sample adjacency matrix, the tuple (A.T, A) will need to be passed to transform if you do not wish to use fit_transform.

References

[1]

Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Vol. 107(499), 2012

[2]

Levin, K., Roosta-Khorasani, F., Mahoney, M. W., & Priebe, C. E. (2018). Out-of-sample extension of graph adjacency spectral embedding. PMLR: Proceedings of Machine Learning Research, 80, 2975-2984

graspologic.embed.node2vec_embed(graph, num_walks=10, walk_length=40, return_hyperparameter=1.0, inout_hyperparameter=1.0, dimensions=128, window_size=2, workers=8, iterations=3, interpolate_walk_lengths_by_node_degree=True, random_seed=None)[source]¶

Generates a node2vec embedding from a given graph. Will follow the word2vec algorithm to create the embedding.

Parameters:

graph: Union[nx.Graph, nx.DiGraph]: A networkx graph or digraph. A multigraph should be turned into a non-multigraph so that the calling user properly handles the multi-edges (i.e. aggregate weights or take last edge weight). If the graph is unweighted, the weight of each edge will default to 1.
num_walksint: Number of walks per source. Default is 10.
walk_length: int: Length of walk per source. Default is 40.
return_hyperparameterfloat: Return hyperparameter (p). Default is 1.0
inout_hyperparameterfloat: Inout hyperparameter (q). Default is 1.0
dimensionsint: Dimensionality of the word vectors. Default is 128.
window_sizeint: Maximum distance between the current and predicted word within a sentence. Default is 2.
workersint: Use these many worker threads to train the model. Default is 8.
iterationsint: Number of epochs in stochastic gradient descent (SGD). Default is 3.
interpolate_walk_lengths_by_node_degreebool: Use a dynamic walk length that corresponds to each nodes degree. If the node is in the bottom 20 percentile, default to a walk length of 1. If it is in the top 10 percentile, use walk_length. If it is in the 20-80 percentiles, linearly interpolate between 1 and walk_length. This will reduce lower degree nodes from biasing your resulting embedding. If a low degree node has the same number of walks as a high degree node (which it will if this setting is not on), then the lower degree nodes will take a smaller breadth of random walks when compared to the high degree nodes. This will result in your lower degree walks dominating your higher degree nodes.
random_seedint: Seed to be used for reproducible results. Default is None and will produce a random output. Note that for a fully deterministically-reproducible run, you must also limit to a single worker thread (workers=1), to eliminate ordering jitter from OS thread scheduling. In addition the environment variable PYTHONHASHSEED must be set to control hash randomization.

Returns:

Tuple[np.array, List[Any]]: A tuple containing a matrix, with each row index corresponding to the embedding for each node. The tuple also contains a vector containing the corresponding vertex labels for each row in the matrix. The matrix and vector are positionally correlated.

Parameters:

graph (Graph | DiGraph) --
num_walks (int) --
walk_length (int) --
return_hyperparameter (float) --
inout_hyperparameter (float) --
dimensions (int) --
window_size (int) --
workers (int) --
iterations (int) --
interpolate_walk_lengths_by_node_degree (bool) --
random_seed (int | None) --

Return type:

Tuple[ndarray, List[Any]]

Notes

The original reference implementation of node2vec comes from Aditya Grover from: https://github.com/aditya-grover/node2vec/.
Further details on the Alias Method used in this functionality can be found at: https://lips.cs.princeton.edu/the-alias-method-efficient-sampling-with-many-discrete-outcomes/

References

[1]

Aditya Grover and Jure Leskovec "node2vec: Scalable Feature Learning for Networks." Knowledge Discovery and Data Mining, 2016.

Multiple graph embedding¶

class graspologic.embed.OmnibusEmbed[source]¶

Omnibus embedding of arbitrary number of input graphs with matched vertex sets.

Given $A_1, A_2, ..., A_m$ a collection of (possibly weighted) adjacency matrices of a collection $m$ undirected graphs with matched vertices. Then the $(mn \times mn)$ omnibus matrix, $M$, has the subgraph where $M_{ij} = \frac{1}{2}(A_i + A_j)$. The omnibus matrix is then embedded using adjacency spectral embedding.

Parameters:

n_componentsint or None, default = None

Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension() using n_elbows argument.

n_elbowsint, optional, default: 2

If n_components is None, then compute the optimal embedding dimension using select_dimension(). Otherwise, ignored.

algorithm{'randomized' (default), 'full', 'truncated'}, optional

SVD solver to use:

'randomized'
Computes randomized svd using sklearn.utils.extmath.randomized_svd()
'full'
Computes full svd using scipy.linalg.svd()
'truncated'
Computes truncated svd using scipy.sparse.linalg.svds()

n_iterint, optional (default = 5)

Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.

check_lccbool , optional (defult = True)

Whether to check if the average of all input graphs are connected. May result in non-optimal results if the average graph is unconnected. If True and average graph is unconnected, a UserWarning is thrown.

diag_augbool, optional (default = True)

Whether to replace the main diagonal of each adjacency matrices with a vector corresponding to the degree (or sum of edge weights for a weighted network) before embedding.

concatbool, optional (default = False)

If graph(s) are directed, whether to concatenate each graph's left and right (out and in) latent positions along axis 1.

svd_seedint or None (default = None)

Only applicable for algorithm="randomized"; allows you to seed the randomized svd solver for deterministic, albeit pseudo-randomized behavior.

lsebool, optional (default = False)

Whether to construct the Omni matrix use the laplacian matrices of the graphs and embed the Omni matrix with LSE

Attributes:

n_graphs_int: Number of graphs
n_vertices_int: Number of vertices in each graph
latent_left_array, shape (n_graphs, n_vertices, n_components): Estimated left latent positions of the graph.
latent_right_array, shape (n_graphs, n_vertices, n_components), or None: Only computed when the graph is directed, or adjacency matrix is asymmetric. Estimated right latent positions of the graph. Otherwise, None.
singular_values_array, shape (n_components): Singular values associated with the latent position matrices.

See also

graspologic.embed.select_svd
graspologic.embed.select_dimension

References

[1]

Levin, K., Athreya, A., Tang, M., Lyzinski, V., & Priebe, C. E. (2017, November). A central limit theorem for an omnibus embedding of multiple random dot product graphs. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on (pp. 964-967). IEEE.

__init__(n_components=None, n_elbows=2, algorithm='randomized', n_iter=5, check_lcc=True, diag_aug=True, concat=False, svd_seed=None, lse=False)[source]¶

Parameters:

n_components (int | None) --
n_elbows (int | None) --
algorithm (typing_extensions.Literal[full, truncated, randomized, eigsh]) --
n_iter (int) --
check_lcc (bool) --
diag_aug (bool) --
concat (bool) --
svd_seed (int | None) --
lse (bool) --

fit(graphs, y=None)[source]¶

Fit the model with graphs.

Parameters:

graphslist of nx.Graph or ndarray, or csr_array: If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).

Returns:

selfobject: Returns an instance of self.

fit_transform(graphs, y=None)[source]¶

Fit the model with graphs and apply the embedding on graphs. n_components is either automatically determined or based on user input.

Parameters:

graphslist of nx.Graph or ndarray, or ndarray: If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).

Returns:

outnp.ndarray or length 2 tuple of np.ndarray.: If input graphs were symmetric, ndarray of shape (n_graphs, n_vertices, n_components). If graphs were directed and concat is False, returns tuple of two arrays (same shape as above). The first corresponds to the left latent positions, and the second to the right latent positions. If graphs were directed and concat is True, left and right (out and in) latent positions are concatenated. In this case one tensor of shape (n_graphs, n_vertices, 2*n_components) is returned.

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_fit_request(*, graphs='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graphs parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (OmnibusEmbed) --
graphs (bool | None | str) --

Return type:

OmnibusEmbed

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X)¶

Obtain latent positions from an adjacency matrix or matrix of out-of-sample vertices. For more details on transforming out-of-sample vertices, see Out-of-Sample (OOS) Embedding

For mathematical background, see [2].

Parameters:

Xarray-like or tuple, original shape or (n_oos_vertices, n_vertices).

The original fitted matrix ("graph" in fit) or new out-of-sample data. If X is the original fitted matrix, returns a matrix close to self.fit_transform(X).

If X is an out-of-sample matrix, n_oos_vertices is the number of new vertices, and n_vertices is the number of vertices in the original graph. If tuple, graph is directed and X[0] contains edges from out-of-sample vertices to in-sample vertices.

Returns:

outnp.ndarray OR length 2 tuple of np.ndarray

Array of latent positions, shape (n_oos_vertices, n_components) or (n_vertices, n_components). Transforms the fitted matrix if it was passed in.

If X is an array or tuple containing adjacency vectors corresponding to new nodes, returns the estimated latent positions for the new out-of-sample adjacency vectors. If undirected, returns array. If directed, returns (X_out, X_in), where X_out contains latent positions corresponding to nodes with edges from out-of-sample vertices to in-sample vertices.

Notes

If the matrix was diagonally augmented (e.g., self.diag_aug was True), fit followed by transform will produce a slightly different matrix than fit_transform.

To get the original embedding, using fit_transform is recommended. In the directed case, if A is the original in-sample adjacency matrix, the tuple (A.T, A) will need to be passed to transform if you do not wish to use fit_transform.

References

[1]

Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Vol. 107(499), 2012

[2]

Levin, K., Roosta-Khorasani, F., Mahoney, M. W., & Priebe, C. E. (2018). Out-of-sample extension of graph adjacency spectral embedding. PMLR: Proceedings of Machine Learning Research, 80, 2975-2984

class graspologic.embed.MultipleASE[source]¶

Multiple Adjacency Spectral Embedding (MASE) embeds arbitrary number of input graphs with matched vertex sets.

For a population of undirected graphs, MASE assumes that the population of graphs is sampled from $VR^{(i)}V^T$ where $V \in \mathbb{R}^{n\times d}$ and $R^{(i)} \in \mathbb{R}^{d\times d}$. Score matrices, $R^{(i)}$, are allowed to vary for each graph, but are symmetric. All graphs share a common a latent position matrix $V$.

For a population of directed graphs, MASE assumes that the population is sampled from $UR^{(i)}V^T$ where $U \in \mathbb{R}^{n\times d_1}$, $V \in \mathbb{R}^{n\times d_2}$, and $R^{(i)} \in \mathbb{R}^{d_1\times d_2}$. In this case, score matrices $R^{(i)}$ can be assymetric and non-square, but all graphs still share a common latent position matrices $U$ and $V$.

Parameters:

n_componentsint or None, default = None

Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension() using n_elbows argument.

n_elbowsint, optional, default: 2

If n_components is None, then compute the optimal embedding dimension using select_dimension(). Otherwise, ignored.

algorithm{'randomized' (default), 'full', 'truncated'}, optional

SVD solver to use:

'randomized'
Computes randomized svd using sklearn.utils.extmath.randomized_svd()
'full'
Computes full svd using scipy.linalg.svd()
'truncated'
Computes truncated svd using scipy.sparse.linalg.svds()

n_iterint, optional (default = 5)

Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.

scaledbool, optional (default=True)

Whether to scale individual eigenvectors with eigenvalues in first embedding stage.

diag_augbool, optional (default = True)

Whether to replace the main diagonal of each adjacency matrices with a vector corresponding to the degree (or sum of edge weights for a weighted network) before embedding.

concatbool, optional (default False)

If graph(s) are directed, whether to concatenate each graph's left and right (out and in) latent positions along axis 1.

svd_seedint or None (default None)

Only applicable for algorithm="randomized"; allows you to seed the randomized svd solver for deterministic, albeit pseudo-randomized behavior.

Attributes:

n_graphs_int: Number of graphs
n_vertices_int: Number of vertices in each graph
latent_left_array, shape (n_samples, n_components): Estimated left latent positions of the graph.
latent_right_array, shape (n_samples, n_components), or None: Estimated right latent positions of the graph. Only computed when the an input graph is directed, or adjacency matrix is assymetric. Otherwise, None.
scores_array, shape (n_samples, n_components, n_components): Estimated $\hat{R}$ matrices for each input graph.
singular_values_array, shape (n_components) OR length 2 tuple of arrays: If input graph is undirected, equal to the singular values of the concatenated adjacency spectral embeddings. If input graph is directed, singular_values_ is a tuple of length 2, where singular_values_[0] corresponds to the singular values of the concatenated left adjacency spectral embeddings, and singular_values_[1] corresponds to the singular values of the concatenated right adjacency spectral embeddings.

Notes

When an input graph is directed, n_components of latent_left_ may not be equal to n_components of latent_right_.

__init__(n_components=None, n_elbows=2, algorithm='randomized', n_iter=5, scaled=True, diag_aug=True, concat=False, svd_seed=None)[source]¶

Parameters:

n_components (int | None) --
n_elbows (int | None) --
algorithm (typing_extensions.Literal[full, truncated, randomized, eigsh]) --
n_iter (int) --
scaled (bool) --
diag_aug (bool) --
concat (bool) --
svd_seed (int | None) --

fit(graphs, y=None)[source]¶

Fit the model with graphs.

Parameters:

graphslist of nx.Graph, ndarray or scipy.sparse.csr_array: If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray or csr_array, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).

Returns:

selfobject: Returns an instance of self.

fit_transform(graphs, y=None)[source]¶

Fit the model with graphs and apply the embedding on graphs. n_components is either automatically determined or based on user input.

Parameters:

graphslist of nx.Graph, ndarray or scipy.sparse.csr_array: If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray or csr_array, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).

Returns:

outnp.ndarray or length 2 tuple of np.ndarray.: If input graphs were symmetric shape (n_vertices, n_components). If graphs were directed and concat is False, returns tuple of two arrays (same shape as above). The first corresponds to the left latent positions, and the second to the right latent positions. When concat is True left and right (out and in) latent positions are concatenated along axis 1.

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_fit_request(*, graphs='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graphs parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (MultipleASE) --
graphs (bool | None | str) --

Return type:

MultipleASE

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X)¶

Obtain latent positions from an adjacency matrix or matrix of out-of-sample vertices. For more details on transforming out-of-sample vertices, see Out-of-Sample (OOS) Embedding

For mathematical background, see [2].

Parameters:

Xarray-like or tuple, original shape or (n_oos_vertices, n_vertices).

The original fitted matrix ("graph" in fit) or new out-of-sample data. If X is the original fitted matrix, returns a matrix close to self.fit_transform(X).

If X is an out-of-sample matrix, n_oos_vertices is the number of new vertices, and n_vertices is the number of vertices in the original graph. If tuple, graph is directed and X[0] contains edges from out-of-sample vertices to in-sample vertices.

Returns:

outnp.ndarray OR length 2 tuple of np.ndarray

Array of latent positions, shape (n_oos_vertices, n_components) or (n_vertices, n_components). Transforms the fitted matrix if it was passed in.

If X is an array or tuple containing adjacency vectors corresponding to new nodes, returns the estimated latent positions for the new out-of-sample adjacency vectors. If undirected, returns array. If directed, returns (X_out, X_in), where X_out contains latent positions corresponding to nodes with edges from out-of-sample vertices to in-sample vertices.

Notes

If the matrix was diagonally augmented (e.g., self.diag_aug was True), fit followed by transform will produce a slightly different matrix than fit_transform.

To get the original embedding, using fit_transform is recommended. In the directed case, if A is the original in-sample adjacency matrix, the tuple (A.T, A) will need to be passed to transform if you do not wish to use fit_transform.

References

[1]

Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Vol. 107(499), 2012

[2]

Levin, K., Roosta-Khorasani, F., Mahoney, M. W., & Priebe, C. E. (2018). Out-of-sample extension of graph adjacency spectral embedding. PMLR: Proceedings of Machine Learning Research, 80, 2975-2984

class graspologic.embed.mug2vec[source]¶

Multigraphs-2-vectors (mug2vec).

mug2vec is a sequence of three algorithms that learns a feature vector for each input graph.

Steps:

1. Pass to ranks - ranks all edge weights from smallest to largest valued edges then normalize by a constant.

2. Omnibus embedding - jointly learns a low dimensional matrix representation for all graphs under the random dot product model (RDPG).

3. Classical MDS (cMDS) - learns a feature vector for each graph by computing Euclidean distance between each pair of graph embeddings from omnibus embedding, followed by an eigen decomposition.

Parameters:

pass_to_ranks: {'simple-nonzero' (default), 'simple-all', 'zero-boost'} string, or None

'simple-nonzero'
assigns ranks to all non-zero edges, settling ties using the average. Ranks are then scaled by $\frac{rank(\text{non-zero edges})}{\text{total non-zero edges} + 1}$
'simple-all'
assigns ranks to all non-zero edges, settling ties using the average. Ranks are then scaled by $\frac{rank(\text{non-zero edges})}{n^2 + 1}$ where n is the number of nodes
'zero-boost'
preserves the edge weight for all 0s, but ranks the other edges as if the ranks of all 0 edges has been assigned. If there are 10 0-valued edges, the lowest non-zero edge gets weight 11 / (number of possible edges). Ties settled by the average of the weight that those edges would have received. Number of possible edges is determined by the type of graph (loopless or looped, directed or undirected).
None
No pass to ranks applied.

omnibus_components, cmds_componentsint or None, default = None

Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension() using n_elbows argument.

omnibus_n_elbows, cmds_n_elbows: int, optional, default: 2

If n_components is None, then compute the optimal embedding dimension using select_dimension(). Otherwise, ignored.

svd_seedint or None (default None)

Allows you to seed the randomized svd solver used in the Omnibus embedding for deterministic, albeit pseudo-randomized behavior.

Attributes:

omnibus_n_components_int: Equals the parameter n_components. If input n_components was None, then equals the optimal embedding dimension.
cmds_n_components_int: Equals the parameter n_components. If input n_components was None, then equals the optimal embedding dimension.
embeddings_array, shape (n_components, n_features): Embeddings from the pipeline. Each graph is a point in n_features dimensions.

See also

graspologic.utils.pass_to_ranks
graspologic.embed.OmnibusEmbed
graspologic.embed.ClassicalMDS
graspologic.embed.select_dimension

__init__(pass_to_ranks='simple-nonzero', omnibus_components=None, omnibus_n_elbows=2, cmds_components=None, cmds_n_elbows=2, svd_seed=None)[source]¶

Parameters:

pass_to_ranks (typing_extensions.Literal[simple-nonzero, simple-all, zero-boost]) --
omnibus_components (int | None) --
omnibus_n_elbows (int) --
cmds_components (int | None) --
cmds_n_elbows (int) --
svd_seed (int | None) --

Return type:

None

fit(graphs, y=None)[source]¶

Computes a vector for each graph.

Parameters:

graphslist of nx.Graph or ndarray, or ndarray: If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).
yIgnored

Returns:

selfreturns an instance of self.

Parameters:

graphs (List[ndarray | csr_array | Graph]) --
y (Any | None) --

Return type:

mug2vec

fit_transform(graphs, y=None)[source]¶

Computes a vector for each graph.

Parameters:

graphslist of nx.Graph or ndarray, or ndarray: If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).
yIgnored

Returns:

embeddingsembeddings generated by fit.

Parameters:

graphs (List[ndarray | csr_array | Graph]) --
y (Any | None) --

Return type:

ndarray

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_fit_request(*, graphs='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

graphsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for graphs parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (mug2vec) --
graphs (bool | None | str) --

Return type:

mug2vec

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

Dissimilarity graph embedding¶

class graspologic.embed.ClassicalMDS[source]¶

Classical multidimensional scaling (cMDS).

cMDS seeks a low-dimensional representation of the data in which the distances respect well the distances in the original high-dimensional space.

Parameters:

n_componentsint, or None (default=None)

Number of components to keep. If None, then it will run select_dimension() to find the optimal embedding dimension.

n_elbowsint, or None (default=2)

If n_components is None, then compute the optimal embedding dimension using select_dimension(). Otherwise, ignored.

dissimilarity'euclidean' | 'precomputed', optional, default: 'euclidean'

Dissimilarity measure to use:

'euclidean'
Pairwise Euclidean distances between points in the dataset.
'precomputed'
Pre-computed dissimilarities are passed directly to fit() and fit_transform().

Attributes:

n_components_int: Equals the parameter n_components. If input n_components was None, then equals the optimal embedding dimension.
n_features_in_: int: Number of features passed to the fit() method.
components_array, shape (n_components, n_features): Principal axes in feature space.
singular_values_array, shape (n_components,): The singular values corresponding to each of the selected components.
dissimilarity_matrix_array, shape (n_features, n_features): Dissimilarity matrix
svd_seedint or None (default None): Only applicable for n_components!=1; allows you to seed the randomized svd solver for deterministic, albeit pseudo-randomized behavior.

See also

graspologic.embed.select_dimension

References

Wickelmaier, Florian. "An introduction to MDS." Sound Quality Research Unit, Aalborg University, Denmark 46.5 (2003).

__init__(n_components=None, n_elbows=2, dissimilarity='euclidean', svd_seed=None)[source]¶

Parameters:

n_components (int | None) --
n_elbows (int) --
dissimilarity (typing_extensions.Literal[euclidean, precomputed]) --
svd_seed (int | None) --

Return type:

None

fit(X, y=None)[source]¶

Fit the model with X.

Parameters:

Xarray_like: If dissimilarity=='precomputed', the input should be the dissimilarity matrix with shape (n_samples, n_samples). If dissimilarity=='euclidean', then the input should be 2d-array with shape (n_samples, n_features) or a 3d-array with shape (n_samples, n_features_1, n_features_2).

Returns:

selfobject: Returns an instance of self.

Parameters:

X (ndarray) --
y (Any | None) --

Return type:

ClassicalMDS

fit_transform(X, y=None)[source]¶

Fit the data from X, and returns the embedded coordinates.

Parameters:

Xnd-array: If dissimilarity=='precomputed', the input should be the dissimilarity matrix with shape (n_samples, n_samples). If dissimilarity=='euclidean', then the input should be array with shape (n_samples, n_features) or a nd-array with shape (n_samples, n_features_1, n_features_2, ..., n_features_d). First axis of nd-array must be n_samples.

Returns:

X_newarray-like, shape (n_samples, n_components): Embedded input.

Parameters:

X (ndarray) --
y (Any | None) --

Return type:

ndarray

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.