Nomination¶

Spectral Vertex Nomination¶

class graspologic.nominate.SpectralVertexNomination[source]¶

Class for spectral vertex nomination on a single graph.

Given a graph $G=(V,E)$ and a subset of $V$ called $S$ (the "seed"), Single Graph Vertex Nomination is the problem of ranking all $V$ in order of relation to members of $S$. Spectral Vertex Nomination solves this problem by embedding $G$ into a low dimensional euclidean space (see: Adjacency Spectral Embed Tutorial ), and then generating a nomination list by some distance based algorithm. In the simple unattributed case, for each seed vertex $u$, the other vertices are ranked in order of euclidean distance from $u$.

Parameters:

input_graph: bool, default = True

Flag whether to expect two full graphs, or the embeddings.

True
.fit and .fit_predict() expect graphs as adjacency matrix, provided as ndarray of shape (n, n). They will be embedded using the specified embedder.
False
.fit() and .fit_predict() expect an embedding of the graph, i.e. a ndarray of size (n, d).

embedder: str or BaseSpectralEmbed, default = 'ASE'

May provide either a embed object or a string indicating which embedding method to use, which may be either: "ASE" for AdjacencySpectralEmbed or "LSE" for LaplacianSpectralEmbed.

n_neighbors: int, default=None

The number of vertices to nominate for each seed.

metricstr, default = 'euclidean'

Distance metric to use when finding the nearest neighbors, all sklearn metrics available.

metric_paramsdict, default = None

Arguments for the sklearn DistanceMetric specified via metric parameter.

Attributes:

nearest_neighbors_sklearn.neighbors.NearestNeighbors: A fit sklearn NearestNeighbors classifier used to find closest vertices to each seed.

References

[1]

Fishkind, D. E.; Lyzinski, V.; Pao, H.; Chen, L.; Priebe, C. E. Vertex nomination schemes for membership prediction. Ann. Appl. Stat. 9 2015. https://projecteuclid.org/euclid.aoas/1446488749

[2]

Jordan Yoder, Li Chen, Henry Pao, Eric Bridgeford, Keith Levin, Donniell E. Fishkind, Carey Priebe, Vince Lyzinski, Vertex nomination: The canonical sampling and the extended spectral nomination schemes, Computational Statistics & Data Analysis, Volume 145, 2020. http://www.sciencedirect.com/science/article/pii/S0167947320300074

__init__(input_graph=True, embedder='ase', n_neighbors=None, metric='euclidean', metric_params=None)[source]¶

Parameters:

input_graph (bool) --
embedder (typing_extensions.Literal[ase, ASE, lse, LSE] | BaseSpectralEmbed) --
n_neighbors (int | None) --
metric (str) --
metric_params (Dict[str, Any] | None) --

fit(X, y=None)[source]¶

Constructs the embedding if not provided, then calculates the pairwise distance from each seed to each vertex in graph.

Parameters:

Xnp.ndarray

If input_graph is True
Expects a graph as an adjacency matrix, i.e. an ndarray of shape (n, n). Will be embedded using the specified embedder.
If input_graph is False
Expects an embedding of the graph, i.e. a ndarray of size (n, d).

yNoneType

Included by sklearn convention.

Returns:

selfobject: Returns an instance of self.

Parameters:

X (ndarray) --
y (Any | None) --

Return type:

SpectralVertexNomination

predict(y)[source]¶

Nominates vertices for each seed vertex. Methodology is distance based ranking.

Parameters:

ynp.ndarray: The indices of the seed vertices. Should be a dim 1 array with length less than $|V|$.

Returns:

Nomination Listnp.ndarray: Shape is (number_vertices, number_vertices_in_seed) . Each column is a seed vertex, and the rows of each column are a list of vertex indexes from the original adjacency matrix in order degree of match.
Distance Matrixnp.ndarray: The matrix of distances associated with each element of the nomination list.

Parameters:

y (List | ndarray) --

Return type:

Tuple[ndarray, ndarray]

fit_predict(X, y)[source]¶

Calls this class' fit and then predict methods.

Parameters:

Xnp.ndarray

If input_graph is True
Expects a graph as an adjacency matrix, i.e. an ndarray of shape (n, n). Will be embedded using the specified embedder.
If input_graph is False
Expects an embedding of the graph, i.e. a ndarray of size (n, d).

ynp.ndarray.

List of unattributed seed vertex indices.

Returns:

Nomination Listnp.ndarray: Shape is (number_vertices, number_vertices_in_seed) . Each column is a seed vertex, and the rows of each column are a list of vertex indexes from the original adjacency matrix in order degree of match.
Distance Matrixnp.ndarray: The matrix of distances associated with each element of the nomination list.

Parameters:

X (ndarray) --
y (ndarray) --

Return type:

Tuple[ndarray, ndarray]

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

Vertex Nomination via SGM¶

class graspologic.nominate.VNviaSGM[source]¶

This class implements Vertex Nomination via Seeded Graph Matching (VNviaSGM) with the algorithm described in [1].

Rather than providing a 1-1 matching for the vertices of two graphs, as in GraphMatch, VNviaSGM ranks the potential matches for a vertex of interst (VOI) in one to graph to the vertices in another graph, based on probability of matching.

Parameters:

order_voi_subgraph: int, positive (default = 1): Order used to create induced subgraph on A about VOI where the max distance between VOI and other nodes is order_voi_subgraph. This induced subgraph will be used to determine what seeds are used when the SGM algorithm is called. If no seeds are in this subgraph about VOI, then a UserWarning is thrown, and nomination_list_ is None.
order_seeds_subgraph: int, positive (default = 1): Order used to create induced subgraphs on A and B. These subgraphs are centered about the seeds that were determined by the subgraph generated by order_voi_subgraph. These two subgraphs will be passed into the SGM algorithm.
n_init: int, positive (default = 100): Number of random initializations of the seeded graph matching algorithm (SGM). Increasing the number of restarts will make the probabilities returned more precise.
max_nominations: int (default = None): Max number of nominations to include in the nomination list. If None is passed, then all nominations computed will be returned.
graph_match_kwsdict (default = {}): Gives users the option to pass custom arguments to the graph matching algorithm. Format should be {'arg_name': arg_value, ...}. See GraphMatch

Attributes:

n_seeds_: int: Number of seeds passed in seedsA that occured in the induced subgraph about VOI
nomination_list_: 2d-array: An array containing vertex nominations in the form nomination list = [[j, p_val],...] where p_val is the probability that the VOI matches to node j in graph B (sorted by descending probability)

Notes

VNviaSGM generates an initial induced subgraph about the VOI to determine which seeds are close enough to be used. If no seeds are close enough, then a warning is thrown and nomination_list_ is set to None.

All the seeds that are close enough are then used to generate subgraphs in both A and B. These subgraphs are matched using several random initializations of the seeded graph matching algorithm (SGM), and a nomination list is returned. See GraphMatch for SGM docs

References

[1]

Patsolic, HG, Park, Y, Lyzinski, V, Priebe, CE. Vertex nomination via seeded graph matching. Stat Anal Data Min: The ASA Data Sci Journal. 2020; 13: 229– 244. https://doi.org/10.1002/sam.11454

nomination_list_: ndarray | None¶

__init__(order_voi_subgraph=1, order_seeds_subgraph=1, n_init=100, max_nominations=None, graph_match_kws={})[source]¶

Parameters:

order_voi_subgraph (int) --
order_seeds_subgraph (int) --
n_init (int) --
max_nominations (int | None) --
graph_match_kws (Dict[str, Any]) --

fit(A, B, voi, seeds)[source]¶

Fits the model to two graphs.

Parameters:

A: 2d-array, square: Adjacency matrix, the graph where voi is known
B: 2d-array, square: Adjacency matrix, the graph where voi is not known
voi: int: Vertex of interest
seeds: list, 2d-array: List of length two, of form [seedsA, seedsB]. The elements of seedsA and seedsB are vertex indices from A and B, respectively, which are known to be matched; that is, vertex seedsA[i] is matched to vertex seedsB[i]. Note: len(seedsA)==len(seedsB).

Returns:

self: An instance of self

Parameters:

A (ndarray) --
B (ndarray) --
voi (int) --
seeds (ndarray | List[List[int]]) --

Return type:

VNviaSGM

fit_predict(A, B, voi, seeds)[source]¶

Fits model to two adjacency matrices and returns nomination list

Parameters:

A: 2d-array, square: Adjacency matrix, the graph where voi is known
B: 2d-array, square: Adjacency matrix, the graph where voi is not known
voi: int: Vertex of interest
seeds: list, 2d-array: List of length two, of form [seedsA, seedsB]. The elements of seedsA and seedsB are vertex indices from A and B, respectively, which are known to be matched; that is, vertex seedsA[i] is matched to vertex seedsB[i]. Note: len(seedsA)==len(seedsB).

Returns:

nomination_list_2d-array: The nomination list.

Parameters:

A (ndarray) --
B (ndarray) --
voi (int) --
seeds (ndarray | List[List[int]]) --

Return type:

ndarray | None

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_fit_request(*, A='$UNCHANGED$', B='$UNCHANGED$', seeds='$UNCHANGED$', voi='$UNCHANGED$')¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

Astr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for A parameter in fit.
Bstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for B parameter in fit.
seedsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for seeds parameter in fit.
voistr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for voi parameter in fit.

Returns:

selfobject: The updated object.

Parameters:

self (VNviaSGM) --
A (bool | None | str) --
B (bool | None | str) --
seeds (bool | None | str) --
voi (bool | None | str) --

Return type:

VNviaSGM

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.