Nomination¶
Spectral Vertex Nomination¶
- class graspologic.nominate.SpectralVertexNomination[source]¶
Class for spectral vertex nomination on a single graph.
Given a graph \(G=(V,E)\) and a subset of \(V\) called \(S\) (the "seed"), Single Graph Vertex Nomination is the problem of ranking all \(V\) in order of relation to members of \(S\). Spectral Vertex Nomination solves this problem by embedding \(G\) into a low dimensional euclidean space (see: Adjacency Spectral Embed Tutorial ), and then generating a nomination list by some distance based algorithm. In the simple unattributed case, for each seed vertex \(u\), the other vertices are ranked in order of euclidean distance from \(u\).
- Parameters:
- input_graph: bool, default = True
Flag whether to expect two full graphs, or the embeddings.
- True
.fit and .fit_predict() expect graphs as adjacency matrix, provided as ndarray of shape (n, n). They will be embedded using the specified
embedder
.
- False
.fit() and .fit_predict() expect an embedding of the graph, i.e. a ndarray of size (n, d).
- embedder: str or BaseSpectralEmbed, default = 'ASE'
May provide either a embed object or a string indicating which embedding method to use, which may be either: "ASE" for
AdjacencySpectralEmbed
or "LSE" forLaplacianSpectralEmbed
.- n_neighbors: int, default=None
The number of vertices to nominate for each seed.
- metricstr, default = 'euclidean'
Distance metric to use when finding the nearest neighbors, all sklearn metrics available.
- metric_paramsdict, default = None
Arguments for the sklearn DistanceMetric specified via metric parameter.
- Attributes:
- nearest_neighbors_sklearn.neighbors.NearestNeighbors
A fit sklearn NearestNeighbors classifier used to find closest vertices to each seed.
References
[1]Fishkind, D. E.; Lyzinski, V.; Pao, H.; Chen, L.; Priebe, C. E. Vertex nomination schemes for membership prediction. Ann. Appl. Stat. 9 2015. https://projecteuclid.org/euclid.aoas/1446488749
[2]Jordan Yoder, Li Chen, Henry Pao, Eric Bridgeford, Keith Levin, Donniell E. Fishkind, Carey Priebe, Vince Lyzinski, Vertex nomination: The canonical sampling and the extended spectral nomination schemes, Computational Statistics & Data Analysis, Volume 145, 2020. http://www.sciencedirect.com/science/article/pii/S0167947320300074
- __init__(input_graph=True, embedder='ase', n_neighbors=None, metric='euclidean', metric_params=None)[source]¶
- fit(X, y=None)[source]¶
Constructs the embedding if not provided, then calculates the pairwise distance from each seed to each vertex in graph.
- Parameters:
- Xnp.ndarray
- If input_graph is True
Expects a graph as an adjacency matrix, i.e. an ndarray of shape (n, n). Will be embedded using the specified embedder.
- If input_graph is False
Expects an embedding of the graph, i.e. a ndarray of size (n, d).
- yNoneType
Included by sklearn convention.
- Returns:
- selfobject
Returns an instance of self.
- Parameters:
X (ndarray) --
y (Any | None) --
- Return type:
- predict(y)[source]¶
Nominates vertices for each seed vertex. Methodology is distance based ranking.
- Parameters:
- ynp.ndarray
The indices of the seed vertices. Should be a dim 1 array with length less than \(|V|\).
- Returns:
- Nomination Listnp.ndarray
Shape is
(number_vertices, number_vertices_in_seed)
. Each column is a seed vertex, and the rows of each column are a list of vertex indexes from the original adjacency matrix in order degree of match.- Distance Matrixnp.ndarray
The matrix of distances associated with each element of the nomination list.
- Parameters:
y (List | ndarray) --
- Return type:
Tuple[ndarray, ndarray]
- fit_predict(X, y)[source]¶
Calls this class' fit and then predict methods.
- Parameters:
- Xnp.ndarray
- If input_graph is True
Expects a graph as an adjacency matrix, i.e. an ndarray of shape (n, n). Will be embedded using the specified embedder.
- If input_graph is False
Expects an embedding of the graph, i.e. a ndarray of size (n, d).
- ynp.ndarray.
List of unattributed seed vertex indices.
- Returns:
- Nomination Listnp.ndarray
Shape is
(number_vertices, number_vertices_in_seed)
. Each column is a seed vertex, and the rows of each column are a list of vertex indexes from the original adjacency matrix in order degree of match.- Distance Matrixnp.ndarray
The matrix of distances associated with each element of the nomination list.
- Parameters:
X (ndarray) --
y (ndarray) --
- Return type:
Tuple[ndarray, ndarray]
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it's possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
Vertex Nomination via SGM¶
- class graspologic.nominate.VNviaSGM[source]¶
This class implements Vertex Nomination via Seeded Graph Matching (VNviaSGM) with the algorithm described in [1].
Rather than providing a 1-1 matching for the vertices of two graphs, as in
GraphMatch
, VNviaSGM ranks the potential matches for a vertex of interst (VOI) in one to graph to the vertices in another graph, based on probability of matching.- Parameters:
- order_voi_subgraph: int, positive (default = 1)
Order used to create induced subgraph on
A
about VOI where the max distance between VOI and other nodes isorder_voi_subgraph
. This induced subgraph will be used to determine what seeds are used when the SGM algorithm is called. If no seeds are in this subgraph about VOI, then a UserWarning is thrown, andnomination_list_
is None.- order_seeds_subgraph: int, positive (default = 1)
Order used to create induced subgraphs on
A
andB
. These subgraphs are centered about the seeds that were determined by the subgraph generated byorder_voi_subgraph
. These two subgraphs will be passed into the SGM algorithm.- n_init: int, positive (default = 100)
Number of random initializations of the seeded graph matching algorithm (SGM). Increasing the number of restarts will make the probabilities returned more precise.
- max_nominations: int (default = None)
Max number of nominations to include in the nomination list. If None is passed, then all nominations computed will be returned.
- graph_match_kwsdict (default = {})
Gives users the option to pass custom arguments to the graph matching algorithm. Format should be {'arg_name': arg_value, ...}. See
GraphMatch
- Attributes:
- n_seeds_: int
Number of seeds passed in seedsA that occured in the induced subgraph about VOI
- nomination_list_: 2d-array
An array containing vertex nominations in the form nomination list = [[j, p_val],...] where p_val is the probability that the VOI matches to node j in graph B (sorted by descending probability)
Notes
VNviaSGM generates an initial induced subgraph about the VOI to determine which seeds are close enough to be used. If no seeds are close enough, then a warning is thrown and
nomination_list_
is set to None.All the seeds that are close enough are then used to generate subgraphs in both
A
andB
. These subgraphs are matched using several random initializations of the seeded graph matching algorithm (SGM), and a nomination list is returned. SeeGraphMatch
for SGM docsReferences
[1]Patsolic, HG, Park, Y, Lyzinski, V, Priebe, CE. Vertex nomination via seeded graph matching. Stat Anal Data Min: The ASA Data Sci Journal. 2020; 13: 229– 244. https://doi.org/10.1002/sam.11454
- __init__(order_voi_subgraph=1, order_seeds_subgraph=1, n_init=100, max_nominations=None, graph_match_kws={})[source]¶
- fit(A, B, voi, seeds)[source]¶
Fits the model to two graphs.
- Parameters:
- A: 2d-array, square
Adjacency matrix, the graph where
voi
is known- B: 2d-array, square
Adjacency matrix, the graph where
voi
is not known- voi: int
Vertex of interest
- seeds: list, 2d-array
List of length two, of form [seedsA, seedsB]. The elements of seedsA and seedsB are vertex indices from
A
andB
, respectively, which are known to be matched; that is, vertex seedsA[i] is matched to vertex seedsB[i]. Note: len(seedsA)==len(seedsB).
- Returns:
- self: An instance of self
- Parameters:
- Return type:
- fit_predict(A, B, voi, seeds)[source]¶
Fits model to two adjacency matrices and returns nomination list
- Parameters:
- A: 2d-array, square
Adjacency matrix, the graph where
voi
is known- B: 2d-array, square
Adjacency matrix, the graph where
voi
is not known- voi: int
Vertex of interest
- seeds: list, 2d-array
List of length two, of form [seedsA, seedsB]. The elements of seedsA and seedsB are vertex indices from
A
andB
, respectively, which are known to be matched; that is, vertex seedsA[i] is matched to vertex seedsB[i]. Note: len(seedsA)==len(seedsB).
- Returns:
- nomination_list_2d-array
The nomination list.
- Parameters:
- Return type:
ndarray | None
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_fit_request(*, A='$UNCHANGED$', B='$UNCHANGED$', seeds='$UNCHANGED$', voi='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- Astr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
A
parameter infit
.- Bstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
B
parameter infit
.- seedsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
seeds
parameter infit
.- voistr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
voi
parameter infit
.
- Returns:
- selfobject
The updated object.
- Parameters:
- Return type:
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it's possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.