Matching¶
Graph Matching¶
- graspologic.match.graph_match(A, B, AB=None, BA=None, S=None, partial_match=None, init=None, init_perturbation=0.0, n_init=1, shuffle_input=True, maximize=True, padding='naive', n_jobs=None, max_iter=30, tol=0.01, verbose=0, rng=None, transport=False, transport_regularizer=100, transport_tol=0.05, transport_max_iter=1000, fast=True)[source]¶
Attempts to solve the Graph Matching Problem or the Quadratic Assignment Problem (QAP) through an implementation of the Fast Approximate QAP (FAQ) Algorithm [1].
This algorithm can be thought of as finding an alignment of the vertices of two graphs which minimizes the number of induced edge disagreements, or, in the case of weighted graphs, the sum of squared differences of edge weight disagreements. Various extensions to the original FAQ algorithm are also included in this function ([2-5]).
- Parameters:
- A{ndarray, csr_array, csr_array} of shape (n, n), or a list thereof
The first (potentially multilayer) adjacency matrix to be matched. Multiplex networks (e.g. a network with multiple edge types) can be used by inputting a list of the adjacency matrices for each edge type.
- B{ndarray, csr_array, csr_array} of shape (m, m), or a list thereof
The second (potentially multilayer) adjacency matrix to be matched. Must have the same number of layers as
A
, but need not have the same size (seepadding
).- AB{ndarray, csr_array, csr_array} of shape (n, m), or a list thereof, default=None
A (potentially multilayer) matrix representing connections from the objects indexed in
A
to those inB
, used for bisected graph matching (see [2]).- BA{ndarray, csr_array, csr_array} of shape (m, n), or a list thereof, default=None
A (potentially multilayer) matrix representing connections from the objects indexed in
B
to those inA
, used for bisected graph matching (see [2]).- S{ndarray, csr_array, csr_array} of shape (n, m), default=None
A matrix representing the similarity of objects indexed in
A
to each object indexed inB
. Note that the scale (i.e. the norm) of this matrix will affect how strongly the similarity (linear) term is weighted relative to the adjacency (quadratic) terms.- partial_matchndarray of shape (n_matches, 2), dtype=int, or tuple of two array-likes of shape (n_matches,), default=None
Indices specifying known matches to include in the optimization. The first column represents indices of the objects in
A
, and the second column represents their corresponding matches inB
.- initndarray of shape (n_unseed, n_unseed), default=None
Initialization for the algorithm. Setting to None specifies the "barycenter", which is the most commonly used initialization and represents an uninformative (flat) initialization. If a ndarray, then this matrix must be square and have size equal to the number of unseeded (not already matched in
partial_match
) nodes.- init_perturbationfloat, default=0.0
Weight of the random perturbation from
init
that the initialization will undergo. Must be between 0 and 1.- n_initint, default=1
Number of initializations/runs of the algorithm to repeat. The solution with the best objective function value over all initializations is kept. Increasing
n_init
can improve performance but will take longer.- shuffle_inputbool, default=True
Whether to shuffle the order of the inputs internally during optimization. This option is recommended to be kept to True besides for testing purposes; it alleviates a dependence of the solution on the (arbitrary) ordering of the input rows/columns.
- maximizebool, default=True
Whether to maximize the objective function (graph matching problem) or minimize it (quadratic assignment problem).
maximize=True
corresponds to trying to find a permutation wherein the input matrices are as similar as possible - for adjacency matrices, this corresponds to maximizing the overlap of the edges of the two networks. Conversely,maximize=False
would attempt to make this overlap as small as possible.- padding{"naive", "adopted"}, default="naive"
Specification of a padding scheme if
A
andB
are not of equal size. See the padded graph matching tutorial or [3] for more explanation. Adopted padding has not been tested for weighted networks; use with caution.- n_jobsint, default=None
The number of jobs to run in parallel. Parallelization is over the initializations, so only relevant when
n_init > 1
. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. Seejoblib.Parallel
for more details.- max_iterint, default=30
Must be 1 or greater, specifying the max number of iterations for the algorithm. Setting this value higher may provide more precise solutions at the cost of longer computation time.
- tolfloat, default=0.01
Stopping tolerance for the FAQ algorithm. Setting this value smaller may provide more precise solutions at the cost of longer computation time.
- verboseint, default=0
A positive number specifying the level of verbosity for status updates in the algorithm's progress. If
n_jobs
> 1, then this parameter behaves as theverbose
parameter forjoblib.Parallel
. Otherwise, will print increasing levels of information about the algorithm's progress for each initialization.- rngint or np.random.Generator, default=None
Allows the specification of a random seed (positive integer) or a
np.random.Generator
object to ensure reproducibility.- transportbool, default=False
Whether to enable use of regularized optimal transport for determining the step direction as described in [4]. May improve accuracy/speed, especially for large inputs and data where the correlation between edges is not close to 1.
- transport_regularizerint or float, default=100
Strength of the entropic regularization in the optimal transport solver.
- transport_tolint or float, default=0.05,
Must be positive. Stopping tolerance for the optimal transport solver. Setting this value smaller may provide more precise solutions at the cost of longer computation time.
- transport_max_iterint, default=1000
Must be positive. Maximum number of iterations for the optimal transport solver. Setting this value higher may provide more precise solutions at the cost of longer computation time.
- fast: bool, default=True
Whether to use numerical shortcuts to speed up the computation. Typically will be faster for most applications, although requires storing intermediate computations in memory which may be undesirable for very large inputs or when memory is a bottleneck.
- Returns:
- res: MatchResult
MatchResult
containing the following fields.- indices_Andarray
Sorted indices in
A
which were matched.- indices_Bndarray
Indices in
B
which were matched. Elementindices_B[i]
was matched to elementindices_A[i]
.indices_B
can also be thought of as a permutation of the nodes ofB
with respect toA
.- scorefloat
Objective function value at the end of optimization.
- misclist of dict
List of length
n_init
containing information about each run. Fields for each run arescore
,n_iter
,convex_solution
, andconverged
.
- Parameters:
A (List[ndarray | csr_array] | ndarray | csr_array) --
B (List[ndarray | csr_array] | ndarray | csr_array) --
AB (List[ndarray | csr_array] | ndarray | csr_array | None) --
BA (List[ndarray | csr_array] | ndarray | csr_array | None) --
S (ndarray | csr_array | None) --
partial_match (ndarray | Tuple | None) --
init (ndarray | None) --
n_init (int | integer) --
shuffle_input (bool) --
maximize (bool) --
padding (typing_extensions.Literal[adopted, naive]) --
n_jobs (int | integer | None) --
max_iter (int | integer) --
verbose (int | integer) --
rng (int | integer | Generator | None) --
transport (bool) --
transport_max_iter (int | integer) --
fast (bool) --
- Return type:
MatchResult
Notes
Many extensions [2-5] to the original FAQ algorithm are included in this function. The full objective function which this function aims to solve can be written as
\[f(P) = - \sum_{k=1}^K \|A^{(k)} - PB^{(k)}P^T\|_F^2 - \sum_{k=1}^K \|(AB)^{(k)}P^T - P(BA)^{(k)}\|_F^2 + trace(SP^T)\]where \(P\) is a permutation matrix we are trying to learn, \(A^{(k)}\) is the adjacency matrix in network \(A\) for the \(k\)-th edge type (and likewise for B), \((AB)^{(k)}\) (with a slight abuse of notation, but for consistency with the code) is an adjacency matrix representing a subgraph of any connections which go from objects in \(A\) to those in \(B\) (and defined likewise for \((BA)\)), and \(S\) is a similarity matrix indexing the similarity of each object in \(A\) to each object in \(B\).
If
partial_match
is used, then the above will be maximized/minimized over the set of permutations which respect this partial matching of the two networks.If
maximize
, this function will attempt to maximize \(f(P)\) (solve the graph matching problem); otherwise, it will be minimized.References
[1]J.T. Vogelstein, J.M. Conroy, V. Lyzinski, L.J. Podrazik, S.G. Kratzer, E.T. Harley, D.E. Fishkind, R.J. Vogelstein, and C.E. Priebe, “Fast approximate quadratic programming for graph matching,” PLOS one, vol. 10, no. 4, p. e0121002, 2015.
[2]B.D. Pedigo, M. Winding, C.E. Priebe, J.T. Vogelstein, "Bisected graph matching improves automated pairing of bilaterally homologous neurons from connectomes," bioRxiv 2022.05.19.492713 (2022)
[3]D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski, C. Priebe, "Seeded graph matching," Pattern Recognit. 87 (2019) 203–215
[4]A. Saad-Eldin, B.D. Pedigo, C.E. Priebe, J.T. Vogelstein "Graph Matching via Optimal Transport," arXiv 2111.05366 (2021)
[5]K. Pantazis, D.L. Sussman, Y. Park, Z. Li, C.E. Priebe, V. Lyzinski, "Multiplex graph matching matched filters," Applied Network Science (2022)