Matching

Graph Matching

graspologic.match.graph_match(A, B, AB=None, BA=None, S=None, partial_match=None, init=None, init_perturbation=0.0, n_init=1, shuffle_input=True, maximize=True, padding='naive', n_jobs=None, max_iter=30, tol=0.01, verbose=0, rng=None, transport=False, transport_regularizer=100, transport_tol=0.05, transport_max_iter=1000, fast=True)[source]

Attempts to solve the Graph Matching Problem or the Quadratic Assignment Problem (QAP) through an implementation of the Fast Approximate QAP (FAQ) Algorithm [1].

This algorithm can be thought of as finding an alignment of the vertices of two graphs which minimizes the number of induced edge disagreements, or, in the case of weighted graphs, the sum of squared differences of edge weight disagreements. Various extensions to the original FAQ algorithm are also included in this function ([2-5]).

Parameters:
A{ndarray, csr_array, csr_array} of shape (n, n), or a list thereof

The first (potentially multilayer) adjacency matrix to be matched. Multiplex networks (e.g. a network with multiple edge types) can be used by inputting a list of the adjacency matrices for each edge type.

B{ndarray, csr_array, csr_array} of shape (m, m), or a list thereof

The second (potentially multilayer) adjacency matrix to be matched. Must have the same number of layers as A, but need not have the same size (see padding).

AB{ndarray, csr_array, csr_array} of shape (n, m), or a list thereof, default=None

A (potentially multilayer) matrix representing connections from the objects indexed in A to those in B, used for bisected graph matching (see [2]).

BA{ndarray, csr_array, csr_array} of shape (m, n), or a list thereof, default=None

A (potentially multilayer) matrix representing connections from the objects indexed in B to those in A, used for bisected graph matching (see [2]).

S{ndarray, csr_array, csr_array} of shape (n, m), default=None

A matrix representing the similarity of objects indexed in A to each object indexed in B. Note that the scale (i.e. the norm) of this matrix will affect how strongly the similarity (linear) term is weighted relative to the adjacency (quadratic) terms.

partial_matchndarray of shape (n_matches, 2), dtype=int, or tuple of two array-likes of shape (n_matches,), default=None

Indices specifying known matches to include in the optimization. The first column represents indices of the objects in A, and the second column represents their corresponding matches in B.

initndarray of shape (n_unseed, n_unseed), default=None

Initialization for the algorithm. Setting to None specifies the "barycenter", which is the most commonly used initialization and represents an uninformative (flat) initialization. If a ndarray, then this matrix must be square and have size equal to the number of unseeded (not already matched in partial_match) nodes.

init_perturbationfloat, default=0.0

Weight of the random perturbation from init that the initialization will undergo. Must be between 0 and 1.

n_initint, default=1

Number of initializations/runs of the algorithm to repeat. The solution with the best objective function value over all initializations is kept. Increasing n_init can improve performance but will take longer.

shuffle_inputbool, default=True

Whether to shuffle the order of the inputs internally during optimization. This option is recommended to be kept to True besides for testing purposes; it alleviates a dependence of the solution on the (arbitrary) ordering of the input rows/columns.

maximizebool, default=True

Whether to maximize the objective function (graph matching problem) or minimize it (quadratic assignment problem). maximize=True corresponds to trying to find a permutation wherein the input matrices are as similar as possible - for adjacency matrices, this corresponds to maximizing the overlap of the edges of the two networks. Conversely, maximize=False would attempt to make this overlap as small as possible.

padding{"naive", "adopted"}, default="naive"

Specification of a padding scheme if A and B are not of equal size. See the padded graph matching tutorial or [3] for more explanation. Adopted padding has not been tested for weighted networks; use with caution.

n_jobsint, default=None

The number of jobs to run in parallel. Parallelization is over the initializations, so only relevant when n_init > 1. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See joblib.Parallel for more details.

max_iterint, default=30

Must be 1 or greater, specifying the max number of iterations for the algorithm. Setting this value higher may provide more precise solutions at the cost of longer computation time.

tolfloat, default=0.01

Stopping tolerance for the FAQ algorithm. Setting this value smaller may provide more precise solutions at the cost of longer computation time.

verboseint, default=0

A positive number specifying the level of verbosity for status updates in the algorithm's progress. If n_jobs > 1, then this parameter behaves as the verbose parameter for joblib.Parallel. Otherwise, will print increasing levels of information about the algorithm's progress for each initialization.

rngint or np.random.Generator, default=None

Allows the specification of a random seed (positive integer) or a np.random.Generator object to ensure reproducibility.

transportbool, default=False

Whether to enable use of regularized optimal transport for determining the step direction as described in [4]. May improve accuracy/speed, especially for large inputs and data where the correlation between edges is not close to 1.

transport_regularizerint or float, default=100

Strength of the entropic regularization in the optimal transport solver.

transport_tolint or float, default=0.05,

Must be positive. Stopping tolerance for the optimal transport solver. Setting this value smaller may provide more precise solutions at the cost of longer computation time.

transport_max_iterint, default=1000

Must be positive. Maximum number of iterations for the optimal transport solver. Setting this value higher may provide more precise solutions at the cost of longer computation time.

fast: bool, default=True

Whether to use numerical shortcuts to speed up the computation. Typically will be faster for most applications, although requires storing intermediate computations in memory which may be undesirable for very large inputs or when memory is a bottleneck.

Returns:
res: MatchResult

MatchResult containing the following fields.

indices_Andarray

Sorted indices in A which were matched.

indices_Bndarray

Indices in B which were matched. Element indices_B[i] was matched to element indices_A[i]. indices_B can also be thought of as a permutation of the nodes of B with respect to A.

scorefloat

Objective function value at the end of optimization.

misclist of dict

List of length n_init containing information about each run. Fields for each run are score, n_iter, convex_solution, and converged.

Parameters:
  • A (List[ndarray | csr_array] | ndarray | csr_array) --

  • B (List[ndarray | csr_array] | ndarray | csr_array) --

  • AB (List[ndarray | csr_array] | ndarray | csr_array | None) --

  • BA (List[ndarray | csr_array] | ndarray | csr_array | None) --

  • S (ndarray | csr_array | None) --

  • partial_match (ndarray | Tuple | None) --

  • init (ndarray | None) --

  • init_perturbation (int | float | integer) --

  • n_init (int | integer) --

  • shuffle_input (bool) --

  • maximize (bool) --

  • padding (typing_extensions.Literal[adopted, naive]) --

  • n_jobs (int | integer | None) --

  • max_iter (int | integer) --

  • tol (int | float | integer) --

  • verbose (int | integer) --

  • rng (int | integer | Generator | None) --

  • transport (bool) --

  • transport_regularizer (int | float | integer) --

  • transport_tol (int | float | integer) --

  • transport_max_iter (int | integer) --

  • fast (bool) --

Return type:

MatchResult

Notes

Many extensions [2-5] to the original FAQ algorithm are included in this function. The full objective function which this function aims to solve can be written as

\[f(P) = - \sum_{k=1}^K \|A^{(k)} - PB^{(k)}P^T\|_F^2 - \sum_{k=1}^K \|(AB)^{(k)}P^T - P(BA)^{(k)}\|_F^2 + trace(SP^T)\]

where \(P\) is a permutation matrix we are trying to learn, \(A^{(k)}\) is the adjacency matrix in network \(A\) for the \(k\)-th edge type (and likewise for B), \((AB)^{(k)}\) (with a slight abuse of notation, but for consistency with the code) is an adjacency matrix representing a subgraph of any connections which go from objects in \(A\) to those in \(B\) (and defined likewise for \((BA)\)), and \(S\) is a similarity matrix indexing the similarity of each object in \(A\) to each object in \(B\).

If partial_match is used, then the above will be maximized/minimized over the set of permutations which respect this partial matching of the two networks.

If maximize, this function will attempt to maximize \(f(P)\) (solve the graph matching problem); otherwise, it will be minimized.

References

[1]

J.T. Vogelstein, J.M. Conroy, V. Lyzinski, L.J. Podrazik, S.G. Kratzer, E.T. Harley, D.E. Fishkind, R.J. Vogelstein, and C.E. Priebe, “Fast approximate quadratic programming for graph matching,” PLOS one, vol. 10, no. 4, p. e0121002, 2015.

[2]

B.D. Pedigo, M. Winding, C.E. Priebe, J.T. Vogelstein, "Bisected graph matching improves automated pairing of bilaterally homologous neurons from connectomes," bioRxiv 2022.05.19.492713 (2022)

[3]

D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski, C. Priebe, "Seeded graph matching," Pattern Recognit. 87 (2019) 203–215

[4]

A. Saad-Eldin, B.D. Pedigo, C.E. Priebe, J.T. Vogelstein "Graph Matching via Optimal Transport," arXiv 2111.05366 (2021)

[5]

K. Pantazis, D.L. Sussman, Y. Park, Z. Li, C.E. Priebe, V. Lyzinski, "Multiplex graph matching matched filters," Applied Network Science (2022)