Utility

Transformations

graspologic.utils.pass_to_ranks(graph, method='simple-nonzero')[source]

Rescales edge weights of an adjacency matrix based on their relative rank in the graph.

Parameters:
graph: array_like or networkx.Graph

Adjacency matrix

method: {'simple-nonzero' (default), 'simple-all', 'zero-boost'} string, optional
  • 'simple-nonzero'

    assigns ranks to all non-zero edges, settling ties using the average. Ranks are then scaled by \(\frac{rank(\text{non-zero edges})}{\text{total non-zero edges} + 1}\)

  • 'simple-all'

    assigns ranks to all non-zero edges, settling ties using the average. Ranks are then scaled by \(\frac{rank(\text{non-zero edges})}{n^2 + 1}\) where n is the number of nodes

  • 'zero-boost'

    preserves the edge weight for all 0s, but ranks the other edges as if the ranks of all 0 edges has been assigned. If there are 10 0-valued edges, the lowest non-zero edge gets weight 11 / (number of possible edges). Ties settled by the average of the weight that those edges would have received. Number of possible edges is determined by the type of graph (loopless or looped, directed or undirected).

Returns:
graph: numpy.ndarray, shape(n_vertices, n_vertices)

Adjacency matrix of graph after being passed to ranks

Parameters:
  • graph (ndarray | csr_array | Graph) --

  • method (str) --

Return type:

ndarray | csr_array | Graph

graspologic.utils.to_laplacian(graph, form='DAD', regularizer=None)[source]

A function to convert graph adjacency matrix to graph Laplacian.

Currently supports I-DAD, DAD, and R-DAD Laplacians, where D is the diagonal matrix of degrees of each node, I is the identity matrix, and A is the adjacency matrix.

R-DAD is regularized Laplacian: where \(D_t = D + regularizer \times I\).

Parameters:
graph: object

Either array-like, (n_vertices, n_vertices) numpy array, scipy.sparse.csr_array, or an object of type networkx.Graph.

form: {'I-DAD', 'DAD' (default), 'R-DAD'}, string, optional
  • 'I-DAD'

    Computes \(L = I - D_i^{-1/2} A D_i^{-1/2}\)

  • 'DAD'

    Computes \(L = D_o^{-1/2} A D_i^{-1/2}\)

  • 'R-DAD'

    Computes \(L = D_o(r)^{-1/2} A D_i(r)^{-1/2}\) where \(D_o(r)^{-1/2} = D_o^{-1/2} + regularizer \times I\) and likewise for \(D_i(r)^{-1/2}\)

regularizer: int, float or None, optional (default=None)

Constant to add to the degree vector(s). If None, average node degree is added. If int or float, must be >= 0. Only used when form is 'R-DAD'.

Returns:
Lnumpy.ndarray

2D (n_vertices, n_vertices) array representing graph Laplacian of specified form

Parameters:
  • graph (ndarray | csr_array | Graph) --

  • form (typing_extensions.Literal[I-DAD, DAD, R-DAD]) --

  • regularizer (float | None) --

Return type:

ndarray

References

[1]

Qin, Tai, and Karl Rohe. "Regularized spectral clustering under the degree-corrected stochastic blockmodel." In Advances in Neural Information Processing Systems, pp. 3120-3128. 2013

[2]

Rohe, Karl, Tai Qin, and Bin Yu. "Co-clustering directed graphs to discover asymmetries and directional communities." Proceedings of the National Academy of Sciences 113.45 (2016): 12679-12684.

Examples

>>> a = np.array([
...    [0, 1, 1],
...    [1, 0, 0],
...    [1, 0, 0]])
>>> to_laplacian(a, "DAD")
array([[0.        , 0.70710678, 0.70710678],
       [0.70710678, 0.        , 0.        ],
       [0.70710678, 0.        , 0.        ]])
graspologic.utils.augment_diagonal(graph, weight=1)[source]

Replaces the diagonal of an adjacency matrix with \(\frac{d}{nverts - 1}\) where \(d\) is the degree vector for an unweighted graph and the sum of magnitude of edge weights for each node for a weighted graph. For a directed graph the in/out \(d\) is averaged.

Parameters:
graph: nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph, np.ndarray,

scipy.scr_matrix. Input graph in any of the above specified formats. If np.ndarray, interpreted as an \(n \times n\) adjacency matrix

weight: float/int

scalar value to multiply the new diagonal vector by

Returns:
graphnp.array or csr_array

Adjacency matrix with average degrees added to the diagonal.

Parameters:
  • graph (ndarray | csr_array | Graph) --

  • weight (float) --

Return type:

ndarray | csr_array

Examples

>>> a = np.array([
...    [0, 1, 1],
...    [1, 0, 0],
...    [1, 0, 0]])
>>> augment_diagonal(a)
array([[1. , 1. , 1. ],
       [1. , 0.5, 0. ],
       [1. , 0. , 0.5]])
graspologic.utils.symmetrize(graph, method='avg')[source]

A function for forcing symmetry upon a graph.

Parameters:
graph: object

Either array-like, (n_vertices, n_vertices) numpy matrix or csr_array

method: {'avg' (default), 'triu', 'tril',}, optional

An option indicating which half of the edges to retain when symmetrizing.

  • 'avg'

    Retain the average weight between the upper and lower right triangle, of the adjacency matrix.

  • 'triu'

    Retain the upper right triangle.

  • 'tril'

    Retain the lower left triangle.

Returns:
graph: array-like, shape (n_vertices, n_vertices)

Graph with asymmetries removed.

Parameters:
  • graph (ndarray | csr_array) --

  • method (typing_extensions.Literal[avg, triu, tril]) --

Return type:

ndarray | csr_array

Examples

>>> a = np.array([
...    [0, 1, 1],
...    [0, 0, 1],
...    [0, 0, 1]])
>>> symmetrize(a, method="triu")
array([[0, 1, 1],
       [1, 0, 1],
       [1, 1, 1]])
graspologic.utils.remove_loops(graph)[source]

A function to remove loops from a graph.

Parameters:
graph: object

Either array-like, (n_vertices, n_vertices) numpy matrix, or an object of type networkx.Graph.

Returns:
graph: array-like, shape(n_vertices, n_vertices)

the graph with self-loops (edges between the same node) removed.

Parameters:

graph (ndarray | csr_array | Graph) --

Return type:

ndarray | csr_array

Connected Components

graspologic.utils.is_fully_connected(graph)[source]

Checks whether the input graph is fully connected in the undirected case or weakly connected in the directed case.

Connected means one can get from any vertex \(u\) to vertex \(v\) by traversing the graph. For a directed graph, weakly connected means that the graph is connected after it is converted to an unweighted graph (ignore the direction of each edge)

Parameters:
graph: nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph,

scipy.sparse.csr_array, np.ndarray Input graph in any of the above specified formats. If np.ndarray, interpreted as an \(n \times n\) adjacency matrix

Returns:
boolean: True if the entire input graph is connected
Parameters:

graph (ndarray | csr_array | Graph) --

Return type:

bool

References

http://mathworld.wolfram.com/ConnectedGraph.html http://mathworld.wolfram.com/WeaklyConnectedDigraph.html

Examples

>>> a = np.array([
...    [0, 1, 0],
...    [1, 0, 0],
...    [0, 0, 0]])
>>> is_fully_connected(a)
False
graspologic.utils.largest_connected_component(graph, return_inds=False)[source]

Finds the largest connected component for the input graph.

The largest connected component is the fully connected subgraph which has the most nodes.

Parameters:
graph: nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph, np.ndarray, scipy.sparse.csr_array

Input graph in any of the above specified formats. If np.ndarray or scipy.sparse.csr_array interpreted as an \(n \times n\) adjacency matrix.

return_inds: boolean, default: False

Whether to return a np.ndarray containing the indices/nodes in the original adjacency matrix that were kept and are now in the returned graph.

Returns:
graph: nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph, np.ndarray, scipy.sparse.csr_array

New graph of the largest connected component, returned in the input format.

inds: (optional)

Indices/nodes from the original adjacency matrix that were kept after taking the largest connected component.

Parameters:
Return type:

Graph | DiGraph | MultiGraph | MultiDiGraph | ndarray | csr_array

Notes

For networks input in scipy.sparse.csr_array format, explicit zeros are removed prior to finding the largest connected component, thus they are not treated as edges. This differs from the convention in scipy.sparse.csgraph.connected_components().

graspologic.utils.multigraph_lcc_union(graphs, return_inds=False)[source]

Finds the union of all multiple graphs, then compute the largest connected component.

Parameters:
graphs: list or np.ndarray

List of array-like, (n_vertices, n_vertices), or list of np.ndarray nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph.

return_inds: boolean, default: False

Whether to return a np.ndarray containing the indices in the original adjacency matrix that were kept and are now in the returned graph. Ignored when input is networkx object

Returns:
outlist or np.ndarray

If input was a list

Parameters:
  • graphs (ndarray | List[ndarray | csr_array | Graph]) --

  • return_inds (bool) --

Return type:

ndarray | List[ndarray | csr_array | Graph] | Tuple[ndarray | List[ndarray | csr_array | Graph], ndarray]

graspologic.utils.multigraph_lcc_intersection(graphs, return_inds=False)[source]

Finds the intersection of multiple graphs's largest connected components.

Computes the largest connected component for each graph that was input, and takes the intersection over all of these resulting graphs. Note that this does not guarantee finding the largest graph where every node is shared among all of the input graphs.

Parameters:
graphs: list or np.ndarray

if list, each element must be an \(n \times n\) np.ndarray adjacency matrix

return_inds: boolean, default: False

Whether to return a np.ndarray containing the indices in the original adjacency matrix that were kept and are now in the returned graph. Ignored when input is networkx object

Returns:
graph: list of(nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph) or np.ndarray

New graph of the largest connected component of the input parameter.

inds: (optional)

Indices from the original adjacency matrix that were kept after taking the largest connected component

Parameters:
  • graphs (List[ndarray | csr_array | Graph]) --

  • return_inds (bool) --

Return type:

List[ndarray | csr_array | Graph] | ndarray | Tuple[ndarray | List[ndarray | csr_array | Graph], ndarray]

IO

graspologic.utils.import_graph(graph, copy=True)[source]

A function for reading a graph and returning a shared data type.

Parameters:
graph: GraphRepresentation

Either array-like, shape (n_vertices, n_vertices) numpy array, a scipy.sparse.csr_array, or an object of type networkx.Graph.

copy: bool, (default=True)

Whether to return a copied version of array. If False and input is np.array, the output returns the original input.

Returns:
out: array-like, shape (n_vertices, n_vertices)

A graph.

Parameters:
  • graph (ndarray | csr_array | Graph) --

  • copy (bool) --

Return type:

ndarray | csr_array

See also

networkx.Graph, numpy.array
graspologic.utils.import_edgelist(path, extension='edgelist', delimiter=None, nodetype=<class 'int'>, return_vertices=False)[source]

Function for reading a single or multiple edgelists. When importing multiple edgelists, the union of vertices from all graphs is computed so that each output graph have matched vertex set. The order of nodes are sorted by node values.

Parameters:
pathstr, Path object, or iterable

If path is a directory, then the importing order will be sorted in alphabetical order.

extensionstr, optional

If path is a directory, then the function will convert all files with matching extension.

delimiterstr or None, default=None, optional

Delimiter of edgelist. If None, the delimiter is whitespace.

nodetypeint (default), float, str, Python type, optional

Convert node data from strings to specified type.

return_verticesbool, default=False, optional

Returns the union of all individual edgelists.

Returns:
outlist of array-like, or array-like, shape (n_vertices, n_vertices)

If path is a directory, a list of arrays is returned. If path is a file, an array is returned.

verticesarray-like, shape (n_vertices, )

If return_vertices` is True, then returns an array of all vertices that were included in the output graphs.

Parameters:
Return type:

ndarray | List[ndarray] | Tuple[ndarray | List[ndarray], ndarray]

Other

graspologic.utils.remap_labels(y_true, y_pred, return_map=False)[source]

Remaps a categorical labeling (such as one predicted by a clustering algorithm) to match the labels used by another similar labeling.

Given two \(n\)-length vectors describing a categorical labeling of \(n\) samples, this method reorders the labels of the second vector (y_pred) so that as many samples as possible from the two label vectors are in the same category.

Parameters:
y_truearray-like of shape (n_samples,)

Ground truth labels, or, labels to map to.

y_predarray-like of shape (n_samples,)

Labels to remap to match the categorical labeling of y_true. The categorical labeling of y_pred will be preserved exactly, but the labels used to denote the categories will be changed to best match the categories used in y_true.

return_mapbool, optional

Whether to return a dictionary where the keys are the original category labels from y_pred and the values are the new category labels that they were mapped to.

Returns:
remapped_y_prednp.ndarray of shape (n_samples,)

Same categorical labeling as that of y_pred, but with the category labels permuted to best match those of y_true.

label_mapdict

Mapping from the original labels of y_pred to the new labels which best resemble those of y_true. Only returned if return_map was True.

Parameters:
Return type:

ndarray | Tuple[ndarray, Dict]

Notes

This method will work well when the label vectors describe a somewhat similar categorization of the data (as measured by metrics such as sklearn.metrics.adjusted_rand_score(), for example). When the categorizations are not similar, the remapping may not make sense (as such a remapping does not exist).

For example, consider when one category in y_true is exactly split in half into two categories in y_pred. If this is the case, it is impossible to say which of the categories in y_pred match that original category from y_true.

Examples

>>> y_true = np.array([0,0,1,1,2,2])
>>> y_pred = np.array([2,2,1,1,0,0])
>>> remap_labels(y_true, y_pred)
array([0, 0, 1, 1, 2, 2])