Partition¶

Modularity and Component Modularity¶

graspologic.partition.modularity(graph, partitions, weight_attribute='weight', resolution=1.0)[source]¶

Given an undirected graph and a dictionary of vertices to community ids, calculate the modularity.

Parameters:

graphnx.Graph: An undirected graph
partitionsDict[Any, int]: A dictionary representing a community partitioning scheme with the keys being the vertex and the value being a community id.
weight_attributestr: The edge data attribute on the graph that contains a float weight for the edge.
resolutionfloat: The resolution to use when calculating the modularity.

Returns:

float: The sum of the modularity of each of the communities.

Raises:

TypeError: If graph is not a networkx Graph or If partitions is not a dictionary or If resolution is not a float
ValueError: If graph is unweighted If graph is directed If graph is a multigraph

Parameters:

graph (Graph) --
partitions (Dict[Any, int]) --
weight_attribute (str) --
resolution (float) --

Return type:

float

References

[1]

https://en.wikipedia.org/wiki/Modularity_(networks)

graspologic.partition.modularity_components(graph, partitions, weight_attribute='weight', resolution=1.0)[source]¶

Given an undirected, weighted graph and a community partition dictionary, calculates a modularity quantum for each community ID. The sum of these quanta is the modularity of the graph and partitions provided.

Parameters:

graphnx.Graph: An undirected graph
partitionsDict[Any, int]: A dictionary representing a community partitioning scheme with the keys being the vertex and the value being a community id.
weight_attributestr: The edge data attribute on the graph that contains a float weight for the edge.
resolutionfloat: The resolution to use when calculating the modularity.

Returns:

Dict[int, float]: A dictionary of the community id to the modularity component of that community

Raises:

TypeError: If graph is not a networkx Graph or If partitions is not a dictionary or If resolution is not a float
ValueError: If graph is unweighted If graph is directed If graph is a multigraph

Parameters:

graph (Graph) --
partitions (Dict[Any, int]) --
weight_attribute (str) --
resolution (float) --

Return type:

Dict[int, float]

Leiden and Hierarchical Leiden¶

graspologic.partition.leiden(graph, starting_communities=None, extra_forced_iterations=0, resolution=1.0, randomness=0.001, use_modularity=True, random_seed=None, weight_attribute='weight', is_weighted=None, weight_default=1.0, check_directed=True, trials=1)[source]¶

Leiden is a global network partitioning algorithm. Given a graph, it will iterate through the network node by node, and test for an improvement in our quality maximization function by speculatively joining partitions of each neighboring node.

This process continues until no moves are made that increases the partitioning quality.

Parameters:

graphUnion[List[Tuple[Any, Any, Union[int, float]]], GraphRepresentation]

A graph representation, whether a weighted edge list referencing an undirected graph, an undirected networkx graph, or an undirected adjacency matrix in either numpy.ndarray or scipy.sparse.csr_array form. Please see the Notes section regarding node ids used.

starting_communitiesOptional[Dict[Any, int]]

Default is None. An optional community mapping dictionary that contains a node id mapping to the community it belongs to. Please see the Notes section regarding node ids used.

If no community map is provided, the default behavior is to create a node community identity map, where every node is in their own community.

extra_forced_iterationsint

Default is 0. Leiden will run until a maximum quality score has been found for the node clustering and no nodes are moved to a new cluster in another iteration. As there is an element of randomness to the Leiden algorithm, it is sometimes useful to set extra_forced_iterations to a number larger than 0 where the process is forced to attempt further refinement.

resolutionUnion[int, float]

Default is 1.0. Higher resolution values lead to more communities and lower resolution values leads to fewer communities. Must be greater than 0.

randomnessUnion[int, float]

Default is 0.001. The larger the randomness value, the more exploration of the partition space is possible. This is a major difference from the Louvain algorithm, which is purely greedy in the partition exploration.

use_modularitybool

Default is True. If False, will use a Constant Potts Model (CPM).

random_seedOptional[int]

Default is None. Can provide an optional seed to the PRNG used in Leiden for deterministic output.

weight_attributestr

Default is weight. Only used when creating a weighed edge list of tuples when the source graph is a networkx graph. This attribute corresponds to the edge data dict key.

is_weightedOptional[bool]

Default is None. Only used when creating a weighted edge list of tuples when the source graph is an adjacency matrix. The graspologic.utils.is_unweighted() function will scan these matrices and attempt to determine whether it is weighted or not. This flag can short circuit this test and the values in the adjacency matrix will be treated as weights.

weight_defaultUnion[int, float]

Default is 1.0. If the graph is a networkx graph and the graph does not have a fully weighted sequence of edges, this default will be used. If the adjacency matrix is found or specified to be unweighted, this weight_default will be used for every edge.

check_directedbool

Default is True. If the graph is an adjacency matrix, we will attempt to ascertain whether it is directed or undirected. As our leiden implementation is only known to work with an undirected graph, this function will raise an error if it is found to be a directed graph. If you know it is undirected and wish to avoid this scan, you can set this value to False and only the lower triangle of the adjacency matrix will be used to generate the weighted edge list.

trialsint

Default is 1. Runs leiden trials times, keeping the best partitioning as judged by the quality maximization function (default: modularity, see use_modularity parameter for details). This differs from extra_forced_iterations by starting over from scratch each for each trial, while extra_forced_iterations attempts to make microscopic adjustments from the "final" state.

Returns:

Dict[Any, int]: The results of running leiden over the provided graph, a dictionary containing mappings of node -> community id. Isolate nodes in the input graph are not returned in the result.

Raises:

ValueError
TypeError
BeartypeCallHintParamViolation

Parameters:

graph (List[Tuple[Any, Any, int | float]] | ndarray | csr_array | Graph) --
starting_communities (Dict[Any, int] | None) --
extra_forced_iterations (int) --
resolution (int | float) --
randomness (int | float) --
use_modularity (bool) --
random_seed (int | None) --
weight_attribute (str) --
is_weighted (bool | None) --
weight_default (int | float) --
check_directed (bool) --
trials (int) --

Return type:

Dict[Any, int]

See also

graspologic.utils.is_unweighted

Notes

No two different nodes are allowed to encode to the same str representation, e.g. node_a id of "1" and node_b id of 1 are different object types but str(node_a) == str(node_b). This collision will result in a ValueError

This function is implemented in the graspologic-native Python module, a module written in Rust for Python.

References

[1]

Traag, V.A.; Waltman, L.; Van, Eck N.J. "From Louvain to Leiden: guaranteeing well-connected communities", Scientific Reports, Vol. 9, 2019

[2]

https://github.com/microsoft/graspologic-native

class graspologic.partition.HierarchicalCluster[source]¶

Create new instance of HierarchicalCluster(node, cluster, parent_cluster, level, is_final_cluster)

node: Any¶: Node id

cluster: int¶: Leiden cluster id

parent_cluster: int | None¶: Only used when level != 0, but will indicate the previous cluster id that this node was in

level: int¶: Each time a community has a higher population than we would like, we create a subnetwork of that community and process it again to break it into smaller chunks. Each time we detect this, the level increases by 1

is_final_cluster: bool¶: Whether this is the terminal cluster in the hierarchical leiden process or not

static __new__(_cls, node, cluster, parent_cluster, level, is_final_cluster)¶

Create new instance of HierarchicalCluster(node, cluster, parent_cluster, level, is_final_cluster)

Parameters:

node (Any) --
cluster (int) --
parent_cluster (int | None) --
level (int) --
is_final_cluster (bool) --

count(value, /)¶: Return number of occurrences of value.

index(value, start=0, stop=sys.maxsize, /)¶

Return first index of value.

Raises ValueError if the value is not present.

class graspologic.partition.HierarchicalClusters[source]¶

HierarchicalClusters is a subclass of Python's list class with two helper methods for retrieving dictionary views of the first and final level of hierarchical clustering in dictionary form. The rest of the HierarchicalCluster entries in this list can be seen as a transition state log of our graspologic.partition.hierarchical_leiden() process as it continuously tries to break down communities over a certain size, with the two helper methods on this list providing you the starting point community map and ending point community map.

__init__(*args, **kwargs)¶

__new__(*args, **kwargs)¶

append(object, /)¶: Append object to the end of the list.

clear(/)¶: Remove all items from list.

copy(/)¶: Return a shallow copy of the list.

count(value, /)¶: Return number of occurrences of value.

extend(iterable, /)¶: Extend list by appending elements from the iterable.

index(value, start=0, stop=sys.maxsize, /)¶

Return first index of value.

Raises ValueError if the value is not present.

insert(index, object, /)¶: Insert object before index.

pop(index=-1, /)¶

Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove(value, /)¶

Remove first occurrence of value.

Raises ValueError if the value is not present.

reverse(/)¶: Reverse IN PLACE.

sort(*, key=None, reverse=False)¶

Sort the list in ascending order and return None.

The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).

If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.

The reverse flag can be set to sort in descending order.

first_level_hierarchical_clustering()[source]¶

Returns:

Dict[Any, int]: The initial leiden algorithm clustering results as a dictionary of node id to community id.

Return type:

Dict[Any, int]

final_level_hierarchical_clustering()[source]¶

Returns:

Dict[Any, int]: The last leiden algorithm clustering results as a dictionary of node id to community id.

Return type:

Dict[Any, int]

graspologic.partition.hierarchical_leiden(graph, max_cluster_size=1000, starting_communities=None, extra_forced_iterations=0, resolution=1.0, randomness=0.001, use_modularity=True, random_seed=None, weight_attribute='weight', is_weighted=None, weight_default=1.0, check_directed=True)[source]¶

Leiden is a global network partitioning algorithm. Given a graph, it will iterate through the network node by node, and test for an improvement in our quality maximization function by speculatively joining partitions of each neighboring node.

This process continues until no moves are made that increases the partitioning quality.

Unlike the function graspologic.partition.leiden(), this function does not stop after maximization has been achieved. On some large graphs, it's useful to identify particularly large communities whose membership counts exceed max_cluster_size and induce a subnetwork solely out of that community. This subnetwork is then treated as a wholly separate entity, leiden is run over it, and the new, smaller communities are then mapped into the original community map space.

The results also differ substantially; the returned List[HierarchicalCluster] is more of a log of state at each level. All HierarchicalClusters at level 0 should be considered to be the results of running graspologic.partition.leiden(). Every community whose membership is greater than max_cluster_size will then also have entries where level == 1, and so on until no communities are greater in population than max_cluster_size OR we are unable to break them down any further.

Once a node's membership registration in a community cannot be changed any further, it is marked with the flag graspologic.partition.HierarchicalCluster.is_final_cluster = True.

Parameters:

graphUnion[List[Tuple[Any, Any, Union[int, float]]], GraphRepresentation]

A graph representation, whether a weighted edge list referencing an undirected graph, an undirected networkx graph, or an undirected adjacency matrix in either numpy.ndarray or scipy.sparse.csr_array form. Please see the Notes section regarding node ids used.

max_cluster_sizeint

Default is 1000. Any partition or cluster with membership >= max_cluster_size will be isolated into a subnetwork. This subnetwork will be used for a new leiden global partition mapping, which will then be remapped back into the global space after completion. Once all clusters with membership >= max_cluster_size have been completed, the level increases and the partition scheme is scanned again for any new clusters with membership >= max_cluster_size and the process continues until every cluster's membership is < max_cluster_size or if they cannot be broken into more than one new community.

starting_communitiesOptional[Dict[Any, int]]

Default is None. An optional community mapping dictionary that contains a node id mapping to the community it belongs to. Please see the Notes section regarding node ids used.

If no community map is provided, the default behavior is to create a node community identity map, where every node is in their own community.

extra_forced_iterationsint

Default is 0. Leiden will run until a maximum quality score has been found for the node clustering and no nodes are moved to a new cluster in another iteration. As there is an element of randomness to the Leiden algorithm, it is sometimes useful to set extra_forced_iterations to a number larger than 0 where the entire process is forced to attempt further refinement.

resolutionUnion[int, float]

Default is 1.0. Higher resolution values lead to more communities and lower resolution values leads to fewer communities. Must be greater than 0.

randomnessUnion[int, float]

Default is 0.001. The larger the randomness value, the more exploration of the partition space is possible. This is a major difference from the Louvain algorithm, which is purely greedy in the partition exploration.

use_modularitybool

Default is True. If False, will use a Constant Potts Model (CPM).

random_seedOptional[int]

Default is None. Can provide an optional seed to the PRNG used in Leiden for deterministic output.

weight_attributestr

Default is weight. Only used when creating a weighed edge list of tuples when the source graph is a networkx graph. This attribute corresponds to the edge data dict key.

is_weightedOptional[bool]

Default is None. Only used when creating a weighted edge list of tuples when the source graph is an adjacency matrix. The graspologic.utils.is_unweighted() function will scan these matrices and attempt to determine whether it is weighted or not. This flag can short circuit this test and the values in the adjacency matrix will be treated as weights.

weight_defaultUnion[int, float]

Default is 1.0. If the graph is a networkx graph and the graph does not have a fully weighted sequence of edges, this default will be used. If the adjacency matrix is found or specified to be unweighted, this weight_default will be used for every edge.

check_directedbool

Default is True. If the graph is an adjacency matrix, we will attempt to ascertain whether it is directed or undirected. As our leiden implementation is only known to work with an undirected graph, this function will raise an error if it is found to be a directed graph. If you know it is undirected and wish to avoid this scan, you can set this value to False and only the lower triangle of the adjacency matrix will be used to generate the weighted edge list.

Returns:

HierarchicalClusters: The results of running hierarchical leiden over the provided graph, a list of HierarchicalClusters identifying the state of every node and cluster at each level. Isolate nodes in the input graph are not returned in the result.

Raises:

ValueError
TypeError
BeartypeCallHintParamViolation

Parameters:

graph (List[Tuple[Any, Any, int | float]] | Graph | ndarray | csr_array) --
max_cluster_size (int) --
starting_communities (Dict[str, int] | None) --
extra_forced_iterations (int) --
resolution (int | float) --
randomness (int | float) --
use_modularity (bool) --
random_seed (int | None) --
weight_attribute (str) --
is_weighted (bool | None) --
weight_default (int | float) --
check_directed (bool) --

Return type:

HierarchicalClusters