Leiden algorithm

The Leiden algorithm is a community detection algorithm developed by Traag et al ^[1] at Leiden University. It was developed as a modification of the Louvain method. Like the Louvain method, the Leiden algorithm attempts to optimize modularity in extracting communities from networks; however, it addresses key issues present in the Louvain method, namely poorly connected communities and the resolution limit of modularity.

Broadly, the Leiden algorithm uses the same two primary phases as the Louvain algorithm, a local node moving step (though, the method by which nodes are considered in Leiden is more efficient^[1]) and an graph aggregation step. However, to address the the issues with poorly connected communities and the merging of smaller communities into larger communities (the resolution limit of modularity), the Leiden algorithm employs an intermediate refinement phase in which communities may be split to guarantee that all communities are well-connected.

Graph components

Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph.

Vertices and edges

A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let $V$ be the set of vertices, and $E$ be the set of edges:

${\begin{aligned}V&:=\{v_{1},v_{2},\dots ,v_{n}\}\\E&:=\{e_{ij},e_{ik},\dots ,e_{kl}\}\end{aligned}}$

where $e_{ij}$ is the directed edge from vertex $v_{i}$ to vertex $v_{j}$ . We can also write this as an ordered pair:

${\begin{aligned}e_{ij}&:=(v_{i},v_{j})\end{aligned}}$

Community

A community is a unique set of nodes:

${\begin{aligned}C_{i}&\subseteq V\\C_{i}&\bigcap C_{j}=\emptyset ~\forall ~i\neq j\end{aligned}}$

and the union of all communities must be the total set of vertices:

${\begin{aligned}V&=\bigcup _{i=1}C_{i}\end{aligned}}$

Partition

A partition is the set of all communities:

${\begin{aligned}{\mathcal {P}}&=\{C_{1},C_{2},\dots ,C_{n}\}\end{aligned}}$

Partition Quality

How communities are partitioned is an integral part on the Leiden algorithm. How partitions are decided can depend on how their quality is measured. Additionally, many of these metrics contain parameters of their own that can change the outcome of their communities.

Modularity

Modularity is a highly used quality metric for assessing how well a set of communities partition a graph. The equation for this metric is defined for a adjacency matrix, A, as:^[2]

$Q={\frac {1}{2m}}\sum _{ij}(A_{ij}-{\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})$

Where, k, is a node's weighted degree, c, is a node's community, and function δ which is 1 if both inputs share the same community and 0 otherwise.

Reichardt Bornholdt Potts Model (RB)

One of the most well used metrics for the Leiden algorithm is the Reichardt Bornholdt Potts Model (RB).^[3] This model is used by default in most mainstream Leiden algorithm libraries under the name RBConfigurationVertexPartition.^[4]^[5] This model introduces a resolution parameter γ and is highly similar to the equation for modularity. This model is defined by the following quality function for an adjacency matrix, A, as:^[4]

$Q=\sum _{ij}(A_{ij}-\gamma {\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})$

Constant Potts Model (CPM)

Another metric similar to RB, is the Constant Potts Model (CPM). This metric also relies on a resolution parameter^[6] The quality function is defined as:

$H=-\sum _{ij}(A_{ij}w_{ij}-\gamma )\delta (c_{i},c_{j})$

Where, H, is the quality function, A is the adjacency matrix (0 for an edge not being present, 1 otherwise) and, w, referring to the edge weight.

Algorithm

The Leiden algorithm is similar to that of the Louvain method, with some important modifications.

Step 1: First, each node in the network is assigned to its own community.

Step 2: Next, we decide which communities to move the nodes into and update the partition ${\mathcal {P}}$ .

Step 3: Assign each node in the graph to its own community in a new partition called ${\mathcal {P}}_{\text{refined}}$ .

Step 4: The goal of this step is to separate poorly-connected communities:

Step 5: Use the refined partition ${\mathcal {P}}_{\text{refined}}$ to aggregate the graph. Each community in ${\mathcal {P}}_{\text{refined}}$ becomes a node in the new graph $G_{\text{agg}}$ .

Example: Suppose that we have:

${\begin{aligned}V&=\{v_{1},v_{2},v_{3},v_{4},v_{5},v_{6},v_{7}\}\\C_{1}&=\{v_{1},v_{2},v_{3},v_{4}\}\\C_{2}&=\{v_{5},v_{6},v_{7}\}\\{\mathcal {P}}&=\{C_{1},C_{2}\}\\{\mathcal {P}}_{\text{refined}}&=\{C_{1a},C_{1b},C_{2}\}\end{aligned}}$

Then our new set of nodes will be:

${\begin{aligned}V_{agg}&=\{C_{1a}\mapsto w_{1a},C_{1b}\mapsto w_{1b},C_{2}\mapsto w_{2}\}\end{aligned}}$

Step 6: Update the partition ${\mathcal {P}}$ using the aggregated graph. We keep the communities from partition ${\mathcal {P}}$ , but the communities can be separated into multiple nodes from the refined partition ${\mathcal {P}}_{\text{refined}}$ :

${\begin{aligned}{\mathcal {P}}&=\{\{v~|~v\subseteq C,v\in V(G_{\text{agg}})\}~|~C\in {\mathcal {P}}\}\end{aligned}}$

Example: Suppose that $C$ is a poorly-connected community from the partition ${\mathcal {P}}$ :

${\begin{aligned}C&=\{v_{1},v_{2},v_{3},v_{4},v_{5}\}\\{\mathcal {P}}&=\{C\}\end{aligned}}$

Then suppose during the refinement step, it was separated into two communities, $C_{1}$ and $C_{2}$ :

${\begin{aligned}C_{1}&=\{v_{1},v_{2},v_{3}\}\\C_{2}&=\{v_{4},v_{5}\}\\{\mathcal {P}}_{\text{refined}}&=\{C_{1},C_{2}\}\end{aligned}}$

When we aggregate the graph, the new nodes will be:

${\begin{aligned}V(G_{\text{agg}})&=\{C_{1},C_{2}\}\end{aligned}}$

but we will keep the old partition:

${\begin{aligned}{\mathcal {P}}&=\{\{C_{1},C_{2}\}\}\end{aligned}}$

Step 7: Repeat Steps 2 - 6 until each community consists of only one node.

function Leiden_community_detection(Graph G, Partition P)
    do
        P = fast_louvain_move_nodes(G, P) 
           /* Call the function from "A Simple Acceleration Method for the Louvain Algorithm" ^[7] paper to move the nodes to communities.(more details in function below) */
        done = (|P| == |V(G)|)
            /* If the number of partitions in P equals the number of nodes in G, then set done flag to True to end do-while loop, as this will mean that each node has been aggregated into its own community */
        if not done
            P_refined = get_p_refined(G, P)
                /* This is a crucial part of what separates Leiden from Louvain, as this refinement of the partition enforces that only nodes that are well connected within their community are considered to be moved out of the community. (more detail in function refine_partition_subset below) */
            G = aggregate_graph(G, P_refined)
                /* Aggregates communities into single nodes for next iteration (details in function below)*/
            P = {{v | v ⊆ C, v ∈ V (G)} | C ∈ P} 
                /*  This line essentially takes nodes from the communities in P and breaks them down so that each node is treated as its own singleton community (community made up of one node) */
        end if
    while not done
    return flattened(P) /* Return final partition where all nodes of G are listed in one community each */
end function

function fast_louvain_move_nodes(Graph G, Partition P)
    Q = queue(V(G))  /* Place all of the nodes of G into a queue to ensure that they are all visited */
    while Q not empty
        v = Q.pop_front() /* Select the first node from the queue to visit */
        C_prime = arg maxC∈P∪∅ ∆HP(v → C) 
            /* Set C_prime to be the community in P or the empty set (no community) that provides the maximum increase in the Quality function H when node v is moved into that community */
        if ∆HP(v → C_prime) > 0  /* Only look at moving nodes that will result in a positive change in the quality function */
            v → C_prime          /* Move node v to community C_prime. */
            N = {u | (u, v) ∈ E(G), u /∈ C_prime} /* Create a set N of nodes that are direct neighbors of v but are not in the community C_prime */
            Q.add(N - Q) /* Add all of the nodes from N to the queue, unless they are already in Q */
        end if
    return P /* Return the updated partition */
end function

function get_p_refined(Graph G, Partition P)
    P_refined = get_singleton_partition(G)  /* Assign each node in G to a singleton community (a community by itself). */
    for C ∈ P 
        P_refined = refine_partition_subset(G, P_refined, C) 
             /* Refine partition for each of the communities in P_refined */
    end for
    return P_refined ## return newly refined partition

function refine_partition_subset(Graph G, Partition P, Subset S)
    R = {v | v ∈ S, E(v, S − v) ≥  γ * degree(v) * (degree(S) − degree(v))}
        /* For node v, which is a member of subset S, check if E(v, S-v) (the edges of v connected to other members of the community S, excluding v itself) are above a certain scaling factor. degree(v) is the degree of node v and degree(S) is the total degree of the nodes in the subset S. This statement essentially requires that if v is removed from the subset, the community will remain in tact. */
    for v ∈ R 
        if v in singleton_community  /* If node v is in a singleton community, meaning it is the only node */
            T = {C | C ∈ P, C ⊆ S, E(C, S − C) ≥ γ * degree(C) · (degree(S) − degree(C)} 
                /* Create a set T of communities where E(C, S - C) (the edges between community C and subset S, excluding edges between community C and itself) is greater than the threshold. The threshold here is γ * degree(C) · (degree(S) − degree(C). */
            Pr(C_prime = C) ∼ exp(1/θ ∆HP(v → C) if ∆HP(v → C) ≥ 0
            0 otherwise                                             for C ∈ T
                /* If moving the node v to C_prime changes the quality function in the positive direction, set the probability that the community of v to exp(1/θ * ∆HP(v → C)) else set it to 0 for all of the communities in T */
            v → C_prime ## Move node v into a random C_prime community with a positive probability
        end if
    end for    
    return P  /* return refined partition. */
end function

function aggregate_graph(Graph G, Partition P)
     V = P   /* Set communities of P as individual nodes of the graph */
     E = {(C, D) | (u, v) ∈ E(G), u ∈ C ∈ P, v ∈ D ∈ P}  /* If u is a member of subset C of P, and v is a member subset D of P and u and v share an edge in E(G), then we add a connection between C and D in the new graph. */
    return Graph(V, E) /* Return the new graph's nodes and edges
end function. */

function get_singleton_partition(Graph G)
     return {{v} | v ∈ V (G)}  /* This is the function where we assign each node in G to a singleton community (a community by itself). */
end function

References

^ ^a ^b Traag, Vincent A; Waltman, Ludo; van Eck, Nees Jan (26 March 2019). "From Louvain to Leiden: guaranteeing well-connected communities". Scientific Reports. 9 (1): 5233. arXiv:1810.08473. Bibcode:2019NatSR...9.5233T. doi:10.1038/s41598-019-41695-z. PMC 6435756. PMID 30914743.
^ Clauset, Aaron and Newman, M. E. J. and Moore, Cristopher (2004). "Finding community structure in very large networks". Phys. Rev. E. 70 (6): 066111. arXiv:cond-mat/0408187. Bibcode:2004PhRvE..70f6111C. doi:10.1103/PhysRevE.70.066111. PMID 15697438. S2CID 8977721.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ https://arxiv.org/pdf/cond-mat/0402349
^ ^a ^b https://leidenalg.readthedocs.io/en/stable/reference.html
^ https://cran.r-project.org/web/packages/leiden/leiden.pdf
^ Traag, Vincent A; Van Dooren, Paul; Nesterov, Yurii (29 July 2011). "Narrow scope for resolution-limit-free community detection". Physical Review E. 84 (1): 016114. arXiv:1104.3083. Bibcode:2011PhRvE..84a6114T. doi:10.1103/PhysRevE.84.016114. PMID 21867264.
^ Ozaki, N., Tezuka, H., & Inaba, M. (2016). A Simple Acceleration Method for the Louvain Algorithm. International Journal of Computer and Electrical Engineering, 8(3), 207–218. doi:10.17706/ijcee.2016.8.3.207-218

[Leiden-1] Traag, Vincent A; Waltman, Ludo; van Eck, Nees Jan (26 March 2019). "From Louvain to Leiden: guaranteeing well-connected communities". Scientific Reports. 9 (1): 5233. arXiv:1810.08473. Bibcode:2019NatSR...9.5233T. doi:10.1038/s41598-019-41695-z. PMC 6435756. PMID 30914743.

[community-2] Clauset, Aaron and Newman, M. E. J. and Moore, Cristopher (2004). "Finding community structure in very large networks". Phys. Rev. E. 70 (6): 066111. arXiv:cond-mat/0408187. Bibcode:2004PhRvE..70f6111C. doi:10.1103/PhysRevE.70.066111. PMID 15697438. S2CID 8977721.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[RB-3] ttps://arxiv.org/pdf/cond-mat/0402349

[leidenalg-4] ttps://leidenalg.readthedocs.io/en/stable/reference.html

[leidenR-5] ttps://cran.r-project.org/web/packages/leiden/leiden.pdf

[CPM-6] Traag, Vincent A; Van Dooren, Paul; Nesterov, Yurii (29 July 2011). "Narrow scope for resolution-limit-free community detection". Physical Review E. 84 (1): 016114. arXiv:1104.3083. Bibcode:2011PhRvE..84a6114T. doi:10.1103/PhysRevE.84.016114. PMID 21867264.

[7] Ozaki, N., Tezuka, H., & Inaba, M. (2016). A Simple Acceleration Method for the Louvain Algorithm. International Journal of Computer and Electrical Engineering, 8(3), 207–218. doi:10.17706/ijcee.2016.8.3.207-218

[1]

[2]

[3]

[4]

[5]

[6]

[7]