Extensive Benchmarking of Community Detection Algorithms
Extensive Benchmarking of Community Detection Algorithms
R, S.; Karthik, H.; Raman, K.
AbstractThe detection of clusters or community in networks is an important problem in network science. We systematically evaluate many widely used community detection algorithms and their variants to identify clusters in complex networks. As the ground truth for assessing accuracy, we use artificial networks modeled on power-law distributions and real-world social networks. In addition, we also performed gene enrichment analysis on human and yeast protein protein interaction networks to evaluate algorithms on their ability to uncover enriched communities. We implement and adapt an extensive suite of classical algorithms and their modern variants, classified into five types: stochastic, kernel-based, modularity-based, hierarchical, and local search-based. The algorithms are benchmarked primarily using the Normalized Mutual Information metric, with additional analyses focused on granularity by examining cluster ratio and computational time complexity. We find that decreasing the modularity of networks leads to a consistent decline in performance that follows a sigmoidal trajectory as communities become less defined. Algorithms with greater granularity remain stable when community structures are less distinct, while computation time remains independent of network modularity. Additionally, algorithms tend to perform poorly on smaller networks, and higher accuracy often requires a time complexity trade off for specific high performing methods. However, as the analysis expands to more extensive networks, this trade off becomes more pronounced, highlighting the need for efficient scalability. Based on our benchmark and gene enrichment analysis results, we also present recommendations to practitioners. Our robust Python package, complete with a user friendly command-line interface, empowers users to easily apply these algorithms to their datasets.