Datasets

The community structure analysis framework can be used on any type of network, but the tools available on this website are intended for use on undirected, unweighted networks. We considered networks with up to several hundred thousand nodes, and the scalability of the community detection methods under study is the primary constraint on network size.

Because we were interested in analyzing the structure of real communities, we studied networks that contained external annotation that allowed us to identify such communities. These networks are listed below.

Amazon product co-purchasing network, where product categories reflect real communities
LiveJournal social network, where users explicitly join communities
Genetic interaction networks for S. Cerevisiae, H. Sapiens, and D. Melanogaster, where gene functions identify communities. In our work, we used the PPI network files to identify networks, and the protein ID/gene ontology (GI) ID files to identify communities. Note that the files available may change as new interactions and genetic functions are discovered; in fact, these files have been modified at least once since our work.
DBLP co-authorship network, where conferences and journals represent communities. The data that we used is available here (Names have been replaced with integers. In the community file, each line contains one community); however, note that the DBLP website is constantly being updated and so our data is out-of-date.
Additionally, Facebook data for students at a university was obtained with permission from Dr. Alan Mislove

Community Structure Analysis Framework

Bruno Abrahao, Sucheta Soundarajan, Robert Kleinberg, John Hopcroft
Cornell University

Community Structure Analysis Framework

Bruno Abrahao, Sucheta Soundarajan, Robert Kleinberg, John HopcroftCornell University

Bruno Abrahao, Sucheta Soundarajan, Robert Kleinberg, John Hopcroft
Cornell University