Statistically validated networks in bipartite complex systems.

6533b7d7fe1ef96bd1267c5f

RESEARCH PRODUCT

Statistically validated networks in bipartite complex systems.

Michele Tumminello Michele Tumminello Jyrki Piilo Fabrizio Lillo Fabrizio Lillo Fabrizio Lillo Salvatore Miccichè Rosario N. Mantegna

subject

Theoretical computer science Computer science lcsh:Medicine Network theory Social and Behavioral Sciences Bioinformatics Quantitative Biology - Quantitative Methods Sociology Protein Interaction Mapping lcsh:Science Quantitative Methods (q-bio.QM)Multidisciplinary Systems Biology Applied Mathematics Physics Statistics Complex Systems Genomics Link (geometry)Social Networks Specialization (logic)Interdisciplinary Physics Bipartite graph Probability distribution Research Article Network analysis Physics - Physics and Society Complex system FOS: Physical sciences Physics and Society (physics.soc-ph)Type (model theory)Biology Models Biological Network theory Statistical Physics Statistical Mechanics Set (abstract data type)Statistical Methods Biology Structure (mathematical logic)Statistical Physics lcsh:R Computational Biology Models Theoretical Comparative Genomics Settore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)FOS: Biological sciences Network theory lcsh:Q Null hypothesis Mathematics

description

Many complex systems present an intrinsic bipartite nature and are often described and modeled in terms of networks [1-5]. Examples include movies and actors [1, 2, 4], authors and scientific papers [6-9], email accounts and emails [10], plants and animals that pollinate them [11, 12]. Bipartite networks are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set. When one constructs a projected network with nodes from only one set, the system heterogeneity makes it very difficult to identify preferential links between the elements. Here we introduce an unsupervised method to statistically validate each link of the projected network against a null hypothesis taking into account the heterogeneity of the system. We apply our method to three different systems, namely the set of clusters of orthologous genes (COG) in completely sequenced genomes [13, 14], a set of daily returns of 500 US financial stocks, and the set of world movies of the IMDb database [15]. In all these systems, both different in size and level of heterogeneity, we find that our method is able to detect network structures which are informative about the system and are not simply expression of its heterogeneity. Specifically, our method (i) identifies the preferential relationships between the elements, (ii) naturally highlights the clustered structure of investigated systems, and (iii) allows to classify links according to the type of statistically validated relationships between the connected nodes.

year	journal	country	edition	language
2011-01-01	PLoS ONE

10.1371/journal.pone.0017994 http://europepmc.org/articles/PMC3069038?pdf=render