6533b823fe1ef96bd127ec7c

RESEARCH PRODUCT

World Influence of Infectious Diseases from Wikipedia Network Analysis

Dima L. ShepelyanskyGuillaume RollinJosé Lages

subject

CheiRankComputer scienceHuman immunodeficiency virus (HIV)medicine.disease_cause01 natural sciences[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI]law.invention03 medical and health sciencesPageRanklaw0103 physical sciencesGlobal networkmedicine010306 general physics030304 developmental biology0303 health sciencesInformation retrievalGoogle matrixMarkov processes[PHYS.PHYS.PHYS-SOC-PH]Physics [physics]/Physics [physics]/Physics and Society [physics.soc-ph]complex networksdata mining[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]ranking (statistics)3. Good healthInfectious diseaseslcsh:Electrical engineering. Electronics. Nuclear engineeringlcsh:TK1-9971Network analysisWikipedia

description

AbstractWe consider the network of 5 416 537 articles of English Wikipedia extracted in 2017. Using the recent reduced Google matrix (REGOMAX) method we construct the reduced network of 230 articles (nodes) of infectious diseases and 195 articles of world countries. This method generates the reduced directed network between all 425 nodes taking into account all direct and indirect links with pathways via the huge global network. PageRank and CheiRank algorithms are used to determine the most influential diseases with the top PageRank diseases being Tuberculosis, HIV/AIDS and Malaria. From the reduced Google matrix we determine the sensitivity of world countries to specific diseases integrating their influence over all their history including the times of ancient Egyptian mummies. The obtained results are compared with the World Health Organization (WHO) data demonstrating that the Wikipedia network analysis provides reliable results with up to about 80 percent overlap between WHO and REGOMAX analyses.

10.1109/access.2019.2899339https://hal.archives-ouvertes.fr/hal-01880718