6533b7d3fe1ef96bd125ff79

RESEARCH PRODUCT

Least-squares community extraction in feature-rich networks using similarity data

Soroosh ShalilehBoris MirkinBoris Mirkin

subject

Computer scienceEconomicsKernel FunctionsSocial Sciences02 engineering and technologyLeast squaresInfographicsTranslocation GeneticGeographical LocationsMedical Conditions0202 electrical engineering electronic engineering information engineeringMedicine and Health SciencesPsychologyCluster AnalysisOperator TheoryData ManagementMultidisciplinaryApplied MathematicsSimulation and ModelingQRExperimental PsychologyEuropeFeature (computer vision)Research DesignPhysical SciencesMedicine020201 artificial intelligence & image processingGraphsAlgorithmsNetwork AnalysisNetwork analysisResearch ArticleComputer and Information SciencesScienceFeature vectorScale (descriptive set theory)Research and Analysis MethodsColumn (database)Similarity (network science)020204 information systemsParasitic DiseasesLeast-Squares AnalysisFeature databusiness.industryData VisualizationBiology and Life SciencesPattern recognitionTropical DiseasesEconomic AnalysisMalariaPeople and PlacesArtificial intelligencebusinessMathematics

description

We explore a doubly-greedy approach to the issue of community detection in feature-rich networks. According to this approach, both the network and feature data are straightforwardly recovered from the underlying unknown non-overlapping communities, supplied with a center in the feature space and intensity weight(s) over the network each. Our least-squares additive criterion allows us to search for communities one-by-one and to find each community by adding entities one by one. A focus of this paper is that the feature-space data part is converted into a similarity matrix format. The similarity/link values can be used in either of two modes: (a) as measured in the same scale so that one may can meaningfully compare and sum similarity values across the entire similarity matrix (summability mode), and (b) similarity values in one column should not be compared with the values in other columns (nonsummability mode). The two input matrices and two modes lead us to developing four different Iterative Community Extraction from Similarity data (ICESi) algorithms, which determine the number of communities automatically. Our experiments at real-world and synthetic datasets show that these algorithms are valid and competitive.

10.1371/journal.pone.0254377http://europepmc.org/articles/PMC8282089