6533b85afe1ef96bd12b8be8

RESEARCH PRODUCT

Research literature clustering using diffusion maps

Ilkka PölönenPaavo NieminenTuomo Sipola

subject

ta113kirjallisuuskatsausklusterointiComputer scienceProcess (engineering)Dimensionality reductiondiffuusiokuvausta111Diffusion mapKeyword extractionliterature mappingdiffusion mapKnowledge discovery processLibrary and Information Sciencescomputer.software_genreData scienceField (geography)Computer Science ApplicationsKnowledge extractionTiedonhavaitsemisprosessitiedonlouhintaCluster analysiscomputerWeb scrapingclustering

description

We apply the knowledge discovery process to the mapping of current topics in a particular field of science. We are interested in how articles form clusters and what are the contents of the found clusters. A framework involving web scraping, keyword extraction, dimensionality reduction and clustering using the diffusion map algorithm is presented. We use publicly available information about articles in high-impact journals. The method should be of use to practitioners or scientists who want to overview recent research in a field of science. As a case study, we map the topics in data mining literature in the year 2011. peerReviewed

http://urn.fi/URN:NBN:fi:jyu-201309202329