6533b832fe1ef96bd129adf9

RESEARCH PRODUCT

Feature Ranking of Large, Robust, and Weighted Clustering Result

Tommi KärkkäinenJoonas HämäläinenMirka Saarela

subject

Kruskal-Wallis testComputer scienceCorrelation clusteringPopulation02 engineering and technologycomputer.software_genreMachine learning01 natural sciencesRanking (information retrieval)010104 statistics & probabilityKnowledge extractionCURE data clustering algorithmpopulation analysisRanking SVM0202 electrical engineering electronic engineering information engineeringTest statistic0101 mathematicseducational knowledge discoveryeducationCluster analysiseducation.field_of_studybusiness.industryRanking020201 artificial intelligence & image processingData miningArtificial intelligencerobust clusteringbusinesscomputer

description

A clustering result needs to be interpreted and evaluated for knowledge discovery. When clustered data represents a sample from a population with known sample-to-population alignment weights, both the clustering and the evaluation techniques need to take this into account. The purpose of this article is to advance the automatic knowledge discovery from a robust clustering result on the population level. For this purpose, we derive a novel ranking method by generalizing the computation of the Kruskal-Wallis H test statistic from sample to population level with two different approaches. Application of these enlargements to both the input variables used in clustering and to metadata provides automatic determination of variable ranking that can be used to explain and distinguish the groups of population. The ranking method is illustrated with an open data and then, applied to advance the educational knowledge discovery from large scale international student assessment data, whose robust clustering into disjoint groups on three different levels of abstraction was performed in [19]. peerReviewed

https://doi.org/10.1007/978-3-319-57454-7_8