6533b86ffe1ef96bd12ce9bf

RESEARCH PRODUCT

Robust Principal Component Analysis of Data with Missing Values

Mirka SaarelaTommi Kärkkäinen

subject

Ground truthPCAComputer scienceRobust statisticsMissing datacomputer.software_genreSet (abstract data type)missing dataMultiple correspondence analysisrobust statisticsPrincipal component analysisData miningcomputerRobust principal component analysis

description

Principal component analysis is one of the most popular machine learning and data mining techniques. Having its origins in statistics, principal component analysis is used in numerous applications. However, there seems to be not much systematic testing and assessment of principal component analysis for cases with erroneous and incomplete data. The purpose of this article is to propose multiple robust approaches for carrying out principal component analysis and, especially, to estimate the relative importances of the principal components to explain the data variability. Computational experiments are first focused on carefully designed simulated tests where the ground truth is known and can be used to assess the accuracy of the results of the different methods. In addition, a practical application and evaluation of the methods for an educational data set is given. peerReviewed

http://urn.fi/URN:NBN:fi:jyu-201509042808