6533b7d7fe1ef96bd12683df
RESEARCH PRODUCT
Principal components analysis: theory and application to gene expression data analysis
Hristo TodorovSusanne GerberDavid Fourniersubject
0301 basic medicineComputer sciencebusiness.industryAssociation (object-oriented programming)Big dataGenomicsMachine learningcomputer.software_genreField (computer science)03 medical and health sciences030104 developmental biology0302 clinical medicineSoftwareWorkflowPrincipal component analysisData analysisArtificial intelligencebusinesscomputer030217 neurology & neurosurgerydescription
Advances in computational power have enabled research to generate significant amounts of data related to complex biological problems. Consequently, applying appropriate data analysis techniques has become paramount to tackle this complexity. However, theoretical understanding of statistical methods is necessary to ensure that the correct method is used and that sound inferences are made based on the analysis. In this article, we elaborate on the theory behind principal components analysis (PCA), which has become a favoured multivariate statistical tool in the field of omics-data analysis. We discuss the necessary prerequisites and steps to produce statistically valid results and provide guidelines for interpreting the output. Using PCA on gene expression data from a mouse experiment, we demonstrate that the main distinctive pattern in the data is associated with the transgenic mouse line and is not related to the mouse gender. A weaker association of the pattern with the genotype was also identified.
year | journal | country | edition | language |
---|---|---|---|---|
2018-01-30 | Genomics and Computational Biology |