6533b81ffe1ef96bd12779be
RESEARCH PRODUCT
Automated quality control of next generation sequencing data using machine learning
Jean-fred FontaineMiguel A. Andrade-navarroSteffen Albrechtsubject
business.industryComputer sciencemedia_common.quotation_subjectDeep learningMachine learningcomputer.software_genreDNA sequencingStatistical classificationTree (data structure)Task (computing)SoftwareResource (project management)Data fileQuality (business)Artificial intelligencebusinesscomputermedia_commondescription
AbstractControlling quality of next generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterized common NGS quality features and developed a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal data and external disease diagnostic datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at the following URL:https://github.com/salbrec/seqQscorer.
year | journal | country | edition | language |
---|---|---|---|---|
2019-09-14 |