6533b824fe1ef96bd1280012

RESEARCH PRODUCT

Graphical Workflow System for Modification Calling by Machine Learning of Reverse Transcription Signatures

Lukas SchmidtStephan WernerThomas KemmerStefan NieblerMarco KristenLilia AyadiLilia AyadiPatrick JoheVirginie MarchandTanja SchirmeisterYuri MotorinYuri MotorinAndreas HildebrandtBertil SchmidtMark Helm

subject

0301 basic medicinelcsh:QH426-470Downstream (software development)Computer scienceRT signatureMachine learningcomputer.software_genre[SDV.BBM.BM] Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyField (computer science)m1A03 medical and health sciencesRNA modifications0302 clinical medicineEpitranscriptomics[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]GeneticsTechnology and CodeGalaxy platformGenetics (clinical)ComputingMilieux_MISCELLANEOUSbusiness.industryPrincipal (computer security)[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyAutomationWatson–Crick faceVisualizationlcsh:Geneticsmachine learningComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyWorkflow030220 oncology & carcinogenesisMolecular Medicine[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]TrimmingArtificial intelligencebusinesscomputer

description

Modification mapping from cDNA data has become a tremendously important approach in epitranscriptomics. So-called reverse transcription signatures in cDNA contain information on the position and nature of their causative RNA modifications. Data mining of, e.g. Illumina-based high-throughput sequencing data, is therefore fast growing in importance, and the field is still lacking effective tools. Here we present a versatile user-friendly graphical workflow system for modification calling based on machine learning. The workflow commences with a principal module for trimming, mapping, and postprocessing. The latter includes a quantification of mismatch and arrest rates with single-nucleotide resolution across the mapped transcriptome. Further downstream modules include tools for visualization, machine learning, and modification calling. From the machine-learning module, quality assessment parameters are provided to gauge the suitability of the initial dataset for effective machine learning and modification calling. This output is useful to improve the experimental parameters for library preparation and sequencing. In summary, the automation of the bioinformatics workflow allows a faster turnaround of the optimization cycles in modification calling.

10.3389/fgene.2019.00876https://hal.univ-lorraine.fr/hal-02317684