6533b7d3fe1ef96bd1260b5c

RESEARCH PRODUCT

Machine learning of reverse transcription signatures of variegated polymerases allows mapping and discrimination of methylated purines in limited transcriptomes

Thomas KemmerChristoph FalschlungerGuillaume BecVirginie MarchandRonald MicuraClaudia HöbartnerYuri MotorinMark HelmLukas SchmidtMaksim V. SednevStephan WernerEric EnnifarAndreas Hildebrandt

subject

AdenosineAcademicSubjects/SCI00010Machine learningcomputer.software_genre[SDV.BBM.BM] Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyMethylationMachine Learning03 medical and health sciences0302 clinical medicineComplementary DNA[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]GeneticsMolecular BiologyPolymerase030304 developmental biologychemistry.chemical_classification0303 health sciencesOligoribonucleotidesGuanosinebiologybusiness.industryRNA-Directed DNA PolymeraseRNARNA-Directed DNA Polymerase[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyReverse TranscriptionMethylationReverse transcriptaseEnzymechemistryTransfer RNAbiology.protein[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]Artificial intelligenceTranscriptomebusinesscomputer030217 neurology & neurosurgery

description

AbstractReverse transcription (RT) of RNA templates containing RNA modifications leads to synthesis of cDNA containing information on the modification in the form of misincorporation, arrest, or nucleotide skipping events. A compilation of such events from multiple cDNAs represents an RT-signature that is typical for a given modification, but, as we show here, depends also on the reverse transcriptase enzyme. A comparison of 13 different enzymes revealed a range of RT-signatures, with individual enzymes exhibiting average arrest rates between 20 and 75%, as well as average misincorporation rates between 30 and 75% in the read-through cDNA. Using RT-signatures from individual enzymes to train a random forest model as a machine learning regimen for prediction of modifications, we found strongly variegated success rates for the prediction of methylated purines, as exemplified with N1-methyladenosine (m1A). Among the 13 enzymes, a correlation was found between read length, misincorporation, and prediction success. Inversely, low average read length was correlated to high arrest rate and lower prediction success. The three most successful polymerases were then applied to the characterization of RT-signatures of other methylated purines. Guanosines featuring methyl groups on the Watson-Crick face were identified with high confidence, but discrimination between m1G and m22G was only partially successful. In summary, the results suggest that, given sufficient coverage and a set of specifically optimized reaction conditions for reverse transcription, all RNA modifications that impede Watson-Crick bonds can be distinguished by their RT-signature.

10.1093/nar/gkaa113https://hal.univ-lorraine.fr/hal-02492494