0000000000378536

AUTHOR

Miguel A Andrade-navarro

SuppFile1.fasta.txt – Supplemental material for MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments

Supplemental material, SuppFile1.fasta.txt for MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments by Pablo Mier and Miguel A Andrade-Navarro in Evolutionary Bioinformatics

research product

Glutamine Codon Usage and polyQ Evolution in Primates Depend on the Q Stretch Length

Abstract Amino acid usage in a proteome depends mostly on its taxonomy, as it does the codon usage in transcriptomes. Here, we explore the level of variation in the codon usage of a specific amino acid, glutamine, in relation to the number of consecutive glutamine residues. We show that CAG triplets are consistently more abundant in short glutamine homorepeats (polyQ, four to eight residues) than in shorter glutamine stretches (one to three residues), leading to the evolutionary growth of the repeat region in a CAG-dependent manner. The length of orthologous polyQ regions is mostly stable in primates, particularly the short ones. Interestingly, given a short polyQ the CAG usage is higher in…

research product

SuppFile2.fasta.txt – Supplemental material for MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments

Supplemental material, SuppFile2.fasta.txt for MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments by Pablo Mier and Miguel A Andrade-Navarro in Evolutionary Bioinformatics

research product

Quality control guidelines and machine learning predictions for next generation sequencing data

Abstract Controlling the quality of next generation sequencing (NGS) data files is usually not fully automatized because of its complexity and involves strong assumptions and arbitrary choices. We have statistically characterized common NGS quality features of a large set of files and optimized the complex quality control procedure using a machine learning approach including tree-based algorithms and deep learning. Predictive models were validated using internal and external data, including applications to disease diagnosis datasets. Models are unbiased, accurate and to some extent generalizable to unseen data types and species. Given enough labelled data for training, this approach could p…

research product