6533b7d3fe1ef96bd1260952

RESEARCH PRODUCT

The evolution of splicing: transcriptome complexity and transcript distances implemented in TranD

Adalena NanniZihao LiuLauren M. McintyreOleksandr MoskalenkoJames Titus-mcquillanAna ConesaAna ConesaRebekah L. RogersFrancisco Jose Pardo-palacios

subject

TranscriptomeAnnotationExonAlternative splicingRNA splicingIntronComputational biologyBiologyGeneExon skipping

description

AbstractAlternative splicing contributes to organismal complexity. Comparing transcripts between and within species is an important first step toward understanding questions about how evolution of transcript structure changes between species and contributes to sub-functionalization. These questions are confounded with issues of data quality and availability. The recent explosion of affordable long read sequencing of mRNA has considerably widened the ability to study transcriptional variation in non-model species. In this work, we develop a computational framework that uses nucleotide resolution distance metrics to compare transcript models for structural phenotypes: total transcript length, intron retention, donor/acceptor site variation, alternative exon cassettes, alternative 5’/3’ UTRs are each scored qualitatively and quantitatively in terms of number of nucleotides. For a single annotation file, all differences among transcripts within a gene are summarized and transcriptome-level complexity metrics: number of variable nucleotides, unique exons per gene, exons per transcript, and transcripts per gene are calculated. To compare two transcriptomes on the same co-ordinates, a weighted total distance between pairs of transcripts for the same gene is calculated. The weight function proposed has larger penalties for intron retention and exon skipping than alternative donor/acceptor sites. Minimum distances can be used to identify both transcript pairs and transcripts missing structural elements in either of the two annotations. This enables a broad range of functionality from comparing sister species to comparing different methods of building and summarizing transcriptomes. Importantly, the philosophy here is to output metrics, enabling others to explore the nucleotide-level distance metrics. Single transcriptome annotation summaries and pairwise comparisons are implemented in a new tool, TranD, distributed as a PyPi package and in the open-source web-based Galaxy (www.galaxyproject.org) platform.

https://doi.org/10.1101/2021.09.28.462251