6533b839fe1ef96bd12a642b
RESEARCH PRODUCT
RNACache: Fast Mapping of RNA-Seq Reads to Transcriptomes Using MinHashing
Bertil SchmidtAndré MüllerJulian CascittiStefan Nieblersubject
WorkstationComputer sciencebusiness.industryHash functionBig dataRNA-SeqComputational biologyPipeline (software)Fast mappinglaw.inventionTranscriptomelawScalabilitybusinessdescription
The alignment of reads to a transcriptome is an important initial step in a variety of bioinformatics RNA-seq pipelines. As traditional alignment-based tools suffer from high runtimes, alternative, alignment-free methods have recently gained increasing importance. We present a novel approach to the detection of local similarities between transcriptomes and RNA-seq reads based on context-aware minhashing. We introduce RNACache, a three-step processing pipeline consisting of minhashing of k-mers, match-based (online) filtering, and coverage-based filtering in order to identify truly expressed transcript isoforms. Our performance evaluation shows that RNACache produces transcriptomic mappings of high accuracy that include significantly fewer erroneous matches compared to the state-of-the-art tools RapMap, Salmon, and Kallisto. Furthermore, it offers scalable and highly competitive runtime performance at low memory consumption on common multi-core workstations. RNACache is publicly available at: https://github.com/jcasc/rnacache.
year | journal | country | edition | language |
---|---|---|---|---|
2021-01-01 |