0000000001277029

AUTHOR

Lilian Janin

showing 1 related works from this author

Adaptive reference-free compression of sequence quality scores

2014

Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…

Statistics and ProbabilityFOS: Computer and information sciencesComputer sciencemedia_common.quotation_subjectReference-freecomputer.software_genreBiochemistryDNA sequencingSet (abstract data type)Redundancy (information theory)BWTComputer Science - Data Structures and AlgorithmsCode (cryptography)AnimalsHumansQuality (business)Data Structures and Algorithms (cs.DS)Quantitative Biology - GenomicsCaenorhabditis elegansMolecular Biologymedia_commonGenomics (q-bio.GN)SequenceGenomeSettore INF/01 - Informaticareference-free compressionHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAData CompressioncompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesData miningquality scoreMetagenomicscomputerBWT; compression; quality score; reference-free compressionAlgorithmsReference genome
researchProduct