Search results for "Reference-free"

showing 3 items of 3 documents

Detecting mutations by eBWT

2018

In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in th…

0301 basic medicineFOS: Computer and information sciences000 Computer science knowledge general worksBWT LCP Array SNPs Reference-free Assembly-freeLCP ArraySettore INF/01 - Informatica[SDV]Life Sciences [q-bio]Reference-freeAssembly-freeSNP03 medical and health sciences030104 developmental biologyBWTBWT; LCP Array; SNPs; Reference-free; Assembly-freeComputer ScienceComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)[INFO]Computer Science [cs]SoftwareSNPs
researchProduct

Adaptive reference-free compression of sequence quality scores

2014

Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…

Statistics and ProbabilityFOS: Computer and information sciencesComputer sciencemedia_common.quotation_subjectReference-freecomputer.software_genreBiochemistryDNA sequencingSet (abstract data type)Redundancy (information theory)BWTComputer Science - Data Structures and AlgorithmsCode (cryptography)AnimalsHumansQuality (business)Data Structures and Algorithms (cs.DS)Quantitative Biology - GenomicsCaenorhabditis elegansMolecular Biologymedia_commonGenomics (q-bio.GN)SequenceGenomeSettore INF/01 - Informaticareference-free compressionHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAData CompressioncompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesData miningquality scoreMetagenomicscomputerBWT; compression; quality score; reference-free compressionAlgorithmsReference genome
researchProduct

SNPs detection by eBWT positional clustering

2019

Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data. We develop the positional clustering theory that (i) describes how the extended Burrows–Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the e…

lcsh:QH426-470Computer scienceLCP arrayReference-free[SDV]Life Sciences [q-bio]0206 medical engineeringSequencing dataSNPAssembly-free02 engineering and technologyBWT LCP array SNPs Reference-free Assembly-freecomputer.software_genreSoftwareBWTStructural BiologyComputational Theory and MathematicCluster (physics)Cluster analysislcsh:QH301-705.5Molecular BiologyComputingMilieux_MISCELLANEOUSSettore INF/01 - Informaticabusiness.industryResearchApplied MathematicsLCP arrayData structurePipeline (software)lcsh:GeneticsComputational Theory and Mathematicslcsh:Biology (General)Data miningBWT; LCP array; SNPs; Reference-free; Assembly-free[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]businessRaw datacomputer020602 bioinformaticsSNPs
researchProduct