Search results for "genomics"

showing 10 items of 1255 documents

Adaptive reference-free compression of sequence quality scores

2014

Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…

Statistics and ProbabilityFOS: Computer and information sciencesComputer sciencemedia_common.quotation_subjectReference-freecomputer.software_genreBiochemistryDNA sequencingSet (abstract data type)Redundancy (information theory)BWTComputer Science - Data Structures and AlgorithmsCode (cryptography)AnimalsHumansQuality (business)Data Structures and Algorithms (cs.DS)Quantitative Biology - GenomicsCaenorhabditis elegansMolecular Biologymedia_commonGenomics (q-bio.GN)SequenceGenomeSettore INF/01 - Informaticareference-free compressionHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAData CompressioncompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesData miningquality scoreMetagenomicscomputerBWT; compression; quality score; reference-free compressionAlgorithmsReference genome

researchProduct

Metagenomics reveals our incomplete knowledge of global diversity

2008

Metagenomic sequencing obtains huge amounts of sequences from environmental and clinical samples, thus providing a glimpse of the global prokaryotic diversity of both species and genes in these sources. The current trend in metagenomic analysis follows the so-called gene-centric approach, focused on describing the environments by the study of the functional roles of the proteins encoded in the sequenced genes. In this way, it is clear that metagenomic analysis relies heavily on the accurate knowledge of the universe of proteins stored in the databases. Nevertheless, it is known that some biases exist in the composition of databases (which are rich in sequences from common, cultivable and ea…

Statistics and ProbabilityGeneticsPhylogenetic treebiologyPhylumGenetic VariationGenomicsBiodiversityGenomicsGenome Analysisbiology.organism_classificationBiochemistryComputer Science ApplicationsComputational MathematicsTaxonComputational Theory and MathematicsEvolutionary biologyMetagenomicsGenBankCIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIALTaxonomic rankLetter to the EditorMolecular BiologyEcosystemAcidobacteria

researchProduct

A parallel and sensitive software tool for methylation analysis on multicore platforms.

2015

Abstract Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. Results: We present a new software tool, called HPG-Methyl, which efficiently maps bis…

Statistics and ProbabilityMutation rateTime FactorsComputer scienceReal-time computingBisulfite sequencingMolecular Sequence DataGenomicsParallel computingcomputer.software_genremedicine.disease_causeBiochemistryGenomeBottleneckchemistry.chemical_compoundSoftwareMutation RateDatabases GeneticmedicineHumansSulfitesMolecular BiologyMutationMulti-core processorGenomeBase Sequencebusiness.industryHigh-Throughput Nucleotide SequencingMethylationGenomicsDNA MethylationOriginal PapersComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicschemistryDNA methylationScalabilityMutationCompilerbusinesscomputerSequence AnalysisDNAAlgorithmsSoftwareBioinformatics (Oxford, England)

researchProduct

A web application for the unspecific detection of differentially expressed DNA regions in strand-specific expression data

2015

Abstract Genomic technologies allow laboratories to produce large-scale data sets, either through the use of next-generation sequencing or microarray platforms. To explore these data sets and obtain maximum value from the data, researchers view their results alongside all the known features of a given reference genome. To study transcriptional changes that occur under a given condition, researchers search for regions of the genome that are differentially expressed between different experimental conditions. In order to identify these regions several algorithms have been developed over the years, along with some bioinformatic platforms that enable their use. However, currently available appli…

Statistics and ProbabilitySequence analysisADNGenomicsComputational biologyBiologycomputer.software_genreBiochemistryGenomeComputer GraphicsExpressió genèticaWeb applicationHumansMolecular BiologyGeneInternetMicroarray analysis techniquesbusiness.industryGenome HumanGene Expression ProfilingComputational BiologyHigh-Throughput Nucleotide SequencingDNAGenomicsSequence Analysis DNAComputer Science ApplicationsGene expression profilingComputational MathematicsGenòmicaComputingMethodologies_PATTERNRECOGNITIONComputational Theory and MathematicsData miningbusinesscomputerAlgorithmsGenèticaReference genome

researchProduct

Long read alignment based on maximal exact match seeds

2012

Abstract Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 an…

Statistics and ProbabilitySequencing and Sequence AnalysisTheoretical computer scienceGenomicsBiologyBiochemistrySoftwareHumansMolecular BiologyAlignment-free sequence analysisExact matchSupplementary dataGenome Humanbusiness.industryChromosome MappingHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAOriginal PapersComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsComputer engineeringScalabilitybusinessSequence AlignmentAlgorithmsSoftwareBioinformatics

researchProduct

ArtiFuse—computational validation of fusion gene detection tools without relying on simulated reads

2019

Abstract Motivation Gene fusions are an important class of transcriptional variants that can influence cancer development and can be predicted from RNA sequencing (RNA-seq) data by multiple existing tools. However, the real-world performance of these tools is unclear due to the lack of known positive and negative events, especially with regard to fusion genes in individual samples. Often simulated reads are used, but these cannot account for all technical biases in RNA-seq data generated from real samples. Results Here, we present ArtiFuse, a novel approach that simulates fusion genes by sequence modification to the genomic reference, and therefore, can be applied to any RNA-seq dataset wit…

Statistics and ProbabilitySource codeSequence analysisComputer sciencemedia_common.quotation_subjectValue (computer science)Genomicscomputer.software_genreBiochemistryFusion gene03 medical and health sciences0302 clinical medicineSoftwareMolecular BiologyGene030304 developmental biologymedia_common0303 health sciencesSequence Analysis RNAbusiness.industryHigh-Throughput Nucleotide SequencingRNAGenomicsComputer Science ApplicationsComputational MathematicsComputational Theory and Mathematics030220 oncology & carcinogenesisBenchmark (computing)RNAData miningGene FusionbusinesscomputerSoftwareBioinformatics

researchProduct

Structure Learning in Nested Effects Models

2007

Nested Effects Models (NEMs) are a class of graphical models introduced to analyze the results of gene perturbation screens. NEMs explore noisy subset relations between the high-dimensional outputs of phenotyping studies, e.g., the effects showing in gene expression profiles or as morphological features of the perturbed cell. In this paper we expand the statistical basis of NEMs in four directions. First, we derive a new formula for the likelihood function of a NEM, which generalizes previous results for binary data. Second, we prove model identifiability under mild assumptions. Third, we show that the new formulation of the likelihood allows efficiency in traversing model space. Fourth, we…

Statistics and ProbabilityTraverseComputer scienceMolecular Networks (q-bio.MN)Genes MHC Class IIPerturbation (astronomy)Genes InsectFeature selectionQuantitative Biology - Quantitative Methods03 medical and health sciences0302 clinical medicineGeneticsAnimalsheterocyclic compoundsQuantitative Biology - Molecular NetworksGraphical modelMolecular BiologyQuantitative Methods (q-bio.QM)Oligonucleotide Array Sequence Analysis030304 developmental biologyLikelihood Functions0303 health sciencesNanoelectromechanical systemsModels StatisticalModels GeneticGene Expression ProfilingGenomicsComputational MathematicsDrosophila melanogasterPhenotypeFOS: Biological sciencesBinary dataIdentifiabilityRNA InterferenceLikelihood functionAlgorithmAlgorithms030217 neurology & neurosurgery

researchProduct

Towards next-generation diagnostics for tuberculosis: identification of novel molecular targets by large-scale comparative genomics.

2020

5 páginas, 2 figuras. AVAILABILITY AND IMPLEMENTATION: The database of non-tuberculous mycobacteria assemblies can be accessed at: 10.5281/zenodo.3374377. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online: http://dx.doi.org/10.1093/bioinformatics/btz729

Statistics and ProbabilityTuberculosisGenomicsComputational biologyBiologyBiochemistryMycobacterium tuberculosis03 medical and health sciencesmedicineHumansTuberculosisDiscovery NotesMolecular Biology030304 developmental biologyComparative genomics0303 health sciences030306 microbiologyScale (chemistry)GenomicsMycobacterium tuberculosismedicine.diseasebiology.organism_classificationGenome Analysis3. Good healthComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsMycobacterium tuberculosis complexMolecular targetsIdentification (biology)BiomarkersBioinformatics (Oxford, England)

researchProduct

RabbitMash: accelerating hash-based genome analysis on modern multi-core architectures

2020

Abstract Motivation Mash is a popular hash-based genome analysis toolkit with applications to important downstream analyses tasks such as clustering and assembly. However, Mash is currently not able to fully exploit the capabilities of modern multi-core architectures, which in turn leads to high runtimes for large-scale genomic datasets. Results We present RabbitMash, an efficient highly optimized implementation of Mash which can take full advantage of modern hardware including multi-threading, vectorization and fast I/O. We show that our approach achieves speedups of at least 1.3, 9.8, 8.5 and 4.4 compared to Mash for the operations sketch, dist, triangle and screen, respectively. Furtherm…

Statistics and ProbabilityWorkstationExploitComputer scienceHash functionParallel computingBiochemistrylaw.invention03 medical and health sciencesSoftwarelawCluster analysisMolecular Biology030304 developmental biology0303 health sciencesMulti-core processorGenomeComputersbusiness.industry030302 biochemistry & molecular biologyGenomicsSketchComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsbusinessAlgorithmsSoftwareBioinformatics

researchProduct

Two hundred and fifty-four metagenome-assembled bacterial genomes from the bank vole gut microbiota.

2020

Abstract Vertebrate gut microbiota provide many essential services to their host. To better understand the diversity of such services provided by gut microbiota in wild rodents, we assembled metagenome shotgun sequence data from a small mammal, the bank vole Myodes glareolus (Rodentia, Cricetidae). We were able to identify 254 metagenome assembled genomes (MAGs) that were at least 50% ( n = 133 MAGs), 80% ( n = 77 MAGs) or 95% ( n = 44 MAGs) complete. As typical for a rodent gut microbiota, these MAGs are dominated by taxa assigned to the phyla Bacteroidetes ( n = 132 MAGs) and Firmicutes ( n = 80), with some Spirochaetes ( n = 15) and Proteobacteria ( n = 11). Based on coverage over…

Statistics and Probabilitymetagenomicsbacterial genomicsGenomeBacteriametsämyyräArvicolinaesuolistomikrobistoBacterialsequencinggenomiikkaLibrary and Information Sciencesmicrobial ecologybakteeritComputer Science ApplicationsEducationGastrointestinal MicrobiomemikrobiekologiaAnimalslcsh:QStatistics Probability and Uncertaintylcsh:ScienceInformation Systems

researchProduct