Search results for "GENOME"

showing 10 items of 1913 documents

Genome-wide association analysis identifies six new loci associated with forced vital capacity

2014

Forced vital capacity (FVC), a spirometric measure of pulmonary function, reflects lung volume and is used to diagnose and monitor lung diseases. We performed genome-wide association study meta-analysis of FVC in 52,253 individuals from 26 studies and followed up the top associations in 32,917 additional individuals of European ancestry. We found six new regions associated at genome-wide significance (P <5 x 10(-8)) with FVC in or near EFEMP1, BMP6, MIR129-2-HSD17B12, PRDM11, WWOX and KCNJ2. Two loci previously associated with spirometric measures (GSTCD and PTCH1) were related to FVC. Newly implicated regions were followed up in samples from African-American, Korean, Chinese and Hispani…

SpirometryLung DiseasesVital capacityQuantitative Trait LociVital CapacityGenome-wide association studyBiologyPolymorphism Single NucleotideArticleDISEASEPulmonary function testingCohort StudiesFEV1/FVC ratioIdiopathic pulmonary fibrosisSDG 3 - Good Health and Well-beingMeta-Analysis as TopicForced Expiratory VolumeDatabases GeneticGeneticsmedicineHumansRestrictive lung diseaseLung volumesGenetic Predisposition to Diseaselung; spriometry; SNP; geneGENE-EXPRESSIONGeneticsmedicine.diagnostic_testGenome HumanHERITABILITYHEALTHY TWINMORTALITYta3141respiratory systemmedicine.diseasePrognosis3. Good healthRespiratory Function Testsrespiratory tract diseasesFAMILYLUNG-FUNCTIONGenetic LociSpirometryImmunologyCELLSIDIOPATHIC PULMONARY-FIBROSISTRAITSFollow-Up StudiesGenome-Wide Association StudyNature Genetics

researchProduct

CROSSMAPPER: estimating cross-mapping rates and optimizing experimental design in multi-species sequencing studies

2020

Motivation Numerous sequencing studies, including transcriptomics of host-pathogen systems, sequencing of hybrid genomes, xenografts, mixed species systems, metagenomics and meta-transcriptomics, involve samples containing genetic material from divergent organisms. A crucial step in these studies is identifying from which organism each sequencing read originated, and the experimental design should be directed to minimize biases caused by cross-mapping of reads to incorrect source genomes. Additionally, pooling of sufficiently different genetic material into a single sequencing library could significantly reduce experimental costs but requires careful planning and assessment of the impact of…

Statistics and Probability:Informàtica::Aplicacions de la informàtica::Bioinformàtica [Àrees temàtiques de la UPC]Computer sciencecomputer.software_genreBiochemistryGenomeTranscriptome03 medical and health sciencesResource (project management)GenomesTranscriptomicsMolecular BiologyOrganismGenòmica -- Informàtica030304 developmental biology0303 health sciences030306 microbiologyHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNADNAGenome analysisGenome AnalysisAnàlisis de seqüènciesComputer Science ApplicationsApplications NoteComputational MathematicsComputational Theory and MathematicsCross-mappingResearch DesignMetagenomicsRNAData miningLine (text file)computerSoftwareGenèticaparametres

researchProduct

Cluster-Localized Sparse Logistic Regression for SNP Data

2012

The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, th…

Statistics and ProbabilityBoosting (machine learning)Computer scienceMultivariable calculusComputational BiologyHigh-Throughput Nucleotide SequencingFeature selectionRegression analysisModels TheoreticalLogistic regressioncomputer.software_genrePolymorphism Single NucleotideRegressionComputational MathematicsLogistic ModelsData Interpretation StatisticalGeneticsCluster AnalysisHumansData miningCluster analysisMolecular BiologyUnit-weighted regressioncomputerGenome-Wide Association StudyStatistical Applications in Genetics and Molecular Biology

researchProduct

DySC: software for greedy clustering of 16S rRNA reads.

2012

Abstract Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. Availability and implementation: DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license. Contact: bertil.schmidt@uni-mainz.de Sup…

Statistics and ProbabilityComputer sciencebusiness.industrySequence Analysis RNA16S ribosomal RNAcomputer.software_genreBiochemistryComputer Science ApplicationsComputational MathematicsSoftwareComputational Theory and MathematicsRNA Ribosomal 16SCluster AnalysisMetagenomeData miningCluster analysisbusinessMolecular BiologycomputerSoftwareBioinformatics (Oxford, England)

researchProduct

Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data

2012

Abstract Motivation: The imperfect sequence data produced by next-generation sequencing technologies have motivated the development of a number of short-read error correctors in recent years. The majority of methods focus on the correction of substitution errors, which are the dominant error source in data produced by Illumina sequencing technology. Existing tools either score high in terms of recall or precision but not consistently high in terms of both measures. Results: In this article, we present Musket, an efficient multistage k-mer-based corrector for Illumina short-read data. We use the k-mer spectrum approach and introduce three correction techniques in a multistage workflow: two-s…

Statistics and ProbabilityComputer sciencebusiness.industrySequence assemblySequence Analysis DNAMusketBiochemistryComputer Science ApplicationsComputational MathematicsCUDASoftwareComputational Theory and Mathematicsk-merEscherichia coliChromosomes HumanHumansbusinessFocus (optics)Molecular BiologyAlgorithmAlgorithmsGenome BacterialSoftwareIllumina dye sequencingBioinformatics

researchProduct

Adaptive reference-free compression of sequence quality scores

2014

Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…

Statistics and ProbabilityFOS: Computer and information sciencesComputer sciencemedia_common.quotation_subjectReference-freecomputer.software_genreBiochemistryDNA sequencingSet (abstract data type)Redundancy (information theory)BWTComputer Science - Data Structures and AlgorithmsCode (cryptography)AnimalsHumansQuality (business)Data Structures and Algorithms (cs.DS)Quantitative Biology - GenomicsCaenorhabditis elegansMolecular Biologymedia_commonGenomics (q-bio.GN)SequenceGenomeSettore INF/01 - Informaticareference-free compressionHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAData CompressioncompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesData miningquality scoreMetagenomicscomputerBWT; compression; quality score; reference-free compressionAlgorithmsReference genome

researchProduct

Metagenomics reveals our incomplete knowledge of global diversity

2008

Metagenomic sequencing obtains huge amounts of sequences from environmental and clinical samples, thus providing a glimpse of the global prokaryotic diversity of both species and genes in these sources. The current trend in metagenomic analysis follows the so-called gene-centric approach, focused on describing the environments by the study of the functional roles of the proteins encoded in the sequenced genes. In this way, it is clear that metagenomic analysis relies heavily on the accurate knowledge of the universe of proteins stored in the databases. Nevertheless, it is known that some biases exist in the composition of databases (which are rich in sequences from common, cultivable and ea…

Statistics and ProbabilityGeneticsPhylogenetic treebiologyPhylumGenetic VariationGenomicsBiodiversityGenomicsGenome Analysisbiology.organism_classificationBiochemistryComputer Science ApplicationsComputational MathematicsTaxonComputational Theory and MathematicsEvolutionary biologyMetagenomicsGenBankCIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIALTaxonomic rankLetter to the EditorMolecular BiologyEcosystemAcidobacteria

researchProduct

Prokaryotic symbiotic consortia and the origin of nucleated cells: A critical review of Lynn Margulis hypothesis.

2021

The publication in the late 1960s of Lynn Margulis endosymbiotic proposal is a scientific milestone that brought to the fore of evolutionary discussions the issue of the origin of nucleated cells. Although it is true that the times were ripe, the timely publication of Lynn Margulis' original paper was the product of an intellectually bold 29-years old scientist, who based on the critical analysis of the available scientific information produced an all-encompassing, sophisticated narrative scheme on the origin of eukaryotic cells as a result of the evolution of prokaryotic consortia and, in bold intellectual stroke, put it all in the context of planetary evolution. A critical historical reas…

Statistics and ProbabilityHistoryCentromereGenome PlastidMicrobial ConsortiaGene transferContext (language use)General Biochemistry Genetics and Molecular Biology03 medical and health sciences0302 clinical medicineCell MovementSymbiosisGene transferNon-mendelian inheritance030304 developmental biologyOrganelles0303 health sciencesEndosymbiosisEndosymbiosisApplied MathematicsNarrative historyGeneral MedicineBiological EvolutionGenealogyBasal BodiesStructural heredityEukaryotic CellsAsgard archaeaProkaryotic CellsMicrobial consortiaFlagellaModeling and SimulationGenome MitochondrialPlanetary Evolution030217 neurology & neurosurgeryBio Systems

researchProduct

SeqEditor: an application for primer design and sequence analysis with or without GTF/GFF files

2021

[Motivation]: Sequence analyses oriented to investigate specific features, patterns and functions of protein and DNA/RNA sequences usually require tools based on graphic interfaces whose main characteristic is their intuitiveness and interactivity with the user’s expertise, especially when curation or primer design tasks are required. However, interface-based tools usually pose certain computational limitations when managing large sequences or complex datasets, such as genome and transcriptome assemblies. Having these requirments in mind we have developed SeqEditor an interactive software tool for nucleotide and protein sequences’ analysis.

Statistics and ProbabilityInterface (Java)Sequence analysisComputer sciencePcr assayBiochemistryGenomeTranscriptome03 medical and health sciencesSequence Analysis ProteinMultiplex polymerase chain reactionHumansNucleotideAmino Acid SequenceMolecular Biology030304 developmental biologychemistry.chemical_classification0303 health sciencesGenomeInformation retrievalContig030302 biochemistry & molecular biologyChromosomeComputer Science ApplicationsComputational MathematicsComputingMethodologies_PATTERNRECOGNITIONComputational Theory and MathematicschemistryLine (text file)Primer (molecular biology)Sequence AnalysisSoftwareReference genome

researchProduct

CARE: context-aware sequencing read error correction.

2020

Abstract Motivation Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. Results We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors ar…

Statistics and ProbabilityMultiple sequence alignmentComputer scienceSequence assemblyHigh-Throughput Nucleotide SequencingContext (language use)Sequence Analysis DNAcomputer.software_genreBiochemistryGenomeComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsHumansHuman genomeData miningError detection and correctionMolecular BiologycomputerSequence AlignmentAlgorithmsSoftwareBioinformatics (Oxford, England)

researchProduct