Search results for "Sequence analysis"

showing 10 items of 1349 documents

Genetic Characterization of Legionella pneumophila Isolated from a Common Watershed in Comunidad Valenciana, Spain

2013

Legionella pneumophila infects humans to produce legionellosis and Pontiac fever only from environmental sources. In order to establish control measures and study the sources of outbreaks it is essential to know extent and distribution of strain variants of this bacterium in the environment. Sporadic and outbreak-related cases of legionellosis have been historically frequent in the Comunidad Valenciana region (CV, Spain), with a high prevalence in its Southeastern-most part (BV). Environmental investigations for the detection of Legionella pneumophila are performed in this area routinely. We present a population genetics study of 87 L. pneumophila strains isolated in 13 different localities…

Evolutionary GeneticsBacterial DiseasesPopulation geneticslcsh:MedicineLocus (genetics)Legionella pneumophilaMicrobiologyMicrobial EcologyLegionella pneumophilaIntergenic regionGenetic variationmedicineNatural SelectionGeneticsGram Negativelcsh:ScienceBiologyMicrobial PathogensGeneticsRecombination GeneticGenetic diversityEvolutionary BiologyMultidisciplinaryLegionellosisbiologyEcologyEcologyPontiac feverlcsh:ROutbreakGenetic Variationbiology.organism_classificationmedicine.diseaseBacterial PathogensInfectious DiseasesSpainMicrobial EvolutionGenetic PolymorphismMedicinelcsh:QWater MicrobiologySequence AnalysisPopulation GeneticsResearch ArticlePLoS ONE

researchProduct

On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing

2013

One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with …

Evolutionary GeneticsChromosome Structure and Functionlcsh:MedicineComputational biologyBiologyGenomeDNA sequencingStructural variation03 medical and health sciences0302 clinical medicineGenetic MutationGeneticsFalse positive paradoxHumansComputer SimulationFalse Positive ReactionsGenomic libraryGenome Sequencinglcsh:ScienceBiologyGenome EvolutionFalse Negative Reactions030304 developmental biologyChromosomal inversionSegmental duplicationGeneticsEvolutionary Biology0303 health sciencesMultidisciplinaryChromosome Biologylcsh:RBreakpointMutation TypesComputational BiologyChromosome MappingGenomic EvolutionGenomicsSequence Analysis DNAComparative GenomicsChromosomes Human Pair 1Chromosome Inversionlcsh:QStructural GenomicsSequence AnalysisAlgorithms030217 neurology & neurosurgeryResearch Article

researchProduct

On the complexity of the Saccharomyces bayanus taxon: Hybridization and potential hybrid speciation

2014

Although the genus Saccharomyces has been thoroughly studied, some species in the genus has not yet been accurately resolved; an example is S. bayanus, a taxon that includes genetically diverse lineages of pure and hybrid strains. This diversity makes the assignation and classification of strains belonging to this species unclear and controversial. They have been subdivided by some authors into two varieties (bayanus and uvarum), which have been raised to the species level by others. In this work, we evaluate the complexity of 46 different strains included in the S. bayanus taxon by means of PCR-RFLP analysis and by sequencing of 34 gene regions and one mitochondrial gene. Using the sequenc…

Evolutionary GeneticsSaccharomyces bayanusDIVERSITYSequence Homologylcsh:MedicineSaccharomycesPolymerase Chain Reaction//purl.org/becyt/ford/1 [https]Genética y HerenciaPCR-RFLP analysisFungal EvolutionCluster Analysislcsh:ScienceGenome EvolutionPhylogenyGeneticsMultidisciplinarySACCHAROMYCES EUBAYANUSPhylogenetic analysisbiologyStrain (biology)Systems BiologyGenomicsS. bayanusPolymorphism Restriction Fragment LengthCIENCIAS NATURALES Y EXACTASResearch ArticleEvolutionary ProcessesGenetic SpeciationMolecular Sequence DataIntrogressionMycologyGenome ComplexityMicrobiologyGenètica molecularCiencias BiológicasSaccharomycesSpecies SpecificityPhylogeneticsGenetic variationGeneticsYEAST//purl.org/becyt/ford/1.6 [https]HybridizationAllelesHybridEvolutionary BiologyBase Sequencelcsh:ROrganismsFungiBiology and Life SciencesComputational BiologyGenetic VariationSACCHAROMYCES PASTORIANUSSequence Analysis DNAComparative Genomicsbiology.organism_classificationYeastGenetics PopulationHaplotypesFungal ClassificationHybridization GeneticHybrid speciationlcsh:Q

researchProduct

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

2019

AbstractThe widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotatio…

FOS: Computer and information sciencesBioinformatics[SDV]Life Sciences [q-bio]Sequence assemblyGenomics[SDV.BC]Life Sciences [q-bio]/Cellular BiologyComputational biologyBiologyGenome03 medical and health sciencesAnnotation0302 clinical medicineTandem repeatGeneticsAnimalsSurvey and SummaryDatabases ProteinGeneComputingMilieux_MISCELLANEOUS030304 developmental biology0303 health sciencesEnd user572: BiochemieDNASequence Analysis DNAGenomics[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]WorkflowComputingMethodologies_PATTERNRECOGNITIONGadus morhuaTandem Repeat SequencesScientific Experimental Error[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Databases Nucleic Acid030217 neurology & neurosurgery

researchProduct

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

2012

Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…

FOS: Computer and information sciencesStatistics and ProbabilityBurrows–Wheeler transformComputer scienceData_CODINGANDINFORMATIONTHEORYBurrows-Wheeler transformcomputer.software_genreBiochemistryBurrows-Wheeler transform; Data Compression; Next-generation sequencingComputer Science - Data Structures and AlgorithmsEscherichia coliCode (cryptography)HumansOverhead (computing)Data Structures and Algorithms (cs.DS)Computer SimulationQuantitative Biology - GenomicsMolecular BiologyGenomics (q-bio.GN)Genome HumanString (computer science)Search engine indexingSortingGenomicsSequence Analysis DNAConstruct (python library)Data CompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesNext-generation sequencingData miningDatabases Nucleic AcidcomputerAlgorithmsData compression

researchProduct

Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R

2019

Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be used in various longitudinal settings: to account for measurement error, to detect unobservable states, or to compress information across several types of observations. Extending to mixture hidden Markov models (MHMMs) allows clustering data into homogeneous subsets, with or without external covariate…

FOS: Computer and information sciencesStatistics and ProbabilityMultivariate statisticssequence analysisaikasarjatComputer sciencerMarkov modelStatistics - ComputationStatistics - Applications01 natural sciencesUnobservablecategorical time seriesR-kieli010104 statistics & probabilitymulti-channel sequences; categorical time series; visualizing sequence data; visualizing models; latent Markov models; latent class models; RCovariateApplications (stat.AP)Sannolikhetsteori och statistikComputer software0101 mathematicsTime seriesProbability Theory and StatisticsHidden Markov modelCluster analysislcsh:Statisticslcsh:HA1-4737Categorical variableComputation (stat.CO)ta112business.industryvisualizing sequence dataR (programming languages)Pattern recognitionmulti-channel sequencesvisualizing modelslatent class modelssekvenssianalyysiArtificial intelligencelatent markov modelstime seriesStatistics Probability and UncertaintybusinessSoftwareJournal of Statistical Software

researchProduct

Alignment-free Genomic Analysis via a Big Data Spark Platform

2021

Abstract Motivation Alignment-free distance and similarity functions (AF functions, for short) are a well-established alternative to pairwise and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive applications, the computation of AF functions is a Big Data problem, with the recent literature indicating that the development of fast and scalable algorithms computing AF functions is a high-priority task. Somewhat surprisingly, despite the increasing popularity of Big Data technologies in computational biology, the development of a Big Data platform for those tasks has not been pursued, possibly due to its complexity. Results We fill this impo…

FOS: Computer and information sciencesStatistics and Probabilitysequence analysisComputer science0206 medical engineeringBig data02 engineering and technologyMachine learningcomputer.software_genreBiochemistry03 medical and health sciencesSpark (mathematics)MapReduceMolecular Biology030304 developmental biology0303 health sciencesSettore INF/01 - Informaticabusiness.industryBioinformatics High Performance Computing Compressed Data StructuresMapReduce; hadoop; sequence analysisComputer Science ApplicationsComputational MathematicsTask (computing)Computer Science - Distributed Parallel and Cluster ComputingComputational Theory and MathematicsDistributed Parallel and Cluster Computing (cs.DC)Artificial intelligencehadoopbusinesscomputer020602 bioinformaticsBioinformatics

researchProduct

Configurable low-cost plotter device for fabrication of multi-color sub-cellular scale microarrays.

2014

We report on the construction and operation of a low-cost plotter for fabrication of microarrays for multiplexed single-cell analyses. The printing head consists of polymeric pyramidal pens mounted on a rotation stage installed on an aluminium frame. This construction enables printing of microarrays onto glass substrates mounted on a tilt stage, controlled by a Lab-View operated user interface. The plotter can be assembled by typical academic workshops from components of less than 15 000 Euro. The functionality of the instrument is demonstrated by printing DNA microarrays on the area of 0.5 squared centimeters using up to three different oligonucleotides. Typical feature sizes are 5 μm diam…

FabricationMaterials scienceScale (ratio)NanotechnologyMultiplexingBiomaterialsUser-Computer InterfacePlotterHumansGeneral Materials ScienceBiochipOligonucleotide Array Sequence AnalysisEGF ReceptorsEpidermal Growth FactorOligonucleotideDNA-directed protein immobilization EGF receptors device automation multiplexed patterns polymer pen lithographyGeneral ChemistryMicrofluidic Analytical TechniquesErbB ReceptorsTissue Array AnalysisCosts and Cost AnalysisMCF-7 CellsPrintingDNA microarraySingle-Cell AnalysisBiotechnologySmall (Weinheim an der Bergstrasse, Germany)

researchProduct

Confidence-based Somatic Mutation Evaluation and Prioritization

2012

Next generation sequencing (NGS) has enabled high throughput discovery of somatic mutations. Detection depends on experimental design, lab platforms, parameters and analysis algorithms. However, NGS-based somatic mutation detection is prone to erroneous calls, with reported validation rates near 54% and congruence between algorithms less than 50%. Here, we developed an algorithm to assign a single statistic, a false discovery rate (FDR), to each somatic mutation identified by NGS. This FDR confidence value accurately discriminates true mutations from erroneous calls. Using sequencing data generated from triplicate exome profiling of C57BL/6 mice and B16-F10 melanoma cells, we used the exist…

False discovery rateSequence analysisSomatic cellQH301-705.5Low ConfidenceDNA Mutational AnalysisBiologySensitivity and SpecificityDNA sequencing03 medical and health sciencesCellular and Molecular NeuroscienceMice0302 clinical medicineGermline mutationGenetic MutationGeneticsAnimalsExomeFalse Positive ReactionsGenome SequencingBiology (General)Molecular BiologyExomeBiologyMelanomaEcology Evolution Behavior and SystematicsHealth aging / healthy living Cardiovascular diseases [IGMD 5]030304 developmental biologyGenetics0303 health sciencesEcologyReceiver operating characteristicComputational BiologyReproducibility of ResultsGenomicsDNA NeoplasmSequence Analysis DNAMice Inbred C57BLComputational Theory and Mathematics030220 oncology & carcinogenesisModeling and SimulationMutationArtifactsResearch Article

researchProduct

Microarray mRNA expression analysis of Fanconi anemia fibroblasts.

2007

Fanconi anemia (FA) cells are generally hypersensitive to DNA cross-linking agents, implying that mutations in the different FANC genes cause a similar DNA repair defect(s). By using a customized cDNA microarray chip for DNA repair- and cell cycle-associated genes, we identified three genes, cathepsin B (CTSB), glutaredoxin (GLRX), and polo-like kinase 2 (PLK2), that were misregulated in untreated primary fibroblasts from three unrelated FA-D2 patients, compared to six controls. Quantitative real-time RT PCR was used to validate these results and to study possible molecular links between FA-D2 and other FA subtypes.…

Fanconi anemia complementation group CMicroarrayDNA RepairDNA repairMrna expressionBiologyProtein Serine-Threonine KinasesCathepsin Bchemistry.chemical_compoundCytogeneticsFanconi anemiahemic and lymphatic diseasesGeneticsmedicineHumansRNA MessengerMolecular BiologyGeneGenetics (clinical)GlutaredoxinsOligonucleotide Array Sequence AnalysisGeneticsReverse Transcriptase Polymerase Chain ReactionGene Expression ProfilingCell CycleFibroblastsmedicine.diseaseMolecular biologyFanconi AnemiachemistryCase-Control StudiesDNACytogenetic and genome research

researchProduct