Search results for "Sequence analysis"
showing 10 items of 1349 documents
Genetic Characterization of Legionella pneumophila Isolated from a Common Watershed in Comunidad Valenciana, Spain
2013
Legionella pneumophila infects humans to produce legionellosis and Pontiac fever only from environmental sources. In order to establish control measures and study the sources of outbreaks it is essential to know extent and distribution of strain variants of this bacterium in the environment. Sporadic and outbreak-related cases of legionellosis have been historically frequent in the Comunidad Valenciana region (CV, Spain), with a high prevalence in its Southeastern-most part (BV). Environmental investigations for the detection of Legionella pneumophila are performed in this area routinely. We present a population genetics study of 87 L. pneumophila strains isolated in 13 different localities…
On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing
2013
One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with …
On the complexity of the Saccharomyces bayanus taxon: Hybridization and potential hybrid speciation
2014
Although the genus Saccharomyces has been thoroughly studied, some species in the genus has not yet been accurately resolved; an example is S. bayanus, a taxon that includes genetically diverse lineages of pure and hybrid strains. This diversity makes the assignation and classification of strains belonging to this species unclear and controversial. They have been subdivided by some authors into two varieties (bayanus and uvarum), which have been raised to the species level by others. In this work, we evaluate the complexity of 46 different strains included in the S. bayanus taxon by means of PCR-RFLP analysis and by sequencing of 34 gene regions and one mitochondrial gene. Using the sequenc…
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases
2019
AbstractThe widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotatio…
Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
2012
Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…
Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R
2019
Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be used in various longitudinal settings: to account for measurement error, to detect unobservable states, or to compress information across several types of observations. Extending to mixture hidden Markov models (MHMMs) allows clustering data into homogeneous subsets, with or without external covariate…
Alignment-free Genomic Analysis via a Big Data Spark Platform
2021
Abstract Motivation Alignment-free distance and similarity functions (AF functions, for short) are a well-established alternative to pairwise and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive applications, the computation of AF functions is a Big Data problem, with the recent literature indicating that the development of fast and scalable algorithms computing AF functions is a high-priority task. Somewhat surprisingly, despite the increasing popularity of Big Data technologies in computational biology, the development of a Big Data platform for those tasks has not been pursued, possibly due to its complexity. Results We fill this impo…
Configurable low-cost plotter device for fabrication of multi-color sub-cellular scale microarrays.
2014
We report on the construction and operation of a low-cost plotter for fabrication of microarrays for multiplexed single-cell analyses. The printing head consists of polymeric pyramidal pens mounted on a rotation stage installed on an aluminium frame. This construction enables printing of microarrays onto glass substrates mounted on a tilt stage, controlled by a Lab-View operated user interface. The plotter can be assembled by typical academic workshops from components of less than 15 000 Euro. The functionality of the instrument is demonstrated by printing DNA microarrays on the area of 0.5 squared centimeters using up to three different oligonucleotides. Typical feature sizes are 5 μm diam…
Confidence-based Somatic Mutation Evaluation and Prioritization
2012
Next generation sequencing (NGS) has enabled high throughput discovery of somatic mutations. Detection depends on experimental design, lab platforms, parameters and analysis algorithms. However, NGS-based somatic mutation detection is prone to erroneous calls, with reported validation rates near 54% and congruence between algorithms less than 50%. Here, we developed an algorithm to assign a single statistic, a false discovery rate (FDR), to each somatic mutation identified by NGS. This FDR confidence value accurately discriminates true mutations from erroneous calls. Using sequencing data generated from triplicate exome profiling of C57BL/6 mice and B16-F10 melanoma cells, we used the exist…
Microarray mRNA expression analysis of Fanconi anemia fibroblasts.
2007
Fanconi anemia (FA) cells are generally hypersensitive to DNA cross-linking agents, implying that mutations in the different <i>FANC</i> genes cause a similar DNA repair defect(s). By using a customized cDNA microarray chip for DNA repair- and cell cycle-associated genes, we identified three genes, cathepsin B (<i>CTSB</i>), glutaredoxin (<i>GLRX</i>), and polo-like kinase 2 (<i>PLK2</i>), that were misregulated in untreated primary fibroblasts from three unrelated FA-D2 patients, compared to six controls. Quantitative real-time RT PCR was used to validate these results and to study possible molecular links between FA-D2 and other FA subtypes.…