Search results for " GENOMICS"
showing 10 items of 390 documents
Adaptive reference-free compression of sequence quality scores
2014
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…
Towards next-generation diagnostics for tuberculosis: identification of novel molecular targets by large-scale comparative genomics.
2020
5 páginas, 2 figuras. AVAILABILITY AND IMPLEMENTATION: The database of non-tuberculous mycobacteria assemblies can be accessed at: 10.5281/zenodo.3374377. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online: http://dx.doi.org/10.1093/bioinformatics/btz729
Two hundred and fifty-four metagenome-assembled bacterial genomes from the bank vole gut microbiota.
2020
Abstract Vertebrate gut microbiota provide many essential services to their host. To better understand the diversity of such services provided by gut microbiota in wild rodents, we assembled metagenome shotgun sequence data from a small mammal, the bank vole Myodes glareolus (Rodentia, Cricetidae). We were able to identify 254 metagenome assembled genomes (MAGs) that were at least 50% ( n = 133 MAGs), 80% ( n = 77 MAGs) or 95% ( n = 44 MAGs) complete. As typical for a rodent gut microbiota, these MAGs are dominated by taxa assigned to the phyla Bacteroidetes ( n = 132 MAGs) and Firmicutes ( n = 80), with some Spirochaetes ( n = 15) and Proteobacteria ( n = 11). Based on coverage over…
One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads.
2020
Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species…
Whole genome sequencing of the black grouse (Tetrao tetrix): reference guided assembly suggests faster-Z and MHC evolution
2014
Background The different regions of a genome do not evolve at the same rate. For example, comparative genomic studies have suggested that the sex chromosomes and the regions harbouring the immune defence genes in the Major Histocompatability Complex (MHC) may evolve faster than other genomic regions. The advent of the next generation sequencing technologies has made it possible to study which genomic regions are evolutionary liable to change and which are static, as well as enabling an increasing number of genome studies of non-model species. However, de novo sequencing of the whole genome of an organism remains non-trivial. In this study, we present the draft genome of the black grouse, wh…
Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform
2012
Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between the input reads and the reference genome, where hash tables are the most frequently used data structure. However, hash tables are memory-consuming, making it not well-suited to memory-stringent many-core architectures, like GPUs, even though they usually have a nearly constant query time com…
Statistically validated networks in bipartite complex systems.
2011
Many complex systems present an intrinsic bipartite nature and are often described and modeled in terms of networks [1-5]. Examples include movies and actors [1, 2, 4], authors and scientific papers [6-9], email accounts and emails [10], plants and animals that pollinate them [11, 12]. Bipartite networks are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set. When one constructs a projected network with nodes from only one set, the system heterogeneity makes it very difficult to identify preferential links between the elements. Here we introduce an unsupervised method to statistically validate each link of the pr…
High-throughput sequencing of RNA silencing-associated small RNAs in olive (Olea europaea L.).
2011
14 páginas, 5 figuras, 3 tablas, S4 figuras, S2 tablas
A complete set of nascent transcription rates for yeast genes
2010
The amount of mRNA in a cell is the result of two opposite reactions: transcription and mRNA degradation. These reactions are governed by kinetics laws, and the most regulated step for many genes is the transcription rate. The transcription rate, which is assumed to be exercised mainly at the RNA polymerase recruitment level, can be calculated using the RNA polymerase densities determined either by run-on or immunoprecipitation using specific antibodies. The yeast Saccharomyces cerevisiae is the ideal model organism to generate a complete set of nascent transcription rates that will prove useful for many gene regulation studies. By combining genomic data from both the GRO (Genomic Run-on) a…
Annotation of microsporidian genomes using transcriptional signals
2012
EA GenoSol CT3; International audience; High-quality annotation of microsporidian genomes is essential for understanding the biological processes that govern the development of these parasites. Here we present an improved structural annotation method using transcriptional DNA signals. We apply this method to re-annotate four previously annotated genomes, which allow us to detect annotation errors and identify a significant number of unpredicted genes. We then annotate the newly sequenced genome of Anncaliia algerae. A comparative genomic analysis of A. algerae permits the identification of not only microsporidian core genes, but also potentially highly expressed genes encoding membrane-asso…