Search results for "Computer Science Application"
showing 10 items of 3998 documents
Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods
2014
Abstract Motivation: Protein–protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. Results: We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and …
mRNAStab—a web application for mRNA stability analysis
2013
Abstract Eukaryotic gene expression is regulated both at the transcription and the mRNA degradation levels. The implementation of functional genomics methods that allow the simultaneous measurement of transcription (TR) and degradation (DR) rates for thousands of mRNAs is a huge improvement in this field. One of the best established methods for mRNA stability determination is genomic run-on (GRO). It allows the measurement of DR, TR and mRNA levels during cell dynamic responses. Here, we offer a software package that provides improved algorithms for determination of mRNA stability during dynamic GRO experiments. Availability and implementation: The program mRNAStab is freely accessible at h…
Acceleration of short and long DNA read mapping without loss of accuracy using suffix array
2014
HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20 for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies.
DySC: software for greedy clustering of 16S rRNA reads.
2012
Abstract Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. Availability and implementation: DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license. Contact: bertil.schmidt@uni-mainz.de Sup…
Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data
2012
Abstract Motivation: The imperfect sequence data produced by next-generation sequencing technologies have motivated the development of a number of short-read error correctors in recent years. The majority of methods focus on the correction of substitution errors, which are the dominant error source in data produced by Illumina sequencing technology. Existing tools either score high in terms of recall or precision but not consistently high in terms of both measures. Results: In this article, we present Musket, an efficient multistage k-mer-based corrector for Illumina short-read data. We use the k-mer spectrum approach and introduce three correction techniques in a multistage workflow: two-s…
MCRL: using a reference library to compress a metagenome into a non-redundant list of sequences, considering viruses as a case study
2019
Abstract Motivation Metagenomes offer a glimpse into the total genomic diversity contained within a sample. Currently, however, there is no straightforward way to obtain a non-redundant list of all putative homologs of a set of reference sequences present in a metagenome. Results To address this problem, we developed a novel clustering approach called ‘metagenomic clustering by reference library’ (MCRL), where a reference library containing a set of reference genes is clustered with respect to an assembled metagenome. According to our proposed approach, reference genes homologous to similar sets of metagenomic sequences, termed ‘signatures’, are iteratively clustered in a greedy fashion, re…
WOODIV, a database of occurrences, functional traits, and phylogenetic data for all Euro-Mediterranean trees
2021
Trees play a key role in the structure and function of many ecosystems worldwide. In the Mediterranean Basin, forests cover approximately 22% of the total land area hosting a large number of endemics (46 species). Despite its particularities and vulnerability, the biodiversity of Mediterranean trees is not well known at the taxonomic, spatial, functional, and genetic levels required for conservation applications. The WOODIV database fills this gap by providing reliable occurrences, four functional traits (plant height, seed mass, wood density, and specific leaf area), and sequences from three DNA-regions (rbcL, matK, and trnH-psbA), together with modelled occurrences and a phylogeny for all…
Spanish electoral archive. SEA database
2021
This paper introduces the SEA database (acronym for Spanish Electoral Archive). SEA brings together the most complete public repository available to date on Spanish election outcomes. SEA holds all the results recorded from the electoral processes of General (1979–2019), Regional (1989–2021), Local (1979–2019) and European Parliamentary (1987–2019) elections held in Spain since the restoration of democracy in the late 70 s, in addition to other data sets with electoral content. The data are offered for free and is presented in a homogeneous and friendly format. Most of the databases are available for download with data from various electoral levels, including from the ballot box level. This…
A database for the monitoring of thermal anomalies over the Amazon forest and adjacent intertropical oceans
2015
AbstractAdvances in information technologies and accessibility to climate and satellite data in recent years have favored the development of web-based tools with user-friendly interfaces in order to facilitate the dissemination of geo/biophysical products. These products are useful for the analysis of the impact of global warming over different biomes. In particular, the study of the Amazon forest responses to drought have recently received attention by the scientific community due to the occurrence of two extreme droughts and sustained warming over the last decade. Thermal Amazoni@ is a web-based platform for the visualization and download of surface thermal anomalies products over the Ama…
Galaxy LIMS for next-generation sequencing.
2013
Abstract Summary: We have developed a laboratory information management system (LIMS) for a next-generation sequencing (NGS) laboratory within the existing Galaxy platform. The system provides lab technicians standard and customizable sample information forms, barcoded submission forms, tracking of input sample quality, multiplex-capable automatic flow cell design and automatically generated sample sheets to aid physical flow cell preparation. In addition, the platform provides the researcher with a user-friendly interface to create a request, submit accompanying samples, upload sample quality measurements and access to the sequencing results. As the LIMS is within the Galaxy platform, the …