Search results for " Applications"

showing 10 items of 4541 documents

DySC: software for greedy clustering of 16S rRNA reads.

2012

Abstract Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. Availability and implementation: DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license. Contact: bertil.schmidt@uni-mainz.de Sup…

Statistics and ProbabilityComputer sciencebusiness.industrySequence Analysis RNA16S ribosomal RNAcomputer.software_genreBiochemistryComputer Science ApplicationsComputational MathematicsSoftwareComputational Theory and MathematicsRNA Ribosomal 16SCluster AnalysisMetagenomeData miningCluster analysisbusinessMolecular BiologycomputerSoftwareBioinformatics (Oxford, England)

researchProduct

Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data

2012

Abstract Motivation: The imperfect sequence data produced by next-generation sequencing technologies have motivated the development of a number of short-read error correctors in recent years. The majority of methods focus on the correction of substitution errors, which are the dominant error source in data produced by Illumina sequencing technology. Existing tools either score high in terms of recall or precision but not consistently high in terms of both measures. Results: In this article, we present Musket, an efficient multistage k-mer-based corrector for Illumina short-read data. We use the k-mer spectrum approach and introduce three correction techniques in a multistage workflow: two-s…

Statistics and ProbabilityComputer sciencebusiness.industrySequence assemblySequence Analysis DNAMusketBiochemistryComputer Science ApplicationsComputational MathematicsCUDASoftwareComputational Theory and Mathematicsk-merEscherichia coliChromosomes HumanHumansbusinessFocus (optics)Molecular BiologyAlgorithmAlgorithmsGenome BacterialSoftwareIllumina dye sequencingBioinformatics

researchProduct

MCRL: using a reference library to compress a metagenome into a non-redundant list of sequences, considering viruses as a case study

2019

Abstract Motivation Metagenomes offer a glimpse into the total genomic diversity contained within a sample. Currently, however, there is no straightforward way to obtain a non-redundant list of all putative homologs of a set of reference sequences present in a metagenome. Results To address this problem, we developed a novel clustering approach called ‘metagenomic clustering by reference library’ (MCRL), where a reference library containing a set of reference genes is clustered with respect to an assembled metagenome. According to our proposed approach, reference genes homologous to similar sets of metagenomic sequences, termed ‘signatures’, are iteratively clustered in a greedy fashion, re…

Statistics and ProbabilityContigComputer scienceRobustness (evolution)Computational biologyOriginal PapersBiochemistryComputer Science ApplicationsSet (abstract data type)Computational MathematicsComputational Theory and MathematicsMetagenomicsReference genesGene familyHuman viromeCluster analysisMolecular BiologyBioinformatics

researchProduct

WOODIV, a database of occurrences, functional traits, and phylogenetic data for all Euro-Mediterranean trees

2021

Trees play a key role in the structure and function of many ecosystems worldwide. In the Mediterranean Basin, forests cover approximately 22% of the total land area hosting a large number of endemics (46 species). Despite its particularities and vulnerability, the biodiversity of Mediterranean trees is not well known at the taxonomic, spatial, functional, and genetic levels required for conservation applications. The WOODIV database fills this gap by providing reliable occurrences, four functional traits (plant height, seed mass, wood density, and specific leaf area), and sequences from three DNA-regions (rbcL, matK, and trnH-psbA), together with modelled occurrences and a phylogeny for all…

Statistics and ProbabilityData DescriptorDatabases FactualMediterranean RegionConservation biologySettore BIO/02 - Botanica SistematicaScienceQBiodiversityLibrary and Information SciencesTreesComputer Science ApplicationsEducationBiogeographySettore BIO/03 - Botanica Ambientale E Applicata[SDE]Environmental SciencesForestCommunity ecologyStatistics Probability and UncertaintyForest ecologyEcosystemPhylogenyInformation Systems

researchProduct

Spanish electoral archive. SEA database

2021

This paper introduces the SEA database (acronym for Spanish Electoral Archive). SEA brings together the most complete public repository available to date on Spanish election outcomes. SEA holds all the results recorded from the electoral processes of General (1979–2019), Regional (1989–2021), Local (1979–2019) and European Parliamentary (1987–2019) elections held in Spain since the restoration of democracy in the late 70 s, in addition to other data sets with electoral content. The data are offered for free and is presented in a homogeneous and friendly format. Most of the databases are available for download with data from various electoral levels, including from the ballot box level. This…

Statistics and ProbabilityData DescriptorHistoryDownloadSciencemedia_common.quotation_subject0211 other engineering and technologiesInference02 engineering and technologyLibrary and Information Sciencescomputer.software_genre01 natural sciencesEducation010104 statistics & probabilitySociologyVotingPolitical scienceAcronymSociety0101 mathematicsmedia_commonDatabaseQPolitics021107 urban & regional planningTurnoutDemocracyComputer Science ApplicationsMetadataBallotGovernmentEconomia Mètodes estadísticsStatistics Probability and UncertaintycomputerInformation SystemsScientific Data

researchProduct

A database for the monitoring of thermal anomalies over the Amazon forest and adjacent intertropical oceans

2015

AbstractAdvances in information technologies and accessibility to climate and satellite data in recent years have favored the development of web-based tools with user-friendly interfaces in order to facilitate the dissemination of geo/biophysical products. These products are useful for the analysis of the impact of global warming over different biomes. In particular, the study of the Amazon forest responses to drought have recently received attention by the scientific community due to the occurrence of two extreme droughts and sustained warming over the last decade. Thermal Amazoni@ is a web-based platform for the visualization and download of surface thermal anomalies products over the Ama…

Statistics and ProbabilityData DescriptorRainforestDatabases FactualDownloadOceans and SeasBiomeRainforestLibrary and Information SciencesGlobal WarmingEducationEffects of global warmingServerBaseline (configuration management)Global warmingTropical ecologyComputer Science ApplicationsOceanographyClimatologyEnvironmental scienceSatelliteForest ecologyStatistics Probability and UncertaintyClimate-change impactsSoftwareInformation SystemsScientific Data

researchProduct

Galaxy LIMS for next-generation sequencing.

2013

Abstract Summary: We have developed a laboratory information management system (LIMS) for a next-generation sequencing (NGS) laboratory within the existing Galaxy platform. The system provides lab technicians standard and customizable sample information forms, barcoded submission forms, tracking of input sample quality, multiplex-capable automatic flow cell design and automatically generated sample sheets to aid physical flow cell preparation. In addition, the platform provides the researcher with a user-friendly interface to create a request, submit accompanying samples, upload sample quality measurements and access to the sequencing results. As the LIMS is within the Galaxy platform, the …

Statistics and ProbabilityDatabasebusiness.industryComputer scienceSample (material)Interface (computing)High-Throughput Nucleotide Sequencingcomputer.software_genreBiochemistryDNA sequencingComputer Science ApplicationsWorkflowWorld Wide WebComputational MathematicsUser-Computer InterfaceSoftwareComputational Theory and MathematicsbusinessMolecular BiologycomputerSoftwareInformation SystemsBioinformatics (Oxford, England)

researchProduct

Textual data compression in computational biology: a synopsis.

2009

Abstract Motivation: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks. Results: The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been use…

Statistics and ProbabilityDatabases Factualbusiness.industryComputer sciencemedia_common.quotation_subjectSearch engine indexingcompression dataComputational BiologyInformation Storage and RetrievalComputational biologyBiochemistryData scienceComputer Science ApplicationsComputational MathematicsPresentationSoftwareComputational Theory and MathematicsBenchmark (computing)businessMolecular BiologyBiological networkSoftwareData compressionmedia_commonBioinformatics (Oxford, England)

researchProduct

Weighted distance-based trees for ranking data

2017

Within the framework of preference rankings, the interest can lie in finding which predictors and which interactions are able to explain the observed preference structures, because preference decisions will usually depend on the characteristics of both the judges and the objects being judged. This work proposes the use of a univariate decision tree for ranking data based on the weighted distances for complete and incomplete rankings, and considers the area under the ROC curve both for pruning and model assessment. Two real and well-known datasets, the SUSHI preference data and the University ranking data, are used to display the performance of the methodology.

Statistics and ProbabilityDecision tree03 medical and health sciences0302 clinical medicine0504 sociology030225 pediatricsPreference dataStatisticsDecision treePruning (decision trees)University ranking dataDistance-based methodMathematicsWeighted distanceApplied Mathematics05 social sciencesUnivariate050401 social sciences methodsSUSHI dataComputer Science Applications1707 Computer Vision and Pattern RecognitionPreferenceComputer Science ApplicationsRankingRanking dataKemeny distanceSettore SECS-S/01 - StatisticaArea under the roc curve

researchProduct

Stochastic Learning for SAT- Encoded Graph Coloring Problems

2010

The graph coloring problem (GCP) is a widely studied combinatorial optimization problem due to its numerous applications in many areas, including time tabling, frequency assignment, and register allocation. The need for more efficient algorithms has led to the development of several GC solvers. In this paper, the authors introduce a team of Finite Learning Automata, combined with the random walk algorithm, using Boolean satisfiability encoding for the GCP. The authors present an experimental analysis of the new algorithm’s performance compared to the random walk technique, using a benchmark set containing SAT-encoding graph coloring test sets.

Statistics and ProbabilityDiscrete mathematicsControl and OptimizationTheoretical computer scienceComparability graphComputer Science ApplicationsGreedy coloringComputational MathematicsEdge coloringComputational Theory and MathematicsModeling and SimulationGraph (abstract data type)Decision Sciences (miscellaneous)Graph coloringFractional coloringGraph factorizationList coloringMathematicsInternational Journal of Applied Metaheuristic Computing

researchProduct