Search results for "Mining"

showing 10 items of 1730 documents

DySC: software for greedy clustering of 16S rRNA reads.

2012

Abstract Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. Availability and implementation: DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license. Contact: bertil.schmidt@uni-mainz.de Sup…

Statistics and ProbabilityComputer sciencebusiness.industrySequence Analysis RNA16S ribosomal RNAcomputer.software_genreBiochemistryComputer Science ApplicationsComputational MathematicsSoftwareComputational Theory and MathematicsRNA Ribosomal 16SCluster AnalysisMetagenomeData miningCluster analysisbusinessMolecular BiologycomputerSoftwareBioinformatics (Oxford, England)

researchProduct

Recurrence Plots in Nonlinear Time Series Analysis: Free Software

2002

Recurrence plots are graphical devices specially suited to detect hidden dynamical patterns and nonlinearities in data. However, there are few programs available to apply such a mehodology. This paper reviews one of the best free programs to apply nonlinear time series analysis: Visual Recurrence Analysis (VRA). This program is targeted to recurrence analysis and the so-called Recurrence Quantitative Analysis (RQA, the quantitative counterpart of recurrence plots), although it includes many procedures in a friendly visual environment. Comparisons with alternative programs are performed.

Statistics and ProbabilityComputer sciencebusiness.industrycomputer.software_genreNonlinear time series analysisSoftwareQuantitative analysis (finance)StatisticsData miningStatistics Probability and Uncertaintybusinesslcsh:Statisticslcsh:HA1-4737computerSoftwareJournal of Statistical Software

researchProduct

Visualizing the flow of evidence in network meta-analysis and characterizing mixed treatment comparisons

2013

Network meta-analysis techniques allow for pooling evidence from different studies with only partially overlapping designs for getting a broader basis for decision support. The results are network-based effect estimates that take indirect evidence into account for all pairs of treatments. The results critically depend on homogeneity and consistency assumptions, which are sometimes difficult to investigate. To support such evaluation, we propose a display of the flow of evidence and introduce new measures that characterize the structure of a mixed treatment comparison. Specifically, a linear fixed effects model for network meta-analysis is considered, where the network estimates for two trea…

Statistics and ProbabilityDecision support systemEpidemiologyComputer scienceHomogeneity (statistics)PoolingLinear modelFixed effects modelDirected acyclic graphcomputer.software_genrePath lengthData miningLinear combinationcomputerStatistics in Medicine

researchProduct

Mixed Non-Parametric and Parametric Estimation Techniques in R Package etasFLP for Earthquakes’ Description

2017

etasFLP is an R package which fits an epidemic type aftershock sequence (ETAS) model to an earthquake catalog; non-parametric background seismicity can be estimated through a forward predictive likelihood approach, while parametric components of triggered seismicity are estimated through maximum likelihood; estimation steps are alternated until convergence is obtained and for each event the probability of being a background event is estimated. The package includes options which allow its wide use. Methods for plot, summary and profile are defined for the main output class object. The paper provides examples of the package's use with description of the underlying R and Fortran routines.

Statistics and ProbabilityEarthquakeComputer scienceFortranFortranInduced seismicity010502 geochemistry & geophysicscomputer.software_genre01 natural sciencesPlot (graphics)Point processPhysics::GeophysicsPoint proce010104 statistics & probabilityetasFLP; R; Fortran; point process; ETAS; earthquakesETAS0101 mathematicsearthquakeslcsh:Statisticslcsh:HA1-4737AftershockEtasFLPpoint process0105 earth and related environmental sciencesEvent (probability theory)Parametric statisticscomputer.programming_languageNonparametric statisticsRetasFLP R Fortran point process ETAS earthquakes.Data miningStatistics Probability and UncertaintySettore SECS-S/01 - StatisticacomputerAlgorithmSoftware

researchProduct

A heuristic method for estimating attribute importance by measuring choice time in a ranking task

2012

The evaluation of a product or service in terms of its attributes has been broadly studied in marketing, management and decision sciences. However, methods for finding important attributes have theoretical and practical limitations. The former are related to the selection of the most appropriate model; the latter are due to large number of variables that affect the specific experimental context. This study aims to present a new methodology that captures attribute preferences from a respondent and in particular, by using the choice time in a ranking task, it allows to indirectly obtain the importance weights for several tested attributes through a simple, fast and inexpensive procedure. More…

Statistics and ProbabilityEconomics and EconometricsService (systems architecture)HeuristicComputer scienceSettore SECS-S/02 - Statistica Per La Ricerca Sperimentale E TecnologicaVariable and attributeContext (language use)computer.software_genreTask (project management)RankingRespondentData miningStatistics Probability and UncertaintySettore SECS-S/01 - StatisticacomputerFinanceSelection (genetic algorithm)CHOICE TIME response time response latency attribute rating choice models

researchProduct

Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures.

2011

When seeking prognostic information for patients, modern technologies provide a huge amount of genomic measurements as a starting point. For single-nucleotide polymorphisms (SNPs), there may be more than one million covariates that need to be simultaneously considered with respect to a clinical endpoint. Although the underlying biological problem cannot be solved on the basis of clinical cohorts of only modest size, some important SNPs might still be identified. Sparse multivariable regression techniques have recently become available for automatically identifying prognostic molecular signatures that comprise relatively few covariates and provide reasonable prediction performance. For illus…

Statistics and ProbabilityEpidemiologyComputer scienceFeature selectionBiostatisticscomputer.software_genrePolymorphism Single NucleotideLasso (statistics)Gene FrequencyResamplingCovariateHumansLikelihood FunctionsModels StatisticalMultivariable calculusRegression analysisGenomicsPrognosisRegressionMinor allele frequencyLeukemia Myeloid AcuteMultivariate AnalysisRegression AnalysisData miningcomputerAlgorithmsStatistics in medicine

researchProduct

An autoregressive approach to spatio-temporal disease mapping

2007

Disease mapping has been a very active research field during recent years. Nevertheless, time trends in risks have been ignored in most of these studies, yet they can provide information with a very high epidemiological value. Lately, several spatio-temporal models have been proposed, either based on a parametric description of time trends, on independent risk estimates for every period, or on the definition of the joint covariance matrix for all the periods as a Kronecker product of matrices. The following paper offers an autoregressive approach to spatio-temporal disease mapping by fusing ideas from autoregressive time series in order to link information in time and by spatial modelling t…

Statistics and ProbabilityEpidemiologyComputer sciencecomputer.software_genreBayesian statisticsspatial statisticsBayes' theoremsymbols.namesakeMarkov random fieldsEconometricsDiseaseSpatial analysisParametric statisticsDemographyKronecker productCovariance matrixBayes TheoremField (geography)Bayesian statisticsEpidemiologic StudiesAutoregressive modelSpainsymbolsRegression AnalysisData miningcomputer

researchProduct

Prospective analysis of infectious disease surveillance data using syndromic information.

2014

In this paper, we describe a Bayesian hierarchical Poisson model for the prospective analysis of data for infectious diseases. The proposed model consists of two components. The first component describes the behavior of disease during nonepidemic periods and the second component represents the increase in disease counts due to the presence of an epidemic. A novelty of our model formulation is that the parameters describing the spread of epidemics are allowed to vary in both space and time. We also show how syndromic information can be incorporated into the model to provide a better description of the data and more accurate one-step-ahead forecasts. These real-time forecasts can be used to …

Statistics and ProbabilityEpidemiologySouth CarolinaBayesian probabilityDiseasecomputer.software_genreCommunicable Diseasessymbols.namesakeProspective analysisHealth Information ManagementMedicineHumansPoisson regressionProspective StudiesBronchitisbusiness.industryNoveltyOutbreakBayes TheoremModels TheoreticalInfectious disease (medical specialty)Population SurveillancesymbolsTargeted surveillanceData miningbusinesscomputerStatistical methods in medical research

researchProduct

Visualizing parameters from loglinear models

2004

This paper presents a graphical display for the parameters resulting from loglinear models. Loglinear models provide a method for analyzing associations between two or several categorical variables and have become widely accepted as a tool for researchers during the last two decades. An important part of the output of any computer program focused on loglinear models is that devoted to estimation of parameters in the model. Traditionally, this output has been presented using tables that indicate the values of the coefficients, the associated standard errors and other related information. Evaluation of these tables can be rather tedious because of the number of values shown as well as their r…

Statistics and ProbabilityEstimationStructure (mathematical logic)Computer programComputer scienceGraphical displaycomputer.software_genreComputational MathematicsStandard errorLog-linear modelData miningStatistics Probability and UncertaintycomputerStatistical graphicsCategorical variable

researchProduct

Adaptive reference-free compression of sequence quality scores

2014

Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…

Statistics and ProbabilityFOS: Computer and information sciencesComputer sciencemedia_common.quotation_subjectReference-freecomputer.software_genreBiochemistryDNA sequencingSet (abstract data type)Redundancy (information theory)BWTComputer Science - Data Structures and AlgorithmsCode (cryptography)AnimalsHumansQuality (business)Data Structures and Algorithms (cs.DS)Quantitative Biology - GenomicsCaenorhabditis elegansMolecular Biologymedia_commonGenomics (q-bio.GN)SequenceGenomeSettore INF/01 - Informaticareference-free compressionHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAData CompressioncompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesData miningquality scoreMetagenomicscomputerBWT; compression; quality score; reference-free compressionAlgorithmsReference genome

researchProduct