Search results for "Mining"
showing 10 items of 1730 documents
DySC: software for greedy clustering of 16S rRNA reads.
2012
Abstract Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. Availability and implementation: DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license. Contact: bertil.schmidt@uni-mainz.de Sup…
Recurrence Plots in Nonlinear Time Series Analysis: Free Software
2002
Recurrence plots are graphical devices specially suited to detect hidden dynamical patterns and nonlinearities in data. However, there are few programs available to apply such a mehodology. This paper reviews one of the best free programs to apply nonlinear time series analysis: Visual Recurrence Analysis (VRA). This program is targeted to recurrence analysis and the so-called Recurrence Quantitative Analysis (RQA, the quantitative counterpart of recurrence plots), although it includes many procedures in a friendly visual environment. Comparisons with alternative programs are performed.
Visualizing the flow of evidence in network meta-analysis and characterizing mixed treatment comparisons
2013
Network meta-analysis techniques allow for pooling evidence from different studies with only partially overlapping designs for getting a broader basis for decision support. The results are network-based effect estimates that take indirect evidence into account for all pairs of treatments. The results critically depend on homogeneity and consistency assumptions, which are sometimes difficult to investigate. To support such evaluation, we propose a display of the flow of evidence and introduce new measures that characterize the structure of a mixed treatment comparison. Specifically, a linear fixed effects model for network meta-analysis is considered, where the network estimates for two trea…
Mixed Non-Parametric and Parametric Estimation Techniques in R Package etasFLP for Earthquakes’ Description
2017
etasFLP is an R package which fits an epidemic type aftershock sequence (ETAS) model to an earthquake catalog; non-parametric background seismicity can be estimated through a forward predictive likelihood approach, while parametric components of triggered seismicity are estimated through maximum likelihood; estimation steps are alternated until convergence is obtained and for each event the probability of being a background event is estimated. The package includes options which allow its wide use. Methods for plot, summary and profile are defined for the main output class object. The paper provides examples of the package's use with description of the underlying R and Fortran routines.
A heuristic method for estimating attribute importance by measuring choice time in a ranking task
2012
The evaluation of a product or service in terms of its attributes has been broadly studied in marketing, management and decision sciences. However, methods for finding important attributes have theoretical and practical limitations. The former are related to the selection of the most appropriate model; the latter are due to large number of variables that affect the specific experimental context. This study aims to present a new methodology that captures attribute preferences from a respondent and in particular, by using the choice time in a ranking task, it allows to indirectly obtain the importance weights for several tested attributes through a simple, fast and inexpensive procedure. More…
Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures.
2011
When seeking prognostic information for patients, modern technologies provide a huge amount of genomic measurements as a starting point. For single-nucleotide polymorphisms (SNPs), there may be more than one million covariates that need to be simultaneously considered with respect to a clinical endpoint. Although the underlying biological problem cannot be solved on the basis of clinical cohorts of only modest size, some important SNPs might still be identified. Sparse multivariable regression techniques have recently become available for automatically identifying prognostic molecular signatures that comprise relatively few covariates and provide reasonable prediction performance. For illus…
An autoregressive approach to spatio-temporal disease mapping
2007
Disease mapping has been a very active research field during recent years. Nevertheless, time trends in risks have been ignored in most of these studies, yet they can provide information with a very high epidemiological value. Lately, several spatio-temporal models have been proposed, either based on a parametric description of time trends, on independent risk estimates for every period, or on the definition of the joint covariance matrix for all the periods as a Kronecker product of matrices. The following paper offers an autoregressive approach to spatio-temporal disease mapping by fusing ideas from autoregressive time series in order to link information in time and by spatial modelling t…
Prospective analysis of infectious disease surveillance data using syndromic information.
2014
In this paper, we describe a Bayesian hierarchical Poisson model for the prospective analysis of data for infectious diseases. The proposed model consists of two components. The first component describes the behavior of disease during nonepidemic periods and the second component represents the increase in disease counts due to the presence of an epidemic. A novelty of our model formulation is that the parameters describing the spread of epidemics are allowed to vary in both space and time. We also show how syndromic information can be incorporated into the model to provide a better description of the data and more accurate one-step-ahead forecasts. These real-time forecasts can be used to …
Visualizing parameters from loglinear models
2004
This paper presents a graphical display for the parameters resulting from loglinear models. Loglinear models provide a method for analyzing associations between two or several categorical variables and have become widely accepted as a tool for researchers during the last two decades. An important part of the output of any computer program focused on loglinear models is that devoted to estimation of parameters in the model. Traditionally, this output has been presented using tables that indicate the values of the coefficients, the associated standard errors and other related information. Evaluation of these tables can be rather tedious because of the number of values shown as well as their r…
Adaptive reference-free compression of sequence quality scores
2014
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…