Search results for "Data mining"
showing 10 items of 907 documents
Mixed Non-Parametric and Parametric Estimation Techniques in R Package etasFLP for Earthquakes’ Description
2017
etasFLP is an R package which fits an epidemic type aftershock sequence (ETAS) model to an earthquake catalog; non-parametric background seismicity can be estimated through a forward predictive likelihood approach, while parametric components of triggered seismicity are estimated through maximum likelihood; estimation steps are alternated until convergence is obtained and for each event the probability of being a background event is estimated. The package includes options which allow its wide use. Methods for plot, summary and profile are defined for the main output class object. The paper provides examples of the package's use with description of the underlying R and Fortran routines.
A heuristic method for estimating attribute importance by measuring choice time in a ranking task
2012
The evaluation of a product or service in terms of its attributes has been broadly studied in marketing, management and decision sciences. However, methods for finding important attributes have theoretical and practical limitations. The former are related to the selection of the most appropriate model; the latter are due to large number of variables that affect the specific experimental context. This study aims to present a new methodology that captures attribute preferences from a respondent and in particular, by using the choice time in a ranking task, it allows to indirectly obtain the importance weights for several tested attributes through a simple, fast and inexpensive procedure. More…
Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures.
2011
When seeking prognostic information for patients, modern technologies provide a huge amount of genomic measurements as a starting point. For single-nucleotide polymorphisms (SNPs), there may be more than one million covariates that need to be simultaneously considered with respect to a clinical endpoint. Although the underlying biological problem cannot be solved on the basis of clinical cohorts of only modest size, some important SNPs might still be identified. Sparse multivariable regression techniques have recently become available for automatically identifying prognostic molecular signatures that comprise relatively few covariates and provide reasonable prediction performance. For illus…
An autoregressive approach to spatio-temporal disease mapping
2007
Disease mapping has been a very active research field during recent years. Nevertheless, time trends in risks have been ignored in most of these studies, yet they can provide information with a very high epidemiological value. Lately, several spatio-temporal models have been proposed, either based on a parametric description of time trends, on independent risk estimates for every period, or on the definition of the joint covariance matrix for all the periods as a Kronecker product of matrices. The following paper offers an autoregressive approach to spatio-temporal disease mapping by fusing ideas from autoregressive time series in order to link information in time and by spatial modelling t…
Prospective analysis of infectious disease surveillance data using syndromic information.
2014
In this paper, we describe a Bayesian hierarchical Poisson model for the prospective analysis of data for infectious diseases. The proposed model consists of two components. The first component describes the behavior of disease during nonepidemic periods and the second component represents the increase in disease counts due to the presence of an epidemic. A novelty of our model formulation is that the parameters describing the spread of epidemics are allowed to vary in both space and time. We also show how syndromic information can be incorporated into the model to provide a better description of the data and more accurate one-step-ahead forecasts. These real-time forecasts can be used to …
Visualizing parameters from loglinear models
2004
This paper presents a graphical display for the parameters resulting from loglinear models. Loglinear models provide a method for analyzing associations between two or several categorical variables and have become widely accepted as a tool for researchers during the last two decades. An important part of the output of any computer program focused on loglinear models is that devoted to estimation of parameters in the model. Traditionally, this output has been presented using tables that indicate the values of the coefficients, the associated standard errors and other related information. Evaluation of these tables can be rather tedious because of the number of values shown as well as their r…
Adaptive reference-free compression of sequence quality scores
2014
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…
Using Statistical and Computer Models to Quantify Volcanic Hazards
2009
Risk assessment of rare natural hazards, such as large volcanic block and ash or pyroclastic flows, is addressed. Assessment is approached through a combination of computer modeling, statistical modeling, and extreme-event probability computation. A computer model of the natural hazard is used to provide the needed extrapolation to unseen parts of the hazard space. Statistical modeling of the available data is needed to determine the initializing distribution for exercising the computer model. In dealing with rare events, direct simulations involving the computer model are prohibitively expensive. The solution instead requires a combination of adaptive design of computer model approximation…
PROBABILISTIC QUANTIFICATION OF HAZARDS: A METHODOLOGY USING SMALL ENSEMBLES OF PHYSICS-BASED SIMULATIONS AND STATISTICAL SURROGATES
2015
This paper presents a novel approach to assessing the hazard threat to a locale due to a large volcanic avalanche. The methodology combines: (i) mathematical modeling of volcanic mass flows; (ii) field data of avalanche frequency, volume, and runout; (iii) large-scale numerical simulations of flow events; (iv) use of statistical methods to minimize computational costs, and to capture unlikely events; (v) calculation of the probability of a catastrophic flow event over the next T years at a location of interest; and (vi) innovative computational methodology to implement these methods. This unified presentation collects elements that have been separately developed, and incorporates new contri…
Visualizing categorical data in ViSta
2003
The modules in the statistical package ViSta related to categorical data analysis are presented These modules are: visualization of frequency data with mosaic and bar plots, correspondence analysis, multiple correspondence analysis and loglinear analysis. All these methods are implemented in ViSta with a big emphasis on plots and graphical representations of data, as well as interactivity for the user with the system. These provide a system that has shown to be easy, useful, and powerful, both for novice and experienced users.