Search results for "Data mining"

showing 10 items of 907 documents

Mixed Non-Parametric and Parametric Estimation Techniques in R Package etasFLP for Earthquakes’ Description

2017

etasFLP is an R package which fits an epidemic type aftershock sequence (ETAS) model to an earthquake catalog; non-parametric background seismicity can be estimated through a forward predictive likelihood approach, while parametric components of triggered seismicity are estimated through maximum likelihood; estimation steps are alternated until convergence is obtained and for each event the probability of being a background event is estimated. The package includes options which allow its wide use. Methods for plot, summary and profile are defined for the main output class object. The paper provides examples of the package's use with description of the underlying R and Fortran routines.

Statistics and ProbabilityEarthquakeComputer scienceFortranFortranInduced seismicity010502 geochemistry & geophysicscomputer.software_genre01 natural sciencesPlot (graphics)Point processPhysics::GeophysicsPoint proce010104 statistics & probabilityetasFLP; R; Fortran; point process; ETAS; earthquakesETAS0101 mathematicsearthquakeslcsh:Statisticslcsh:HA1-4737AftershockEtasFLPpoint process0105 earth and related environmental sciencesEvent (probability theory)Parametric statisticscomputer.programming_languageNonparametric statisticsRetasFLP R Fortran point process ETAS earthquakes.Data miningStatistics Probability and UncertaintySettore SECS-S/01 - StatisticacomputerAlgorithmSoftware

researchProduct

A heuristic method for estimating attribute importance by measuring choice time in a ranking task

2012

The evaluation of a product or service in terms of its attributes has been broadly studied in marketing, management and decision sciences. However, methods for finding important attributes have theoretical and practical limitations. The former are related to the selection of the most appropriate model; the latter are due to large number of variables that affect the specific experimental context. This study aims to present a new methodology that captures attribute preferences from a respondent and in particular, by using the choice time in a ranking task, it allows to indirectly obtain the importance weights for several tested attributes through a simple, fast and inexpensive procedure. More…

Statistics and ProbabilityEconomics and EconometricsService (systems architecture)HeuristicComputer scienceSettore SECS-S/02 - Statistica Per La Ricerca Sperimentale E TecnologicaVariable and attributeContext (language use)computer.software_genreTask (project management)RankingRespondentData miningStatistics Probability and UncertaintySettore SECS-S/01 - StatisticacomputerFinanceSelection (genetic algorithm)CHOICE TIME response time response latency attribute rating choice models

researchProduct

Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures.

2011

When seeking prognostic information for patients, modern technologies provide a huge amount of genomic measurements as a starting point. For single-nucleotide polymorphisms (SNPs), there may be more than one million covariates that need to be simultaneously considered with respect to a clinical endpoint. Although the underlying biological problem cannot be solved on the basis of clinical cohorts of only modest size, some important SNPs might still be identified. Sparse multivariable regression techniques have recently become available for automatically identifying prognostic molecular signatures that comprise relatively few covariates and provide reasonable prediction performance. For illus…

Statistics and ProbabilityEpidemiologyComputer scienceFeature selectionBiostatisticscomputer.software_genrePolymorphism Single NucleotideLasso (statistics)Gene FrequencyResamplingCovariateHumansLikelihood FunctionsModels StatisticalMultivariable calculusRegression analysisGenomicsPrognosisRegressionMinor allele frequencyLeukemia Myeloid AcuteMultivariate AnalysisRegression AnalysisData miningcomputerAlgorithmsStatistics in medicine

researchProduct

An autoregressive approach to spatio-temporal disease mapping

2007

Disease mapping has been a very active research field during recent years. Nevertheless, time trends in risks have been ignored in most of these studies, yet they can provide information with a very high epidemiological value. Lately, several spatio-temporal models have been proposed, either based on a parametric description of time trends, on independent risk estimates for every period, or on the definition of the joint covariance matrix for all the periods as a Kronecker product of matrices. The following paper offers an autoregressive approach to spatio-temporal disease mapping by fusing ideas from autoregressive time series in order to link information in time and by spatial modelling t…

Statistics and ProbabilityEpidemiologyComputer sciencecomputer.software_genreBayesian statisticsspatial statisticsBayes' theoremsymbols.namesakeMarkov random fieldsEconometricsDiseaseSpatial analysisParametric statisticsDemographyKronecker productCovariance matrixBayes TheoremField (geography)Bayesian statisticsEpidemiologic StudiesAutoregressive modelSpainsymbolsRegression AnalysisData miningcomputer

researchProduct

Prospective analysis of infectious disease surveillance data using syndromic information.

2014

In this paper, we describe a Bayesian hierarchical Poisson model for the prospective analysis of data for infectious diseases. The proposed model consists of two components. The first component describes the behavior of disease during nonepidemic periods and the second component represents the increase in disease counts due to the presence of an epidemic. A novelty of our model formulation is that the parameters describing the spread of epidemics are allowed to vary in both space and time. We also show how syndromic information can be incorporated into the model to provide a better description of the data and more accurate one-step-ahead forecasts. These real-time forecasts can be used to …

Statistics and ProbabilityEpidemiologySouth CarolinaBayesian probabilityDiseasecomputer.software_genreCommunicable Diseasessymbols.namesakeProspective analysisHealth Information ManagementMedicineHumansPoisson regressionProspective StudiesBronchitisbusiness.industryNoveltyOutbreakBayes TheoremModels TheoreticalInfectious disease (medical specialty)Population SurveillancesymbolsTargeted surveillanceData miningbusinesscomputerStatistical methods in medical research

researchProduct

Visualizing parameters from loglinear models

2004

This paper presents a graphical display for the parameters resulting from loglinear models. Loglinear models provide a method for analyzing associations between two or several categorical variables and have become widely accepted as a tool for researchers during the last two decades. An important part of the output of any computer program focused on loglinear models is that devoted to estimation of parameters in the model. Traditionally, this output has been presented using tables that indicate the values of the coefficients, the associated standard errors and other related information. Evaluation of these tables can be rather tedious because of the number of values shown as well as their r…

Statistics and ProbabilityEstimationStructure (mathematical logic)Computer programComputer scienceGraphical displaycomputer.software_genreComputational MathematicsStandard errorLog-linear modelData miningStatistics Probability and UncertaintycomputerStatistical graphicsCategorical variable

researchProduct

Adaptive reference-free compression of sequence quality scores

2014

Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…

Statistics and ProbabilityFOS: Computer and information sciencesComputer sciencemedia_common.quotation_subjectReference-freecomputer.software_genreBiochemistryDNA sequencingSet (abstract data type)Redundancy (information theory)BWTComputer Science - Data Structures and AlgorithmsCode (cryptography)AnimalsHumansQuality (business)Data Structures and Algorithms (cs.DS)Quantitative Biology - GenomicsCaenorhabditis elegansMolecular Biologymedia_commonGenomics (q-bio.GN)SequenceGenomeSettore INF/01 - Informaticareference-free compressionHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAData CompressioncompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesData miningquality scoreMetagenomicscomputerBWT; compression; quality score; reference-free compressionAlgorithmsReference genome

researchProduct

Using Statistical and Computer Models to Quantify Volcanic Hazards

2009

Risk assessment of rare natural hazards, such as large volcanic block and ash or pyroclastic flows, is addressed. Assessment is approached through a combination of computer modeling, statistical modeling, and extreme-event probability computation. A computer model of the natural hazard is used to provide the needed extrapolation to unseen parts of the hazard space. Statistical modeling of the available data is needed to determine the initializing distribution for exercising the computer model. In dealing with rare events, direct simulations involving the computer model are prohibitively expensive. The solution instead requires a combination of adaptive design of computer model approximation…

Statistics and ProbabilityHazard (logic)Risk analysisVolcanic hazardsComputer scienceApplied MathematicsComputationInitializationStatistical modelcomputer.software_genreModeling and SimulationNatural hazardRare eventsData miningcomputerTechnometrics

researchProduct

PROBABILISTIC QUANTIFICATION OF HAZARDS: A METHODOLOGY USING SMALL ENSEMBLES OF PHYSICS-BASED SIMULATIONS AND STATISTICAL SURROGATES

2015

This paper presents a novel approach to assessing the hazard threat to a locale due to a large volcanic avalanche. The methodology combines: (i) mathematical modeling of volcanic mass flows; (ii) field data of avalanche frequency, volume, and runout; (iii) large-scale numerical simulations of flow events; (iv) use of statistical methods to minimize computational costs, and to capture unlikely events; (v) calculation of the probability of a catastrophic flow event over the next T years at a location of interest; and (vi) innovative computational methodology to implement these methods. This unified presentation collects elements that have been separately developed, and incorporates new contri…

Statistics and ProbabilityHazard (logic)Volcanic hazardsgeographyControl and Optimizationgeography.geographical_feature_categoryProcess (engineering)Probabilistic logicHazard analysiscomputer.software_genreFlow (mathematics)VolcanoModeling and SimulationEconometricsDiscrete Mathematics and CombinatoricsEnvironmental scienceData miningcomputerEvent (probability theory)International Journal for Uncertainty Quantification

researchProduct

Visualizing categorical data in ViSta

2003

The modules in the statistical package ViSta related to categorical data analysis are presented These modules are: visualization of frequency data with mosaic and bar plots, correspondence analysis, multiple correspondence analysis and loglinear analysis. All these methods are implemented in ViSta with a big emphasis on plots and graphical representations of data, as well as interactivity for the user with the system. These provide a system that has shown to be easy, useful, and powerful, both for novice and experienced users.

Statistics and ProbabilityInformation retrievalComputer sciencebusiness.industryApplied MathematicsMosaic (geodemography)computer.software_genreCorrespondence analysisVisualizationComputational MathematicsData visualizationInteractivityComputational Theory and MathematicsMultiple correspondence analysisLog-linear modelData miningbusinessCategorical variablecomputerComputational Statistics & Data Analysis

researchProduct