Search results for "Mining"

showing 10 items of 1730 documents

Using Statistical and Computer Models to Quantify Volcanic Hazards

2009

Risk assessment of rare natural hazards, such as large volcanic block and ash or pyroclastic flows, is addressed. Assessment is approached through a combination of computer modeling, statistical modeling, and extreme-event probability computation. A computer model of the natural hazard is used to provide the needed extrapolation to unseen parts of the hazard space. Statistical modeling of the available data is needed to determine the initializing distribution for exercising the computer model. In dealing with rare events, direct simulations involving the computer model are prohibitively expensive. The solution instead requires a combination of adaptive design of computer model approximation…

Statistics and ProbabilityHazard (logic)Risk analysisVolcanic hazardsComputer scienceApplied MathematicsComputationInitializationStatistical modelcomputer.software_genreModeling and SimulationNatural hazardRare eventsData miningcomputerTechnometrics

researchProduct

PROBABILISTIC QUANTIFICATION OF HAZARDS: A METHODOLOGY USING SMALL ENSEMBLES OF PHYSICS-BASED SIMULATIONS AND STATISTICAL SURROGATES

2015

This paper presents a novel approach to assessing the hazard threat to a locale due to a large volcanic avalanche. The methodology combines: (i) mathematical modeling of volcanic mass flows; (ii) field data of avalanche frequency, volume, and runout; (iii) large-scale numerical simulations of flow events; (iv) use of statistical methods to minimize computational costs, and to capture unlikely events; (v) calculation of the probability of a catastrophic flow event over the next T years at a location of interest; and (vi) innovative computational methodology to implement these methods. This unified presentation collects elements that have been separately developed, and incorporates new contri…

Statistics and ProbabilityHazard (logic)Volcanic hazardsgeographyControl and Optimizationgeography.geographical_feature_categoryProcess (engineering)Probabilistic logicHazard analysiscomputer.software_genreFlow (mathematics)VolcanoModeling and SimulationEconometricsDiscrete Mathematics and CombinatoricsEnvironmental scienceData miningcomputerEvent (probability theory)International Journal for Uncertainty Quantification

researchProduct

Visualizing categorical data in ViSta

2003

The modules in the statistical package ViSta related to categorical data analysis are presented These modules are: visualization of frequency data with mosaic and bar plots, correspondence analysis, multiple correspondence analysis and loglinear analysis. All these methods are implemented in ViSta with a big emphasis on plots and graphical representations of data, as well as interactivity for the user with the system. These provide a system that has shown to be easy, useful, and powerful, both for novice and experienced users.

Statistics and ProbabilityInformation retrievalComputer sciencebusiness.industryApplied MathematicsMosaic (geodemography)computer.software_genreCorrespondence analysisVisualizationComputational MathematicsData visualizationInteractivityComputational Theory and MathematicsMultiple correspondence analysisLog-linear modelData miningbusinessCategorical variablecomputerComputational Statistics & Data Analysis

researchProduct

Sparse kernel methods for high-dimensional survival data

2008

Abstract Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be ‘kernelized’. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, dependin…

Statistics and ProbabilityLung NeoplasmsLymphomaComputer sciencecomputer.software_genreComputing MethodologiesBiochemistryPattern Recognition AutomatedArtificial IntelligenceMargin (machine learning)CovariateCluster AnalysisHumansComputer SimulationFraction (mathematics)Molecular BiologyProportional Hazards ModelsModels StatisticalTraining setProportional hazards modelGene Expression ProfilingComputational BiologyComputer Science ApplicationsSupport vector machineComputational MathematicsKernel methodComputational Theory and MathematicsRegression AnalysisData miningcomputerAlgorithmsSoftwareBioinformatics

researchProduct

Coupled variable selection for regression modeling of complex treatment patterns in a clinical cancer registry.

2013

For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivat…

Statistics and ProbabilityMaleNiacinamideBoosting (machine learning)Carcinoma HepatocellularEpidemiologyComputer scienceScoreFeature selectionAntineoplastic Agentscomputer.software_genreDecision Support TechniquesNeoplasmsCovariateHumansRegistriesAgedProportional Hazards ModelsProportional hazards modelPhenylurea CompoundsLiver NeoplasmsRegression analysisConfounding Factors EpidemiologicMiddle AgedSorafenibPrognosisRegressionCancer registryData Interpretation StatisticalRegression AnalysisData miningcomputerStatistics in medicine

researchProduct

STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling

2012

STATIS is an extension of principal component analysis PCA tailored to handle multiple data tables that measure sets of variables collected on the same observations, or, alternatively, as in a variant called dual-STATIS, multiple data tables where the same variables are measured on different sets of observations. STATIS proceeds in two steps: First it analyzes the between data table similarity structure and derives from this analysis an optimal set of weights that are used to compute a linear combination of the data tables called the compromise that best represents the information common to the different data tables; Second, the PCA of this compromise gives an optimal map of the observation…

Statistics and ProbabilityMathematical optimizationSimilarity (geometry)[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH]Linear discriminant analysiscomputer.software_genre01 natural sciences[ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH]Correspondence analysisSet (abstract data type)010104 statistics & probability03 medical and health sciences0302 clinical medicine[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]Multiple factor analysisPrincipal component analysisMetric (mathematics)Data miningMultidimensional scaling[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST]0101 mathematicscomputer030217 neurology & neurosurgeryComputingMilieux_MISCELLANEOUSMathematics

researchProduct

Comprehensive estimation of input signals and dynamics in biochemical reaction networks

2012

Abstract Motivation: Cellular information processing can be described mathematically using differential equations. Often, external stimulation of cells by compounds such as drugs or hormones leading to activation has to be considered. Mathematically, the stimulus is represented by a time-dependent input function. Parameters such as rate constants of the molecular interactions are often unknown and need to be estimated from experimental data, e.g. by maximum likelihood estimation. For this purpose, the input function has to be defined for all times of the integration interval. This is usually achieved by approximating the input by interpolation or smoothing of the measured data. This procedu…

Statistics and ProbabilityMedicin och hälsovetenskapComputer scienceDifferential equationMaximum likelihoodcomputer.software_genreBiochemistryModels BiologicalMedical and Health SciencesIntegration intervalMolecular BiologyJanus KinasesLikelihood FunctionsRegulation Pathways and Systems BiologyExperimental dataOriginal PapersConfidence intervalComputer Science ApplicationsComputational MathematicsSTAT Transcription FactorsComputational Theory and MathematicsData miningAlgorithmcomputerSmoothingAlgorithmsSignal Transduction

researchProduct

CARE: context-aware sequencing read error correction.

2020

Abstract Motivation Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. Results We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors ar…

Statistics and ProbabilityMultiple sequence alignmentComputer scienceSequence assemblyHigh-Throughput Nucleotide SequencingContext (language use)Sequence Analysis DNAcomputer.software_genreBiochemistryGenomeComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsHumansHuman genomeData miningError detection and correctionMolecular BiologycomputerSequence AlignmentAlgorithmsSoftwareBioinformatics (Oxford, England)

researchProduct

Hybrid recommendation methods in complex networks

2015

We propose here two new recommendation methods, based on the appropriate normalization of already existing similarity measures, and on the convex combination of the recommendation scores derived from similarity between users and between objects. We validate the proposed measures on three relevant data sets, and we compare their performance with several recommendation systems recently proposed in the literature. We show that the proposed similarity measures allow to attain an improvement of performances of up to 20\% with respect to existing non-parametric methods, and that the accuracy of a recommendation can vary widely from one specific bipartite network to another, which suggests that a …

Statistics and ProbabilityNormalization (statistics)Social and Information Networks (cs.SI)FOS: Computer and information sciencesPhysics - Physics and SocietyComputer scienceNonparametric statisticsFOS: Physical sciencesComputer Science - Social and Information NetworksCondensed Matter PhysicPhysics and Society (physics.soc-ph)Complex networkRecommender systemcomputer.software_genreComputer Science - Information RetrievalBipartite graphConvex combinationData miningNoisy datacomputerInformation Retrieval (cs.IR)Statistical and Nonlinear Physic

researchProduct

Functional Principal Component Analysis for the explorative analysis of multisite-multivariate air pollution time series with long gaps

2013

The knowledge of the urban air quality represents the first step to face air pollution issues. For the last decades many cities can rely on a network of monitoring stations recording concentration values for the main pollutants. This paper focuses on functional principal component analysis (FPCA) to investigate multiple pollutant datasets measured over time at multiple sites within a given urban area. Our purpose is to extend what has been proposed in the literature to data that are multisite and multivariate at the same time. The approach results to be effective to highlight some relevant statistical features of the time series, giving the opportunity to identify significant pollutants and…

Statistics and ProbabilityPollutantFunctional principal component analysisgeographyMultivariate statisticsgeography.geographical_feature_categorySeries (mathematics)Computer scienceAir pollutionFunctional data analysiscomputer.software_genreUrban areamedicine.disease_causeAir quality Functional Data Analysis Three mode FPCA EOFmedicineData miningStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaAir quality indexcomputer

researchProduct