Search results for "Mining"

showing 10 items of 1730 documents

Comparing data mining and deterministic pedology to assess the frequency of WRB reference soil groups in the legend of small scale maps

2015

Abstract The assessment of class frequency in soil map legends is affected by uncertainty, especially at small scales where generalization is greater. The aim of this study was to test the hypothesis that data mining techniques provide better estimation of class frequency than traditional deterministic pedology in a national soil map. In the 1:5,000,000 map of Italian soil regions, the soil classes are the WRB reference soil groups (RSGs). Different data mining techniques, namely neural networks, random forests, boosted tree, classification and regression tree, and supported vector machine (SVM), were tested and the last one gave the best RSG predictions using selected auxiliary variables a…

Soil mapGeomaticBayesian probabilitySoil ScienceSoil classificationLearning machinecomputer.software_genreSoil typeRandom forestSupport vector machineItalySettore AGR/14 - PedologiaSoil classificationStatisticsPedologyData miningBayesian predictivityScale (map)computerMathematics
researchProduct

Small solar system bodies as granular systems

2017

Asteroids and other Small Solar System Bodies (SSSBs) are currently of great scientific and even industrial interest. Asteroids exist as the permanent record of the formation of the Solar System and therefore hold many clues to its understanding as a whole, as well as insights into the formation of planetary bodies. Additionally, SSSBs are being investigated in the context of impact risks for the Earth, space situational awareness and their possible industrial exploitation (asteroid mining). In all these aspects, the knowledge of the geophysical characteristics of SSSB surface and internal structure are of great importance. Given their size, constitution, and the evidence that many SSSBs ar…

Solar SystemSituation awareness[PHYS.ASTR.EP]Physics [physics]/Astrophysics [astro-ph]/Earth and Planetary Astrophysics [astro-ph.EP]Computer sciencePhysicsQC1-999Small solar system bodiesContext (language use)Granular systems01 natural sciencesCelestial mechanicsAstrobiologyTheoretical physics13. Climate actionAsteroidFísica Aplicada0103 physical sciencesFormation and evolution of the Solar System[PHYS.ASTR]Physics [physics]/Astrophysics [astro-ph]010306 general physics010303 astronomy & astrophysicsComputingMilieux_MISCELLANEOUSSoil mechanicsAsteroid miningEPJ Web of Conferences
researchProduct

Improved SOM Learning using Simulated Annealing

2007

Self-Organizing Map (SOM) algorithm has been extensively used for analysis and classification problems. For this kind of problems, datasets become more and more large and it is necessary to speed up the SOM learning. In this paper we present an application of the Simulated Annealing (SA) procedure to the SOM learning algorithm. The goal of the algorithm is to obtain fast learning and better performance in terms of matching of input data and regularity of the obtained map. An advantage of the proposed technique is that it preserves the simplicity of the basic algorithm. Several tests, carried out on different large datasets, demonstrate the effectiveness of the proposed algorithm in comparis…

SpeedupMatching (graph theory)Wake-sleep algorithmComputer sciencebusiness.industryPattern recognitioncomputer.software_genreAdaptive simulated annealingGeneralization errorComputingMethodologies_PATTERNRECOGNITIONSimulated annealingSOM simulated Annealing TrainingData miningArtificial intelligencebusinesscomputer
researchProduct

Taxonomy of stock market indices

2000

We investigate sets of financial non-redundant and nonsynchronously recorded time series. The sets are composed by a number of stock market indices located all over the world in five continents. By properly selecting the time horizon of returns and by using a reference currency we find a meaningful taxonomy. The detection of such a taxonomy proves that interpretable information can be stored in a set of nonsynchronously recorded time series.

Statistical Finance (q-fin.ST)Statistical Mechanics (cond-mat.stat-mech)Series (mathematics)Computer scienceQuantitative Finance - Statistical FinanceFOS: Physical sciencesTime horizoncomputer.software_genreStock market indexFOS: Economics and businessSet (abstract data type)CurrencyTaxonomy (general)EconometricsData miningTime seriescomputerCondensed Matter - Statistical MechanicsPhysical Review E
researchProduct

Ranking coherence in topic models using statistically validated networks

2023

Probabilistic topic models have become one of the most widespread machine learning techniques in textual analysis. Topic discovering is an unsupervised process that does not guarantee the interpretability of its output. Hence, the automatic evaluation of topic coherence has attracted the interest of many researchers over the last decade, and it is an open research area. This article offers a new quality evaluation method based on statistically validated networks (SVNs). The proposed probabilistic approach consists of representing each topic as a weighted network of its most probable words. The presence of a link between each pair of words is assessed by statistically validating their co-oc…

Statistically Validated NetworksTopic coherenceText MiningProbabilistic Topic modelLibrary and Information SciencesInformation SystemsJournal of Information Science
researchProduct

The Psychological Science Accelerator’s COVID-19 rapid-response dataset

2023

Funder: Amazon Web Services (AWS) Imagine Grant

Statistics and Probability223 participants with varying completion rates. Participants completed the survey from 111 geopolitical regions in 44 unique languages/dialects. The anonymized dataset described here is provided in both raw and processed formats to facilitate re-use and further analyses. The dataset offers secondary analytic opportunities to explore copingBF Psychology230 Affective NeuroscienceHealth Behaviorand demographic information for each participant. Each participant started the study with the same general questions and then was randomized to complete either one longer experiment or two shorter experiments. Data were provided by 73Message framingDiseasesLibrary and Information Sciences:Ciências Sociais::Psicologia [Domínio/Área Científica]geographical and cultural context characterizationHV Social pathology. Social and public welfare. CriminologypandemiatEducationa general questionnaire examining health prevention behaviors and COVID-19 experienceddc:150SDG 3 - Good Health and Well-beingRA0421 Public health. Hygiene. Preventive MedicineSurveys and QuestionnairesAdaptation PsychologicalyleiskartoituksetHumansPendienteHealth behaviorsPandemicsframingBehaviour Change and Well-beingEmotion regulationSelf-determination messagingand self-determination across a diverseCOVID-19kansainvälinen vertailuResearch dataComputer Science Applicationswhich can be merged with other time-sampled or geographic data.cognitive reappraisalsglobal sample obtained at the onset of the COVID-19 pandemicterveyskäyttäytyminenIn response to the COVID-19 pandemic/dk/atira/pure/sustainabledevelopmentgoals/good_health_and_well_beingand autonomy framing manipulations on behavioral intentions and affective measures. The data collected (April to October 2020) included specific measures for each experimental studyStatistics Probability and UncertaintyPeople’s healthtutkimusaineistosurvey-tutkimusDatasetInformation Systemsthe Psychological Science Accelerator coordinated three large-scale psychological studies to examine the effects of loss-gain framing
researchProduct

CROSSMAPPER: estimating cross-mapping rates and optimizing experimental design in multi-species sequencing studies

2020

Motivation Numerous sequencing studies, including transcriptomics of host-pathogen systems, sequencing of hybrid genomes, xenografts, mixed species systems, metagenomics and meta-transcriptomics, involve samples containing genetic material from divergent organisms. A crucial step in these studies is identifying from which organism each sequencing read originated, and the experimental design should be directed to minimize biases caused by cross-mapping of reads to incorrect source genomes. Additionally, pooling of sufficiently different genetic material into a single sequencing library could significantly reduce experimental costs but requires careful planning and assessment of the impact of…

Statistics and Probability:Informàtica::Aplicacions de la informàtica::Bioinformàtica [Àrees temàtiques de la UPC]Computer sciencecomputer.software_genreBiochemistryGenomeTranscriptome03 medical and health sciencesResource (project management)GenomesTranscriptomicsMolecular BiologyOrganismGenòmica -- Informàtica030304 developmental biology0303 health sciences030306 microbiologyHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNADNAGenome analysisGenome AnalysisAnàlisis de seqüènciesComputer Science ApplicationsApplications NoteComputational MathematicsComputational Theory and MathematicsCross-mappingResearch DesignMetagenomicsRNAData miningLine (text file)computerSoftwareGenèticaparametres
researchProduct

A Bayesian Sequential Look at u-Control Charts

2005

We extend the usual implementation of u-control charts (uCCs) in two ways. First, we overcome the restrictive (and often inadequate) assumptions of the Poisson model; next, we eliminate the need for the questionable base period by using a sequential procedure. We use empirical Bayes(EB) and Bayes methods and compare them with the traditional frequentist implementation. EB methods are somewhat easy to implement, and they deal nicely with extra-Poisson variability (and, at the same time, informally check the adequacy of the Poisson assumption). However, they still need the base period. The sequential, full Bayes approach, on the other hand, also avoids this drawback of traditional u-charts. T…

Statistics and ProbabilityApplied MathematicsBayesian probabilityPoisson distributioncomputer.software_genreStatistical process controlsymbols.namesakeBayes' theoremOverdispersionFrequentist inferenceModeling and SimulationPrior probabilitysymbolsControl chartData miningcomputerMathematicsTechnometrics
researchProduct

Using mathematical morphology for unsupervised classification of functional data

2011

This paper is concerned with the unsupervised classification of functional data by using mathematical morphology. Different morphological operators are used to extract relevant structures of the functions (considered as sets through their subgraph representations). These operators can be considered as preprocessing tools whose outputs are also functional data. We explore some dissimilarity measures and clustering methods for the classification of the transformed data. Our approach is illustrated through a detailed analysis of two data sets. These techniques, which have mainly been used in image processing, provide a flexible and robust toolbox for improving the results in unsupervised funct…

Statistics and ProbabilityApplied MathematicsData classificationImage processingMathematical morphologycomputer.software_genreToolboxComputingMethodologies_PATTERNRECOGNITIONModeling and SimulationPreprocessorData miningStatistics Probability and UncertaintyCluster analysisMorphological operatorscomputerMathematicsJournal of Statistical Computation and Simulation
researchProduct

An introduction to Bayesian reference analysis: inference on the ratio of multinomial parameters

1998

This paper offers an introduction to Bayesian reference analysis, often described as the more successful method to produce non-subjective, model-based, posterior distributions. The ideas are illustrated in detail with an interesting problem, the ratio of multinomial parameters, for which no model-based Bayesian analysis has been proposed. Signposts are provided to the huge related literature.

Statistics and ProbabilityBayesian probabilityPosterior probabilityInferenceBayesian inferencecomputer.software_genreStatistics::ComputationBayesian statisticsComputingMethodologies_PATTERNRECOGNITIONPrior probabilityEconometricsData miningBayesian linear regressionBayesian averagecomputerMathematicsJournal of the Royal Statistical Society: Series D (The Statistician)
researchProduct