Search results for "Data mining"

showing 10 items of 907 documents

Integrated satellite data fusion and mining for monitoring lake water quality status of the Albufera de Valencia in Spain

2015

Abstract Lake eutrophication is a critical issue in the interplay of water supply, environmental management, and ecosystem conservation. Integrated sensing, monitoring, and modeling for a holistic lake water quality assessment with respect to multiple constituents is in acute need. The aim of this paper is to develop an integrated algorithm for data fusion and mining of satellite remote sensing images to generate daily estimates of some water quality parameters of interest, such as chlorophyll a concentrations and water transparency, to be applied for the assessment of the hypertrophic Albufera de Valencia. The Albufera de Valencia is the largest freshwater lake in Spain, which can often pr…

Environmental EngineeringManagement Monitoring Policy and LawRemote SensingMachine LearningWater SupplyWater QualityData MiningSpacecraftWaste Management and DisposalImage resolutionEcosystemRemote sensingGround truthSecchi diskLake managementGeneral MedicineData FusionLakesSpainThematic MapperTemporal resolutionEnvironmental scienceSpatial variabilityWater qualityModerate-resolution imaging spectroradiometerEnvironmental Monitoring
researchProduct

Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling

2017

Gully erosion is identified as an important sediment source in a range of environments and plays a conclusive role in redistribution of eroded soils on a slope. Hence, addressing spatial occurrence pattern of this phenomenon is very important. Different ensemble models and their single counterparts, mostly data mining methods, have been used for gully erosion susceptibility mapping; however, their calibration and validation procedures need to be thoroughly addressed. The current study presents a series of individual and ensemble data mining methods including artificial neural network (ANN), support vector machine (SVM), maximum entropy (ME), ANN-SVM, ANN-ME, and SVM-ME to map gully erosion …

Environmental EngineeringSòls Erosió010504 meteorology & atmospheric sciencesEnsemble forecastingPrinciple of maximum entropy010501 environmental sciencescomputer.software_genre01 natural sciencesPollutionStability (probability)Support vector machineGoodness of fitRobustness (computer science)StatisticsRange (statistics)Environmental ChemistryData miningWaste Management and Disposalcomputer0105 earth and related environmental sciencesMathematicsStatistical hypothesis testingScience of The Total Environment
researchProduct

Estimation of National Colorectal-Cancer Incidence Using Claims Databases

2012

Background.The aim of the study was to assess the accuracy of the colorectal-cancer incidence estimated from administrative data.Methods.We selected potential incident colorectal-cancer cases in 2004-2005 French administrative data, using two alternative algorithms. The first was based only on diagnostic and procedure codes, whereas the second considered the past history of the patient. Results of both methods were assessed against two corresponding local cancer registries, acting as “gold standards.” We then constructed a multivariable regression model to estimate the corrected total number of incident colorectal-cancer cases from the whole national administrative database.Results.The firs…

EstimationArticle SubjectEpidemiologybusiness.industryColorectal cancerIncidence (epidemiology)lcsh:RPublic Health Environmental and Occupational HealthMEDLINElcsh:MedicineRegression analysiscomputer.software_genremedicine.diseaseCancer registryAdministrative databaseStatisticsGeneticsMedicineData miningClaims databasebusinesscomputerResearch ArticleJournal of Cancer Epidemiology
researchProduct

Dealing with spatial data pooled over time in statistical models

2012

Recent developments in spatial econometrics have been devoted to spatio-temporal data and how spatial panel data structure should be modeled. Little effort has been devoted to the way one must deal with spatial data pooled over time. This paper presents the characteristics of spatial data pooled over time and proposes a simple way to take into account unidirectional temporal effect as well as multidirectional spatial effect in the estimation process. An empirical example, using data on 25,357 single family homes sold in Lucas County, OH (USA), between 1993 and 1998 (available in the MatLab library), is used to illustrate the potential of the approach proposed.

EstimationStructure (mathematical logic)Economics and EconometricsComputer scienceProcess (engineering)Geography Planning and DevelopmentStatistical modelstatistical modelscomputer.software_genre[SHS.ECO]Humanities and Social Sciences/Economics and FinanceUrban Studiesspatial dataEconometrics[ SHS.ECO ] Humanities and Social Sciences/Economies and financesSpatial econometricsData miningMATLAB[SHS.ECO] Humanities and Social Sciences/Economics and FinanceSpatial analysiscomputerComputingMilieux_MISCELLANEOUSDemographycomputer.programming_languagePanel data
researchProduct

Missing Value Estimation for Microarray Data by Bayesian Principal Component Analysis and Iterative Local Least Squares

2013

Published version of an article from the journal: Mathematical Problems in Engineering. Also available from Hindawi: http://dx.doi.org/10.1155/2013/162938 Missing values are prevalent in microarray data, they course negative influence on downstream microarray analyses, and thus they should be estimated from known values. We propose a BPCA-iLLS method, which is an integration of two commonly used missing value estimation methods-Bayesian principal component analysis (BPCA) and local least squares (LLS). The inferior row-average procedure in LLS is replaced with BPCA, and the least squares method is put into an iterative framework. Comparative result shows that the proposed method has obtaine…

EstimationVDP::Mathematics and natural science: 400::Mathematics: 410::Applied mathematics: 413Article SubjectComputer sciencelcsh:MathematicsGeneral MathematicsGeneral EngineeringValue (computer science)lcsh:QA1-939Non-linear iterative partial least squarescomputer.software_genreLeast squaresBayesian principal component analysislcsh:TA1-2040Data mininglcsh:Engineering (General). Civil engineering (General)computerMathematical Problems in Engineering
researchProduct

smatr 3 - an R package for estimation and inference about allometric lines

2011

Summary 1. The Standardised Major Axis Tests and Routines (SMATR) software provides tools for estimation and inference about allometric lines, currently widely used in ecology and evolution. 2. This paper describes some significant improvements to the functionality of the package, now available on R in smatr version 3. 3. New inclusions in the package include sma and ma functions that accept formula input and perform the key inference tasks; multiple comparisons; graphical methods for visualising data and checking (S)MA assumptions; robust (S)MA estimation and inference tools.

Estimationbusiness.industryComputer scienceEcological ModelingInferencecomputer.software_genreR packageSoftwareMultiple comparisons problemPrincipal component analysisKey (cryptography)Data miningAllometrybusinesscomputerEcology Evolution Behavior and SystematicsMethods in Ecology and Evolution
researchProduct

Criminal networks analysis in missing data scenarios through graph distances

2021

Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…

Euclidean distanceData collectionComputer scienceNode (networking)Law enforcementGraph (abstract data type)Adjacency listData miningMissing datacomputer.software_genreCriminal investigationcomputerCrimRxiv
researchProduct

GEM

2014

The widespread use of digital sensor systems causes a tremendous demand for high-quality time series analysis tools. In this domain the majority of data mining algorithms relies on established distance measures like Dynamic Time Warping (DTW) or Euclidean distance (ED). However, the notion of similarity induced by ED and DTW may lead to unsatisfactory clusterings. In order to address this shortcoming we introduce the Gliding Elastic Match (GEM) algorithm. It determines an optimal local similarity measure of a query time series Q and a subject time series S. The measure is invariant under both local deformation on the measurement-axis and scaling in the time domain. GEM is compared to ED and…

Euclidean distanceDynamic time warpingSimilarity (network science)Computer scienceData miningInvariant (mathematics)Similarity measurecomputer.software_genreMeasure (mathematics)AlgorithmcomputerDistance measuresProceedings of the 29th Annual ACM Symposium on Applied Computing
researchProduct

Data Mining Algorithms for Knowledge Extraction

2020

In this paper, we study the methods, techniques, and algorithms used in data mining, and from the studied algorithms, we emphasized the clustering algorithms, more precisely on the K-means algorithm. This algorithm was first studied using the Euclidean distance, then modifying the distance between the clusters using the distances Mahalanobis and Canberra. After implementing the algorithms in C/C++, we compared the clustering of the three algorithms, after which we modified them and studied the distance between the clusters.

Euclidean distanceMahalanobis distanceMatrix (mathematics)ComputingMethodologies_PATTERNRECOGNITIONKnowledge extractionComputer sciencebusiness.industryValue (computer science)Pattern recognitionArtificial intelligenceCluster analysisbusinessData mining algorithm
researchProduct

Literature, social media and questionnaire surveys identify relevant conservation areas for Carcharhinus species in the Mediterranean Sea

2023

Sharks support ecosystems’ health, but their populations are facing severe declines worldwide. Knowledge gaps on shark distribution and the negative human perception of them still represent a barrier to the implementation of effective conservation measures. Here we carried out a regional-scale analysis in the Mediterranean Sea using data on requiem shark catches and sightings available in the scientific literature and on social media platforms to: 1) depict the distribution of Carcharhinus species across the basin, 2) identify potentially relevant areas for their conservation, and 3) evaluate people’s attitude toward shark protection. In addition, we administered 112 questionnaires in one o…

Extinction Social media data mining Conservation hotspot Public perception Ecotourism Coastal sharks Requiem sharksEcology Evolution Behavior and SystematicsNature and Landscape Conservation
researchProduct