Search results for "Data mining"
showing 10 items of 907 documents
Integrated satellite data fusion and mining for monitoring lake water quality status of the Albufera de Valencia in Spain
2015
Abstract Lake eutrophication is a critical issue in the interplay of water supply, environmental management, and ecosystem conservation. Integrated sensing, monitoring, and modeling for a holistic lake water quality assessment with respect to multiple constituents is in acute need. The aim of this paper is to develop an integrated algorithm for data fusion and mining of satellite remote sensing images to generate daily estimates of some water quality parameters of interest, such as chlorophyll a concentrations and water transparency, to be applied for the assessment of the hypertrophic Albufera de Valencia. The Albufera de Valencia is the largest freshwater lake in Spain, which can often pr…
Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling
2017
Gully erosion is identified as an important sediment source in a range of environments and plays a conclusive role in redistribution of eroded soils on a slope. Hence, addressing spatial occurrence pattern of this phenomenon is very important. Different ensemble models and their single counterparts, mostly data mining methods, have been used for gully erosion susceptibility mapping; however, their calibration and validation procedures need to be thoroughly addressed. The current study presents a series of individual and ensemble data mining methods including artificial neural network (ANN), support vector machine (SVM), maximum entropy (ME), ANN-SVM, ANN-ME, and SVM-ME to map gully erosion …
Estimation of National Colorectal-Cancer Incidence Using Claims Databases
2012
Background.The aim of the study was to assess the accuracy of the colorectal-cancer incidence estimated from administrative data.Methods.We selected potential incident colorectal-cancer cases in 2004-2005 French administrative data, using two alternative algorithms. The first was based only on diagnostic and procedure codes, whereas the second considered the past history of the patient. Results of both methods were assessed against two corresponding local cancer registries, acting as “gold standards.” We then constructed a multivariable regression model to estimate the corrected total number of incident colorectal-cancer cases from the whole national administrative database.Results.The firs…
Dealing with spatial data pooled over time in statistical models
2012
Recent developments in spatial econometrics have been devoted to spatio-temporal data and how spatial panel data structure should be modeled. Little effort has been devoted to the way one must deal with spatial data pooled over time. This paper presents the characteristics of spatial data pooled over time and proposes a simple way to take into account unidirectional temporal effect as well as multidirectional spatial effect in the estimation process. An empirical example, using data on 25,357 single family homes sold in Lucas County, OH (USA), between 1993 and 1998 (available in the MatLab library), is used to illustrate the potential of the approach proposed.
Missing Value Estimation for Microarray Data by Bayesian Principal Component Analysis and Iterative Local Least Squares
2013
Published version of an article from the journal: Mathematical Problems in Engineering. Also available from Hindawi: http://dx.doi.org/10.1155/2013/162938 Missing values are prevalent in microarray data, they course negative influence on downstream microarray analyses, and thus they should be estimated from known values. We propose a BPCA-iLLS method, which is an integration of two commonly used missing value estimation methods-Bayesian principal component analysis (BPCA) and local least squares (LLS). The inferior row-average procedure in LLS is replaced with BPCA, and the least squares method is put into an iterative framework. Comparative result shows that the proposed method has obtaine…
smatr 3 - an R package for estimation and inference about allometric lines
2011
Summary 1. The Standardised Major Axis Tests and Routines (SMATR) software provides tools for estimation and inference about allometric lines, currently widely used in ecology and evolution. 2. This paper describes some significant improvements to the functionality of the package, now available on R in smatr version 3. 3. New inclusions in the package include sma and ma functions that accept formula input and perform the key inference tasks; multiple comparisons; graphical methods for visualising data and checking (S)MA assumptions; robust (S)MA estimation and inference tools.
Criminal networks analysis in missing data scenarios through graph distances
2021
Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…
GEM
2014
The widespread use of digital sensor systems causes a tremendous demand for high-quality time series analysis tools. In this domain the majority of data mining algorithms relies on established distance measures like Dynamic Time Warping (DTW) or Euclidean distance (ED). However, the notion of similarity induced by ED and DTW may lead to unsatisfactory clusterings. In order to address this shortcoming we introduce the Gliding Elastic Match (GEM) algorithm. It determines an optimal local similarity measure of a query time series Q and a subject time series S. The measure is invariant under both local deformation on the measurement-axis and scaling in the time domain. GEM is compared to ED and…
Data Mining Algorithms for Knowledge Extraction
2020
In this paper, we study the methods, techniques, and algorithms used in data mining, and from the studied algorithms, we emphasized the clustering algorithms, more precisely on the K-means algorithm. This algorithm was first studied using the Euclidean distance, then modifying the distance between the clusters using the distances Mahalanobis and Canberra. After implementing the algorithms in C/C++, we compared the clustering of the three algorithms, after which we modified them and studied the distance between the clusters.
Literature, social media and questionnaire surveys identify relevant conservation areas for Carcharhinus species in the Mediterranean Sea
2023
Sharks support ecosystems’ health, but their populations are facing severe declines worldwide. Knowledge gaps on shark distribution and the negative human perception of them still represent a barrier to the implementation of effective conservation measures. Here we carried out a regional-scale analysis in the Mediterranean Sea using data on requiem shark catches and sightings available in the scientific literature and on social media platforms to: 1) depict the distribution of Carcharhinus species across the basin, 2) identify potentially relevant areas for their conservation, and 3) evaluate people’s attitude toward shark protection. In addition, we administered 112 questionnaires in one o…