Search results for " mining"

showing 10 items of 1548 documents

Estimation of National Colorectal-Cancer Incidence Using Claims Databases

2012

Background.The aim of the study was to assess the accuracy of the colorectal-cancer incidence estimated from administrative data.Methods.We selected potential incident colorectal-cancer cases in 2004-2005 French administrative data, using two alternative algorithms. The first was based only on diagnostic and procedure codes, whereas the second considered the past history of the patient. Results of both methods were assessed against two corresponding local cancer registries, acting as “gold standards.” We then constructed a multivariable regression model to estimate the corrected total number of incident colorectal-cancer cases from the whole national administrative database.Results.The firs…

EstimationArticle SubjectEpidemiologybusiness.industryColorectal cancerIncidence (epidemiology)lcsh:RPublic Health Environmental and Occupational HealthMEDLINElcsh:MedicineRegression analysiscomputer.software_genremedicine.diseaseCancer registryAdministrative databaseStatisticsGeneticsMedicineData miningClaims databasebusinesscomputerResearch ArticleJournal of Cancer Epidemiology
researchProduct

Dealing with spatial data pooled over time in statistical models

2012

Recent developments in spatial econometrics have been devoted to spatio-temporal data and how spatial panel data structure should be modeled. Little effort has been devoted to the way one must deal with spatial data pooled over time. This paper presents the characteristics of spatial data pooled over time and proposes a simple way to take into account unidirectional temporal effect as well as multidirectional spatial effect in the estimation process. An empirical example, using data on 25,357 single family homes sold in Lucas County, OH (USA), between 1993 and 1998 (available in the MatLab library), is used to illustrate the potential of the approach proposed.

EstimationStructure (mathematical logic)Economics and EconometricsComputer scienceProcess (engineering)Geography Planning and DevelopmentStatistical modelstatistical modelscomputer.software_genre[SHS.ECO]Humanities and Social Sciences/Economics and FinanceUrban Studiesspatial dataEconometrics[ SHS.ECO ] Humanities and Social Sciences/Economies and financesSpatial econometricsData miningMATLAB[SHS.ECO] Humanities and Social Sciences/Economics and FinanceSpatial analysiscomputerComputingMilieux_MISCELLANEOUSDemographycomputer.programming_languagePanel data
researchProduct

Missing Value Estimation for Microarray Data by Bayesian Principal Component Analysis and Iterative Local Least Squares

2013

Published version of an article from the journal: Mathematical Problems in Engineering. Also available from Hindawi: http://dx.doi.org/10.1155/2013/162938 Missing values are prevalent in microarray data, they course negative influence on downstream microarray analyses, and thus they should be estimated from known values. We propose a BPCA-iLLS method, which is an integration of two commonly used missing value estimation methods-Bayesian principal component analysis (BPCA) and local least squares (LLS). The inferior row-average procedure in LLS is replaced with BPCA, and the least squares method is put into an iterative framework. Comparative result shows that the proposed method has obtaine…

EstimationVDP::Mathematics and natural science: 400::Mathematics: 410::Applied mathematics: 413Article SubjectComputer sciencelcsh:MathematicsGeneral MathematicsGeneral EngineeringValue (computer science)lcsh:QA1-939Non-linear iterative partial least squarescomputer.software_genreLeast squaresBayesian principal component analysislcsh:TA1-2040Data mininglcsh:Engineering (General). Civil engineering (General)computerMathematical Problems in Engineering
researchProduct

smatr 3 - an R package for estimation and inference about allometric lines

2011

Summary 1. The Standardised Major Axis Tests and Routines (SMATR) software provides tools for estimation and inference about allometric lines, currently widely used in ecology and evolution. 2. This paper describes some significant improvements to the functionality of the package, now available on R in smatr version 3. 3. New inclusions in the package include sma and ma functions that accept formula input and perform the key inference tasks; multiple comparisons; graphical methods for visualising data and checking (S)MA assumptions; robust (S)MA estimation and inference tools.

Estimationbusiness.industryComputer scienceEcological ModelingInferencecomputer.software_genreR packageSoftwareMultiple comparisons problemPrincipal component analysisKey (cryptography)Data miningAllometrybusinesscomputerEcology Evolution Behavior and SystematicsMethods in Ecology and Evolution
researchProduct

Hunting for valuables from landfills and assessing their market opportunities A case study with Kudjape landfill in Estonia

2017

Landfill mining is an alternative technology that merges the ideas of material recycling and sustainable waste management. This paper reports a case study to estimate the value of landfilled materials and their respective market opportunities, based on a full-scale landfill mining project in Estonia. During the project, a dump site (Kudjape, Estonia) was excavated with the main objectives of extracting soil-like final cover material with the function of methane degradation. In total, about 57,777 m3 of waste was processed, particularly the uppermost 10-year layer of waste. Manual sorting was performed in four test pits to determine the detailed composition of wastes. 11,610 kg of waste was…

EstoniaEnvironmental EngineeringWaste management020209 energySorting (sediment)Extraction (chemistry)Environmental engineeringFraction (chemistry)02 engineering and technology010501 environmental sciences01 natural sciencesPollutionMiningWaste Disposal FacilitiesWaste Management0202 electrical engineering electronic engineering information engineeringEnvironmental scienceRecyclingLandfill miningChemical compositionRefuse-derived fuelFinal cover0105 earth and related environmental sciencesAlternative technologyWaste Management & Research: The Journal for a Sustainable Circular Economy
researchProduct

The upgraded HADES trigger and data acquisition system

2011

The HADES experiment is a High Acceptance Di-Electron Spectrometer located at GSI in Darmstadt, Germany. Recently, its trigger and data acquisition system was upgraded. The main goal was to substantially increase the event rate capability by a factor of up to 20 to reach 100 kHz in light and 20 kHz in heavy ion reaction systems. The total data rate written to storage is about 400 MByte/s in peak.In this context, the complete read-out system was exchanged to FPGA-based platforms using optical communication. For data transport a general-purpose real-time network protocol was developed to meet the strong requirements of the system. In particular, trigger information has to reach all front-end …

EthernetEvent (computing)business.industryData stream miningComputer scienceContext (language use)Data acquisitionServer farmVirtual address spacebusinessCommunications protocolInstrumentationMathematical PhysicsComputer hardwareJournal of Instrumentation
researchProduct

CUDA-Accelerated Alignment of Subsequences in Streamed Time Series Data

2014

Euclidean Distance (ED) and Dynamic Time Warping (DTW) are cornerstones in the field of time series data mining. Many high-level algorithms like kNN-classification, clustering or anomaly detection make excessive use of these distance measures as subroutines. Furthermore, the vast growth of recorded data produced by automated monitoring systems or integrated sensors establishes the need for efficient implementations. In this paper, we introduce linear memory parallelization schemes for the alignment of a given query Q in a stream of time series data S for both ED and DTW using CUDA-enabled accelerators. The ED parallelization features a log-linear calculation scheme in contrast to the naive …

Euclidean distanceCUDADynamic time warpingData stream miningComputer scienceAnomaly detectionParallel computingCluster analysisTime complexityDistance measures2014 43rd International Conference on Parallel Processing
researchProduct

Criminal networks analysis in missing data scenarios through graph distances

2021

Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…

Euclidean distanceData collectionComputer scienceNode (networking)Law enforcementGraph (abstract data type)Adjacency listData miningMissing datacomputer.software_genreCriminal investigationcomputerCrimRxiv
researchProduct

GEM

2014

The widespread use of digital sensor systems causes a tremendous demand for high-quality time series analysis tools. In this domain the majority of data mining algorithms relies on established distance measures like Dynamic Time Warping (DTW) or Euclidean distance (ED). However, the notion of similarity induced by ED and DTW may lead to unsatisfactory clusterings. In order to address this shortcoming we introduce the Gliding Elastic Match (GEM) algorithm. It determines an optimal local similarity measure of a query time series Q and a subject time series S. The measure is invariant under both local deformation on the measurement-axis and scaling in the time domain. GEM is compared to ED and…

Euclidean distanceDynamic time warpingSimilarity (network science)Computer scienceData miningInvariant (mathematics)Similarity measurecomputer.software_genreMeasure (mathematics)AlgorithmcomputerDistance measuresProceedings of the 29th Annual ACM Symposium on Applied Computing
researchProduct

Data Mining Algorithms for Knowledge Extraction

2020

In this paper, we study the methods, techniques, and algorithms used in data mining, and from the studied algorithms, we emphasized the clustering algorithms, more precisely on the K-means algorithm. This algorithm was first studied using the Euclidean distance, then modifying the distance between the clusters using the distances Mahalanobis and Canberra. After implementing the algorithms in C/C++, we compared the clustering of the three algorithms, after which we modified them and studied the distance between the clusters.

Euclidean distanceMahalanobis distanceMatrix (mathematics)ComputingMethodologies_PATTERNRECOGNITIONKnowledge extractionComputer sciencebusiness.industryValue (computer science)Pattern recognitionArtificial intelligenceCluster analysisbusinessData mining algorithm
researchProduct