Search results for "DATA MINING"
showing 10 items of 907 documents
Enabling Searches on Wavelengths in a Hyperspectral Indices Database
2018
Abstract. Spectral indices derived from hyperspectral reflectance measurements are powerful tools to estimate physical parameters in a non-destructive and precise way for several fields of applications, among others vegetation health analysis, coastal and deep water constituents, geology, and atmosphere composition. In the last years, several micro-hyperspectral sensors have appeared, with both full-frame and push-broom acquisition technologies, while in the near future several hyperspectral spaceborne missions are planned to be launched. This is fostering the use of hyperspectral data in basic and applied research causing a large number of spectral indices to be defined and used in various…
Interpolative mapping of mean precipitation in the Baltic countries by using landscape characteristics
2011
Maps of the long-term mean precipitation involving local landscape variables were generated for the Baltic countries, and the effectiveness of seven modelling methods was compared. The precipitation data were recorded in 245 meteorological stations in 1966–2005, and 51 location-related explanatory variables were used. The similarity-based reasoning in the Constud software system outperformed other methods according to the validation fit, except for spring. The multivariate adaptive regression splines (MARS) was another effective method on average. The inclusion of landscape variables, compared to reverse distance-weighted interpolation, highlights the effect of uplands, larger water bodies …
High Performance 3D PET Reconstruction Using Spherical Basis Functions on a Polar Grid
2011
Statistical iterative methods are a widely used method of image reconstruction in emission tomography. Traditionally, the image space is modelled as a combination of cubic voxels as a matter of simplicity. After reconstruction, images are routinely filtered to reduce statistical noise at the cost of spatial resolution degradation. An alternative to produce lower noise during reconstruction is to model the image space with spherical basis functions. These basis functions overlap in space producing a significantly large number of non-zero elements in the system response matrix (SRM) to store, which additionally leads to long reconstruction times. These two problems are partly overcome by expl…
The factorization method for electrical impedance tomography data from a new planar device.
2006
We present numerical results for two reconstruction methods for a new planar electrical impedance tomography device. This prototype allows noninvasive medical imaging techniques if only one side of a patient is accessible for electric measurements. The two reconstruction methods have different properties: one is a linearization-type method that allows quantitative reconstructions; the other one, that is, the factorization method, is a qualitative one, and is designed to detect anomalies within the body.
SNPs detection by eBWT positional clustering
2019
Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data. We develop the positional clustering theory that (i) describes how the extended Burrows–Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the e…
Automatic knowledge discovery from sparse and large-scale educational data : case Finland
2017
The Finnish educational system has received a lot of attention during the 21st century. Especially, the outstanding results in the first three cycles of the Programme for International Student Assessment (PISA) have made Finland’s education system internationally famous, and its unique characteristics have been under active research by various, predominantly educational, scholars since then. However, despite the availability of real but often sparse big data sets that would allow more evidence-based decision making, existing research to date has mostly concentrated on using classical qualitative and (univariate) quantitative methods. This thesis discusses, in general terms, knowledge discove…
Comparing Different approaches - Data mining, Geostatistic, and Deterministic pedology - to assess the Frequency of WRB reference soil groups in the …
2014
Estimating frequency of soil classes in map unit is always affected by some degree of uncertainty, especially at small scales, with a larger generalization. The aim of this study was to compare different possible approaches - data mining, geostatistic, deterministic pedology - to assess the frequency of WRB Reference Soil Groups (RSG) in the major Italian soil regions. In the soil map of Italy (Costantini et al., 2012), a list of the first five RSG was reported in each major 10 soil regions. The soil map was produced using the national soil geodatabase, which stored 22,015 analyzed and classified pedons, 1,413 soil typological unit (STU) and a set of auxiliary variables (lithology, land-use…
O pewnej możliwości ewaluacji frazeologii na przykładzie danych z portalu Грамота.ру i z Narodowego Korpusu Języka Rosyjskiego
2020
The author of the article has run an experiment based on extracting a portion of phraseology from an online Russian language dictionary for further corpus-driven study. On the basis of the list of 100 most common Russian nouns the author has constructed queries to the Грамота.ру web portal that led to extracting over 600 idioms. These were subsequently used to perform another search in the Russian National Corpus. The main goal of this article is to construct a small dictionary of phraseological units extracted from Грамота.ру, as well as discuss the problem of evaluation of phraseology with the use of corpus-extracted data. The author argues that this kind of aproach can provide a consider…
Semantics of Voids within Data: Ignorance-Aware Machine Learning
2021
Operating with ignorance is an important concern of geographical information science when the objective is to discover knowledge from the imperfect spatial data. Data mining (driven by knowledge discovery tools) is about processing available (observed, known, and understood) samples of data aiming to build a model (e.g., a classifier) to handle data samples that are not yet observed, known, or understood. These tools traditionally take semantically labeled samples of the available data (known facts) as an input for learning. We want to challenge the indispensability of this approach, and we suggest considering the things the other way around. What if the task would be as follows: how to buil…
Atlas: analysis tools for low-depth and ancient samples
2017
AbstractSummaryPost-mortem damage (PMD) obstructs the proper analysis of ancient DNA samples and can currently only be addressed by removing or down-weighting potentially damaged data. Here we present ATLAS, a suite of methods to accurately genotype and estimate genetic diversity from ancient samples, while accounting for PMD. It works directly from raw BAM files and enables the building of complete and customized pipelines for the analysis of ancient and other low-depth samples in a very user-friendly way. Based on simulations we show that, in the presence of PMD, a dedicated pipeline of ATLAS calls genotypes more accurately than the state-of-the-art pipeline of GATK combined with mapDamag…