Search results for "Data mining"
showing 10 items of 907 documents
Feature extraction from remote sensing data using Kernel Orthonormalized PLS
2007
This paper presents the study of a sparse kernel-based method for non-linear feature extraction in the context of remote sensing classification and regression problems. The so-called kernel orthonormalized PLS algorithm with reduced complexity (rKOPLS) has two core parts: (i) a kernel version of OPLS (called KOPLS), and (ii) a sparse (reduced) approximation for large scale data sets, which ultimately leads to rKOPLS. The method demonstrates good capabilities in terms of expressive power of the extracted features and scalability.
Local dimensionality reduction within natural clusters for medical data analysis
2005
Inductive learning systems have been successfully applied in a number of medical domains. Nevertheless, the effective use of these systems requires data preprocessing before applying a learning algorithm. Especially it is important for multidimensional heterogeneous data, presented by a large number of features of different types. Dimensionality reduction is one commonly applied approach. The goal of this paper is to study the impact of natural clustering on dimensionality reduction for classification. We compare several data mining strategies that apply dimensionality reduction by means of feature extraction or feature selection for subsequent classification. We show experimentally on micr…
Extracting information from support vector machines for pattern-based classification
2014
Statistical machine learning algorithms building on patterns found by pattern mining algorithms have to cope with large solution sets and thus the high dimensionality of the feature space. Vice versa, pattern mining algorithms are frequently applied to irrelevant instances, thus causing noise in the output. Solution sets of pattern mining algorithms also typically grow with increasing input datasets. The paper proposes an approach to overcome these limitations. The approach extracts information from trained support vector machines, in particular their support vectors and their relevance according to their coefficients. It uses the support vectors along with their coefficients as input to pa…
Improving distance based image retrieval using non-dominated sorting genetic algorithm
2015
Image retrieval is formulated as a multiobjective optimization problem.A multiobjective genetic algorithm is hybridized with distance based search.A parameter balances exploration (genetic search) or exploitation (nearest neighbors).Extensive comparative experimentation illustrate and assess the proposed methodology. Relevance feedback has been adopted as a standard in Content Based Image Retrieval (CBIR). One major difficulty that algorithms have to face is to achieve and adequate balance between the exploitation of already known areas of interest and the exploration of the feature space to find other relevant areas. In this paper, we evaluate different ways to combine two existing relevan…
Cognitive intelligent sensory system for vision-based quality control
2003
This paper presents an original approach for a vision-based quality control system, built around a cognitive intelligent sensory system. The principle of the approach relies on two steps. First, a so-called initialization phase leads to structural knowledge on image acquisition conditions, type of illumination sources, etc. Second, the image is iteratively evaluated using this knowledge and complementary information (e.g., CAD models, and tolerance information). Finally, the information describing the quality of the piece under evaluation is extracted. A further aim of the approach is to enable building strategies that determine for instance the “next best view” required for completing the …
An Agents and Artifacts Approach to Distributed Data Mining
2013
This paper proposes a novel Distributed Data Mining (DDM) approach based on the Agents and Artifacts paradigm, as implemented in CArtAgO [9], where artifacts encapsulate data mining tools, inherited from Weka, that agents can use while engaged in collaborative, distributed learning processes. Target hypothesis are currently constrained to decision trees built with J48, but the approach is flexible enough to allow different kinds of learning models. The twofold contribution of this work includes: i) JaCA-DDM: an extensible tool implemented in the agent oriented programming language Jason [2] and CArtAgO [10,9] to experiment DDM agent-based approaches on different, well known training sets. A…
Using recursive Bayesian estimation for matching GPS measurements to imperfect road network data
2010
Map-matching refers to the process of projecting positioning measurements to a location on a digital road network map. It is an important element of intelligent transportation systems (ITS) focusing on driver assistance applications, on emergency and incident management, arterial and freeway management, and other applications. This paper addresses the problem of map-matching in the applications characterized by imperfect map quality and restricted computational resources - e.g. in the context of community-based ITS applications. Whereas a number of map-matching methods are available, often these methods rely on topological analysis, thereby making them sensitive to the map inaccuracies. In …
Validation of Semantic Analyses of Unstructured Medical Data for Research Purposes
2019
BACKGROUND: In secondary data there are often unstructured free texts. The aim of this study was to validate a text mining system to extract unstructured medical data for research purposes. METHODS: From a radiological department, 1,000 out of 7,102 CT findings were randomly selected. These were manually divided into defined groups by 2 physicians. For automated tagging and reporting, the text analysis software Averbis Extraction Platform (AEP) was used. Special features of the system are a morphological analysis for the decomposition of compound words as well as the recognition of noun phrases, abbreviations and negated statements. Based on the extracted standardized keywords, findings rep…
Spectro-temporal reflectance surfaces: a new conceptual framework for the integration of remote-sensing data from multiple different sensors
2012
The conflict between spatial and temporal resolution of satellite systems, as well as the frequent presence of clouds in the images, has been a traditional limitation of remote sensing in the optical domain. Nevertheless, most of the conceptual tools and algorithms developed classically in remote sensing are based on the input of a series of cloud-free images from identical sensors. In this study, we propose a conceptual framework that is able to ingest data from several different sensors, make them homogeneous, eliminate clouds virtually, and make them usable in a flexible, efficient, and transparent way. The methodology is based on previous developments such as spatial ‘downscaling’, temp…
Special issue on pattern recognition techniques in data mining
2017
Peer Reviewed