Search results for "Data mining"
showing 10 items of 907 documents
Combining one class fuzzy KNN’s
2007
This paper introduces a parallel combination of N > 2 one class fuzzy KNN (FKNN) classifiers. The classifier combination consists of a new optimization procedure based on a genetic algorithm applied to FKNN’s, that differ in the kind of similarity used. We tested the integration techniques in the case of N = 5 similarities that have been recently introduced to face with categorical data sets. The assessment of the method has been carried out on two public data set, the Masquerading User Data (www.schonlau.net) and the badges database on the UCI Machine Learning Repository (http://www.ics.uci.edu/~mlearn/). Preliminary results show the better performance obtained by the fuzzy integration …
A Combined Fuzzy and Probabilistic Data Descriptor for Distributed CBIR
2009
With the wide diffusion of digital image acquisition devices, the cost of managing hundreds of digital images is quickly increasing. Currently, the main way to search digital image libraries is by keywords given by the user. However, users usually add ambiguos keywords for large set of images. A content-based system intended to automatically find a query image, or similar images, within the whole collection is needed. In our work we address the scenario where medical image collections, which nowadays are rapidly expanding in quantity and heterogeneity, are shared in a distributed system to support diagnostic and preventive medicine. Our goal is to produce an efficient content-based descript…
Unsupervised tissue classification of brain MR images for voxel-based morphometry analysis
2016
In this article, a fully unsupervised method for brain tissue segmentation of T1-weighted MRI 3D volumes is proposed. The method uses the Fuzzy C-Means (FCM) clustering algorithm and a Fully Connected Cascade Neural Network (FCCNN) classifier. Traditional manual segmentation methods require neuro-radiological expertise and significant time while semiautomatic methods depend on parameter's setup and trial-and-error methodologies that may lead to high intraoperator/interoperator variability. The proposed method selects the most useful MRI data according to FCM fuzziness values and trains the FCCNN to learn to classify brain’ tissues into White Matter, Gray Matter, and Cerebro-Spinal Fluid in …
Distance-constrained data clustering by combined k-means algorithms and opinion dynamics filters
2014
Data clustering algorithms represent mechanisms for partitioning huge arrays of multidimensional data into groups with small in–group and large out–group distances. Most of the existing algorithms fail when a lower bound for the distance among cluster centroids is specified, while this type of constraint can be of help in obtaining a better clustering. Traditional approaches require that the desired number of clusters are specified a priori, which requires either a subjective decision or global meta–information knowledge that is not easily obtainable. In this paper, an extension of the standard data clustering problem is addressed, including additional constraints on the cluster centroid di…
Minimum message length clustering: an explication and some applications to vegetation data
2001
In this paper, we examine the application of a particular approach to induction, the minimum message length principle and illustrate some of the problems that can be addressed through its use. The MML principle seeks to identify an optimal model within some specified parameterised class of models and for this paper we have chosen to concentrate on a single model class, that of mixture separation or fuzzy clustering. The first section presents, in outline, an MML methodology for fuzzy clustering. We then present some applications, including the nature of the within-cluster model, examination of the univocality of results for different groups of species and the effectiveness of presence data …
Scalable Clustering by Iterative Partitioning and Point Attractor Representation
2016
Clustering very large datasets while preserving cluster quality remains a challenging data-mining task to date. In this paper, we propose an effective scalable clustering algorithm for large datasets that builds upon the concept of synchronization. Inherited from the powerful concept of synchronization, the proposed algorithm, CIPA (Clustering by Iterative Partitioning and Point Attractor Representations), is capable of handling very large datasets by iteratively partitioning them into thousands of subsets and clustering each subset separately. Using dynamic clustering by synchronization, each subset is then represented by a set of point attractors and outliers. Finally, CIPA identifies the…
A Novel Clustering Algorithm based on a Non-parametric "Anti-Bayesian" Paradigm
2015
The problem of clustering, or unsupervised classification, has been solved by a myriad of techniques, all of which depend, either directly or implicitly, on the Bayesian principle of optimal classification. To be more specific, within a Bayesian paradigm, if one is to compare the testing sample with only a single point in the feature space from each class, the optimal Bayesian strategy would be to achieve this based on the distance from the corresponding means or central points in the respective distributions. When this principle is applied in clustering, one would assign an unassigned sample into the cluster whose mean is the closest, and this can be done in either a bottom-up or a top-dow…
Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering
2017
Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on t…
Decision Suport System for Manufacturing Processes Reengineering based upon Fuzzy Logic Techniques
2012
Abstract This work presents a method for taking the decision of reengineering a production system, based upon fuzzy techniques. The main advantage of this method is, after authors' opinion, is the ease of its implementation together with the reduced time for gathering data and processing it. Multi-variable decision systems are usually based upon complicated mathematical methods and involved a large amount of data to be processed. The fuzzy approach presented here is based only on five input variables and one output variable. The data for the model are gathered by simple queries and quizzes. Human perception, the main point of fuzzy logic, is widely used here for gathering input data for the…
Distributed medical images analysis on a Grid infrastructure
2007
In this paper medical applications on a Grid infrastructure, the MAGIC-5 Project, are presented and discussed. MAGIC-5 aims at developing Computer Aided Detection (CADe) software for the analysis of medical images on distributed databases by means of GRID Services. The use of automated systems for analyzing medical images improves radiologists’ performance; in addition, it could be of paramount importance in screening programs, due to the huge amount of data to check and the cost of related manpower. The need for acquiring and analyzing data stored in different locations requires the use of Grid Services for the management of distributed computing resources and data. Grid technologies allow…