Search results for "mining"
showing 10 items of 1730 documents
Minimum message length clustering: an explication and some applications to vegetation data
2001
In this paper, we examine the application of a particular approach to induction, the minimum message length principle and illustrate some of the problems that can be addressed through its use. The MML principle seeks to identify an optimal model within some specified parameterised class of models and for this paper we have chosen to concentrate on a single model class, that of mixture separation or fuzzy clustering. The first section presents, in outline, an MML methodology for fuzzy clustering. We then present some applications, including the nature of the within-cluster model, examination of the univocality of results for different groups of species and the effectiveness of presence data …
Scalable Clustering by Iterative Partitioning and Point Attractor Representation
2016
Clustering very large datasets while preserving cluster quality remains a challenging data-mining task to date. In this paper, we propose an effective scalable clustering algorithm for large datasets that builds upon the concept of synchronization. Inherited from the powerful concept of synchronization, the proposed algorithm, CIPA (Clustering by Iterative Partitioning and Point Attractor Representations), is capable of handling very large datasets by iteratively partitioning them into thousands of subsets and clustering each subset separately. Using dynamic clustering by synchronization, each subset is then represented by a set of point attractors and outliers. Finally, CIPA identifies the…
A Novel Clustering Algorithm based on a Non-parametric "Anti-Bayesian" Paradigm
2015
The problem of clustering, or unsupervised classification, has been solved by a myriad of techniques, all of which depend, either directly or implicitly, on the Bayesian principle of optimal classification. To be more specific, within a Bayesian paradigm, if one is to compare the testing sample with only a single point in the feature space from each class, the optimal Bayesian strategy would be to achieve this based on the distance from the corresponding means or central points in the respective distributions. When this principle is applied in clustering, one would assign an unassigned sample into the cluster whose mean is the closest, and this can be done in either a bottom-up or a top-dow…
Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering
2017
Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on t…
Decision Suport System for Manufacturing Processes Reengineering based upon Fuzzy Logic Techniques
2012
Abstract This work presents a method for taking the decision of reengineering a production system, based upon fuzzy techniques. The main advantage of this method is, after authors' opinion, is the ease of its implementation together with the reduced time for gathering data and processing it. Multi-variable decision systems are usually based upon complicated mathematical methods and involved a large amount of data to be processed. The fuzzy approach presented here is based only on five input variables and one output variable. The data for the model are gathered by simple queries and quizzes. Human perception, the main point of fuzzy logic, is widely used here for gathering input data for the…
Distributed medical images analysis on a Grid infrastructure
2007
In this paper medical applications on a Grid infrastructure, the MAGIC-5 Project, are presented and discussed. MAGIC-5 aims at developing Computer Aided Detection (CADe) software for the analysis of medical images on distributed databases by means of GRID Services. The use of automated systems for analyzing medical images improves radiologists’ performance; in addition, it could be of paramount importance in screening programs, due to the huge amount of data to check and the cost of related manpower. The need for acquiring and analyzing data stored in different locations requires the use of Grid Services for the management of distributed computing resources and data. Grid technologies allow…
Area at Risk and Viability after Myocardial Ischemia and Reperfusion Can Be Determined by Contrast-Enhanced Cardiac Magnetic Resonance Imaging
2008
<i>Background/Aims:</i> Clinical differentiation between infarcted and viable myocardium in the ischemic area at risk is controversial. We investigated the potential of contrast-enhanced cardiac magnetic resonance imaging (ceCMRI) in determining the area at risk 24 h after ischemia. <i>Methods:</i> Myocardial ischemia was induced by percutaneous coronary intervention of the left anterior descending coronary artery in pigs. Coronary occlusion time was 30 min in group A, which caused little myocardial infarction and 45 min in group B, which led to irreversible damage. 24 h after reperfusion ceCMRI was performed at 2 and 15 min after administration of gadolinium-diethyl…
No-Reference 3D Mesh Quality Assessment Based on Dihedral Angles Model and Support Vector Regression
2016
International audience; 3D meshes are subject to various visual distortions during their transmission and geometrical processing. Several works have tried to evaluate the visual quality using either full reference or reduced reference approaches. However, these approaches require the presence of the reference mesh which is not available in such practical situations. In this paper, the main contribution lies in the design of a computational method to automatically predict the perceived mesh quality without reference and without knowing beforehand the distortion type. Following the no-reference (NR) quality assessment principle, the proposed method focuses only on the distorted mesh. Specific…
Identification and visualisation of differential isoform expression in RNA-seq time series
2017
AbstractAs sequencing technologies improve their capacity to detect distinct transcripts of the same gene and to address complex experimental designs such as longitudinal studies, there is a need to develop statistical methods for the analysis of isoform expression changes in time series data. Iso-maSigPro is a new functionality of the R package maSigPro for transcriptomics time series data analysis. Iso-maSigPro identifies genes with a differential isoform usage across time. The package also includes new clustering and visualization functions that allow grouping of genes with similar expression patterns at the isoform level, as well as those genes with a shift in major expressed isoform. T…
Non-muscle myosin II as a predictive factor in head and neck squamous cell carcinoma
2018
Background The present study attempted to provide information regarding non-muscle myosin II (MII) isoforms immunoreactivity in patients with head and neck squamous cell carcinoma (HNSCC) and analysis of the patients’ clinical status after 5 years of monitoring. Material and Methods A semiquantitative analysis of the immunoreactivity of the MII isoforms was performed in 54 surgical specimens and its correlation with clinical and pathological variables and prognosis was verified. Data were analyzed using chi-square, Mann-Whitney and Kruskal-Wallis tests. To evaluate the survival over the total monitoring time and any connection with the proteins studied, the Kaplan-Meier analysis was used. P…