Search results for "K-Means"
showing 10 items of 43 documents
Automatic detection of cervical cells in Pap-smear images using polar transform and k-means segmentation
2016
We introduce a novel method of cell detection and segmentation based on a polar transformation. The method assumes that the seed point of each candidate is placed inside the nucleus. The polar representation, built around the seed, is segmented using k-means clustering into one candidate-nucleus cluster, one candidate-cytoplasm cluster and up to three miscellaneous clusters, representing background or surrounding objects that are not part of the candidate cell. For assessing the natural number of clusters, the silhouette method is used. In the segmented polar representation, a number of parameters can be conveniently observed and evaluated as fuzzy memberships to the non-cell class, out of …
Beyond Tandem Analysis: Joint Dimension Reduction and Clustering in R
2019
We present the R package clustrd which implements a class of methods that combine dimension reduction and clustering of continuous or categorical data. In particular, for continuous data, the package contains implementations of factorial K-means and reduced K-means; both methods combine principal component analysis with K-means clustering. For categorical data, the package provides MCA K-means, i-FCB and cluster correspondence analysis, which combine multiple correspondence analysis with K-means. Two examples on real data sets are provided to illustrate the usage of the main functions.
Pattern Classification from Multi-beam Acoustic Data Acquired in Kongsfjorden
2021
Climate change is causing a structural change in Arctic ecosystems, decreasing the effectiveness that the polar regions have in cooling water masses, with inevitable repercussions on the climate and with an impact on marine biodiversity. The Svalbard islands under study are an area greatly influenced by Atlantic waters. This area is undergoing changes that are modifying the composition and distribution of the species present. The aim of this work is to provide a method for the classification of acoustic patterns acquired in the Kongsfjorden, Svalbard, Arctic Circle using multibeam technology. Therefore the general objective is the implementation of a methodology useful for identifying the a…
Exploring the differences of Finnish students in PISA 2003 and 2012 using educational data mining
2016
Suomi on aina saanut hyviä tuloksia PISA-tutkimuksissa, mutta vuonna 2012 tulokset huononivat merkittävästi. Vuosien 2003 ja 2012 aineistoihin sovellettiin muokattua tiedonlouhinta-algoritmia nimeltään k-means++. Tuloksena saatiin kummallekin aineistolle viisi klusteria, joita vertailtiin keskenään. Huonommin pärjänneet klusterit olivat pysyneet samankaltaisina, mutta paremmin pärjänneiden klusteri oli jakautunut. Lisäksi keskiverrosti pärjänneiden klusteri oli kasvanut huomattavasti. Kaikkien klustereiden tulokset matematiikassa olivat laskeneet vuodesta 2003. Finland has always gotten good scores in PISA studies, but in 2012 the results dropped significantly. A modified version of a data mi…
Radio frequency fingerprinting for outdoor user equipment localization
2017
The recent advancements in cellular mobile technology and smart phone usage have opened opportunities for researchers and commercial companies to develop ubiquitous low cost localization systems. Radio frequency (RF) fingerprinting is a popular positioning technique which uses radio signal strength (RSS) values from already existing infrastructures to provide satisfactory user positioning accuracy in indoor and densely built outdoor urban areas where Global Navigation Satellite System (GNSS) signal is poor and hard to reach. However a major requirement for the RF fingerprinting to maintain good localization accuracy is the collection and updating of large training database. The Minimization…
CLUSTERING INCOMPLETE SPECTRAL DATA WITH ROBUST METHODS
2018
Abstract. Missing value imputation is a common approach for preprocessing incomplete data sets. In case of data clustering, imputation methods may cause unexpected bias because they may change the underlying structure of the data. In order to avoid prior imputation of missing values the computational operations must be projected on the available data values. In this paper, we apply a robust nan-K-spatmed algorithm to the clustering problem on hyperspectral image data. Robust statistics, such as multivariate medians, are more insensitive to outliers than classical statistics relying on the Gaussian assumptions. They are, however, computationally more intractable due to the lack of closed-for…
Simulated skiing as a measurement tool for performance in cross-country sit-skiing
2019
The International Paralympic Committee mandates the development of an evidence-based classification system, which requires a measure of performance. Performance in cross-country sit-skiing is mainly dependent on force generated during the poling phase and is enhanced by trunk flexion–extension movements. Since all sit-skiers have neuromuscular impairment, but different ability to control the trunk, this study aimed to verify if simulated action of poling on an adapted ergometer, together with a cluster analysis, could be used for grouping participants with different impairments according to their performance. On the ergometer, eight male and five female participants performed seven poling c…
Improving Scalable K-Means++
2021
Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation …
Improvements and applications of the elements of prototype-based clustering
2018
Clustering or cluster analysis is an essential part of data mining, machine learning, and pattern recognition. The most popularly applied clustering methods are partitioning-based or prototype-based methods. Prototype-based clustering methods usually have easy implementability and good scalability. These methods, such as K-means clustering, have been used for different applications in various fields. On the other hand, prototype-based clustering methods are typically sensitive to initialization, and the selection of the number of clusters for knowledge discovery purposes is not straightforward. In the era of big data, in high-velocity, ever-growing datasets, which can also be erroneous, outl…
Paysage et risque sanitaire - Le cas de l'echinococcose alvéolaire. Approche multiscalaire
2005
Echinococcus multilocularis is a parasite of public health importance causing the fatal zoonotic disease alveolar echinococcosis. The parasite's eggs are dispersed in the environment through the fox faeces. Epidemiological issues associated with the disease led to the monitoring of the endemic status in foxes in France and in Europe. Fox faeces collected in the field were tested for the presence of the parasite and assembled in a georeferenced database. GIS-assisted analysis investigated relationships between landscape characteristics and potential risk. Three scale levels were successively explored. In the french Doubs département located in a high endemicity area, binary logistic regressi…