Search results for "Clustering"
showing 10 items of 446 documents
Probabilistic quantum clustering
2020
Abstract Quantum Clustering is a powerful method to detect clusters with complex shapes. However, it is very sensitive to a length parameter that controls the shape of the Gaussian kernel associated with a wave function, which is employed in the Schrodinger equation with the role of a density estimator. In addition, linking data points into clusters requires local estimates of covariance which requires further parameters. This paper proposes a Bayesian framework that provides an objective measure of goodness-of-fit to the data, to optimise the adjustable parameters. This also quantifies the probabilities of cluster membership, thus partitioning the data into a specific number of clusters, w…
ExtMiner : Combining multiple ranking and clustering algorithms for structured document retrieval
2006
This paper introduces ExtMiner, a platform and potential tool for information management in SMEs (small & medium-size enterprise), or for organizational workgroups. ExtMiner supports interactive and iterative clustering of documents. It provides users with a visual cluster and list views at the same time, supporting iterative search process. ExtMiner may also be applied as a platform for research on retrieval fusion, since it combines search, clustering and visualization algorithms. ExtMiner was evaluated with three document collections. Although the findings were encouraging the user interface and performance with large document repositories need further development. peerReviewed
Rings for privacy: An architecture for privacy-preserving user profiling
2014
Document Word Clouds: Visualising Web Documents as Tag Clouds to Aid Users in Relevance Decisions
2009
Περιέχει το πλήρες κείμενο Information Retrieval systems spend a great effort on determining the significant terms in a document. When, instead, a user is looking at a document he cannot benefit from such information. He has to read the text to understand which words are important. In this paper we take a look at the idea of enhancing the perception of web documents with visualisation techniques borrowed from the tag clouds of Web 2.0. Highlighting the important words in a document by using a larger font size allows to get a quick impression of the relevant concepts in a text. As this process does not depend on a user query it can also be used for explorative search. A user study showed, th…
Data mining-based statistical analysis of biological data uncovers hidden significance: clustering Hashimoto’s thyroiditis patients based on the resp…
2014
The pathogenesis of Hashimoto's thyroiditis includes autoimmunity involving thyroid antigens, autoantibodies, and possibly cytokines. It is unclear what role plays Hsp60, but our recent data indicate that it may contribute to pathogenesis as an autoantigen. Its role in the induction of cytokine production, pro- or anti-inflammatory, was not elucidated, except that we found that peripheral blood mononucleated cells (PBMC) from patients or from healthy controls did not respond with cytokine production upon stimulation by Hsp60 in vitro with patterns that would differentiate patients from controls with statistical significance. This "negative” outcome appeared when the data were pooled and ana…
PGAC: A Parallel Genetic Algorithm for Data Clustering
2005
Cluster analysis is a valuable tool for exploratory pattern analysis, especially when very little a priori knowledge about the data is available. Distributed systems, based on high speed intranet connections, provide new tools in order to design new and faster clustering algorithms. Here, a parallel genetic algorithm for clustering called PGAC is described. The used strategy of parallelization is the island model paradigm where different populations of chromosomes (called demes) evolve locally to each processor and from time to time some individuals are moved from one deme to another. Experiments have been performed for testing the benefits of the parallelisation paradigm in terms of comput…
Gamma Knife treatment planning: MR brain tumor segmentation and volume measurement based on unsupervised Fuzzy C-Means clustering
2015
Nowadays, radiation treatment is beginning to intensively use MRI thanks to its greater ability to discriminate healthy and diseased soft-tissues. Leksell Gamma Knife® is a radio-surgical device, used to treat different brain lesions, which are often inaccessible for conventional surgery, such as benign or malignant tumors. Currently, the target to be treated with radiation therapy is contoured with slice-by-slice manual segmentation on MR datasets. This approach makes the segmentation procedure time consuming and operator-dependent. The repeatability of the tumor boundary delineation may be ensured only by using automatic or semiautomatic methods, supporting clinicians in the treatment pla…
Knowledge Discovery from the Programme for International Student Assessment
2017
The Programme for International Student Assessment (PISA) is a worldwide study that assesses the proficiencies of 15-year-old students in reading, mathematics, and science every three years. Despite the high quality and open availability of the PISA data sets, which call for big data learning analytics, academic research using this rich and carefully collected data is surprisingly sparse. Our research contributes to reducing this deficit by discovering novel knowledge from the PISA through the development and use of appropriate methods. Since Finland has been the country of most international interest in the PISA assessment, a relevant review of the Finnish educational system is provided. T…
Comparison of genomic sequences clustering using Normalized Compression Distance and Evolutionary Distance
2008
Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a long procedure and the obtained dissimilarity results is not a metric. Recently the normalized compression distance was introduced as a method to calculate the distance between two generic digital objects, and it seems a suitable way to compare genomic strings. In this paper the clustering and the mapping, obtained using a SOM, with the traditional evolutionary distance and the compression distance are compared in order to understand if the two distances sets are similar. The first results indicate that the two distances catch differen…
Feature Ranking of Large, Robust, and Weighted Clustering Result
2017
A clustering result needs to be interpreted and evaluated for knowledge discovery. When clustered data represents a sample from a population with known sample-to-population alignment weights, both the clustering and the evaluation techniques need to take this into account. The purpose of this article is to advance the automatic knowledge discovery from a robust clustering result on the population level. For this purpose, we derive a novel ranking method by generalizing the computation of the Kruskal-Wallis H test statistic from sample to population level with two different approaches. Application of these enlargements to both the input variables used in clustering and to metadata provides a…