Search results for "clustering"
showing 10 items of 446 documents
A Coclustering Approach for Mining Large Protein-Protein Interaction Networks
2012
Several approaches have been presented in the literature to cluster Protein-Protein Interaction (PPI) networks. They can be grouped in two main categories: those allowing a protein to participate in different clusters and those generating only nonoverlapping clusters. In both cases, a challenging task is to find a suitable compromise between the biological relevance of the results and a comprehensive coverage of the analyzed networks. Indeed, methods returning high accurate results are often able to cover only small parts of the input PPI network, especially when low-characterized networks are considered. We present a coclustering-based technique able to generate both overlapping and nonove…
Evolution of Cooperation Patterns in Psoriasis Research: Co-Authorship Network Analysis of Papers in Medline (1942–2013)
2015
BackgroundAlthough researchers have worked in collaboration since the origins of modern science and the publication of the first scientific journals in the eighteenth century, this phenomenon has acquired exceptional importance in the last several decades. Since the mid-twentieth century, new knowledge has been generated from within an ever-growing network of investigators, working cooperatively in research groups across countries and institutions. Cooperation is a crucial determinant of academic success.ObjectiveThe aim of the present paper is to analyze the evolution of scientific collaboration at the micro level, with regard to the scientific production generated on psoriasis research.Me…
Improving clustering of Web bot and human sessions by applying Principal Component Analysis
2019
View references (18) The paper addresses the problem of modeling Web sessions of bots and legitimate users (humans) as feature vectors for their use at the input of classification models. So far many different features to discriminate bots’ and humans’ navigational patterns have been considered in session models but very few studies were devoted to feature selection and dimensionality reduction in the context of bot detection. We propose applying Principal Component Analysis (PCA) to develop improved session models based on predictor variables being efficient discriminants of Web bots. The proposed models are used in session clustering, whose performance is evaluated in terms of the purity …
Fast dendrogram-based OTU clustering using sequence embedding
2014
Biodiversity assessment is an important step in a metagenomic processing pipeline. The biodiversity of a microbial metagenome is often estimated by grouping its 16S rRNA reads into operational taxonomic units or OTUs. These metagenomic datasets are typically large and hence require effective yet accurate computational methods for processing.In this paper, we introduce a new hierarchical clustering method called CRiSPy-Embed which aims to produce high-quality clustering results at a low computational cost. We tackle two computational issues of the current OTU hierarchical clustering approach: (1) the compute-intensive sequence alignment operation for building the distance matrix and (2) the …
A Fuzzy Logic C-Means Clustering Algorithm to Enhance Microcalcifications Clusters in Digital Mammograms
2011
The detection of microcalcifications is a hard task, since they are quite small and often poorly contrasted against the background of images. The Computer Aided Detection (CAD) systems could be very useful for breast cancer control. In this paper, we report a method to enhance microcalcifications cluster in digital mammograms. A Fuzzy Logic clustering algorithm with a set of features is used for clustering microcalcifications. The method described was tested on simulated clusters of microcalcifications, so that the location of the cluster within the breast and the exact number of microcalcifications is known.
Parallelized Clustering of Protein Structures on CUDA-Enabled GPUs
2014
Estimation of the pose in which two given molecules might bind together to form a potential complex is a crucial task in structural biology. To solve this so-called "docking problem", most algorithms initially generate large numbers of candidate poses (or decoys) which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates ranges from thousands to millions, performing the clustering on standard CPUs is highly time consuming. In this paper we analyze and evaluate different approaches to parallelize the nearest neighbor chain algorithm to perform hierarchical Ward clustering of protein structures usin…
Cellular automata and urban development simulation : a transition rules creation process based on statistical analysis
2015
National audience; Nowadays land use evolution study has become a major stake in urban planning. The main focus is to understand the way in which land use evolves across time and to understand processes that take place. This understanding would allow to plan urban developments based on a knowledge as complete as possible covering as many fields as possible (i.e. urban planning, politics, sociology, etc.). Simulation tools can be used to merge and display different points of view and stakes from different stakeholders (Parrott & Meyer, 2012).
Efficient unsupervised clustering for spatial bird population analysis along the Loire river
2015
International audience; This paper focuses on application and comparison of Non Linear Dimensionality Reduction (NLDR) methods on natural high dimensional bird communities dataset along the Loire River (France). In this context, biologists usually use the well-known PCA in order to explain the upstream-downstream gradient.Unfortunately this method was unsuccessful on this kind of nonlinear dataset.The goal of this paper is to compare recent NLDR methods coupled with different data transformations in order to find out the best approach. Results show that Multiscale Jensen-Shannon Embedding (Ms JSE) outperform all over methods in this context.
SMART: Unique splitting-while-merging framework for gene clustering
2014
© 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named "splitting merging awareness tactics" (SMART), which does not require any a priori knowledge of either the number …
The on-line curvilinear component analysis (onCCA) for real-time data reduction
2015
Real time pattern recognition applications often deal with high dimensional data, which require a data reduction step which is only performed offline. However, this loses the possibility of adaption to a changing environment. This is also true for other applications different from pattern recognition, like data visualization for input inspection. Only linear projections, like the principal component analysis, can work in real time by using iterative algorithms while all known nonlinear techniques cannot be implemented in such a way and actually always work on the whole database at each epoch. Among these nonlinear tools, the Curvilinear Component Analysis (CCA), which is a non-convex techni…