Search results for "Clustering"
showing 10 items of 446 documents
The Three Steps of Clustering In The Post-Genomic Era
2013
This chapter descibes the basic algorithmic components that are involved in clustering, with particular attention to classification of microarray data.
A Feature Set Decomposition Method for the Construction of Multi-classifier Systems Trained with High-Dimensional Data
2013
Data mining for the discovery of novel, useful patterns, encounters obstacles when dealing with high-dimensional datasets, which have been documented as the "curse" of dimensionality. A strategy to deal with this issue is the decomposition of the input feature set to build a multi-classifier system. Standalone decomposition methods are rare and generally based on random selection. We propose a decomposition method which uses information theory tools to arrange input features into uncorrelated and relevant subsets. Experimental results show how this approach significantly outperforms three baseline decomposition methods, in terms of classification accuracy.
Regularized Regression Incorporating Network Information: Simultaneous Estimation of Covariate Coefficients and Connection Signs
2014
We develop an algorithm that incorporates network information into regression settings. It simultaneously estimates the covariate coefficients and the signs of the network connections (i.e. whether the connections are of an activating or of a repressing type). For the coefficient estimation steps an additional penalty is set on top of the lasso penalty, similarly to Li and Li (2008). We develop a fast implementation for the new method based on coordinate descent. Furthermore, we show how the new methods can be applied to time-to-event data. The new method yields good results in simulation studies concerning sensitivity and specificity of non-zero covariate coefficients, estimation of networ…
Incrementally Assessing Cluster Tendencies with a~Maximum Variance Cluster Algorithm
2003
A straightforward and efficient way to discover clustering tendencies in data using a recently proposed Maximum Variance Clustering algorithm is proposed. The approach shares the benefits of the plain clustering algorithm with regard to other approaches for clustering. Experiments using both synthetic and real data have been performed in order to evaluate the differences between the proposed methodology and the plain use of the Maximum Variance algorithm. According to the results obtained, the proposal constitutes an efficient and accurate alternative.
Advanced Indexing Schema for Imaging Applications: Three-Case Studies
2007
Bayesian versus data driven model selection for microarray data
2014
Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. A…
Project Management Information Systems (PMISs): A Statistical-Based Analysis for the Evaluation of Software Packages Features
2021
Project Managers (PMs) working in competitive markets are finding Project Management Information Systems (PMISs) useful for planning, organizing and controlling projects of varying complexity. A wide variety of PMIS software is available, suitable for projects differing in scope and user needs. This paper identifies the most useful features found in PMISs. An extensive literature review and analysis of commercial software is made to identify the main features of PMISs. Afterwards, the list is reduced by a panel of project management experts, and a statistical analysis is performed on data acquired by means of two different surveys. The relative importance of listed features is properly comp…
A Greedy Algorithm for Hierarchical Complete Linkage Clustering
2014
We are interested in the greedy method to compute an hierarchical complete linkage clustering. There are two known methods for this problem, one having a running time of \({\mathcal O}(n^3)\) with a space requirement of \({\mathcal O}(n)\) and one having a running time of \({\mathcal O}(n^2 \log n)\) with a space requirement of Θ(n 2), where n is the number of points to be clustered. Both methods are not capable to handle large point sets. In this paper, we give an algorithm with a space requirement of \({\mathcal O}(n)\) which is able to cluster one million points in a day on current commodity hardware.
Balanced Words Having Simple Burrows-Wheeler Transform
2009
The investigation of the "clustering effect" of the Burrows-Wheeler transform (BWT) leads to study the words having simple BWT , i.e. words w over an ordered alphabet $A=\{a_1,a_2,\ldots,a_k\}$, with $a_1 < a_2 < \ldots <a_k$, such that $bwt(w)$ is of the form $a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_1^{n_1}$, for some non-negative integers $n_1, n_2, \ldots, n_k$. We remark that, in the case of binary alphabets, there is an equivalence between words having simple BWT, the family of (circular) balanced words and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence between these notions. As a main result of this paper we prove that, u…
Internacionalización de empresas a través de clústeres: análisis bibliométrico de palabras clave de 152 publicaciones destacadas en el período 2009-2…
2022
[EN] While countless studies on the role of clusters in regional economic developments and business performance have been done, some disadvantages and limitations also have been identified. Limitations such as small local markets, limited resources, isolation, and over-independence which lead companies to a lock-in state regarding knowledge and innovation can be solved by means of internationalization or foreign market expansion. Therefore, the internationalization of clusters still needs more attention. Furthermore, by conducting a bibliometric study based on the keywords from previous research, this investigation intends to identify the principal and most influential items, their relation…