Search results for "clustering"

showing 10 items of 446 documents

The Three Steps of Clustering In The Post-Genomic Era

2013

This chapter descibes the basic algorithmic components that are involved in clustering, with particular attention to classification of microarray data.

Clustering high-dimensional dataSettore INF/01 - Informaticabusiness.industryCorrelation clusteringPattern recognitioncomputer.software_genreBiclusteringCURE data clustering algorithmClustering Classification Biological Data MiningConsensus clusteringArtificial intelligenceData miningbusinessCluster analysiscomputerMathematics

researchProduct

A Feature Set Decomposition Method for the Construction of Multi-classifier Systems Trained with High-Dimensional Data

2013

Data mining for the discovery of novel, useful patterns, encounters obstacles when dealing with high-dimensional datasets, which have been documented as the "curse" of dimensionality. A strategy to deal with this issue is the decomposition of the input feature set to build a multi-classifier system. Standalone decomposition methods are rare and generally based on random selection. We propose a decomposition method which uses information theory tools to arrange input features into uncorrelated and relevant subsets. Experimental results show how this approach significantly outperforms three baseline decomposition methods, in terms of classification accuracy.

Clustering high-dimensional databusiness.industryComputer sciencePattern recognitionInformation theorycomputer.software_genreUncorrelatedDecomposition method (queueing theory)Data miningArtificial intelligencebusinessFeature setcomputerClassifier (UML)Curse of dimensionality

researchProduct

Regularized Regression Incorporating Network Information: Simultaneous Estimation of Covariate Coefficients and Connection Signs

2014

We develop an algorithm that incorporates network information into regression settings. It simultaneously estimates the covariate coefficients and the signs of the network connections (i.e. whether the connections are of an activating or of a repressing type). For the coefficient estimation steps an additional penalty is set on top of the lasso penalty, similarly to Li and Li (2008). We develop a fast implementation for the new method based on coordinate descent. Furthermore, we show how the new methods can be applied to time-to-event data. The new method yields good results in simulation studies concerning sensitivity and specificity of non-zero covariate coefficients, estimation of networ…

Clustering high-dimensional databusiness.industryjel:C41jel:C13Machine learningcomputer.software_genreRegressionhigh-dimensional data gene expression data pathway information penalized regressionConnection (mathematics)Set (abstract data type)Lasso (statistics)CovariateArtificial intelligenceSensitivity (control systems)businessCoordinate descentAlgorithmcomputerMathematics

researchProduct

Incrementally Assessing Cluster Tendencies with a~Maximum Variance Cluster Algorithm

2003

A straightforward and efficient way to discover clustering tendencies in data using a recently proposed Maximum Variance Clustering algorithm is proposed. The approach shares the benefits of the plain clustering algorithm with regard to other approaches for clustering. Experiments using both synthetic and real data have been performed in order to evaluate the differences between the proposed methodology and the plain use of the Maximum Variance algorithm. According to the results obtained, the proposal constitutes an efficient and accurate alternative.

Clustering high-dimensional datak-medoidsComputer scienceCURE data clustering algorithmSingle-linkage clusteringCanopy clustering algorithmVariance (accounting)Data miningCluster analysiscomputer.software_genrecomputerk-medians clustering

researchProduct

Advanced Indexing Schema for Imaging Applications: Three-Case Studies

2007

Clustering List of Clusters Antipole Clustering TSVQ AESA Range Search K-nearest-neighbor Search Texture Synthesis Image Colorization Super-Resolution

researchProduct

Bayesian versus data driven model selection for microarray data

2014

Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. A…

Clustering Model selection Bayesian information criterion Akaike information criterion Minimum message length BioinformaticsSettore INF/01 - InformaticaComputer sciencebusiness.industryModel selectionBayesian probabilitycomputer.software_genreMachine learningComputer Science ApplicationsData-drivenDetermining the number of clusters in a data setIdentification (information)Bayesian information criterionData miningArtificial intelligenceAkaike information criterionCluster analysisbusinesscomputer

researchProduct

Project Management Information Systems (PMISs): A Statistical-Based Analysis for the Evaluation of Software Packages Features

2021

Project Managers (PMs) working in competitive markets are finding Project Management Information Systems (PMISs) useful for planning, organizing and controlling projects of varying complexity. A wide variety of PMIS software is available, suitable for projects differing in scope and user needs. This paper identifies the most useful features found in PMISs. An extensive literature review and analysis of commercial software is made to identify the main features of PMISs. Afterwards, the list is reduced by a panel of project management experts, and a statistical analysis is performed on data acquired by means of two different surveys. The relative importance of listed features is properly comp…

Clustering; Conjoint analysis; Design of Experiment (DoE); Project Management Information System (PMIS); Ranking method; Surveyranking methodTechnologyComputer scienceQH301-705.5QC1-999SoftwareSettore ING-IND/17 - Impianti Industriali MeccaniciGeneral Materials SciencesurveyProject managementBiology (General)Cluster analysisInstrumentationQD1-999Fluid Flow and Transfer ProcessesCommercial softwareScope (project management)business.industryProcess Chemistry and TechnologyTPhysicsGeneral EngineeringProject Management Information System (PMIS); survey; Design of Experiment (DoE); conjoint analysis; ranking method; clusteringClustering Conjoint analysis Design of Experiment (DoE) Project Management Information System (PMIS) Ranking method SurveyProject Management Information System (PMIS)Engineering (General). Civil engineering (General)Data scienceDesign of Experiment (DoE)Computer Science ApplicationsConjoint analysisVariety (cybernetics)ChemistryRespondentconjoint analysisTA1-2040businessclusteringApplied Sciences; Volume 11; Issue 23; Pages: 11233

researchProduct

A Greedy Algorithm for Hierarchical Complete Linkage Clustering

2014

We are interested in the greedy method to compute an hierarchical complete linkage clustering. There are two known methods for this problem, one having a running time of ${\mathcal O}(n^3)$ with a space requirement of ${\mathcal O}(n)$ and one having a running time of ${\mathcal O}(n^2 \log n)$ with a space requirement of Θ(n 2), where n is the number of points to be clustered. Both methods are not capable to handle large point sets. In this paper, we give an algorithm with a space requirement of ${\mathcal O}(n)$ which is able to cluster one million points in a day on current commodity hardware.

CombinatoricsCURE data clustering algorithmSUBCLUNearest-neighbor chain algorithmCorrelation clusteringSingle-linkage clusteringHierarchical clustering of networksGreedy algorithmComplete-linkage clusteringMathematics

researchProduct

Balanced Words Having Simple Burrows-Wheeler Transform

2009

The investigation of the "clustering effect" of the Burrows-Wheeler transform (BWT) leads to study the words having simple BWT , i.e. words w over an ordered alphabet $A=\{a_1,a_2,\ldots,a_k\}$, with $a_1 < a_2 < \ldots <a_k$, such that $bwt(w)$ is of the form $a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_1^{n_1}$, for some non-negative integers $n_1, n_2, \ldots, n_k$. We remark that, in the case of binary alphabets, there is an equivalence between words having simple BWT, the family of (circular) balanced words and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence between these notions. As a main result of this paper we prove that, u…

CombinatoricsConjugacy classClustering effectBurrows–Wheeler transformSettore INF/01 - InformaticaBurrows Wheeler Transform Combinatorics on Words Balanced sequences epistandard rich words words having simple BWTBinary numberBurrows-Wheeler TransformAlphabetBinary alphabetBurrows-Wheeler Transform; Clustering effectMathematics

researchProduct

Internacionalización de empresas a través de clústeres: análisis bibliométrico de palabras clave de 152 publicaciones destacadas en el período 2009-2…

2022

[EN] While countless studies on the role of clusters in regional economic developments and business performance have been done, some disadvantages and limitations also have been identified. Limitations such as small local markets, limited resources, isolation, and over-independence which lead companies to a lock-in state regarding knowledge and innovation can be solved by means of internationalization or foreign market expansion. Therefore, the internationalization of clusters still needs more attention. Furthermore, by conducting a bibliometric study based on the keywords from previous research, this investigation intends to identify the principal and most influential items, their relation…

Commercemapeo bibliométricoF06UNESCO::CIENCIAS ECONÓMICASSciMATR01HF1-6182internacionalizaciónanálisis de palabras clavebibliometric mappingSJRkeyword analysisinternationalizationclustering

researchProduct