Search results for "CLUSTER ANALYSIS"

showing 10 items of 848 documents

SparseHC: A Memory-efficient Online Hierarchical Clustering Algorithm

2014

Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance to this research area. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted and possibly sparse distance matrix chunk-by-chunk. Meanwhile, a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The k…

sparse matrixClustering high-dimensional dataTheoretical computer scienceonline algorithmsComputer scienceSingle-linkage clusteringComplete-linkage clusteringNearest-neighbor chain algorithmConsensus clusteringmemory-efficient clusteringCluster analysisk-medians clusteringGeneral Environmental ScienceSparse matrix:Engineering::Computer science and engineering [DRNTU]k-medoidsDendrogramConstrained clusteringHierarchical clusteringDistance matrixCanopy clustering algorithmGeneral Earth and Planetary SciencesFLAME clusteringHierarchical clustering of networkshierarchical clusteringAlgorithmProcedia Computer Science
researchProduct

An Examination of Tourist Arrivals Dynamics Using Short-Term Time Series Data: A Space—Time Cluster Approach

2013

The purpose of this study is to examine the development of Italian tourist areas ( circoscrizioni turistiche) through a cluster analysis of short time series. The technique is an adaptation of the functional data analysis approach developed by Abraham et al (2003), which combines spline interpolation with k-means clustering. The findings indicate the presence of two patterns (increasing and stable) averagely characterizing groups of territories. Moreover, tests of spatial contiguity suggest the presence of ‘space–time clusters’; that is, areas in the same ‘time cluster’ are also spatially contiguous. These findings appear to be more robust in particular for those series characterized by an…

spline interpolationjoin count testSeries (mathematics)Computer scienceSpace timeGeography Planning and Developmentk-means clusteringcluster analysis; short time series; spline interpolation; K-means; join count test; Italian tourist areasFunctional data analysisjel:C21jel:C22jel:C38jel:C14jel:L83K-meanshort time serieContiguity (probability theory)Tourism Leisure and Hospitality Managementcluster analysiItalian tourist areasEconometricsCluster (physics)Settore SECS-S/05 - Statistica SocialeSpline interpolationCluster analysisTourism Economics
researchProduct

Measuring galaxy segregation with the mark connection function

2010

(abridged) The clustering properties of galaxies belonging to different luminosity ranges or having different morphological types are different. These characteristics or `marks' permit to understand the galaxy catalogs that carry all this information as realizations of marked point processes. Many attempts have been presented to quantify the dependence of the clustering of galaxies on their inner properties. The present paper summarizes methods on spatial marked statistics used in cosmology to disentangle luminosity, colour or morphological segregation and introduces a new one in this context, the mark connection function. The methods used here are the partial correlation functions, includi…

statistical [Methods]Spatial correlationCosmology and Nongalactic Astrophysics (astro-ph.CO)Large-scale structure of UniversePopulationFOS: Physical sciencesContext (language use)AstrophysicsAstrophysics::Cosmology and Extragalactic AstrophysicsCorrelation function (astronomy)UNESCO::ASTRONOMÍA Y ASTROFÍSICAUNESCO::ASTRONOMÍA Y ASTROFÍSICA::Otras especialidades astronómicasdata analysis [Methods]educationCluster analysisPartial correlationPhysicseducation.field_of_studyAstronomy and AstrophysicsFunction (mathematics)GalaxyLarge-scale structure of Universe; Methods : data analysis; Methods : statisticalSpace and Planetary Science:ASTRONOMÍA Y ASTROFÍSICA [UNESCO]:ASTRONOMÍA Y ASTROFÍSICA::Otras especialidades astronómicas [UNESCO]Astrophysics - Cosmology and Nongalactic Astrophysics
researchProduct

Adaptive framework for network traffic classification using dimensionality reduction and clustering

2012

Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …

ta113Computer scienceNetwork securitybusiness.industryDimensionality reductionintrusion detectionk-meansdiffusion mapServer logcomputer.software_genreanomaly detectionTraffic classificationkoneoppiminenWeb log analysis softwareAnomaly detectionData miningWeb servicetiedonlouhintaCluster analysisbusinesscomputern-grams
researchProduct

A novel heuristic memetic clustering algorithm

2013

In this paper we introduce a novel clustering algorithm based on the Memetic Algorithm meta-heuristic wherein clusters are iteratively evolved using a novel single operator employing a combination of heuristics. Several heuristics are described and employed for the three types of selections used in the operator. The algorithm was exhaustively tested on three benchmark problems and compared to a classical clustering algorithm (k-Medoids) using the same performance metrics. The results show that our clustering algorithm consistently provides better clustering solutions with less computational effort.

ta113Determining the number of clusters in a data setBiclusteringClustering high-dimensional dataDBSCANComputingMethodologies_PATTERNRECOGNITIONTheoretical computer scienceCURE data clustering algorithmCorrelation clusteringCanopy clustering algorithmCluster analysisAlgorithmMathematics2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
researchProduct

Gear classification and fault detection using a diffusion map framework

2015

This article proposes a system health monitoring approach that detects abnormal behavior of machines. Diffusion map is used to reduce the dimensionality of training data, which facilitates the classification of newly arriving measurements. The new measurements are handled with Nyström extension. The method is trained and tested with real gear monitoring data from several windmill parks. A machine health index is proposed, showing that data recordings can be classified as working or failing using dimensionality reduction and warning levels in the low dimensional space. The proposed approach can be used with any system that produces high-dimensional measurement data. peerReviewed

ta113Diffusion (acoustics)Training setta214Computer scienceDimensionality reductiondiffusion mapExtension (predicate logic)computer.software_genreFault detection and isolationfault detectionsystem health monitoringArtificial IntelligenceSignal ProcessingComputer Vision and Pattern RecognitionData miningCluster analysiscomputerSoftwareCurse of dimensionalityclustering
researchProduct

Twister Tries

2015

Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only lin…

ta113Hierarchical agglomerative clusteringta112Fuzzy clusteringBrown clusteringComputer scienceSingle-linkage clusteringcomputer.software_genreHierarchical clusteringLocality-sensitive hashingData setCURE data clustering algorithmlocality-sensitive hashingaverage linkageData miningHierarchical clustering of networkslinear complexityCluster analysishierarchical clusteringAlgorithmcomputerTime complexityProceedings of the 2015 ACM SIGMOD International Conference on Management of Data
researchProduct

A Hybrid Multigroup Coclustering Recommendation Framework Based on Information Fusion

2015

Collaborative Filtering (CF) is one of the most successful algorithms in recommender systems. However, it suffers from data sparsity and scalability problems. Although many clustering techniques have been incorporated to alleviate these two problems, most of them fail to achieve further significant improvement in recommendation accuracy. First of all, most of them assume each user or item belongs to a single cluster. Since usually users can hold multiple interests and items may belong to multiple categories, it is more reasonable to assume that users and items can join multiple clusters (groups), where each cluster is a subset of like-minded users and items they prefer. Furthermore, most of…

ta113Information retrievalComputer sciencebusiness.industrydata miningRecommender systemcomputer.software_genreTheoretical Computer ScienceInformation fusionKnowledge baseArtificial IntelligenceCollaborative FilteringScalabilityCluster (physics)Collaborative filteringLearning to rankData miningrecommender systemsCluster analysisbusinesscomputercluster analysisACM Transactions on Intelligent Systems and Technology
researchProduct

Cluster-Based RF Fingerprint Positioning Using LTE and WLAN Outdoor Signals

2015

In this paper we evaluate user-equipment (UE) positioning performance of three cluster-based RF fingerprinting methods using LTE and WLAN signals. Real-life LTE and WLAN data were collected for the evaluation purpose using consumer cellular-mobile handset utilizing ‘Nemo Handy’ drive test software tool. Test results of cluster-based methods were compared to the conventional grid-based RF fingerprinting. The cluster-based methods do not require grid-cell layout and training signature formation as compared to the gridbased method. They utilize LTE cell-ID searching technique to reduce the search space for clustering operation. Thus UE position estimation is done in short time with less comput…

ta113PercentileK-nearest neighborComputer sciencebusiness.industrycell-IDFingerprint (computing)Real-time computingFingerprint recognitionGridHandsetlaw.inventionminimization of drive testsEuclidean distanceLTElawEmbedded systemgrid-based RF fingerprintingRadio frequencyCluster analysisbusinessfuzzy C-meanshierarchical clustering
researchProduct

BioImageXD: an open, general-purpose and high-throughput image-processing platform

2012

BioImageXD puts open-source computer science tools for three-dimensional visualization and analysis into the hands of all researchers, through a user-friendly graphical interface tuned to the needs of biologists. BioImageXD has no restrictive licenses or undisclosed algorithms and enables publication of precise, reproducible and modifiable workflows. It allows simple construction of processing pipelines and should enable biologists to perform challenging analyses of complex processes. We demonstrate its performance in a study of integrin clustering in response to selected inhibitors.

ta113SIMPLE (military communications protocol)Computer sciencebusiness.industryta1182Computational BiologyImage processingCell BiologyBioinformaticsBiochemistryVisualizationHigh-Throughput Screening AssaysUser-Computer InterfaceSoftwareWorkflowImaging Three-DimensionalHuman–computer interactionbusinessCluster analysisMolecular BiologyThroughput (business)AlgorithmsSoftwareBiotechnologyGraphical user interfaceNATURE METHODS
researchProduct