Search results for "clustering"

showing 10 items of 446 documents

DBSCAN Algorithm for Document Clustering

2019

Abstract Document clustering is a problem of automatically grouping similar document into categories based on some similarity metrics. Almost all available data, usually on the web, are unclassified so we need powerful clustering algorithms that work with these types of data. All common search engines return a list of pages relevant to the user query. This list needs to be generated fast and as correct as possible. For this type of problems, because the web pages are unclassified, we need powerful clustering algorithms. In this paper we present a clustering algorithm called DBSCAN – Density-Based Spatial Clustering of Applications with Noise – and its limitations on documents (or web pages)…

DBSCANInformation retrievalSimilarity (network science)Computer scienceWeb pageFeature selectionDocument clusteringCluster analysisData typeWord (computer architecture)International Journal of Advanced Statistics and IT&C for Economics and Life Sciences
researchProduct

Dimensionality Reduction Techniques: An Operational Comparison On Multispectral Satellite Images Using Unsupervised Clustering

2006

Multispectral satellite imagery provides us with useful but redundant datasets. Using Dimensionality Reduction (DR) algorithms, these datasets can be made easier to explore and to use. We present in this study an objective comparison of five DR methods, by evaluating their capacity to provide a usable input to the K-means clustering algorithm. We also suggest a method to automatically find a suitable number of classes K, using objective "cluster validity indexes" over a range of values for K. Ten Landsat images have been processed, yielding a classification rate in the 70-80% range. Our results also show that classical linear methods, though slightly outperformed by more recent nonlinear al…

Data processingContextual image classificationPixelbusiness.industryComputer scienceDimensionality reductionMultispectral imagek-means clusteringUnsupervised learningPattern recognitionArtificial intelligencebusinessCluster analysisProceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006
researchProduct

A New Approach to Investigate Students’ Behavior by Using Cluster Analysis as an Unsupervised Methodology in the Field of Education

2016

The problem of taking a set of data and separating it into subgroups where the ele- ments of each subgroup are more similar to each other than they are to elements not in the subgroup has been extensively studied through the statistical method of cluster analysis. In this paper we want to discuss the application of this method to the field of education: particularly, we want to present the use of cluster analysis to separate students into groups that can be recognized and characterized by common traits in their answers to a questionnaire, without any prior knowledge of what form those groups would take (unsupervised classification). We start from a detailed study of the data processing need…

Data processingPoint (typography)business.industrySettore FIS/08 - Didattica E Storia Della Fisica020208 electrical & electronic engineering05 social sciences050301 educationSample (statistics)02 engineering and technologyGeneral Medicinecomputer.software_genreDisease clusterField (computer science)Hierarchical clusteringSet (abstract data type)Quantitative analysis (finance)Education Unsupervised Methods Hierarchical Clustering Not-Hierarchical Clustering Quantitative Analysis0202 electrical engineering electronic engineering information engineeringArtificial intelligenceData miningbusiness0503 educationcomputerNatural language processingMathematics
researchProduct

Environmental Data Processing by Clustering Methods for Energy Forecast and Planning

2011

This paper presents a statistical approach based on the k-means clustering technique to manage environmental sampled data to evaluate and to forecast of the energy deliverable by different renewable sources in a given site. In particular, wind speed and solar irradiance sampled data are studied in association to the energy capability of a wind generator and a photovoltaic (PV) plant, respectively. The proposed method allows the sub-sets of useful data, describing the energy capability of a site, to be extracted from a set of experimental observations belonging the considered site. The data collection is performed in Sicily, in the south of Italy, as case study. As far as the wind generation…

Data processingWind powerRenewable Energy Sustainability and the Environmentbusiness.industryComputer sciencePhotovoltaic systemcomputer.software_genreWind speedRenewable energyWind energy; Photovoltaic energy; Distributed generation; Statistical methods; Data processing; ClusteringDistributed generationData miningCluster analysisbusinessTelecommunicationscomputerEnergy (signal processing)
researchProduct

Hierarchically nested factor model from multivariate data

2005

We show how to achieve a statistical description of the hierarchical structure of a multivariate data set. Specifically we show that the similarity matrix resulting from a hierarchical clustering procedure is the correlation matrix of a factor model, the hierarchically nested factor model. In this model, factors are mutually independent and hierarchically organized. Finally, we use a bootstrap based procedure to reduce the number of factors in the model with the aim of retaining only those factors significantly robust with respect to the statistical uncertainty due to the finite length of data records.

Data recordsStructure (mathematical logic)Multivariate statisticsCovariance matrixFinance commerce hierarchical structureGeneral Physics and AstronomySimilarity matrixFOS: Physical sciencesDisordered Systems and Neural Networks (cond-mat.dis-nn)Condensed Matter - Disordered Systems and Neural Networkscomputer.software_genreHierarchical clusteringCondensed Matter - Other Condensed MatterSet (abstract data type)Factor (programming language)Data miningcomputerMathematicscomputer.programming_languageOther Condensed Matter (cond-mat.other)
researchProduct

Prototype-based learning on concept-drifting data streams

2014

Data stream mining has gained growing attentions due to its wide emerging applications such as target marketing, email filtering and network intrusion detection. In this paper, we propose a prototype-based classification model for evolving data streams, called SyncStream, which dynamically models time-changing concepts and makes predictions in a local fashion. Instead of learning a single model on a sliding window or ensemble learning, SyncStream captures evolving concepts by dynamically maintaining a set of prototypes in a new data structure called the P-tree. The prototypes are obtained by error-driven representativeness learning and synchronization-inspired constrained clustering. To ide…

Data streamConcept driftbusiness.industryComputer scienceData stream miningConstrained clusteringcomputer.software_genreData structureMachine learningEnsemble learningSynchronization (computer science)Data miningArtificial intelligencebusinesscomputerProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
researchProduct

Fuzzy technique for microcalcifications clustering in digital mammograms

2012

Abstract Background Mammography has established itself as the most efficient technique for the identification of the pathological breast lesions. Among the various types of lesions, microcalcifications are the most difficult to identify since they are quite small (0.1-1.0 mm) and often poorly contrasted against an images background. Within this context, the Computer Aided Detection (CAD) systems could turn out to be very useful in breast cancer control. Methods In this paper we present a potentially powerful microcalcifications cluster enhancement method applicable to digital mammograms. The segmentation phase employs a form filter, obtained from LoG filter, to overcome the dependence from …

Databases FactualMicrocalcificationsBreast NeoplasmsContext (language use)CADcomputer.software_genreSensitivity and SpecificityFuzzy logicClusteringBreast cancerSegmentationBreast cancerC-meansImage Processing Computer-AssistedmedicineCluster AnalysisHumansMammographyRadiology Nuclear Medicine and imagingSegmentationCluster analysisSpatial filtersmedicine.diagnostic_testMultimediabusiness.industryCalcinosisPattern recognitionmedicine.diseaseSettore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)Computer aided detectionFuzzy logicRadiology Nuclear Medicine and imagingFemaleArtificial intelligencebusinesscomputerAlgorithmsMammographyResearch ArticleBreast cancer Microcalcifications Spatial filters Clustering Fuzzy logic C-means Mammography SegmentationBMC Medical Imaging
researchProduct

Multispectral imaging and its use for face recognition : sensory data enhancement

2015

In this thesis, we focus on multispectral image for face recognition. With such application,the quality of the image is an important factor that affects the accuracy of therecognition. However, the sensory data are in general corrupted by noise. Thus, wepropose several denoising algorithms that are able to ensure a good tradeoff betweennoise removal and details preservation. Furthermore, characterizing regions and detailsof the face can improve recognition. We focus also in this thesis on multispectral imagesegmentation particularly clustering techniques and cluster analysis. The effectiveness ofthe proposed algorithms is illustrated by comparing them with state-of-the-art methodsusing both…

DenoisingCluster analysisSegmentationAnalyse de clustering débruitage[INFO.INFO-TI] Computer Science [cs]/Image Processing [eess.IV]Amélioration des données sensoriellesSensory data enhancementMultispectral imageImage multispectrale
researchProduct

Statistical analysis techniques for Partial Discharges measurement under DC voltage

2022

Partial discharges (PD) phenomenon for HVAC (High-Voltage-Alternating-Current) systems has already been widely studied in literature and from this it has been possible to produce a regulatory document that provides indications for description and identification of the different types of discharges. However, the growing diffusion of HVDC (High-Voltage-Direct-Current) systems makes necessary the development of analysis techniques also for DC case. In this paper data obtained from PD measurement under DC voltage were analyzed through the time-frequency map combined with a density-based clustering algorithm. The results show that, with this approach, it's possible to perform a noise rejection a…

Density-Based ClusteringSettore ING-IND/31 - ElettrotecnicaHVDCPartial Discharge (PD)Time-Frequency map2022 IEEE 4th International Conference on Dielectrics (ICD)
researchProduct

Space-Time FPCA Clustering of Multidimensional Curves.

2018

In this paper we focus on finding clusters of multidimensional curves with spatio-temporal structure, applying a variant of a k-means algorithm based on the principal component rotation of data. The main advantage of this approach is to combine the clustering functional analysis of the multidimensional data, with smoothing methods based on generalized additive models, that cope with both the spatial and the temporal variability, and with functional principal components that takes into account the dependency between the curves.

Dependency (UML)Computer sciencebusiness.industryClustering of multidimensional curves GAM Spatio-temporal patternSpace timeGeneralized additive modelPattern recognition010502 geochemistry & geophysics01 natural sciences010104 statistics & probabilityPrincipal component analysisArtificial intelligence0101 mathematicsCluster analysisbusinessFocus (optics)Settore SECS-S/01 - StatisticaRotation (mathematics)Smoothing0105 earth and related environmental sciences
researchProduct