Search results for "K-means"
showing 10 items of 43 documents
Improving clustering of Web bot and human sessions by applying Principal Component Analysis
2019
View references (18) The paper addresses the problem of modeling Web sessions of bots and legitimate users (humans) as feature vectors for their use at the input of classification models. So far many different features to discriminate bots’ and humans’ navigational patterns have been considered in session models but very few studies were devoted to feature selection and dimensionality reduction in the context of bot detection. We propose applying Principal Component Analysis (PCA) to develop improved session models based on predictor variables being efficient discriminants of Web bots. The proposed models are used in session clustering, whose performance is evaluated in terms of the purity …
The Hydrothermal System of Solfatara Crater (Campi Flegrei, Italy) Inferred From Machine Learning Algorithms
2019
Two machine learning algorithms were applied to three multivariate datasets acquired at Solfatara volcano. Our aim was to find an unbiased and coherent synthesis among the large amount of data acquired within the crater and along two orthogonal vertical NNE- and WNW-trending cross-sections. The first algorithm includes a new approach for a soft K-means clustering based on the use of the silhouette index to control the color palette of the clusters. The second algorithm which uses the self-organizing maps incorporates an alternative method for choosing the number of nodes of the neural network which aims to avoid the need for downstream clustering of the results of the classification. Both m…
Combined Elephant Herding Optimization Algorithm with K-means for Data Clustering
2018
Clustering is an important task in machine learning and data mining. Due to various applications that use clustering, numerous clustering methods were proposed. One well-known, simple, and widely used clustering algorithm is k-means. The main problem of this algorithm is its tendency of getting trapped into local minimum because it does not have any kind of global search. Clustering is a hard optimization problem, and swarm intelligence stochastic optimization algorithms are proved to be successful for such tasks. In this paper, we propose recent swarm intelligence elephant herding optimization algorithm for data clustering. Local search of the elephant herding optimization algorithm was im…
Forms and Functions of the Real Estate Market of Palermo (Italy). Science and Knowledge in the Cluster Analysis Approach
2016
The analysis of the housing market of a city requires suitable approaches and tools, such as data mining models, to represent its complexity which derives on many elements, e.g. the type of capital asset-house is a common good and an investment good as well, the heterogeneity of the urban areas—each of them has own historical and representative values and different urban functions—and the variability of building quality. The housing market of the most densely populated area of Palermo (Italy), corresponding to ten districts, is analyzed to verify the degree of its inner homogeneity and the relations between the quality of the characteristics and the price of the properties. Five hundred set…
A Clustering approach for profiling LoRaWAN IoT devices
2019
Internet of Things (IoT) devices are starting to play a predominant role in our everyday life. Application systems like Amazon Echo and Google Home allow IoT devices to answer human requests, or trigger some alarms and perform suitable actions. In this scenario, any data information, related device and human interaction are stored in databases and can be used for future analysis and improve the system functionality. Also, IoT information related to the network level (wireless or wired) may be stored in databases and can be processed to improve the technology operation and to detect network anomalies. Acquired data can be also used for profiling operation, in order to group devices according…
Classification of cat ganglion retinal cells and implications for shape-function relationship
2002
This article presents a quantitative approach to ganglion cell classification by considering combinations of several geometrical features including fractal dimension, symmetry, diameter, eccentricity and convex hull. Special attention is given to moment and symmetry-based features. Several combinations of such features are fed to two clustering methods (Ward's hierarchical scheme and K-Means) and the respectively obtained classifications are compared. The results indicate the superiority of some features, also suggesting possible biological implications.
Generic heuristics on GPU to superpixel segmentation and application to optical flow estimation
2020
Finding clusters in point clouds and matching graphs to graphs are recurrent tasks in computer science domain, data analysis, image processing, that are most often modeled as NP-hard optimization problems. With the development and accessibility of cheap multiprocessors, acceleration of the heuristic procedures for these tasks becomes possible and necessary. We propose parallel implantation on GPU (graphics processing unit) system for some generic algorithms applied here to image superpixel segmentation and image optical flow problem. The aim is to provide generic algorithms based on standard decentralized data structures to be easy to improve and customized on many optimization problems and…
Balance Perturbations as a Measurement Tool for Trunk Impairment in Cross-Country Sit Skiing
2018
In cross-country sit-skiing, the trunk plays a crucial role in propulsion generation and balance maintenance. Trunk stability is evaluated by automatic responses to unpredictable perturbations; however, electromyography is challenging. The aim of this study was to identify a measure to group sit-skiers according to their ability to control the trunk. Seated in their competitive sit-ski, 10 male and 5 female Paralympic sit-skiers received 6 forward and 6 backward unpredictable perturbations in random order. k-means clustered trunk position at rest, delay to invert the trunk motion, and trunk range of motion significantly into 2 groups. In conclusion, unpredictable perturbations might quantif…
A pattern recognition approach to identify biological clusters acquired by acoustic multi-beam in Kongsfjorden
2022
The Svalbardsis one of the most intensively studied marine regions in the Artic; here the composition and distribution of marine assemblages are changing under the effect of global change, and marine communities are monitored in order to understand the long-term effects on marine biodiversity. In the present work, acoustic data collected in the Kongsfjorden using multi-beam technology was analyzed to develop a methodology for identifying and classifying 3D acoustic patterns related to fish aggregations. In particular, morphological, energetic and depth features were taken into account to develop a multi-variate classification procedure allowing to discriminate fish species. The results obta…
Adaptive framework for network traffic classification using dimensionality reduction and clustering
2012
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …