Bot recognition in a Web store: An approach based on unsupervised learning

6533b85bfe1ef96bd12bb447

RESEARCH PRODUCT

Bot recognition in a Web store: An approach based on unsupervised learning

Francesco Masulli Grażyna Suchacka Stefano Rovetta

subject

Unsupervised classification Web bot detection Computer Networks and Communications Computer science Internet robot 02 engineering and technology Machine learning computer.software_genre Web traffic Web server Machine learning 0202 electrical engineering electronic engineering information engineering Artificial neural network business.industry Supervised learning 020206 networking & telecommunications Perceptron Web application security Web bot Computer Science Applications Support vector machine Generative model ComputingMethodologies_PATTERNRECOGNITION Hardware and Architecture Supervised classification Unsupervised learning 020201 artificial intelligence & image processing Artificial intelligence business computer

description

Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning strategy that decouples modelling the data from labelling them. Its efficiency is evaluated through experiments on real e-commerce data, in realistic conditions, and compared to that of supervised learning classifiers (a multi-layer perceptron neural network and a support vector machine). Results demonstrate that the classification based on unsupervised learning is very efficient, achieving a similar performance level as the fully supervised classification. This is an experimental indication that the bot recognition problem can be successfully dealt with using methods that are less sensitive to mislabelled data or missing labels. A very small fraction of sessions remain misclassified in both cases, so an in-depth analysis of misclassified samples was also performed. This analysis exposed the superiority of the proposed approach which was able to correctly recognize more bots, in fact, and identified more camouflaged agents, that had been erroneously labelled as humans.

year	journal	country	edition	language
2020-05-01

10.1016/j.jnca.2020.102577 https://hdl.handle.net/11567/999600