Author: Francesco Masulli

0000000000281660

AUTHOR

Francesco Masulli

Feature selection: A multi-objective stochastic optimization approach

The feature subset task can be cast as a multiobjective discrete optimization problem. In this work, we study the search algorithm component of a feature subset selection method. We propose an algorithm based on the threshold accepting method, extended to the multi-objective framework by an appropriate definition of the acceptance rule. The method is used in the task of identifying relevant subsets of features in a Web bot recognition problem, where automated software agents on the Web are identified by analyzing the stream of HTTP requests to a Web server.

research product

Bot or not? a case study on bot recognition from web session logs

This work reports on a study of web usage logs to verify whether it is possible to achieve good recognition rates in the task of distinguishing between human users and automated bots using computational intelligence techniques. Two problem statements are given, offline (for completed sessions) and on-line (for sequences of individual HTTP requests). The former is solved with several standard computational intelligence tools. For the second, a learning version of Wald’s sequential probability ratio test is used.

research product

Word sense disamibiguation combining conceptual distance, frequency and gloss

Word sense disambiguation (WSD) is the process of assigning a meaning to a word based on the context in which it occurs. The absence of sense tagged training data is a real problem for the word sense disambiguation task. We present a method for the resolution of lexical ambiguity which relies on the use of the wide-coverage noun taxonomy of WordNet and the notion of conceptual distance among concepts, captured by a conceptual density formula developed for this purpose. The formula we propose, is a generalised form of the Agirre-Rigau conceptual density measure in which many (parameterised) refinements were introduced and an exhaustive evaluation of all meaningful combinations was performed.…

research product

Online Web Bot Detection Using a Sequential Classification Approach

A significant problem nowadays is detection of Web traffic generated by automatic software agents (Web bots). Some studies have dealt with this task by proposing various approaches to Web traffic classification in order to distinguish the traffic stemming from human users' visits from that generated by bots. Most of previous works addressed the problem of offline bot recognition, based on available information on user sessions completed on a Web server. Very few approaches, however, have been proposed to recognize bots online, before the session completes. This paper proposes a novel approach to binary classification of a multivariate data stream incoming on a Web server, in order to recogn…

research product

A Quantum-Inspired Classifier for Early Web Bot Detection

This paper introduces a novel approach, inspired by the principles of Quantum Computing, to address web bot detection in terms of real-time classification of an incoming data stream of HTTP request headers, in order to ensure the shortest decision time with the highest accuracy. The proposed approach exploits the analogy between the intrinsic correlation of two or more particles and the dependence of each HTTP request on the preceding ones. Starting from the a-posteriori probability of each request to belong to a particular class, it is possible to assign a Qubit state representing a combination of the aforementioned probabilities for all available observations of the time series. By levera…

research product

Bot recognition in a Web store: An approach based on unsupervised learning

Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning stra…

research product

Efficient on-the-fly Web bot detection

Abstract A large fraction of traffic on present-day Web servers is generated by bots — intelligent agents able to traverse the Web and execute various advanced tasks. Since bots’ activity may raise concerns about server security and performance, many studies have investigated traffic features discriminating bots from human visitors and developed methods for automated traffic classification. Very few previous works, however, aim at identifying bots on-the-fly, trying to classify active sessions as early as possible. This paper proposes a novel method for binary classification of streams of Web server requests in order to label each active session as “bot” or “human”. A machine learning appro…

research product