Application of the Information Bottleneck method to discover user profiles in a Web store
The paper deals with the problem of discovering groups of Web users with similar behavioral patterns on an e-commerce site. We introduce a novel approach to the unsupervised classification of user sessions, based on session attributes related to the user click-stream behavior, to gain insight into characteristics of various user profiles. The approach uses the agglomerative Information Bottleneck (IB) algorithm. Based on log data for a real online store, efficiency of the approach in terms of its ability to differentiate between buying and non-buying sessions was validated, indicating some possible practical applications of the our method. Experiments performed for a number of session sampl…
Computer Networks
This book constitutes the thoroughly refereed proceedings of the 25th International Conference on Computer Networks, CN 2018, held in Gliwice, Poland, in June 2018. The 34 full papers presented were carefully reviewed and selected from 86 submissions. They are organized in topical sections on computer networks; teleinformatics and telecommunications; queueing theory; cybersecurity and quality service.
Cost-Oriented Recommendation Model for E-Commerce
Contemporary Web stores offer a wide range of products to e-customers. However, online sales are strongly dominated by a limited number of bestsellers whereas other, less popular or niche products are stored in inventory for a long time. Thus, they contribute to the problem of frozen capital and high inventory costs. To cope with this problem, we propose using information on product cost in a recommender system for a Web store. We discuss the proposed recommendation model, in which two criteria have been included: a predicted degree of meeting customer’s needs by a product and the product cost.
Investigating Long-Range Dependence in E-Commerce Web Traffic
This paper addresses the problem of investigating long-range dependence (LRD) and self-similarity in Web traffic. Popular techniques for estimating the intensity of LRD via the Hurst parameter are presented. Using a set of traces of a popular e-commerce site, the presence and the nature of LRD in Web traffic is examined. Our results confirm the self-similar nature of traffic at a Web server input, however the resulting estimates of the Hurst parameter vary depending on the trace and the technique used.
Time series clustering with different distance measures to tell Web bots and humans apart
The paper deals with the problem of differentiating Web sessions of bots and human users by observing some characteristics of their traffic at the Web server input. We propose an approach to cluster bots’ and humans’ sessions represented as time series. First, sessions are expressed as sequences of HTTP requests coming to the server at specific timestamps; then, they are pre-preprocessed to form time series of limited length. Time series are clustered and the clustering performance is evaluated in terms of the ability to partition bots and humans into separate clusters. The proposed approach is applied to real server log data and validated with the use of different time series distance meas…
Feature selection: A multi-objective stochastic optimization approach
The feature subset task can be cast as a multiobjective discrete optimization problem. In this work, we study the search algorithm component of a feature subset selection method. We propose an algorithm based on the threshold accepting method, extended to the multi-objective framework by an appropriate definition of the acceptance rule. The method is used in the task of identifying relevant subsets of features in a Web bot recognition problem, where automated software agents on the Web are identified by analyzing the stream of HTTP requests to a Web server.
Simulation-Based Performance Study of e-Commerce Web Server System – Results for FIFO Scheduling
The chapter concerns the issue of overloaded Web server performance evaluation using a simulation-based approach. We focus on a Business-to-Consumer (B2C) environment and consider server performance both from the perspective of computer system efficiency and e-business profitability. Results of simulation experiments for the Web server system under First-In-First-Out (FIFO) scheduling are discussed. Much attention has been paid to the analysis of the impact of a limited server system capacity on business-related performance metrics.
Bot or not? a case study on bot recognition from web session logs
This work reports on a study of web usage logs to verify whether it is possible to achieve good recognition rates in the task of distinguishing between human users and automated bots using computational intelligence techniques. Two problem statements are given, offline (for completed sessions) and on-line (for sequences of individual HTTP requests). The former is solved with several standard computational intelligence tools. For the second, a learning version of Wald’s sequential probability ratio test is used.
Web Server Support for e-Customer Loyalty through QoS Differentiation
The paper deals with the problem of offering predictive service in e-commerce Web server systems under overload. Due to unpredictability of Web accesses, such systems often fail to effectively handle peak traffic, which results in long delays and incomplete transactions. As a consequence, online retailers miss an opportunity to attract new customers, retain the loyalty of regular customers, and increase profits. We propose a method for priority-based admission control and scheduling of requests at the Web server system in order to differentiate Quality of Service (QoS) with regard to user-perceived delays, i.e., Web page response times provided by the system (as opposed to HTTP request resp…
Detection of Internet robots using a Bayesian approach
A large part of Web traffic on e-commerce sites is generated not by human users but by Internet robots: search engine crawlers, shopping bots, hacking bots, etc. In practice, not all robots, especially the malicious ones, disclose their identities to a Web server and thus there is a need to develop methods for their detection and identification. This paper proposes the application of a Bayesian approach to robot detection based on characteristics of user sessions. The method is applied to the Web traffic from a real e-commerce site. Results show that the classification model based on the cluster analysis with the Ward's method and the weighted Euclidean metric is very effective in robot det…
Practical Aspects of Log File Analysis for E-Commerce
The paper concerns Web server log file analysis to discover knowledge useful for online retailers. Data for one month of the online bookstore operation was analyzed with respect to the probability of making a purchase by e-customers. Key states and characteristics of user sessions were distinguished and their relations to the session state connected with purchase confirmation were analyzed. Results allow identification of factors increasing the probability of making a purchase in a given Web store and thus, determination of user sessions which are more valuable in terms of e-business profitability. Such results may be then applied in practice, e.g. in a method for personalized or prioritize…
HTTP-level e-commerce data based on server access logs for an online store
Abstract Web server logs have been extensively used as a source of data on the characteristics of Web traffic and users’ navigational patterns. In particular, Web bot detection and online purchase prediction using methods from artificial intelligence (AI) are currently key areas of research. However, in reality, it is hard to obtain logs from actual online stores and there is no common dataset that can be used across different studies. Moreover, there is a lack of studies exploring Web traffic over a longer period of time, due to the unavailability of long-term data from server logs. The need to develop reliable models of Web traffic, Web user navigation, and e-customer behaviour calls for …
An Experiment with Facebook as an Advertising Channel for Books and Audiobooks
The paper addresses the problem of using social media to promote innovative products available in online stores. Motivated by the fast development of the audiobook market, on the one hand, and the efficiency of social media marketing, on the other hand, we conducted an experiment with a marketing campaign of books and audiobooks on the most popular social networking site, Facebook, and discussed it in the paper. The goal of the experiment was exploring possible differences in FB users’ reaction to FB advertisements of traditional books and audiobooks. The experiment was implemented by using a real Facebook fanpage of a Polish publishing house having its own online bookstore. Results show so…
Modeling a non-stationary bots’ arrival process at an e-commerce Web site
Abstract The paper concerns the issue of modeling and generating a representative Web workload for Web server performance evaluation through simulation experiments. Web traffic analysis has been done from two decades, usually based on Web server log data. However, while the character of the overall Web traffic has been extensively studied and modeled, relatively few studies have been devoted to the analysis of Web traffic generated by Internet robots (Web bots). Moreover, the overwhelming majority of studies concern the traffic on non e-commerce websites. In this paper we address the problem of modeling a realistic arrival process of bots’ requests on an e-commerce Web server. Based on real…
Analysis of Aggregated Bot and Human Traffic on E-Commerce Site
A significant volume of Web traffic nowadays can be attributed to robots. Although some of them, e.g., search-engine crawlers, perform useful tasks on a website, others may be malicious and should be banned. Consequently, there is a growing need to identify bots and to characterize their behavior. This paper investigates the share of bot-generated traffic on an e-commerce site and studies differences in bots' and humans' session-based traffic by analyzing data recorded in Web server log files. Results show that both kinds of sessions reveal different characteristics, including the session duration, the number of pages visited in session, the number of requests, the volume of data transferre…
Online Web Bot Detection Using a Sequential Classification Approach
A significant problem nowadays is detection of Web traffic generated by automatic software agents (Web bots). Some studies have dealt with this task by proposing various approaches to Web traffic classification in order to distinguish the traffic stemming from human users' visits from that generated by bots. Most of previous works addressed the problem of offline bot recognition, based on available information on user sessions completed on a Web server. Very few approaches, however, have been proposed to recognize bots online, before the session completes. This paper proposes a novel approach to binary classification of a multivariate data stream incoming on a Web server, in order to recogn…
Sensitivity Analysis Of Key Customers And Revenue-Oriented Admission Control And Scheduling Algorithm
Received: 15 January 2013 Abstract Accepted: 2 February 2013 The paper deals with the problem of Quality of Web Service (QoWS) in e-commerce Web servers, i.e. in retail Web stores. It concerns the admission control and scheduling algorithm for a Web server system, which aims at preventing the system from overload to provide high QoWS level and ultimately, to increase Web site’s conversion rate, i.e. to turn more visitors into customers. The sensitivity of the algorithm to changes in its basic parameter values was analyzed by using a simulation-based approach. Special attention was paid to evaluation of the parameter impact on conventional and business-related system performance metrics.
Improving the quality of e-commerce web service: what is important for the request scheduling algorithm?
The paper concerns a new research area that is Quality of Web Service (QoWS). The need for QoWS is motivated by a still growing number of Internet users, by a steady development and diversification of Web services, and especially by popularization of e-commerce applications. The goal of the paper is a critical analysis of the literature concerning scheduling algorithms for e-commerce Web servers. The paper characterizes factors affecting the load of the Web servers and discusses ways of improving their efficiency. Crucial QoWS requirements of the business Web server are identified: serving requests before their individual deadlines, supporting user session integrity, supporting different cl…
Verification of Web traffic burstiness and self-similarity for multiple online stores
Developing realistic Web traffic models is essential for a reliable Web server performance evaluation. Very significant Web traffic properties that have been identified so far include burstiness and self-similarity. Very few relevant studies have been devoted to e-commerce traffic, however. In this paper, we investigate burstiness and self-similarity factors for seven different online stores using their access log data. Our findings show that both features are present in all the analyzed e-commerce datasets. Furthermore, a strong correlation of the Hurst parameter with the average request arrival rate was discovered (0.94). Estimates of the Hurst parameter for the Web traffic in the online …
A Quantum-Inspired Classifier for Early Web Bot Detection
This paper introduces a novel approach, inspired by the principles of Quantum Computing, to address web bot detection in terms of real-time classification of an incoming data stream of HTTP request headers, in order to ensure the shortest decision time with the highest accuracy. The proposed approach exploits the analogy between the intrinsic correlation of two or more particles and the dependence of each HTTP request on the preceding ones. Starting from the a-posteriori probability of each request to belong to a particular class, it is possible to assign a Qubit state representing a combination of the aforementioned probabilities for all available observations of the time series. By levera…
Application of neural network to predict purchases in online store
A key ability of competitive online stores is effective prediction of customers’ purchase intentions as it makes it possible to apply personalized service strategy to convert visitors into buyers and increase sales conversion rates. Data mining and artificial intelligence techniques have proven to be successful in classification and prediction tasks in complex real-time systems, like e-commerce sites. In this paper we proposed a back-propagation neural network model aiming at predicting purchases in active user sessions in a Web store. The neural network training and evaluation was performed using a set of user sessions reconstructed from server log data. The proposed neural network was abl…
Using association rules to assess purchase probability in online stores
The paper addresses the problem of e-customer behavior characterization based on Web server log data. We describe user sessions with the number of session features and aim to identify the features indicating a high probability of making a purchase for two customer groups: traditional customers and innovative customers. We discuss our approach aimed at assessing a purchase probability in a user session depending on categories of viewed products and session features. We apply association rule mining to real online bookstore data. The results show differences in factors indicating a high purchase probability in session for both customer types. The discovered association rules allow us to formu…
Web Traffic Modeling for E-Commerce Web Server System
The paper concerns a problem of the e-commerce Web server system performance evaluation through simulation experiments, especially a problem of modeling a representative stream of user requests at the input of such system. Motivated by a need of a benchmarking tool for the Business-to-Consumer (B2C) environment we discuss a workload model typical of such Web sites and also a model of a multi-tiered e-commerce Web server system. A simulation tool in which the proposed models have been implemented is briefly talked over and some experimental results on the Web system performance in terms of traditional and business performance measures are presented.
Bot recognition in a Web store: An approach based on unsupervised learning
Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning stra…
Identifying legitimate Web users and bots with different traffic profiles — an Information Bottleneck approach
Abstract Recent studies reported that about half of Web users nowadays are intelligent agents (Web bots). Many bots are impersonators operating at a very high sophistication level, trying to emulate navigational behaviors of legitimate users (humans). Moreover, bot technology continues to evolve which makes bot detection even harder. To deal with this problem, many advanced methods for differentiating bots from humans have been proposed, a large part of which relies on supervised machine learning techniques. In this paper, we propose a novel approach to identify various profiles of bots and humans which combines feature selection and unsupervised learning of HTTP-level traffic patterns to d…
Characterizing Web sessions of e-customers interested in traditional and innovative products
Web traffic characterization and modelling is currently a hot research issue. Low-level analysis of HTTP traffic on the server allows one to build adequate traffic models to be used in server benchmarking. High-level analysis of Web user behavior allows one to optimize website structure and develop personalized service strategies. In this paper, analysis of customer sessions in an online store is performed using Web server log data. The goal is to explore possible differences between sessions of customers viewing and purchasing innovative products, and customers only interested in traditional products.
Efficient on-the-fly Web bot detection
Abstract A large fraction of traffic on present-day Web servers is generated by bots — intelligent agents able to traverse the Web and execute various advanced tasks. Since bots’ activity may raise concerns about server security and performance, many studies have investigated traffic features discriminating bots from human visitors and developed methods for automated traffic classification. Very few previous works, however, aim at identifying bots on-the-fly, trying to classify active sessions as early as possible. This paper proposes a novel method for binary classification of streams of Web server requests in order to label each active session as “bot” or “human”. A machine learning appro…