Search results for "Mining"
showing 10 items of 1730 documents
Detection of Internet robots using a Bayesian approach
2015
A large part of Web traffic on e-commerce sites is generated not by human users but by Internet robots: search engine crawlers, shopping bots, hacking bots, etc. In practice, not all robots, especially the malicious ones, disclose their identities to a Web server and thus there is a need to develop methods for their detection and identification. This paper proposes the application of a Bayesian approach to robot detection based on characteristics of user sessions. The method is applied to the Web traffic from a real e-commerce site. Results show that the classification model based on the cluster analysis with the Ward's method and the weighted Euclidean metric is very effective in robot det…
Anomaly Detection from Network Logs Using Diffusion Maps
2011
The goal of this study is to detect anomalous queries from network logs using a dimensionality reduction framework. The fequencies of 2-grams in queries are extracted to a feature matrix. Dimensionality reduction is done by applying diffusion maps. The method is adaptive and thus does not need training before analysis. We tested the method with data that includes normal and intrusive traffic to a web server. This approach finds all intrusions in the dataset. peerReviewed
Modeling a non-stationary bots’ arrival process at an e-commerce Web site
2017
Abstract The paper concerns the issue of modeling and generating a representative Web workload for Web server performance evaluation through simulation experiments. Web traffic analysis has been done from two decades, usually based on Web server log data. However, while the character of the overall Web traffic has been extensively studied and modeled, relatively few studies have been devoted to the analysis of Web traffic generated by Internet robots (Web bots). Moreover, the overwhelming majority of studies concern the traffic on non e-commerce websites. In this paper we address the problem of modeling a realistic arrival process of bots’ requests on an e-commerce Web server. Based on real…
Feature selection: A multi-objective stochastic optimization approach
2020
The feature subset task can be cast as a multiobjective discrete optimization problem. In this work, we study the search algorithm component of a feature subset selection method. We propose an algorithm based on the threshold accepting method, extended to the multi-objective framework by an appropriate definition of the acceptance rule. The method is used in the task of identifying relevant subsets of features in a Web bot recognition problem, where automated software agents on the Web are identified by analyzing the stream of HTTP requests to a Web server.
Application of neural network to predict purchases in online store
2016
A key ability of competitive online stores is effective prediction of customers’ purchase intentions as it makes it possible to apply personalized service strategy to convert visitors into buyers and increase sales conversion rates. Data mining and artificial intelligence techniques have proven to be successful in classification and prediction tasks in complex real-time systems, like e-commerce sites. In this paper we proposed a back-propagation neural network model aiming at predicting purchases in active user sessions in a Web store. The neural network training and evaluation was performed using a set of user sessions reconstructed from server log data. The proposed neural network was abl…
Using association rules to assess purchase probability in online stores
2016
The paper addresses the problem of e-customer behavior characterization based on Web server log data. We describe user sessions with the number of session features and aim to identify the features indicating a high probability of making a purchase for two customer groups: traditional customers and innovative customers. We discuss our approach aimed at assessing a purchase probability in a user session depending on categories of viewed products and session features. We apply association rule mining to real online bookstore data. The results show differences in factors indicating a high purchase probability in session for both customer types. The discovered association rules allow us to formu…
High precision mass measurements for wine metabolomics
2014
An overview of the critical steps for the non-targeted Ultra-High Performance Liquid Chromatography coupled with Quadrupole Time-of-Flight Mass Spectrometry (UPLC-Q-ToF-MS) analysis of wine chemistry is given, ranging from the study design, data preprocessing and statistical analyses, to markers identification. UPLC-Q-ToF-MS data was enhanced by the alignment of exact mass data from FTICR-MS, and marker peaks were identified using UPLC-Q-ToF-MS(2). In combination with multivariate statistical tools and the annotation of peaks with metabolites from relevant databases, this analytical process provides a fine description of the chemical complexity of wines, as exemplified in the case of red (P…
Keynote Paper: Data Mining Researcher, Who is Your Customer? Some Issues Inspired by the Information Systems Field
2006
Data mining as an applied research field is still causing great expectations among organizations which want to raise the utility they are getting from their huge databases and data warehouses. There exist too few success stories about organizations having managed to satisfy even some of those expectations. This situation is very similar to the one inside the information systems (IS) field, especially earlier but even currently. The recent lively debate about the identity of the IS discipline included also the analysis concerning the customers of IS research. Inspired by IS researchers' insights related to the topic, we ask the question "who is our customer?" as data mining researchers. With…
Connections Between Topology and Macroscopic Mechanical Properties of Three-Dimensional Open-Pore Materials
2018
This work addresses a number of fundamental questions regarding the topological description of materials characterized by a highly porous three-dimensional structure with bending as the major deformation mechanism. Highly efficient finite-element beam models were used for generating data on the mechanical behavior of structures with different topologies, ranging from highly coordinated bcc to Gibson–Ashby structures. Random cutting enabled a continuous modification of average coordination numbers ranging from the maximum connectivity to the percolation-cluster transition of the 3D network. The computed macroscopic mechanical properties–Young's modulus, yield strength, and Poisson's ratio–co…