Search results for "Data mining"
showing 10 items of 907 documents
Modeling a non-stationary bots’ arrival process at an e-commerce Web site
2017
Abstract The paper concerns the issue of modeling and generating a representative Web workload for Web server performance evaluation through simulation experiments. Web traffic analysis has been done from two decades, usually based on Web server log data. However, while the character of the overall Web traffic has been extensively studied and modeled, relatively few studies have been devoted to the analysis of Web traffic generated by Internet robots (Web bots). Moreover, the overwhelming majority of studies concern the traffic on non e-commerce websites. In this paper we address the problem of modeling a realistic arrival process of bots’ requests on an e-commerce Web server. Based on real…
Feature selection: A multi-objective stochastic optimization approach
2020
The feature subset task can be cast as a multiobjective discrete optimization problem. In this work, we study the search algorithm component of a feature subset selection method. We propose an algorithm based on the threshold accepting method, extended to the multi-objective framework by an appropriate definition of the acceptance rule. The method is used in the task of identifying relevant subsets of features in a Web bot recognition problem, where automated software agents on the Web are identified by analyzing the stream of HTTP requests to a Web server.
Using association rules to assess purchase probability in online stores
2016
The paper addresses the problem of e-customer behavior characterization based on Web server log data. We describe user sessions with the number of session features and aim to identify the features indicating a high probability of making a purchase for two customer groups: traditional customers and innovative customers. We discuss our approach aimed at assessing a purchase probability in a user session depending on categories of viewed products and session features. We apply association rule mining to real online bookstore data. The results show differences in factors indicating a high purchase probability in session for both customer types. The discovered association rules allow us to formu…
High precision mass measurements for wine metabolomics
2014
An overview of the critical steps for the non-targeted Ultra-High Performance Liquid Chromatography coupled with Quadrupole Time-of-Flight Mass Spectrometry (UPLC-Q-ToF-MS) analysis of wine chemistry is given, ranging from the study design, data preprocessing and statistical analyses, to markers identification. UPLC-Q-ToF-MS data was enhanced by the alignment of exact mass data from FTICR-MS, and marker peaks were identified using UPLC-Q-ToF-MS(2). In combination with multivariate statistical tools and the annotation of peaks with metabolites from relevant databases, this analytical process provides a fine description of the chemical complexity of wines, as exemplified in the case of red (P…
Keynote Paper: Data Mining Researcher, Who is Your Customer? Some Issues Inspired by the Information Systems Field
2006
Data mining as an applied research field is still causing great expectations among organizations which want to raise the utility they are getting from their huge databases and data warehouses. There exist too few success stories about organizations having managed to satisfy even some of those expectations. This situation is very similar to the one inside the information systems (IS) field, especially earlier but even currently. The recent lively debate about the identity of the IS discipline included also the analysis concerning the customers of IS research. Inspired by IS researchers' insights related to the topic, we ask the question "who is our customer?" as data mining researchers. With…
Connections Between Topology and Macroscopic Mechanical Properties of Three-Dimensional Open-Pore Materials
2018
This work addresses a number of fundamental questions regarding the topological description of materials characterized by a highly porous three-dimensional structure with bending as the major deformation mechanism. Highly efficient finite-element beam models were used for generating data on the mechanical behavior of structures with different topologies, ranging from highly coordinated bcc to Gibson–Ashby structures. Random cutting enabled a continuous modification of average coordination numbers ranging from the maximum connectivity to the percolation-cluster transition of the 3D network. The computed macroscopic mechanical properties–Young's modulus, yield strength, and Poisson's ratio–co…
Tracing Potential School Shooters in the Digital Sphere
2010
There are over 300 known school shooting cases in the world and over ten known cases where the perpetrator(s) have been prohibited to perform the attack at the last moment or earlier. Interesting from our point of view is that in many cases the perpetrators have expressed their views in social media or on their web page well in advance, and often also left suicide messages in blogs and other forums before their attack, along the planned date and place. This has become more common towards the end of this decennium. In some cases this has made it possible to prevent the attack. In this paper we will look at the possibilities to find commonalities of the perpetrators, beyond the fact that they…
Overlapping community detection versus ground-truth in AMAZON co-purchasing network
2015
International audience; Objective evaluation of community detection algorithms is a strategic issue. Indeed, we need to verify that the communities identified are actually the good ones. Moreover, it is necessary to compare results between two distinct algorithms to determine which is most effective. Classically, validations rely on clustering comparison measures or on quality metrics. Although, various traditional performance measures are used extensively. It appears very clearly that they cannot distinguish community structures with different topological properties. It is therefore necessary to propose an alternative methodology more sensitive to the community structure variations in orde…
CLEARMiner: a new algorithm for mining association patterns on heterogeneous time series from climate data
2010
International audience; Recently, improvements in sensor technology contributed to increasing in spatial data acquisition. The use of remote sensing in many countries and states, where agricultural business is a large part of their gross income, can provide a valuable source to improve their economy. The combination of climate and remote sensing data can reveal useful information, which can help researchers to monitor and estimate the production of agricultural crops. Data mining techniques are the main tools to analyze and extract relationships and patterns. In this context, this paper presents a new algorithm for mining association patterns in Geo-referenced databases of climate and satel…
User profile matching in social networks
2010
International audience; Inter-social networks operations and functionalities are required in several scenarios (data integration, data enrichment, information retrieval, etc.). To achieve this, matching user profiles is required. Current methods are so restrictive and do not consider all the related problems. Particularly, they assume that two profiles describe the same physical person only if the values of their Inverse Functional Property or IFP (e.g. the email address, homepage, etc.) are the same. However, the observed trend in social networks is not fully compatible with this assumption since users tend to create more than one social network account (for personal use, for work, etc.) w…