6533b834fe1ef96bd129d802

RESEARCH PRODUCT

HTTP-level e-commerce data based on server access logs for an online store

Yash ChawlaGrzegorz ChodakGrażyna Suchacka

subject

Web serverDatabaseaccess logComputer Networks and CommunicationsComputer sciencebusiness.industry020206 networking & telecommunicationselectronic commerce02 engineering and technologyE-commerceWeb trafficcomputer.software_genreWeb trafficWeb serveronline store0202 electrical engineering electronic engineering information engineeringKey (cryptography)020201 artificial intelligence & image processingHTTP trafficUnavailabilitybusinesscomputerData Article

description

Abstract Web server logs have been extensively used as a source of data on the characteristics of Web traffic and users’ navigational patterns. In particular, Web bot detection and online purchase prediction using methods from artificial intelligence (AI) are currently key areas of research. However, in reality, it is hard to obtain logs from actual online stores and there is no common dataset that can be used across different studies. Moreover, there is a lack of studies exploring Web traffic over a longer period of time, due to the unavailability of long-term data from server logs. The need to develop reliable models of Web traffic, Web user navigation, and e-customer behaviour calls for an up-to-date, large-volume e-commerce dataset on Web traffic. Similarly, AI problems require a sufficient amount of solid, real-life data to train and validate new models and methods. Thus, to meet a demand of a publicly available long-term e-commerce dataset, we collected access log data describing the operation of an online store over a six-month period. Using a program written in the C# language, data were aggregated, transformed, and anonymized. As a result, we release this EClog dataset in CSV format, which covers 183 days of HTTP-level e-commerce traffic. The data will be beneficial for research in many areas, including computer science, data science, management, and sociology.

10.1016/j.comnet.2020.107589http://europepmc.org/articles/PMC7540248