6533b82efe1ef96bd1293e5f

RESEARCH PRODUCT

Improving clustering of Web bot and human sessions by applying Principal Component Analysis

Grazyna Suchacka

subject

Bot detectionPrincipal Component AnalysisPCALog analysisComputer sciencek-meansInternet robotcomputer.software_genreClassificationWeb botDimensionality reductionClusteringWeb serverPrincipal component analysisFeature selectionData miningCluster analysiscomputer

description

View references (18) The paper addresses the problem of modeling Web sessions of bots and legitimate users (humans) as feature vectors for their use at the input of classification models. So far many different features to discriminate bots’ and humans’ navigational patterns have been considered in session models but very few studies were devoted to feature selection and dimensionality reduction in the context of bot detection. We propose applying Principal Component Analysis (PCA) to develop improved session models based on predictor variables being efficient discriminants of Web bots. The proposed models are used in session clustering, whose performance is evaluated in terms of the purity of generated clusters. The efficiency of the proposed approach is experimentally verified using real server log data. Results show that PCA may be very efficient in dimensionality reduction and feature selection for session classification aiming at distinguishing Web robots.

10.7148/2019-0434https://doi.org/10.7148/2019-0434