An Improved Decision System for URL Accesses Based on a Rough Feature Selection Technique

6533b855fe1ef96bd12b11c6

RESEARCH PRODUCT

An Improved Decision System for URL Accesses Based on a Rough Feature Selection Technique

A. I. Esparcia-alcázar P. De Las Cuevas Zeineb Chelly Antonio M. Mora Juan J. Merelo

subject

Information retrieval Internal security Computer science Decision system Feature (computer vision)String (computer science)Computational intelligence Feature selection Rough set Corporate security

description

Corporate security is usually one of the matters in which companies invest more resources, since the loss of information directly translates into monetary losses. Security issues might have an origin in external attacks or internal security failures, but an important part of the security breaches is related to the lack of awareness that the employees have with regard to the use of the Web. In this work we have focused on the latter problem, describing the improvements to a system able to detect anomalous and potentially insecure situations that could be dangerous for a company. This system was initially conceived as a better alternative to what are known as black/white lists. These lists contain URLs whose access is banned or dangerous (black list), or URLs to which the access is permitted or allowed (white list). In this chapter, we propose a system that can initially learn from existing black/white lists and then classify a new, unknown, URL request either as “should be allowed” or “should be denied”. This system is described, as well as its results and the improvements made by means of an initial data pre-processing step based on applying Rough Set Theory for feature selection. We prove that high accuracies can be obtained even without including a pre-processing step, reaching between 96 and 97 % of correctly classified patterns. Furthermore, we also prove that including the use of Computational Intelligence techniques for pre-processing the data enhances the system performance, in terms of running time, while the accuracies remain close to 97 %. Indeed, among the obtained results, we demonstrate that it is possible to obtain interesting rules which are not based only on the URL string feature, for classifying new unknown URLs access requests as allowed or as denied.

year	journal	country	edition	language
2015-12-20

https://doi.org/10.1007/978-3-319-26450-9_6