0000000000186021
AUTHOR
Antti Juvonen
Growing Hierarchical Self-organizing Maps and Statistical Distribution Models for Online Detection of Web Attacks
In modern networks, HTTP clients communicate with web servers using request messages. By manipulating these messages attackers can collect confidential information from servers or even corrupt them. In this study, the approach based on anomaly detection is considered to find such attacks. For HTTP queries, feature matrices are obtained by applying an n-gram model, and, by learning on the basis of these matrices, growing hierarchical self-organizing maps are constructed. For HTTP headers, we employ statistical distribution models based on the lengths of header values and relative frequency of symbols. New requests received by the web-server are classified by using the maps and models obtaine…
Intrusion detection applications using knowledge discovery and data mining
Online anomaly detection using dimensionality reduction techniques for HTTP log analysis
Modern web services face an increasing number of new threats. Logs are collected from almost all web servers, and for this reason analyzing them is beneficial when trying to prevent intrusions. Intrusive behavior often differs from the normal web traffic. This paper proposes a framework to find abnormal behavior from these logs. We compare random projection, principal component analysis and diffusion map for anomaly detection. In addition, the framework has online capabilities. The first two methods have intuitive extensions while diffusion map uses the Nyström extension. This fast out-of-sample extension enables real-time analysis of web server traffic. The framework is demonstrated using …
An Efficient Network Log Anomaly Detection System Using Random Projection Dimensionality Reduction
Network traffic is increasing all the time and network services are becoming more complex and vulnerable. To protect these networks, intrusion detection systems are used. Signature-based intrusion detection cannot find previously unknown attacks, which is why anomaly detection is needed. However, many new systems are slow and complicated. We propose a log anomaly detection framework which aims to facilitate quick anomaly detection and also provide visualizations of the network traffic structure. The system preprocesses network logs into a numerical data matrix, reduces the dimensionality of this matrix using random projection and uses Mahalanobis distance to find outliers and calculate an a…
Combining conjunctive rule extraction with diffusion maps for network intrusion detection
Network security and intrusion detection are important in the modern world where communication happens via information networks. Traditional signature-based intrusion detection methods cannot find previously unknown attacks. On the other hand, algorithms used for anomaly detection often have black box qualities that are difficult to understand for people who are not algorithm experts. Rule extraction methods create interpretable rule sets that act as classifiers. They have mostly been combined with already labeled data sets. This paper aims to combine unsupervised anomaly detection with rule extraction techniques to create an online anomaly detection framework. Unsupervised anomaly detectio…
Anomaly Detection Framework Using Rule Extraction for Efficient Intrusion Detection
Huge datasets in cyber security, such as network traffic logs, can be analyzed using machine learning and data mining methods. However, the amount of collected data is increasing, which makes analysis more difficult. Many machine learning methods have not been designed for big datasets, and consequently are slow and difficult to understand. We address the issue of efficient network traffic classification by creating an intrusion detection framework that applies dimensionality reduction and conjunctive rule extraction. The system can perform unsupervised anomaly detection and use this information to create conjunctive rules that classify huge amounts of traffic in real time. We test the impl…
Poikkeavuuksien havaitseminen WWW-palvelinlokidatasta
Nykyajan web-palvelut ovat dynaamisia ja avoimia. Tämä antaa suurelle joukolle käyttäjiä mahdollisuuden päästä käsiksi palveluun ja sen sisältämään tietoon. Samalla avautuu uusia mahdollisuuksia toteuttaa hyökkäys. Tietoturvan pitäminen riittävällä tasolla on kilpailua aikaa vastaan. Poikkeavuuksien havaitsemisjärjestelmillä pystytään kuitenkin havaitsemaan ennestään tuntemattomat hyökkäykset ja muu epänormaali toiminta ja siten pitämään tietoturva hyvällä tasolla. Tutkimuksessa sovellettiin n-grammianalyysia, tukivektorikonetta ja diffuusiokarttoja esikäsitellyn verkkodatan analysointiin. Kaikilla menetelmillä saatiin lupaavia tuloksia, mutta reaaliaikainen järjestelmä vaatii vielä jatkoke…
Adaptive framework for network traffic classification using dimensionality reduction and clustering
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …
Anomaly Detection from Network Logs Using Diffusion Maps
The goal of this study is to detect anomalous queries from network logs using a dimensionality reduction framework. The fequencies of 2-grams in queries are extracted to a feature matrix. Dimensionality reduction is done by applying diffusion maps. The method is adaptive and thus does not need training before analysis. We tested the method with data that includes normal and intrusive traffic to a web server. This approach finds all intrusions in the dataset. peerReviewed