Search results for "n-gram"
showing 9 items of 9 documents
Combining conjunctive rule extraction with diffusion maps for network intrusion detection
2013
Network security and intrusion detection are important in the modern world where communication happens via information networks. Traditional signature-based intrusion detection methods cannot find previously unknown attacks. On the other hand, algorithms used for anomaly detection often have black box qualities that are difficult to understand for people who are not algorithm experts. Rule extraction methods create interpretable rule sets that act as classifiers. They have mostly been combined with already labeled data sets. This paper aims to combine unsupervised anomaly detection with rule extraction techniques to create an online anomaly detection framework. Unsupervised anomaly detectio…
Bene : Adverb or noun?
2013
International audience; When Italian bene ‘good / well’ occurs with fare ‘do / make’, several constructs with remarkably different argument frames are involved. This paper deals with three of them: (a) Il latte fa bene ai bambini ‘Milk is good for children’; (b) Fa bene il suo lavoro ‘She does her job well’, and (c) Faresti bene a non dire niente ‘You would do well to say nothing about it’. We discuss dictionary discrepancies concerning the lexical category of 'bene' in (a), which we take to be a noun predicate, and draw a distinction between the adverbial uses in (b) and (c).
Oppimiskontekstin vaikutus oppijanpragmatiikkaan : astemääritteet leksikaalisina nallekarhuina
2015
Artikkelissa käsitellään oppimisympäristön vaikutusta astemääritteiden käyttöön. Astemääritteet ovat jonkin ominaisuuden asteen suurta, kohtalaista tai vähäistä määrää ilmaisevia, pragmaattista merkitystä kantavia adverbeja (esim. melko, hyvin, tosi). Oppimisympäristön vaikutusta käsitellään artikkelissa korpusten avulla. Vaikka tällä hetkellä käytössä olevat oppijansuomen aineistot eivät ole täysin vertailukelpoisia keskenään esimerkiksi tekstien tehtävänantojen suhteen, voidaan niiden avulla tehdä alustavia havaintoja oppimiskontekstin vaikutuksesta ja edelleen hypoteeseja tulevia tutkimuksia varten. Oppimiskontekstilla tarkoitetaan tässä tutkimuksessa sitä ympäristöä, jossa kieltä opitaa…
Detection of Anomalous HTTP Requests Based on Advanced N-gram Model and Clustering Techniques
2013
Nowadays HTTP servers and applications are some of the most popular targets for network attacks. In this research, we consider an algorithm for HTTP intrusions detection based on simple clustering algorithms and advanced processing of HTTP requests which allows the analysis of all queries at once and does not separate them by resource. The method proposed allows detection of HTTP intrusions in case of continuously updated web-applications and does not require a set of HTTP requests free of attacks to build the normal user behaviour model. The algorithm is tested using logs acquired from a large real-life web service and, as a result, all attacks from these logs are detected, while the numbe…
Anomaly Detection from Network Logs Using Diffusion Maps
2011
The goal of this study is to detect anomalous queries from network logs using a dimensionality reduction framework. The fequencies of 2-grams in queries are extracted to a feature matrix. Dimensionality reduction is done by applying diffusion maps. The method is adaptive and thus does not need training before analysis. We tested the method with data that includes normal and intrusive traffic to a web server. This approach finds all intrusions in the dataset. peerReviewed
Poikkeavuuksien havaitseminen WWW-palvelinlokidatasta
2011
Nykyajan web-palvelut ovat dynaamisia ja avoimia. Tämä antaa suurelle joukolle käyttäjiä mahdollisuuden päästä käsiksi palveluun ja sen sisältämään tietoon. Samalla avautuu uusia mahdollisuuksia toteuttaa hyökkäys. Tietoturvan pitäminen riittävällä tasolla on kilpailua aikaa vastaan. Poikkeavuuksien havaitsemisjärjestelmillä pystytään kuitenkin havaitsemaan ennestään tuntemattomat hyökkäykset ja muu epänormaali toiminta ja siten pitämään tietoturva hyvällä tasolla. Tutkimuksessa sovellettiin n-grammianalyysia, tukivektorikonetta ja diffuusiokarttoja esikäsitellyn verkkodatan analysointiin. Kaikilla menetelmillä saatiin lupaavia tuloksia, mutta reaaliaikainen järjestelmä vaatii vielä jatkoke…
Dimensionality reduction framework for detecting anomalies from network logs
2012
Dynamic web services are vulnerable to multitude of intrusions that could be previously unknown. Server logs contain vast amounts of information about network traffic, and finding attacks from these logs improves the security of the services. In this research features are extracted from HTTP query parameters using 2-grams. We propose a framework that uses dimensionality reduction and clustering to identify anomalous behavior. The framework detects intrusions from log data gathered from a real network service. This approach is adaptive, works on the application layer and reduces the number of log lines that needs to be inspected. Furthermore, the traffic can be visualized. peerReviewed
Distinctive Lexical Patterns in Russian Patient Information Leaflets: A Corpus-Driven Study
2019
This methodologically-oriented corpus-driven study focuses on distinctive patterns of language use in a specialized text type, namely Russian patient information leaflets. The study’s main goal is to identify keywords and recurrent sequences of words that account for the leaflets’ formulaicity, and - as a secondary goal - to describe their discoursal functions. The keywords were identified using three methods (G2, Hedges’ g and Neozeta) and the overlap between the three metrics was explored. The overlapping keywords were qualitatively analyzed in terms of discoursal functions. As for the distinctive multi-word patterns, we focused on recurrent n-grams with the largest coverage in the corpus…
Adaptive framework for network traffic classification using dimensionality reduction and clustering
2012
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …