Search results for "N-grams"
showing 5 items of 5 documents
Dimensionality reduction framework for detecting anomalies from network logs
2012
Dynamic web services are vulnerable to multitude of intrusions that could be previously unknown. Server logs contain vast amounts of information about network traffic, and finding attacks from these logs improves the security of the services. In this research features are extracted from HTTP query parameters using 2-grams. We propose a framework that uses dimensionality reduction and clustering to identify anomalous behavior. The framework detects intrusions from log data gathered from a real network service. This approach is adaptive, works on the application layer and reduces the number of log lines that needs to be inspected. Furthermore, the traffic can be visualized. peerReviewed
Oppimiskontekstin vaikutus oppijanpragmatiikkaan : astemääritteet leksikaalisina nallekarhuina
2015
Artikkelissa käsitellään oppimisympäristön vaikutusta astemääritteiden käyttöön. Astemääritteet ovat jonkin ominaisuuden asteen suurta, kohtalaista tai vähäistä määrää ilmaisevia, pragmaattista merkitystä kantavia adverbeja (esim. melko, hyvin, tosi). Oppimisympäristön vaikutusta käsitellään artikkelissa korpusten avulla. Vaikka tällä hetkellä käytössä olevat oppijansuomen aineistot eivät ole täysin vertailukelpoisia keskenään esimerkiksi tekstien tehtävänantojen suhteen, voidaan niiden avulla tehdä alustavia havaintoja oppimiskontekstin vaikutuksesta ja edelleen hypoteeseja tulevia tutkimuksia varten. Oppimiskontekstilla tarkoitetaan tässä tutkimuksessa sitä ympäristöä, jossa kieltä opitaa…
Adaptive framework for network traffic classification using dimensionality reduction and clustering
2012
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …
Distinctive Lexical Patterns in Russian Patient Information Leaflets: A Corpus-Driven Study
2019
This methodologically-oriented corpus-driven study focuses on distinctive patterns of language use in a specialized text type, namely Russian patient information leaflets. The study’s main goal is to identify keywords and recurrent sequences of words that account for the leaflets’ formulaicity, and - as a secondary goal - to describe their discoursal functions. The keywords were identified using three methods (G2, Hedges’ g and Neozeta) and the overlap between the three metrics was explored. The overlapping keywords were qualitatively analyzed in terms of discoursal functions. As for the distinctive multi-word patterns, we focused on recurrent n-grams with the largest coverage in the corpus…
Anomaly Detection from Network Logs Using Diffusion Maps
2011
The goal of this study is to detect anomalous queries from network logs using a dimensionality reduction framework. The fequencies of 2-grams in queries are extracted to a feature matrix. Dimensionality reduction is done by applying diffusion maps. The method is adaptive and thus does not need training before analysis. We tested the method with data that includes normal and intrusive traffic to a web server. This approach finds all intrusions in the dataset. peerReviewed