Search results for "Data mining"
showing 10 items of 907 documents
Growing Hierarchical Self-organizing Maps and Statistical Distribution Models for Online Detection of Web Attacks
2013
In modern networks, HTTP clients communicate with web servers using request messages. By manipulating these messages attackers can collect confidential information from servers or even corrupt them. In this study, the approach based on anomaly detection is considered to find such attacks. For HTTP queries, feature matrices are obtained by applying an n-gram model, and, by learning on the basis of these matrices, growing hierarchical self-organizing maps are constructed. For HTTP headers, we employ statistical distribution models based on the lengths of header values and relative frequency of symbols. New requests received by the web-server are classified by using the maps and models obtaine…
Semantic annotation and big data techniques for patent information processing
2017
This thesis analyzes approaches to generate semantic annotations on patent records, as well as on other structured data, by relying on the structure and semantic representation of documents. Information in patent records reflects how real-world technologies evolve, and the approximately 3 million annual new patent applications capture the global inventive frontier. The volume of this information is too big to be effectively analyzed purely with human effort, necessitating Big data approaches to analyze it with computer aided tools and techniques. Big data is a term that describes a massive volume of structured, semi structured and unstructured data that is so large to the point that it is d…
Healthcare trajectory mining by combining multidimensional component and itemsets
2012
Sequential pattern mining is aimed at extracting correlations among temporal data. Many different methods were proposed to either enumerate sequences of set valued data (i.e., itemsets) or sequences containing multidimensional items. However, in real-world scenarios, data sequences are described as events of both multidimensional items and set valued information. These rich heterogeneous descriptions cannot be exploited by traditional approaches. For example, in healthcare domain, hospitalizations are defined as sequences of multi-dimensional attributes (e.g. Hospital or Diagnosis) associated with two sets, set of medical procedures (e.g. $ \lbrace $ Radiography, Appendectomy $\rbrace$) and…
Occlusion-based estimation of independent multinomial random variables using occurrence and sequential information
2017
Abstract This paper deals with the relatively new field of sequence-based estimation in which the goal is to estimate the parameters of a distribution by utilizing both the information in the observations and in their sequence of appearance. Traditionally, the Maximum Likelihood (ML) and Bayesian estimation paradigms work within the model that the data, from which the parameters are to be estimated, is known, and that it is treated as a set rather than as a sequence. The position that we take is that these methods ignore, and thus discard, valuable sequence -based information, and our intention is to obtain ML estimates by “extracting” the information contained in the observations when perc…
A weighted logistic regression for conjoint analysis and Kansei engineering
2007
Customer needs for emotional satisfaction are increasingly being considered by product and service designers. While several existing methods such as conjoint analysis (CA), Kano model and quality function deployment support the translation of customer requirements into technical specifications, researchers are now working to develop methods aimed at integrating affective aspects into product design. Kansei engineering (KE) is a design philosophy that considers customer perceptions and emotions by adopting a multi-disciplinary approach. CA is a useful tool within a KE project. This article presents a methodology for conducting a KE project in early development phases. This methodology is bas…
An Approach to Cadastre Map Quality Evaluation
2008
An approach to data quality evaluation is proposed, which is elaborated and implemented by State Land Service of the Republic of Latvia. The approach is based on opinion of Land Service experts about Cadastre map quality that depends on its usage points. Quality parameters of Cadastre map objects identified by experts and its limit values are used for evaluation. The assessment matrix is used, which allow to define Cadastre map quality that depends on its usage purpose. The matrix is used to find out, of what quality a Cadastre map should be in order to be used for the chosen purpose. The given approach is flexible, it gives a possibility to change sets of quality parameters and their limit…
GPCALMA: A Grid-based tool for mammographic screening
2005
The next generation of High Energy Physics (HEP) experiments requires a GRID approach to a distributed computing system and the associated data management: the key concept is the Virtual Organisation (VO), a group of distributed users with a common goal and the will to share their resources. A similar approach is being applied to a group of Hospitals which joined the GPCALMA project (Grid Platform for Computer Assisted Library for MAmmography), which will allow common screening programs for early diagnosis of breast and, in the future, lung cancer. HEP techniques come into play in writing the application code, which makes use of neural networks for the image analysis and proved to be useful…
An Approach to Cadastral Map Quality Evaluation in the Republic of Latvia
2009
An approach to cadastral map quality evaluation is proposed, which is elaborated and implemented by State Land Service of the Republic of Latvia. The approach is based on opinion of Land Service experts about cadastral map quality that depends on its usage points. Quality parameters of cadastral map objects identified by experts and its limit values are used for evaluation. The assessment matrix is used, which allow to define cadastral map quality that depends on its usage purpose. The matrix is used to find out, of what quality a cadastral map should be in order to be used for the chosen purpose. The given approach is flexible, it gives a possibility to change sets of quality parameters an…
Datamining: Pemanfaatan Algoritma Apriori dalam Menganalisa Pola-Pola Transaksi yang Terjadi
2012
This paper will be described about implementation and analysis of the well-known apriori algorithm, which is called Market Basket Analysis (MBA) in data mining. This algorithm is widely used to predict the relation among market basket in the huge amount of database. This algorithm is based on the concept of a prefix tree. There are several ways to organize the nodes of such a tree, to encode the items, and to organize the transactions, which may be used in order to minimize the time needed to find the frequent itemsets as well as to reduce the amount of memory needed to store the counters. The rules produced will be used by management of supermarket to organize the items set to increase the…
Condition Assessment of Norwegian Bridge Elements Using Existing Damage Records
2020
The Norwegian Public Roads Administration (NPRA) has recorded bridge element damages in a database for all the bridges it manages since the 1990s. This paper presents a comparison of three methods to establish element condition based on damage records. The methods consist in a non-parametric procedure based on the worst damage registered in the element, linear regression considering also bridge and road characteristics data and classification through an artificial neural network. The methods are assessed using a set of 159 bridges inspected in 2016. The results show that diagnostics of bridge element condition can reach high accuracy by using an artificial neural network classifier and taki…