Search results for "Data mining"
showing 10 items of 907 documents
PROTEIN SECONDARY STRUCTURE PREDICTION: HOW TO IMPROVE ACCURACY BY INTEGRATION
2006
In this paper a technique to improve protein secondary structure prediction is proposed. The approach is based on the idea of combining the results of a set of prediction tools, choosing the most correct parts of each prediction. The correctness of the resulting prediction is measured referring to accuracy parameters used in several editions of CASP. Experimental evaluations validating the proposed approach are also reported.
Robust refinement of initial prototypes for partitioning-based clustering algorithms
2007
Non-uniqueness of solutions and sensitivity to erroneous data are common problems to large-scale data clustering tasks. In order to avoid poor quality of solutions with partitioning-based clustering methods, robust estimates (that are highly insensitive to erroneous data values) are needed and initial cluster prototypes should be determined properly. In this paper, a robust density estimation initialization method that exploits the spatial median estimate to the prototype update is presented. Besides being insensitive to noise and outliers, the new method is also computationally comparable with other traditional methods. The methods are compared by numerical experiments on a set of syntheti…
2020
Discriminant validity was originally presented as a set of empirical criteria that can be assessed from multitrait-multimethod (MTMM) matrices. Because datasets used by applied researchers rarely lend themselves to MTMM analysis, the need to assess discriminant validity in empirical research has led to the introduction of numerous techniques, some of which have been introduced in an ad hoc manner and without rigorous methodological support. We review various definitions of and techniques for assessing discriminant validity and provide a generalized definition of discriminant validity based on the correlation between two measures after measurement error has been considered. We then review t…
A hierarchical clustering strategy and its application to proteomic interaction data
2003
We describe a novel strategy of hierarchical clustering analysis, particularly useful to analyze proteomic interaction data. The logic behind this method is to use the information for all interactions among the elements of a set to evaluate the strength of the interaction of each pair of elements. Our procedure allows the characterization of protein complexes starting with partial data and the detection of "promiscuous" proteins that bias the results, generating false positive data. We demonstrate the usefulness of our strategy by analyzing a real case that involves 137 Saccharomyces cerevisiae proteins. Because most functional studies require the evaluation of similar data sets, our method…
On handling exceptions
1995
The current literature of information systems has dealt extensively with all kinds of exceptions. There are several studies defining the concept of exception and even providing classifications. However, no studies provide a method for verifying the rules in order to handle exceptions and to achieve the goals set by an organization's rules. In this paper, a model employing a set of unique input/output (UIO) sequences is presented for verifying such rules. The model originally presented for Finite State Machines (FSM) has been modified to include concepts of exception handling and will be used to form a tool usable for verifying exception handling rules in OISs.
PRIvacy LEakage Methodology (PRILE) for IDS Rules
2010
This paper introduces a methodology for evaluating PRIvacy LEakage in signature-based Network Intrusion Detection System (IDS) rules. IDS rules that expose more data than a given percentage of all data sessions are defined as privacy leaking. Furthermore, it analyses the IDS rule attack specific pattern size required in order to keep the privacy leakage below a given threshold, presuming that occurrence frequencies of the attack pattern in normal text are known. We have applied the methodology on the network intrusion detection system Snort’s rule set. The evaluation confirms that Snort in its default configuration aims at not being excessively privacy invasive. However we have identified s…
<strong>New tool useful for drug discovery validated through benchmark datasets</strong>
2018
Atomic Weighted Vectors (AWVs) are vectors that contain the codified information of molecular structures, which can apply to a set of Aggregation Operators (AOs) to calculate total and local molecular descriptors (MDs). This article presents an exploratory study of a new tool useful for drug discovery using different datasets, such as DRAGON and Sutherland’s datasets, as well as their comparison with other well-known approaches. In order to evaluate the performance of the tool, several statistics and QSAR/QSPR experiments were performed. Variability analyses are used to quantify the information content of the AWVs obtained from the tool, by the way of an information theory-based algorithm. …
A Logical Explication of the Concepts of Incomplete and Uncertain Information
1994
Discovery of elementary knowledge and its constituents, i.e. information contained in objects of reality is realized through asking questions including certain aspects called attributes in this paper. We describe a fragment of a discovered reality as an information system (cf. Pawlak [1,3,4]), which consists of the universum U of all the objects of this reality we are concerned with, and of a set A of attributes understood as functions each of which assigns to every object of U 1) a value of given attribute belonging to A or 2) an interval of approximate values of this attribute, i.e. an established set of possible values of this attribute. From the point of view of the cognitive agent and …
Semantic traffic applications based on DatexII
2009
In this work we demonstrate a particular use of ontologies based on the European specifications DATEXII. These specifications are designed and developed as a traffic and travel data exchange mechanism by a European task force to set up and standardise the interface between traffic control and information centres. It is the reference for applications that are developed and implemented in Europe.This language describes concepts and structures of data related to traffic, but the description is just syntactic, not semantic. Therefore the objective to be reached in this part of the research has been to develop a semantic description in order to carry out some applications like syndication and a …
Deriving and comparing deduplication techniques using a model-based classification
2015
Data deduplication has been a hot research topic and a large number of systems have been developed. These systems are usually seen as an inherently linked set of characteristics. However, a detailed analysis shows independent concepts that can be used in other systems. In this work, we perform this analysis on the main representatives of deduplication systems. We embed the results in a model, which shows two yet unexplored combinations of characteristics. In addition, the model enables a comprehensive evaluation of the representatives and the two new systems. We perform this evaluation based on real world data sets.