Search results for "Data mining"

showing 10 items of 907 documents

PROTEIN SECONDARY STRUCTURE PREDICTION: HOW TO IMPROVE ACCURACY BY INTEGRATION

2006

In this paper a technique to improve protein secondary structure prediction is proposed. The approach is based on the idea of combining the results of a set of prediction tools, choosing the most correct parts of each prediction. The correctness of the resulting prediction is measured referring to accuracy parameters used in several editions of CASP. Experimental evaluations validating the proposed approach are also reported.

Set (abstract data type)Bioinformatics Protein PredictionCorrectnessComputer sciencebusiness.industryArtificial intelligenceData miningMachine learningcomputer.software_genreProtein secondary structure predictionbusinessCASPcomputerApplied Artificial Intelligence
researchProduct

Robust refinement of initial prototypes for partitioning-based clustering algorithms

2007

Non-uniqueness of solutions and sensitivity to erroneous data are common problems to large-scale data clustering tasks. In order to avoid poor quality of solutions with partitioning-based clustering methods, robust estimates (that are highly insensitive to erroneous data values) are needed and initial cluster prototypes should be determined properly. In this paper, a robust density estimation initialization method that exploits the spatial median estimate to the prototype update is presented. Besides being insensitive to noise and outliers, the new method is also computationally comparable with other traditional methods. The methods are compared by numerical experiments on a set of syntheti…

Set (abstract data type)Computer scienceCorrelation clusteringOutlierInitializationSensitivity (control systems)Density estimationNoise (video)Data miningCluster analysiscomputer.software_genrecomputerRecent Advances in Stochastic Modeling and Data Analysis
researchProduct

2020

Discriminant validity was originally presented as a set of empirical criteria that can be assessed from multitrait-multimethod (MTMM) matrices. Because datasets used by applied researchers rarely lend themselves to MTMM analysis, the need to assess discriminant validity in empirical research has led to the introduction of numerous techniques, some of which have been introduced in an ad hoc manner and without rigorous methodological support. We review various definitions of and techniques for assessing discriminant validity and provide a generalized definition of discriminant validity based on the correlation between two measures after measurement error has been considered. We then review t…

Set (abstract data type)Computer scienceManagement of Technology and InnovationStrategy and ManagementMonte Carlo methodDiscriminant validityGeneral Decision SciencesGuidelineData miningcomputer.software_genrecomputerConfirmatory factor analysisOrganizational Research Methods
researchProduct

A hierarchical clustering strategy and its application to proteomic interaction data

2003

We describe a novel strategy of hierarchical clustering analysis, particularly useful to analyze proteomic interaction data. The logic behind this method is to use the information for all interactions among the elements of a set to evaluate the strength of the interaction of each pair of elements. Our procedure allows the characterization of protein complexes starting with partial data and the detection of "promiscuous" proteins that bias the results, generating false positive data. We demonstrate the usefulness of our strategy by analyzing a real case that involves 137 Saccharomyces cerevisiae proteins. Because most functional studies require the evaluation of similar data sets, our method…

Set (abstract data type)Data setRange (mathematics)Computer scienceBenchmark (computing)Data miningcomputer.software_genrecomputerHierarchical clustering
researchProduct

On handling exceptions

1995

The current literature of information systems has dealt extensively with all kinds of exceptions. There are several studies defining the concept of exception and even providing classifications. However, no studies provide a method for verifying the rules in order to handle exceptions and to achieve the goals set by an organization's rules. In this paper, a model employing a set of unique input/output (UIO) sequences is presented for verifying such rules. The model originally presented for Finite State Machines (FSM) has been modified to include concepts of exception handling and will be used to form a tool usable for verifying exception handling rules in OISs.

Set (abstract data type)Finite-state machineProgramming languageComputer scienceException handlingInformation systemData miningUSablecomputer.software_genrecomputerProceedings of conference on Organizational computing systems - COCS '95
researchProduct

PRIvacy LEakage Methodology (PRILE) for IDS Rules

2010

This paper introduces a methodology for evaluating PRIvacy LEakage in signature-based Network Intrusion Detection System (IDS) rules. IDS rules that expose more data than a given percentage of all data sessions are defined as privacy leaking. Furthermore, it analyses the IDS rule attack specific pattern size required in order to keep the privacy leakage below a given threshold, presuming that occurrence frequencies of the attack pattern in normal text are known. We have applied the methodology on the network intrusion detection system Snort’s rule set. The evaluation confirms that Snort in its default configuration aims at not being excessively privacy invasive. However we have identified s…

Set (abstract data type)Pattern sizeEngineeringbusiness.industryPrivacy softwareData miningNetwork intrusion detectionLeakage (economics)computer.software_genreComputer securitybusinesscomputerSignature (logic)
researchProduct

<strong>New tool useful for drug discovery validated through benchmark datasets</strong>

2018

Atomic Weighted Vectors (AWVs) are vectors that contain the codified information of molecular structures, which can apply to a set of Aggregation Operators (AOs) to calculate total and local molecular descriptors (MDs). This article presents an exploratory study of a new tool useful for drug discovery using different datasets, such as DRAGON and Sutherland’s datasets, as well as their comparison with other well-known approaches. In order to evaluate the performance of the tool, several statistics and QSAR/QSPR experiments were performed. Variability analyses are used to quantify the information content of the AWVs obtained from the tool, by the way of an information theory-based algorithm. …

Set (abstract data type)Quantitative structure–activity relationshipOrthogonalityComputer scienceMolecular descriptorPrincipal component analysisGenetic algorithmBenchmark (computing)Data miningInformation theorycomputer.software_genrecomputerProceedings of MOL2NET 2018, International Conference on Multidisciplinary Sciences, 4th edition
researchProduct

A Logical Explication of the Concepts of Incomplete and Uncertain Information

1994

Discovery of elementary knowledge and its constituents, i.e. information contained in objects of reality is realized through asking questions including certain aspects called attributes in this paper. We describe a fragment of a discovered reality as an information system (cf. Pawlak [1,3,4]), which consists of the universum U of all the objects of this reality we are concerned with, and of a set A of attributes understood as functions each of which assigns to every object of U 1) a value of given attribute belonging to A or 2) an interval of approximate values of this attribute, i.e. an established set of possible values of this attribute. From the point of view of the cognitive agent and …

Set (abstract data type)Theoretical computer scienceExplicationFragment (logic)Computer scienceInformation systemPoint (geometry)Interval (mathematics)Data miningcomputer.software_genreObject (computer science)computerValue (mathematics)
researchProduct

Semantic traffic applications based on DatexII

2009

In this work we demonstrate a particular use of ontologies based on the European specifications DATEXII. These specifications are designed and developed as a traffic and travel data exchange mechanism by a European task force to set up and standardise the interface between traffic control and information centres. It is the reference for applications that are developed and implemented in Europe.This language describes concepts and structures of data related to traffic, but the description is just syntactic, not semantic. Therefore the objective to be reached in this part of the research has been to develop a semantic description in order to carry out some applications like syndication and a …

Set (abstract data type)Web syndicationSemantic gridInformation retrievalInterface (Java)Computer scienceData exchangeSemantic computingSemantic analyticsSemantic Web StackData miningcomputer.software_genrecomputerProceedings of the 2009 Euro American Conference on Telematics and Information Systems: New Opportunities to increase Digital Citizenship
researchProduct

Deriving and comparing deduplication techniques using a model-based classification

2015

Data deduplication has been a hot research topic and a large number of systems have been developed. These systems are usually seen as an inherently linked set of characteristics. However, a detailed analysis shows independent concepts that can be used in other systems. In this work, we perform this analysis on the main representatives of deduplication systems. We embed the results in a model, which shows two yet unexplored combinations of characteristics. In addition, the model enables a comprehensive evaluation of the representatives and the two new systems. We perform this evaluation based on real world data sets.

Set (abstract data type)Work (electrical)Computer scienceData deduplicationData miningcomputer.software_genrecomputerReal world dataProceedings of the Tenth European Conference on Computer Systems
researchProduct