Search results for "DATA MINING"
showing 10 items of 907 documents
New Trends in Graph Mining
2010
Searching for repeated features characterizing biological data is fundamental in computational biology. When biological networks are under analysis, the presence of repeated modules across the same network (or several distinct ones) is shown to be very relevant. Indeed, several studies prove that biological networks can be often understood in terms of coalitions of basic repeated building blocks, often referred to as network motifs.This work provides a review of the main techniques proposed for motif extraction from biological networks. In particular, main intrinsic difficulties related to the problem are pointed out, along with solutions proposed in the literature to overcome them. Open ch…
A Coclustering Approach for Mining Large Protein-Protein Interaction Networks
2012
Several approaches have been presented in the literature to cluster Protein-Protein Interaction (PPI) networks. They can be grouped in two main categories: those allowing a protein to participate in different clusters and those generating only nonoverlapping clusters. In both cases, a challenging task is to find a suitable compromise between the biological relevance of the results and a comprehensive coverage of the analyzed networks. Indeed, methods returning high accurate results are often able to cover only small parts of the input PPI network, especially when low-characterized networks are considered. We present a coclustering-based technique able to generate both overlapping and nonove…
A new method for morphometric analysis of opal phytoliths from plants.
2014
Micro-morphometry has substantially gained ground in the field of phytolith analysis, but the comparability of results is limited due to the use of different methods. This paper presents a new, user-friendly method based on open-source software (FIJI) that is proposed as a step towards the introduction of a standard method. After obtaining a mask of a phytolith by making a digital drawing, 27 commonly used variables of size and shape are measured automatically. This method is not only useful for phytolith analysis, but may also be used for other fields of morphometric research. Users can furthermore customize the software tool when additional variables are required.
Blended Learning als Spielfeld für Learning Analytics und Educational Data Mining
2020
Der Einsatz digitaler Lernformate im Blended Learning bietet demnach Chancen in mindestens zwei Bereichen. Zum einen konnen digitale Lernformate direkt die Lernprozesse von Studierenden gunstig beeinflussen, ihre Leistungen verbessern und zudem positive Effekte auf vielen weiteren Ebenen wie der Motivation oder des Selbstkonzeptes bewirken. Zum anderen generieren digitale Lernformate eine Fulle von Daten in vielfaltiger Gestalt. Studierende erzeugen bei der Arbeit mit digitalen Werkzeugen Nutzungsdaten, wie Verweildauern und Aktivitatsprofile, sie produzieren Leistungsdaten aus digitalen Aufgaben, sie hinterlassen Textbeitrage in Foren und Chats. All diese Daten konnen genutzt werden, um mi…
Boosting Design Space Explorations with Existing or Automatically Learned Knowledge
2012
During development, processor architectures can be tuned and configured by many different parameters. For benchmarking, automatic design space explorations (DSEs) with heuristic algorithms are a helpful approach to find the best settings for these parameters according to multiple objectives, e.g. performance, energy consumption, or real-time constraints. But if the setup is slightly changed and a new DSE has to be performed, it will start from scratch, resulting in very long evaluation times. To reduce the evaluation times we extend the NSGA-II algorithm in this article, such that automatic DSEs can be supported with a set of transformation rules defined in a highly readable format, the fuz…
Evaluation of Record Linkage Methods for Iterative Insertions
2009
Summary Objectives: There have been many developments and applications of mathematical methods in the context of record linkage as one area of interdisciplinary research efforts. However, comparative evaluations of record linkage methods are still underrepresented. In this paper improvements of the Fellegi-Sunter model are compared with other elaborated classification methods in order to direct further research endeavors to the most promising methodologies. Methods: The task of linking records can be viewed as a special form of object identification. We consider several non-stochastic methods and procedures for the record linkage task in addition to the Fellegi-Sunter model and perform an e…
Improving clustering of Web bot and human sessions by applying Principal Component Analysis
2019
View references (18) The paper addresses the problem of modeling Web sessions of bots and legitimate users (humans) as feature vectors for their use at the input of classification models. So far many different features to discriminate bots’ and humans’ navigational patterns have been considered in session models but very few studies were devoted to feature selection and dimensionality reduction in the context of bot detection. We propose applying Principal Component Analysis (PCA) to develop improved session models based on predictor variables being efficient discriminants of Web bots. The proposed models are used in session clustering, whose performance is evaluated in terms of the purity …
Functional connectivity inference from fMRI data using multivariate information measures
2022
Abstract Shannon’s entropy or an extension of Shannon’s entropy can be used to quantify information transmission between or among variables. Mutual information is the pair-wise information that captures nonlinear relationships between variables. It is more robust than linear correlation methods. Beyond mutual information, two generalizations are defined for multivariate distributions: interaction information or co-information and total correlation or multi-mutual information. In comparison to mutual information, interaction information and total correlation are underutilized and poorly studied in applied neuroscience research. Quantifying information flow between brain regions is not explic…
Fast dendrogram-based OTU clustering using sequence embedding
2014
Biodiversity assessment is an important step in a metagenomic processing pipeline. The biodiversity of a microbial metagenome is often estimated by grouping its 16S rRNA reads into operational taxonomic units or OTUs. These metagenomic datasets are typically large and hence require effective yet accurate computational methods for processing.In this paper, we introduce a new hierarchical clustering method called CRiSPy-Embed which aims to produce high-quality clustering results at a low computational cost. We tackle two computational issues of the current OTU hierarchical clustering approach: (1) the compute-intensive sequence alignment operation for building the distance matrix and (2) the …
Domain-Specific Characteristics of Data Quality
2017
The research discusses the issue how to describe data quality and what should be taken into account when developing an universal data quality management solution. The proposed approach is to create quality specifications for each kind of data objects and to make them executable. The specification can be executed step-by-step according to business process descriptions, ensuring the gradual accumulation of data in the database and data quality checking according to the specific use case. The described approach can be applied to check the completeness, accuracy, timeliness and consistency of accumulated data.