Search results for "DATA MINING"

showing 10 items of 907 documents

New Trends in Graph Mining

2010

Searching for repeated features characterizing biological data is fundamental in computational biology. When biological networks are under analysis, the presence of repeated modules across the same network (or several distinct ones) is shown to be very relevant. Indeed, several studies prove that biological networks can be often understood in terms of coalitions of basic repeated building blocks, often referred to as network motifs.This work provides a review of the main techniques proposed for motif extraction from biological networks. In particular, main intrinsic difficulties related to the problem are pointed out, along with solutions proposed in the literature to overcome them. Open ch…

Bioinformatics network analysisNetwork motifBiological dataColoredComputer scienceGraph (abstract data type)Network scienceData miningMotif (music)computer.software_genrecomputerBiological networkInternational Journal of Knowledge Discovery in Bioinformatics

researchProduct

A Coclustering Approach for Mining Large Protein-Protein Interaction Networks

2012

Several approaches have been presented in the literature to cluster Protein-Protein Interaction (PPI) networks. They can be grouped in two main categories: those allowing a protein to participate in different clusters and those generating only nonoverlapping clusters. In both cases, a challenging task is to find a suitable compromise between the biological relevance of the results and a comprehensive coverage of the analyzed networks. Indeed, methods returning high accurate results are often able to cover only small parts of the input PPI network, especially when low-characterized networks are considered. We present a coclustering-based technique able to generate both overlapping and nonove…

Biologycomputer.software_genreBioinformatics network analysis co-clusteringTask (project management)Set (abstract data type)Protein Interaction MappingGeneticsCluster (physics)Cluster AnalysisHumansRelevance (information retrieval)Protein Interaction MapsCluster analysisStructure (mathematical logic)Applied MathematicsProteinsprotein-protein interaction networksbiological networksComputingMethodologies_PATTERNRECOGNITIONCover (topology)Co-clusteringData miningcomputerAlgorithmsBiological networkBiotechnologyIEEE/ACM Transactions on Computational Biology and Bioinformatics

researchProduct

A new method for morphometric analysis of opal phytoliths from plants.

2014

Micro-morphometry has substantially gained ground in the field of phytolith analysis, but the comparability of results is limited due to the use of different methods. This paper presents a new, user-friendly method based on open-source software (FIJI) that is proposed as a step towards the introduction of a standard method. After obtaining a mask of a phytolith by making a digital drawing, 27 commonly used variables of size and shape are measured automatically. This method is not only useful for phytolith analysis, but may also be used for other fields of morphometric research. Users can furthermore customize the software tool when additional variables are required.

BiometryComputer sciencebusiness.industrySoftware toolPhytolithsComparabilityComputational BiologyImage processingPlantscomputer.software_genreField (computer science)Image analysisSoftwareMorphometric analysisMicro-morphometryPhytolithPlant CellsImage Processing Computer-AssistedFIJIData miningbusinessInstrumentationcomputerArcheobotanySoftwareMicroscopy and microanalysis : the official journal of Microscopy Society of America, Microbeam Analysis Society, Microscopical Society of Canada

researchProduct

Blended Learning als Spielfeld für Learning Analytics und Educational Data Mining

2020

Der Einsatz digitaler Lernformate im Blended Learning bietet demnach Chancen in mindestens zwei Bereichen. Zum einen konnen digitale Lernformate direkt die Lernprozesse von Studierenden gunstig beeinflussen, ihre Leistungen verbessern und zudem positive Effekte auf vielen weiteren Ebenen wie der Motivation oder des Selbstkonzeptes bewirken. Zum anderen generieren digitale Lernformate eine Fulle von Daten in vielfaltiger Gestalt. Studierende erzeugen bei der Arbeit mit digitalen Werkzeugen Nutzungsdaten, wie Verweildauern und Aktivitatsprofile, sie produzieren Leistungsdaten aus digitalen Aufgaben, sie hinterlassen Textbeitrage in Foren und Chats. All diese Daten konnen genutzt werden, um mi…

Blended learningPolitical scienceLearning analyticsLibrary scienceEducational data mining

researchProduct

Boosting Design Space Explorations with Existing or Automatically Learned Knowledge

2012

During development, processor architectures can be tuned and configured by many different parameters. For benchmarking, automatic design space explorations (DSEs) with heuristic algorithms are a helpful approach to find the best settings for these parameters according to multiple objectives, e.g. performance, energy consumption, or real-time constraints. But if the setup is slightly changed and a new DSE has to be performed, it will start from scratch, resulting in very long evaluation times. To reduce the evaluation times we extend the NSGA-II algorithm in this article, such that automatic DSEs can be supported with a set of transformation rules defined in a highly readable format, the fuz…

Boosting (machine learning)Fuzzy ruleFuzzy Control LanguageComputer scienceDecision treeBenchmarkingData miningEnergy consumptionGridcomputer.software_genreMulti-objective optimizationcomputercomputer.programming_language

researchProduct

Evaluation of Record Linkage Methods for Iterative Insertions

2009

Summary Objectives: There have been many developments and applications of mathematical methods in the context of record linkage as one area of interdisciplinary research efforts. However, comparative evaluations of record linkage methods are still underrepresented. In this paper improvements of the Fellegi-Sunter model are compared with other elaborated classification methods in order to direct further research endeavors to the most promising methodologies. Methods: The task of linking records can be viewed as a special form of object identification. We consider several non-stochastic methods and procedures for the record linkage task in addition to the Fellegi-Sunter model and perform an e…

Boosting (machine learning)Medical Records Systems ComputerizedComputer scienceDecision treeHealth Informaticscomputer.software_genreMachine learningFuzzy LogicHealth Information ManagementGermanyExpectation–maximization algorithmHumansRegistriesAdvanced and Specialized NursingElectronic Data ProcessingModels Statisticalbusiness.industryData CollectionDecision TreesSupport vector machineClassification methodsMedical Record LinkageData miningArtificial intelligencebusinesscomputerAlgorithmsSoftwareRecord linkageMethods of Information in Medicine

researchProduct

Improving clustering of Web bot and human sessions by applying Principal Component Analysis

2019

View references (18) The paper addresses the problem of modeling Web sessions of bots and legitimate users (humans) as feature vectors for their use at the input of classification models. So far many different features to discriminate bots’ and humans’ navigational patterns have been considered in session models but very few studies were devoted to feature selection and dimensionality reduction in the context of bot detection. We propose applying Principal Component Analysis (PCA) to develop improved session models based on predictor variables being efficient discriminants of Web bots. The proposed models are used in session clustering, whose performance is evaluated in terms of the purity …

Bot detectionPrincipal Component AnalysisPCALog analysisComputer sciencek-meansInternet robotcomputer.software_genreClassificationWeb botDimensionality reductionClusteringWeb serverPrincipal component analysisFeature selectionData miningCluster analysiscomputerCommunications of the ECMS

researchProduct

Functional connectivity inference from fMRI data using multivariate information measures

2022

Abstract Shannon’s entropy or an extension of Shannon’s entropy can be used to quantify information transmission between or among variables. Mutual information is the pair-wise information that captures nonlinear relationships between variables. It is more robust than linear correlation methods. Beyond mutual information, two generalizations are defined for multivariate distributions: interaction information or co-information and total correlation or multi-mutual information. In comparison to mutual information, interaction information and total correlation are underutilized and poorly studied in applied neuroscience research. Quantifying information flow between brain regions is not explic…

Brain MappingComputer scienceEntropyCognitive NeuroscienceConditional mutual informationBrainMultivariate normal distributionMutual informationcomputer.software_genreMagnetic Resonance ImagingInteraction informationRedundancy (information theory)Artificial IntelligenceEntropy (information theory)Computer SimulationTotal correlationInformation flow (information theory)Data miningcomputerNeural Networks

researchProduct

Fast dendrogram-based OTU clustering using sequence embedding

2014

Biodiversity assessment is an important step in a metagenomic processing pipeline. The biodiversity of a microbial metagenome is often estimated by grouping its 16S rRNA reads into operational taxonomic units or OTUs. These metagenomic datasets are typically large and hence require effective yet accurate computational methods for processing.In this paper, we introduce a new hierarchical clustering method called CRiSPy-Embed which aims to produce high-quality clustering results at a low computational cost. We tackle two computational issues of the current OTU hierarchical clustering approach: (1) the compute-intensive sequence alignment operation for building the distance matrix and (2) the …

Brown clusteringCURE data clustering algorithmSingle-linkage clusteringCorrelation clusteringCanopy clustering algorithmData miningBiologyHierarchical clustering of networksCluster analysiscomputer.software_genrecomputerHierarchical clusteringProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

researchProduct

Domain-Specific Characteristics of Data Quality

2017

The research discusses the issue how to describe data quality and what should be taken into account when developing an universal data quality management solution. The proposed approach is to create quality specifications for each kind of data objects and to make them executable. The specification can be executed step-by-step according to business process descriptions, ensuring the gradual accumulation of data in the database and data quality checking according to the specific use case. The described approach can be applied to check the completeness, accuracy, timeliness and consistency of accumulated data.

Business processComputer sciencecomputer.file_formatcomputer.software_genreElectronic mailData modelingUnified Modeling LanguageData qualityData miningExecutableCompleteness (statistics)Data objectscomputercomputer.programming_languageProceedings of the 2017 Federated Conference on Computer Science and Information Systems

researchProduct