Search results for " mining"
showing 10 items of 1548 documents
Detecting informative higher-order interactions in statistically validated hypergraphs
2021
Recent empirical evidence has shown that in many real-world systems, successfully represented as networks, interactions are not limited to dyads, but often involve three or more agents at a time. These data are better described by hypergraphs, where hyperlinks encode higher-order interactions among a group of nodes. In spite of the large number of works on networks, highlighting informative hyperlinks in hypergraphs obtained from real world data is still an open problem. Here we propose an analytic approach to filter hypergraphs by identifying those hyperlinks that are over-expressed with respect to a random null hypothesis, and represent the most relevant higher-order connections. We apply…
Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
2012
Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…
Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach
2021
Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and…
Centrality measures for networks with community structure
2016
Understanding the network structure, and finding out the influential nodes is a challenging issue in the large networks. Identifying the most influential nodes in the network can be useful in many applications like immunization of nodes in case of epidemic spreading, during intentional attacks on complex networks. A lot of research is done to devise centrality measures which could efficiently identify the most influential nodes in the network. There are two major approaches to the problem: On one hand, deterministic strategies that exploit knowledge about the overall network topology in order to find the influential nodes, while on the other end, random strategies are completely agnostic ab…
A multi-scale area-interaction model for spatio-temporal point patterns
2018
Models for fitting spatio-temporal point processes should incorporate spatio-temporal inhomogeneity and allow for different types of interaction between points (clustering or regularity). This paper proposes an extension of the spatial multi-scale area-interaction model to a spatio-temporal framework. This model allows for interaction between points at different spatio-temporal scales and the inclusion of covariates. We fit the proposed model to varicella cases registered during 2013 in Valencia, Spain. The fitted model indicates small scale clustering and regularity for higher spatio-temporal scales.
Semantic Computing of Moods Based on Tags in Social Media of Music
2014
Social tags inherent in online music services such as Last.fm provide a rich source of information on musical moods. The abundance of social tags makes this data highly beneficial for developing techniques to manage and retrieve mood information, and enables study of the relationships between music content and mood representations with data substantially larger than that available for conventional emotion research. However, no systematic assessment has been done on the accuracy of social tags and derived semantic models at capturing mood information in music. We propose a novel technique called Affective Circumplex Transformation (ACT) for representing the moods of music tracks in an interp…
Synergetic and redundant information flow detected by unnormalized Granger causality: application to resting state fMRI
2015
Objectives: We develop a framework for the analysis of synergy and redundancy in the pattern of information flow between subsystems of a complex network. Methods: The presence of redundancy and/or synergy in multivariate time series data renders difficult to estimate the neat flow of information from each driver variable to a given target. We show that adopting an unnormalized definition of Granger causality one may put in evidence redundant multiplets of variables influencing the target by maximizing the total Granger causality to a given target, over all the possible partitions of the set of driving variables. Consequently we introduce a pairwise index of synergy which is zero when two in…
Plaid model for microarray data: an enhancement of the pruning step
2010
Microarrays have become a standard tool for studying gene functions. For example, we can investigate if a subset of genes shows a coherent expression pattern under different conditions. The plaid model, a model-based biclustering method, can be used to incorporate the addiction structure used for the microarray experiment. In this paper we describe an enhancement for the plaid model algorithm based on the theory of the false discovery rate.
Epistemic uncertainty in fault tree analysis approached by the evidence theory
2012
Abstract Process plants may be subjected to dangerous events. Different methodologies are nowadays employed to identify failure events, that can lead to severe accidents, and to assess the relative probability of occurrence. As for rare events reliability data are generally poor, leading to a partial or incomplete knowledge of the process, the classical probabilistic approach can not be successfully used. Such an uncertainty, called epistemic uncertainty, can be treated by means of different methodologies, alternative to the probabilistic one. In this work, the Evidence Theory or Dempster–Shafer theory (DST) is proposed to deal with this kind of uncertainty. In particular, the classical Fau…
TREEZZY2, a Fuzzy Logic Computer Code for Fault Tree and Event Tree Analyses
2004
In conventional approach to reliability analysis using logical trees methodologies, uncertainties in system components or basic events failure probabilities are approached by assuming probability distribution functions. However, data are often insufficient for statistical estimation, and therefore it is required to resort to approximate estimations. Moreover, complicate calculations are needed to propagate uncertainties up to the final results. In our work, in order to take account of the uncertainties in system failure probabilities, the methodology based on fuzzy sets theory is used both in fault tree and event tree analyses. This paper just presents our work in this issue, which resulted…