Search results for " mining"

showing 10 items of 1548 documents

Detecting informative higher-order interactions in statistically validated hypergraphs

2021

Recent empirical evidence has shown that in many real-world systems, successfully represented as networks, interactions are not limited to dyads, but often involve three or more agents at a time. These data are better described by hypergraphs, where hyperlinks encode higher-order interactions among a group of nodes. In spite of the large number of works on networks, highlighting informative hyperlinks in hypergraphs obtained from real world data is still an open problem. Here we propose an analytic approach to filter hypergraphs by identifying those hyperlinks that are over-expressed with respect to a random null hypothesis, and represent the most relevant higher-order connections. We apply…

FOS: Computer and information sciencesPhysics - Physics and SocietyComputer scienceQC1-999Open problemFOS: Physical sciencesGeneral Physics and AstronomyPhysics and Society (physics.soc-ph)Astrophysicscomputer.software_genreENCODEMethodology (stat.ME)Statistics - MethodologySocial and Information Networks (cs.SI)PhysicsComputer Science - Social and Information NetworksFilter (signal processing)HyperlinkClass (biology)Settore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)QB460-466Pairwise comparisonData miningNoise (video)Null hypothesiscomputerhigher order interactions statistical validation complex networksCommunications Physics

researchProduct

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

2012

Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…

FOS: Computer and information sciencesStatistics and ProbabilityBurrows–Wheeler transformComputer scienceData_CODINGANDINFORMATIONTHEORYBurrows-Wheeler transformcomputer.software_genreBiochemistryBurrows-Wheeler transform; Data Compression; Next-generation sequencingComputer Science - Data Structures and AlgorithmsEscherichia coliCode (cryptography)HumansOverhead (computing)Data Structures and Algorithms (cs.DS)Computer SimulationQuantitative Biology - GenomicsMolecular BiologyGenomics (q-bio.GN)Genome HumanString (computer science)Search engine indexingSortingGenomicsSequence Analysis DNAConstruct (python library)Data CompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesNext-generation sequencingData miningDatabases Nucleic AcidcomputerAlgorithmsData compression

researchProduct

Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach

2021

Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and…

FOS: Computer and information sciencesStatistics and ProbabilityComputer Science - Machine LearningcausalityComputer Science - Artificial IntelligenceHeuristic (computer science)Computer scienceeducationMachine Learning (stat.ML)transportabilitycomputer.software_genre01 natural sciencesMachine Learning (cs.LG)R-kielimissing dataQA76.75-76.765; QA273-280010104 statistics & probabilitydo-calculuscausality; do-calculus; selection bias; transportability; missing data; case-control design; meta-analysisStatistics - Machine LearningSearch algorithmselection bias0101 mathematicsParametric statisticspäättelymeta-analyysicase-control designhakualgoritmit113 Computer and information sciencesMissing datameta-analysisIdentification (information)Artificial Intelligence (cs.AI)Causal inferencekausaliteettiIdentifiabilityProbability distributionData miningStatistics Probability and UncertaintycomputerSoftwareJournal of Statistical Software

researchProduct

Centrality measures for networks with community structure

2016

Understanding the network structure, and finding out the influential nodes is a challenging issue in the large networks. Identifying the most influential nodes in the network can be useful in many applications like immunization of nodes in case of epidemic spreading, during intentional attacks on complex networks. A lot of research is done to devise centrality measures which could efficiently identify the most influential nodes in the network. There are two major approaches to the problem: On one hand, deterministic strategies that exploit knowledge about the overall network topology in order to find the influential nodes, while on the other end, random strategies are completely agnostic ab…

FOS: Computer and information sciencesStatistics and ProbabilityPhysics - Physics and SocietyExploitComplex networksFOS: Physical sciencesNetwork sciencePhysics and Society (physics.soc-ph)Network theoryMachine learningcomputer.software_genreNetwork topologyImmunization strategies01 natural sciences010305 fluids & plasmas0103 physical sciences010306 general physicsMathematicsSocial and Information Networks (cs.SI)Structure (mathematical logic)[PHYS.PHYS]Physics [physics]/Physics [physics]business.industryCommunity structureComputer Science - Social and Information NetworksComplex networkEpidemic dynamicsCondensed Matter Physics[ PHYS.PHYS ] Physics [physics]/Physics [physics]Community structureArtificial intelligenceData miningbusinessCentralitycomputer

researchProduct

A multi-scale area-interaction model for spatio-temporal point patterns

2018

Models for fitting spatio-temporal point processes should incorporate spatio-temporal inhomogeneity and allow for different types of interaction between points (clustering or regularity). This paper proposes an extension of the spatial multi-scale area-interaction model to a spatio-temporal framework. This model allows for interaction between points at different spatio-temporal scales and the inclusion of covariates. We fit the proposed model to varicella cases registered during 2013 in Valencia, Spain. The fitted model indicates small scale clustering and regularity for higher spatio-temporal scales.

FOS: Computer and information sciencesStatistics and ProbabilityScale (ratio)Computer scienceManagement Monitoring Policy and LawMulti-scale area-interaction modelcomputer.software_genreVaricella01 natural sciencesPoint processMethodology (stat.ME)010104 statistics & probability0502 economics and businessStatisticsCovariate60D05 60G55 62M30Point (geometry)0101 mathematicsComputers in Earth SciencesCluster analysisStatistics - Methodology050205 econometrics 05 social sciencesInteraction modelExtension (predicate logic)Gibbs point processesComputingMethodologies_PATTERNRECOGNITIONSpatio-temporal point processesData miningcomputer

researchProduct

Semantic Computing of Moods Based on Tags in Social Media of Music

2014

Social tags inherent in online music services such as Last.fm provide a rich source of information on musical moods. The abundance of social tags makes this data highly beneficial for developing techniques to manage and retrieve mood information, and enables study of the relationships between music content and mood representations with data substantially larger than that available for conventional emotion research. However, no systematic assessment has been done on the accuracy of social tags and derived semantic models at capturing mood information in music. We propose a novel technique called Affective Circumplex Transformation (ACT) for representing the moods of music tracks in an interp…

FOS: Computer and information sciencesVocabularyComputer scienceMusic information retrievalmedia_common.quotation_subjectSemantic analysis (machine learning)Moodscomputer.software_genreAffect (psychology)SemanticsComputer Science - Information RetrievalSemantic computingMusic information retrievalAffective computingmedia_commonSocial and Information Networks (cs.SI)ta113Probabilistic latent semantic analysisSocial tagsbusiness.industryComputer Science - Social and Information NetworksMultimedia (cs.MM)Semantic analysisComputer Science ApplicationsMoodComputational Theory and MathematicsWeb miningta6131Vector space modelArtificial intelligenceGenresbusinesscomputerComputer Science - MultimediaInformation Retrieval (cs.IR)MusicNatural language processingPrediction.Information SystemsIEEE Transactions on Knowledge and Data Engineering

researchProduct

Synergetic and redundant information flow detected by unnormalized Granger causality: application to resting state fMRI

2015

Objectives: We develop a framework for the analysis of synergy and redundancy in the pattern of information flow between subsystems of a complex network. Methods: The presence of redundancy and/or synergy in multivariate time series data renders difficult to estimate the neat flow of information from each driver variable to a given target. We show that adopting an unnormalized definition of Granger causality one may put in evidence redundant multiplets of variables influencing the target by maximizing the total Granger causality to a given target, over all the possible partitions of the set of driving variables. Consequently we introduce a pairwise index of synergy which is zero when two in…

FOS: Computer and information sciencesgranger causality (GC)Multivariate statisticsComputer scienceRestComputer Science - Information TheoryBiomedical EngineeringsynergyFOS: Physical sciencescomputer.software_genre01 natural sciences03 medical and health sciences0302 clinical medicineGranger causality0103 physical sciencesConnectomeRedundancy (engineering)HumansBrain connectivityTime series010306 general physicsModels StatisticalHuman Connectome ProjectResting state fMRIredundancybusiness.industryInformation Theory (cs.IT)functional magnetic resonance imaging (fMRI)BrainPattern recognitionComplex networkMagnetic Resonance ImagingVariable (computer science)Physics - Data Analysis Statistics and ProbabilityQuantitative Biology - Neurons and CognitionFOS: Biological sciencesSettore ING-INF/06 - Bioingegneria Elettronica E InformaticaPairwise comparisonNeurons and Cognition (q-bio.NC)Artificial intelligenceData miningNerve Netbusinesscomputer030217 neurology & neurosurgeryData Analysis Statistics and Probability (physics.data-an)

researchProduct

Plaid model for microarray data: an enhancement of the pruning step

2010

Microarrays have become a standard tool for studying gene functions. For example, we can investigate if a subset of genes shows a coherent expression pattern under different conditions. The plaid model, a model-based biclustering method, can be used to incorporate the addiction structure used for the microarray experiment. In this paper we describe an enhancement for the plaid model algorithm based on the theory of the false discovery rate.

False discovery rateStructure (mathematical logic)MicroarrayMicroarray Plaid model pruning step.Microarray analysis techniquesComputer sciencefood and beveragescomputer.software_genreBiclusteringDNA microarray experimentPruning (decision trees)Data miningDNA microarraySettore SECS-S/01 - Statisticacomputer

researchProduct

Epistemic uncertainty in fault tree analysis approached by the evidence theory

2012

Abstract Process plants may be subjected to dangerous events. Different methodologies are nowadays employed to identify failure events, that can lead to severe accidents, and to assess the relative probability of occurrence. As for rare events reliability data are generally poor, leading to a partial or incomplete knowledge of the process, the classical probabilistic approach can not be successfully used. Such an uncertainty, called epistemic uncertainty, can be treated by means of different methodologies, alternative to the probabilistic one. In this work, the Evidence Theory or Dempster–Shafer theory (DST) is proposed to deal with this kind of uncertainty. In particular, the classical Fau…

Fault tree analysisEpistemic uncertaintyGeneral Chemical EngineeringProbabilistic logicEnergy Engineering and Power TechnologyInterval (mathematics)Management Science and Operations Researchcomputer.software_genreIndustrial and Manufacturing EngineeringFTARisk analysiEvidence theoryControl and Systems EngineeringSettore ING-IND/17 - Impianti Industriali MeccaniciRare eventsSensitivity analysisData miningUncertainty quantificationSafety Risk Reliability and QualitycomputerUncertainty analysisFood ScienceEvent (probability theory)Mathematics

researchProduct

TREEZZY2, a Fuzzy Logic Computer Code for Fault Tree and Event Tree Analyses

2004

In conventional approach to reliability analysis using logical trees methodologies, uncertainties in system components or basic events failure probabilities are approached by assuming probability distribution functions. However, data are often insufficient for statistical estimation, and therefore it is required to resort to approximate estimations. Moreover, complicate calculations are needed to propagate uncertainties up to the final results. In our work, in order to take account of the uncertainties in system failure probabilities, the methodology based on fuzzy sets theory is used both in fault tree and event tree analyses. This paper just presents our work in this issue, which resulted…

Fault tree analysisEvent treeIncremental decision treeTree (data structure)Computer scienceEvent tree analysisFuzzy setProbability distributionData miningcomputer.software_genreFuzzy logiccomputerAlgorithm

researchProduct