Search results for "Data mining"

showing 10 items of 907 documents

Multi-scale analysis of the European airspace using network community detection

2014

We show that the European airspace can be represented as a multi-scale traffic network whose nodes are airports, sectors, or navigation points and links are defined and weighted according to the traffic of flights between the nodes. By using a unique database of the air traffic in the European airspace, we investigate the architecture of these networks with a special emphasis on their community structure. We propose that unsupervised network community detection algorithms can be used to monitor the current use of the airspaces and improve it by guiding the design of new ones. Specifically, we compare the performance of three community detection algorithms, also by using a null model which t…

FOS: Computer and information sciencesDatabases FactualDistributed computingSocial SciencesPoison controllcsh:MedicineSociologycommunity detectionData Mininglcsh:SciencePhysicsMultidisciplinaryMathematical modelApplied MathematicsPhysicsCommunity structureComputer Science - Social and Information NetworksAir traffic controlAir TravelSocial NetworksPhysical SciencesInterdisciplinary PhysicsSocial SystemsEngineering and TechnologyFree flightInformation TechnologyNetwork AnalysisAlgorithmsResearch ArticlePhysics - Physics and SocietyComputer and Information SciencesControl (management)FOS: Physical sciencesComputerApplications_COMPUTERSINOTHERSYSTEMSPhysics and Society (physics.soc-ph)Statistical MechanicsDatabasescomplex networkHumansArchitectureNetworks network communities socio-technical system complex systems Air Traffic ManagementSocial and Information Networks (cs.SI)Null modellcsh:RModels TheoreticalSettore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)Computational SociologySignal ProcessingAir trafficlcsh:QMathematics
researchProduct

Learning Structures in Earth Observation Data with Gaussian Processes

2020

Gaussian Processes (GPs) has experienced tremendous success in geoscience in general and for bio-geophysical parameter retrieval in the last years. GPs constitute a solid Bayesian framework to formulate many function approximation problems consistently. This paper reviews the main theoretical GP developments in the field. We review new algorithms that respect the signal and noise characteristics, that provide feature rankings automatically, and that allow applicability of associated uncertainty intervals to transport GP models in space and time. All these developments are illustrated in the field of geoscience and remote sensing at a local and global scales through a set of illustrative exa…

FOS: Computer and information sciencesEarth observation010504 meteorology & atmospheric sciencesComputer science0211 other engineering and technologiesFOS: Physical sciencesMachine Learning (stat.ML)02 engineering and technologyApplied Physics (physics.app-ph)computer.software_genre01 natural sciencesField (computer science)Physics::GeophysicsSet (abstract data type)Physics - Geophysicssymbols.namesakeStatistics - Machine LearningFeature (machine learning)Gaussian process021101 geological & geomatics engineering0105 earth and related environmental sciencesbusiness.industryPhysics - Applied PhysicsGeophysics (physics.geo-ph)Function approximationsymbolsGlobal Positioning SystemNoise (video)Data miningbusinesscomputer
researchProduct

Randomized kernels for large scale Earth observation applications

2020

Abstract Current remote sensing applications of bio-geophysical parameter estimation and image classification have to deal with an unprecedented big amount of heterogeneous and complex data sources. New satellite sensors involving a high number of improved time, space and wavelength resolutions give rise to challenging computational problems. Standard physical inversion techniques cannot cope efficiently with this new scenario. Dealing with land cover classification of the new image sources has also turned to be a complex problem requiring large amount of memory and processing time. In order to cope with these problems, statistical learning has greatly helped in the last years to develop st…

FOS: Computer and information sciencesEarth observationComputer Science - Machine Learning010504 meteorology & atmospheric sciencesComputer scienceRemote sensing application0211 other engineering and technologiesSoil Science02 engineering and technologycomputer.software_genre01 natural sciencesMachine Learning (cs.LG)Computers in Earth Sciences021101 geological & geomatics engineering0105 earth and related environmental sciencesRemote sensingContextual image classificationEstimation theoryHyperspectral imagingGeology15. Life on landKernel methodKernel regressionData miningComputational problemcomputerRemote Sensing of Environment
researchProduct

Gaussianizing the Earth: Multidimensional Information Measures for Earth Data Analysis

2021

Information theory is an excellent framework for analyzing Earth system data because it allows us to characterize uncertainty and redundancy, and is universally interpretable. However, accurately estimating information content is challenging because spatio-temporal data is high-dimensional, heterogeneous and has non-linear characteristics. In this paper, we apply multivariate Gaussianization for probability density estimation which is robust to dimensionality, comes with statistical guarantees, and is easy to apply. In addition, this methodology allows us to estimate information-theoretic measures to characterize multivariate densities: information, entropy, total correlation, and mutual in…

FOS: Computer and information sciencesMultivariate statisticsGeneral Computer ScienceComputer scienceMachine Learning (stat.ML)Mutual informationInformation theorycomputer.software_genreStatistics - ApplicationsEarth system scienceRedundancy (information theory)13. Climate actionStatistics - Machine LearningGeneral Earth and Planetary SciencesEntropy (information theory)Applications (stat.AP)Total correlationData miningElectrical and Electronic EngineeringInstrumentationcomputerCurse of dimensionality
researchProduct

Statistically validated mobile communication networks: the evolution of motifs in European and Chinese data

2014

Big data open up unprecedented opportunities to investigate complex systems including the society. In particular, communication data serve as major sources for computational social sciences but they have to be cleaned and filtered as they may contain spurious information due to recording errors as well as interactions, like commercial and marketing activities, not directly related to the social network. The network constructed from communication data can only be considered as a proxy for the network of social relationships. Here we apply a systematic method, based on multiple hypothesis testing, to statistically validate the links and then construct the corresponding Bonferroni network, gen…

FOS: Computer and information sciencesPhysics - Physics and SocietyBig dataFOS: Physical sciencesGeneral Physics and AstronomyPhysics and Society (physics.soc-ph)computer.software_genre01 natural sciences010305 fluids & plasmassymbols.namesake0103 physical sciences010306 general physicsProxy (statistics)Social and Information Networks (cs.SI)PhysicsSocial networkbusiness.industryComputer Science - Social and Information NetworksComplex networkcomplex networks social systems statistically validated networks mobile call records 3-motifsSettore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)Bonferroni correctionMobile phonesymbolsMobile telephonyData miningRaw databusinesscomputer
researchProduct

Detecting informative higher-order interactions in statistically validated hypergraphs

2021

Recent empirical evidence has shown that in many real-world systems, successfully represented as networks, interactions are not limited to dyads, but often involve three or more agents at a time. These data are better described by hypergraphs, where hyperlinks encode higher-order interactions among a group of nodes. In spite of the large number of works on networks, highlighting informative hyperlinks in hypergraphs obtained from real world data is still an open problem. Here we propose an analytic approach to filter hypergraphs by identifying those hyperlinks that are over-expressed with respect to a random null hypothesis, and represent the most relevant higher-order connections. We apply…

FOS: Computer and information sciencesPhysics - Physics and SocietyComputer scienceQC1-999Open problemFOS: Physical sciencesGeneral Physics and AstronomyPhysics and Society (physics.soc-ph)Astrophysicscomputer.software_genreENCODEMethodology (stat.ME)Statistics - MethodologySocial and Information Networks (cs.SI)PhysicsComputer Science - Social and Information NetworksFilter (signal processing)HyperlinkClass (biology)Settore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)QB460-466Pairwise comparisonData miningNoise (video)Null hypothesiscomputerhigher order interactions statistical validation complex networksCommunications Physics
researchProduct

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

2012

Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…

FOS: Computer and information sciencesStatistics and ProbabilityBurrows–Wheeler transformComputer scienceData_CODINGANDINFORMATIONTHEORYBurrows-Wheeler transformcomputer.software_genreBiochemistryBurrows-Wheeler transform; Data Compression; Next-generation sequencingComputer Science - Data Structures and AlgorithmsEscherichia coliCode (cryptography)HumansOverhead (computing)Data Structures and Algorithms (cs.DS)Computer SimulationQuantitative Biology - GenomicsMolecular BiologyGenomics (q-bio.GN)Genome HumanString (computer science)Search engine indexingSortingGenomicsSequence Analysis DNAConstruct (python library)Data CompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesNext-generation sequencingData miningDatabases Nucleic AcidcomputerAlgorithmsData compression
researchProduct

Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach

2021

Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and…

FOS: Computer and information sciencesStatistics and ProbabilityComputer Science - Machine LearningcausalityComputer Science - Artificial IntelligenceHeuristic (computer science)Computer scienceeducationMachine Learning (stat.ML)transportabilitycomputer.software_genre01 natural sciencesMachine Learning (cs.LG)R-kielimissing dataQA76.75-76.765; QA273-280010104 statistics & probabilitydo-calculuscausality; do-calculus; selection bias; transportability; missing data; case-control design; meta-analysisStatistics - Machine LearningSearch algorithmselection bias0101 mathematicsParametric statisticspäättelymeta-analyysicase-control designhakualgoritmit113 Computer and information sciencesMissing datameta-analysisIdentification (information)Artificial Intelligence (cs.AI)Causal inferencekausaliteettiIdentifiabilityProbability distributionData miningStatistics Probability and UncertaintycomputerSoftwareJournal of Statistical Software
researchProduct

Centrality measures for networks with community structure

2016

Understanding the network structure, and finding out the influential nodes is a challenging issue in the large networks. Identifying the most influential nodes in the network can be useful in many applications like immunization of nodes in case of epidemic spreading, during intentional attacks on complex networks. A lot of research is done to devise centrality measures which could efficiently identify the most influential nodes in the network. There are two major approaches to the problem: On one hand, deterministic strategies that exploit knowledge about the overall network topology in order to find the influential nodes, while on the other end, random strategies are completely agnostic ab…

FOS: Computer and information sciencesStatistics and ProbabilityPhysics - Physics and SocietyExploitComplex networksFOS: Physical sciencesNetwork sciencePhysics and Society (physics.soc-ph)Network theoryMachine learningcomputer.software_genreNetwork topologyImmunization strategies01 natural sciences010305 fluids & plasmas0103 physical sciences010306 general physicsMathematicsSocial and Information Networks (cs.SI)Structure (mathematical logic)[PHYS.PHYS]Physics [physics]/Physics [physics]business.industryCommunity structureComputer Science - Social and Information NetworksComplex networkEpidemic dynamicsCondensed Matter Physics[ PHYS.PHYS ] Physics [physics]/Physics [physics]Community structureArtificial intelligenceData miningbusinessCentralitycomputer
researchProduct

A multi-scale area-interaction model for spatio-temporal point patterns

2018

Models for fitting spatio-temporal point processes should incorporate spatio-temporal inhomogeneity and allow for different types of interaction between points (clustering or regularity). This paper proposes an extension of the spatial multi-scale area-interaction model to a spatio-temporal framework. This model allows for interaction between points at different spatio-temporal scales and the inclusion of covariates. We fit the proposed model to varicella cases registered during 2013 in Valencia, Spain. The fitted model indicates small scale clustering and regularity for higher spatio-temporal scales.

FOS: Computer and information sciencesStatistics and ProbabilityScale (ratio)Computer scienceManagement Monitoring Policy and LawMulti-scale area-interaction modelcomputer.software_genreVaricella01 natural sciencesPoint processMethodology (stat.ME)010104 statistics & probability0502 economics and businessStatisticsCovariate60D05 60G55 62M30Point (geometry)0101 mathematicsComputers in Earth SciencesCluster analysisStatistics - Methodology050205 econometrics 05 social sciencesInteraction modelExtension (predicate logic)Gibbs point processesComputingMethodologies_PATTERNRECOGNITIONSpatio-temporal point processesData miningcomputer
researchProduct