Search results for " mining"

showing 10 items of 1548 documents

Mislabel Detection of Finnish Publication Ranks

2019

The paper proposes to analyze a data set of Finnish ranks of academic publication channels with Extreme Learning Machine (ELM). The purpose is to introduce and test recently proposed ELM-based mislabel detection approach with a rich set of features characterizing a publication channel. We will compare the architecture, accuracy, and, especially, the set of detected mislabels of the ELM-based approach to the corresponding reference results on the reference paper.

FOS: Computer and information sciencesComputer Science - Machine LearningComputer sciencerankinglistatMachine Learning (stat.ML)computer.software_genreMachine Learning (cs.LG)Set (abstract data type)Statistics - Machine LearningDigital Libraries (cs.DL)julkaisukanavatvirheanalyysimislabel detectionExtreme learning machineExtreme Learning Machine (ELM)publication channelsComputer Science - Digital LibrariesData setkoneoppiminendataData miningrankingsarviointicomputertieteellinen julkaisutoimintaCommunication channel
researchProduct

Integrating Domain Knowledge in Data-Driven Earth Observation With Process Convolutions

2022

The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itself. On the other hand, machine learning approaches are flexible data-driven tools, able to approximate arbitrarily complex functions, but lack interpretability and struggle when data is scarce or in extrapolation regimes. In this paper, we argue that hybrid learning schemes that combine both approa…

FOS: Computer and information sciencesComputer Science - Machine LearningEarth observationAdvanced microwave scanning radiometer-2 (AMSR-2)moderate resolution imaging spectroradiometer (MODIS)Computer scienceleaf area index (LAI)0211 other engineering and technologiesExtrapolationMachine Learning (stat.ML)02 engineering and technologycomputer.software_genreMachine Learning (cs.LG)Data-drivenConvolutionsymbols.namesakeadvanced scatterometer (ASCAT)Statistics - Machine Learningordinary differential equation (ODE)Electrical and Electronic EngineeringGaussian processsoil moisture and ocean salinity (SMOS)021101 geological & geomatics engineeringInterpretabilityForcing (recursion theory)machine learning (ML)soil moisture (SM)time series analysisgaussian process (GP)symbolsGeneral Earth and Planetary SciencesDomain knowledgeData mininggap fillingphysicscomputerfraction of absorbed photosynthetically active radiation (faPAR)IEEE Transactions on Geoscience and Remote Sensing
researchProduct

A perspective on Gaussian processes for Earth observation

2019

Earth observation (EO) by airborne and satellite remote sensing and in-situ observations play a fundamental role in monitoring our planet. In the last decade, machine learning and Gaussian processes (GPs) in particular has attained outstanding results in the estimation of bio-geo-physical variables from the acquired images at local and global scales in a time-resolved manner. GPs provide not only accurate estimates but also principled uncertainty estimates for the predictions, can easily accommodate multimodal data coming from different sensors and from multitemporal acquisitions, allow the introduction of physical knowledge, and a formal treatment of uncertainty quantification and error pr…

FOS: Computer and information sciencesComputer Science - Machine LearningEarth observationComputer scienceDatenmanagement und AnalyseMachine Learning (stat.ML)02 engineering and technology010402 general chemistrycomputer.software_genreStatistics - Applications01 natural sciencesMachine Learning (cs.LG)symbols.namesakeStatistics - Machine LearningApplications (stat.AP)Uncertainty quantificationGaussian processPhysical lawPropagation of uncertaintyMultidisciplinarybusiness.industryPerspective (graphical)gaussian processes021001 nanoscience & nanotechnology0104 chemical sciences13. Climate actionCausal inferenceComputer ScienceGlobal Positioning SystemsymbolsData mining0210 nano-technologybusinesscomputerPerspectivesNational Science Review
researchProduct

Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization With Medical Applications

2019

Medical applications challenge today's text categorization techniques by demanding both high accuracy and ease-of-interpretation. Although deep learning has provided a leap ahead in accuracy, this leap comes at the sacrifice of interpretability. To address this accuracy-interpretability challenge, we here introduce, for the first time, a text categorization approach that leverages the recently introduced Tsetlin Machine. In all brevity, we represent the terms of a text as propositional variables. From these, we capture categories using simple propositional formulae, such as: if "rash" and "reaction" and "penicillin" then Allergy. The Tsetlin Machine learns these formulae from a labelled tex…

FOS: Computer and information sciencesComputer Science - Machine LearningGeneral Computer ScienceComputer sciencetext categorizationNatural language understandingDecision treeMachine Learning (stat.ML)02 engineering and technologyVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Annen informasjonsteknologi: 559Machine learningcomputer.software_genresupervised learningMachine Learning (cs.LG)Naive Bayes classifierText miningStatistics - Machine Learning0202 electrical engineering electronic engineering information engineeringGeneral Materials ScienceTsetlin machinehealth informaticsInterpretabilityPropositional variableClassification algorithmsArtificial neural networkbusiness.industryDeep learning020208 electrical & electronic engineeringGeneral EngineeringRandom forestSupport vector machinemachine learningCategorization020201 artificial intelligence & image processingArtificial intelligencelcsh:Electrical engineering. Electronics. Nuclear engineeringbusinessPrecision and recallcomputerlcsh:TK1-9971
researchProduct

Multi-scale analysis of the European airspace using network community detection

2014

We show that the European airspace can be represented as a multi-scale traffic network whose nodes are airports, sectors, or navigation points and links are defined and weighted according to the traffic of flights between the nodes. By using a unique database of the air traffic in the European airspace, we investigate the architecture of these networks with a special emphasis on their community structure. We propose that unsupervised network community detection algorithms can be used to monitor the current use of the airspaces and improve it by guiding the design of new ones. Specifically, we compare the performance of three community detection algorithms, also by using a null model which t…

FOS: Computer and information sciencesDatabases FactualDistributed computingSocial SciencesPoison controllcsh:MedicineSociologycommunity detectionData Mininglcsh:SciencePhysicsMultidisciplinaryMathematical modelApplied MathematicsPhysicsCommunity structureComputer Science - Social and Information NetworksAir traffic controlAir TravelSocial NetworksPhysical SciencesInterdisciplinary PhysicsSocial SystemsEngineering and TechnologyFree flightInformation TechnologyNetwork AnalysisAlgorithmsResearch ArticlePhysics - Physics and SocietyComputer and Information SciencesControl (management)FOS: Physical sciencesComputerApplications_COMPUTERSINOTHERSYSTEMSPhysics and Society (physics.soc-ph)Statistical MechanicsDatabasescomplex networkHumansArchitectureNetworks network communities socio-technical system complex systems Air Traffic ManagementSocial and Information Networks (cs.SI)Null modellcsh:RModels TheoreticalSettore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)Computational SociologySignal ProcessingAir trafficlcsh:QMathematics
researchProduct

Learning Structures in Earth Observation Data with Gaussian Processes

2020

Gaussian Processes (GPs) has experienced tremendous success in geoscience in general and for bio-geophysical parameter retrieval in the last years. GPs constitute a solid Bayesian framework to formulate many function approximation problems consistently. This paper reviews the main theoretical GP developments in the field. We review new algorithms that respect the signal and noise characteristics, that provide feature rankings automatically, and that allow applicability of associated uncertainty intervals to transport GP models in space and time. All these developments are illustrated in the field of geoscience and remote sensing at a local and global scales through a set of illustrative exa…

FOS: Computer and information sciencesEarth observation010504 meteorology & atmospheric sciencesComputer science0211 other engineering and technologiesFOS: Physical sciencesMachine Learning (stat.ML)02 engineering and technologyApplied Physics (physics.app-ph)computer.software_genre01 natural sciencesField (computer science)Physics::GeophysicsSet (abstract data type)Physics - Geophysicssymbols.namesakeStatistics - Machine LearningFeature (machine learning)Gaussian process021101 geological & geomatics engineering0105 earth and related environmental sciencesbusiness.industryPhysics - Applied PhysicsGeophysics (physics.geo-ph)Function approximationsymbolsGlobal Positioning SystemNoise (video)Data miningbusinesscomputer
researchProduct

Randomized kernels for large scale Earth observation applications

2020

Abstract Current remote sensing applications of bio-geophysical parameter estimation and image classification have to deal with an unprecedented big amount of heterogeneous and complex data sources. New satellite sensors involving a high number of improved time, space and wavelength resolutions give rise to challenging computational problems. Standard physical inversion techniques cannot cope efficiently with this new scenario. Dealing with land cover classification of the new image sources has also turned to be a complex problem requiring large amount of memory and processing time. In order to cope with these problems, statistical learning has greatly helped in the last years to develop st…

FOS: Computer and information sciencesEarth observationComputer Science - Machine Learning010504 meteorology & atmospheric sciencesComputer scienceRemote sensing application0211 other engineering and technologiesSoil Science02 engineering and technologycomputer.software_genre01 natural sciencesMachine Learning (cs.LG)Computers in Earth Sciences021101 geological & geomatics engineering0105 earth and related environmental sciencesRemote sensingContextual image classificationEstimation theoryHyperspectral imagingGeology15. Life on landKernel methodKernel regressionData miningComputational problemcomputerRemote Sensing of Environment
researchProduct

Machine learning information fusion in Earth observation: A comprehensive review of methods, applications and data sources

2020

This paper reviews the most important information fusion data-driven algorithms based on Machine Learning (ML) techniques for problems in Earth observation. Nowadays we observe and model the Earth with a wealth of observations, from a plethora of different sensors, measuring states, fluxes, processes and variables, at unprecedented spatial and temporal resolutions. Earth observation is well equipped with remote sensing systems, mounted on satellites and airborne platforms, but it also involves in-situ observations, numerical models and social media data streams, among other data sources. Data-driven approaches, and ML techniques in particular, are the natural choice to extract significant i…

FOS: Computer and information sciencesEarth observationComputer Science - Machine LearningComputer scienceComputer Vision and Pattern Recognition (cs.CV)Computer Science - Computer Vision and Pattern Recognition02 engineering and technologyMachine learningcomputer.software_genreField (computer science)Machine Learning (cs.LG)Set (abstract data type)0202 electrical engineering electronic engineering information engineeringbusiness.industryData stream mining020206 networking & telecommunicationsNumerical modelsSensor fusionInformation fusionHardware and ArchitectureSignal Processing020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerSoftwareInformation SystemsInformation Fusion
researchProduct

Gaussianizing the Earth: Multidimensional Information Measures for Earth Data Analysis

2021

Information theory is an excellent framework for analyzing Earth system data because it allows us to characterize uncertainty and redundancy, and is universally interpretable. However, accurately estimating information content is challenging because spatio-temporal data is high-dimensional, heterogeneous and has non-linear characteristics. In this paper, we apply multivariate Gaussianization for probability density estimation which is robust to dimensionality, comes with statistical guarantees, and is easy to apply. In addition, this methodology allows us to estimate information-theoretic measures to characterize multivariate densities: information, entropy, total correlation, and mutual in…

FOS: Computer and information sciencesMultivariate statisticsGeneral Computer ScienceComputer scienceMachine Learning (stat.ML)Mutual informationInformation theorycomputer.software_genreStatistics - ApplicationsEarth system scienceRedundancy (information theory)13. Climate actionStatistics - Machine LearningGeneral Earth and Planetary SciencesEntropy (information theory)Applications (stat.AP)Total correlationData miningElectrical and Electronic EngineeringInstrumentationcomputerCurse of dimensionality
researchProduct

Statistically validated mobile communication networks: the evolution of motifs in European and Chinese data

2014

Big data open up unprecedented opportunities to investigate complex systems including the society. In particular, communication data serve as major sources for computational social sciences but they have to be cleaned and filtered as they may contain spurious information due to recording errors as well as interactions, like commercial and marketing activities, not directly related to the social network. The network constructed from communication data can only be considered as a proxy for the network of social relationships. Here we apply a systematic method, based on multiple hypothesis testing, to statistically validate the links and then construct the corresponding Bonferroni network, gen…

FOS: Computer and information sciencesPhysics - Physics and SocietyBig dataFOS: Physical sciencesGeneral Physics and AstronomyPhysics and Society (physics.soc-ph)computer.software_genre01 natural sciences010305 fluids & plasmassymbols.namesake0103 physical sciences010306 general physicsProxy (statistics)Social and Information Networks (cs.SI)PhysicsSocial networkbusiness.industryComputer Science - Social and Information NetworksComplex networkcomplex networks social systems statistically validated networks mobile call records 3-motifsSettore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)Bonferroni correctionMobile phonesymbolsMobile telephonyData miningRaw databusinesscomputer
researchProduct