Search results for "machine learning."

showing 10 items of 1455 documents

Pattern Recovery in Penalized and Thresholded Estimation and its Geometry

2023

We consider the framework of penalized estimation where the penalty term is given by a real-valued polyhedral gauge, which encompasses methods such as LASSO (and many variants thereof such as the generalized LASSO), SLOPE, OSCAR, PACS and others. Each of these estimators can uncover a different structure or ``pattern'' of the unknown parameter vector. We define a general notion of patterns based on subdifferentials and formalize an approach to measure their complexity. For pattern recovery, we provide a minimal condition for a particular pattern to be detected by the procedure with positive probability, the so-called accessibility condition. Using our approach, we also introduce the stronge…

FOS: Computer and information sciencesStatistics - Machine LearningFOS: MathematicsMathematics - Statistics TheoryMachine Learning (stat.ML)[MATH] Mathematics [math]Statistics Theory (math.ST)

researchProduct

Fair Kernel Learning

2017

New social and economic activities massively exploit big data and machine learning algorithms to do inference on people's lives. Applications include automatic curricula evaluation, wage determination, and risk assessment for credits and loans. Recently, many governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat these problems. It has been shown that not including sensitive features that bias fairness, such as gender or race, is not enough to mitigate the discrimination when other related features are included. Instead, including fairness in the objective function has been shown to be more efficient. We present novel fai…

FOS: Computer and information sciencesStatistics - Machine LearningMachine Learning (stat.ML)

researchProduct

Sensitivity Maps of the Hilbert-Schmidt Independence Criterion

2016

Kernel dependence measures yield accurate estimates of nonlinear relations between random variables, and they are also endorsed with solid theoretical properties and convergence rates. Besides, the empirical estimates are easy to compute in closed form just involving linear algebra operations. However, they are hampered by two important problems: the high computational cost involved, as two kernel matrices of the sample size have to be computed and stored, and the interpretability of the measure, which remains hidden behind the implicit feature map. We here address these two issues. We introduce the Sensitivity Maps (SMs) for the Hilbert-Schmidt independence criterion (HSIC). Sensitivity ma…

FOS: Computer and information sciencesStatistics - Machine LearningMachine Learning (stat.ML)

researchProduct

The FLUXCOM ensemble of global land-atmosphere energy fluxes

2019

Although a key driver of Earth’s climate system, global land-atmosphere energy fluxes are poorly constrained. Here we use machine learning to merge energy flux measurements from FLUXNET eddy covariance towers with remote sensing and meteorological data to estimate global gridded net radiation, latent and sensible heat and their uncertainties. The resulting FLUXCOM database comprises 147 products in two setups: (1) 0.0833° resolution using MODIS remote sensing data (RS) and (2) 0.5° resolution using remote sensing and meteorological data (RS + METEO). Within each setup we use a full factorial design across machine learning methods, forcing datasets and energy balance closure corrections. For…

FOS: Computer and information sciencesStatistics and ProbabilityComputer Science - Machine LearningData Descriptor010504 meteorology & atmospheric sciencesMeteorology0208 environmental biotechnologyEnergy balanceEddy covarianceFOS: Physical sciencesEnergy fluxMachine Learning (stat.ML)02 engineering and technologySensible heatLibrary and Information Sciences01 natural sciences7. Clean energyMachine Learning (cs.LG)EducationFluxNetStatistics - Machine LearningEvapotranspirationLatent heatlcsh:Science0105 earth and related environmental sciences020801 environmental engineeringComputer Science ApplicationsMetadataEnvironmental sciencesPhysics - Atmospheric and Oceanic Physics13. Climate actionAtmospheric and Oceanic Physics (physics.ao-ph)Environmental sciencelcsh:QStatistics Probability and UncertaintyHydrologyClimate sciencesInformation SystemsScientific Data

researchProduct

Sparse and Smooth: improved guarantees for Spectral Clustering in the Dynamic Stochastic Block Model

2020

In this paper, we analyse classical variants of the Spectral Clustering (SC) algorithm in the Dynamic Stochastic Block Model (DSBM). Existing results show that, in the relatively sparse case where the expected degree grows logarithmically with the number of nodes, guarantees in the static case can be extended to the dynamic case and yield improved error bounds when the DSBM is sufficiently smooth in time, that is, the communities do not change too much between two time steps. We improve over these results by drawing a new link between the sparsity and the smoothness of the DSBM: the more regular the DSBM is, the more sparse it can be, while still guaranteeing consistent recovery. In particu…

FOS: Computer and information sciencesStatistics and ProbabilityComputer Science - Machine Learning[STAT.ML]Statistics [stat]/Machine Learning [stat.ML]Statistics - Machine LearningFOS: MathematicsMachine Learning (stat.ML)Mathematics - Statistics TheoryStatistics Theory (math.ST)Statistics Probability and Uncertainty[STAT.ML] Statistics [stat]/Machine Learning [stat.ML]Machine Learning (cs.LG)

researchProduct

Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach

2021

Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and…

FOS: Computer and information sciencesStatistics and ProbabilityComputer Science - Machine LearningcausalityComputer Science - Artificial IntelligenceHeuristic (computer science)Computer scienceeducationMachine Learning (stat.ML)transportabilitycomputer.software_genre01 natural sciencesMachine Learning (cs.LG)R-kielimissing dataQA76.75-76.765; QA273-280010104 statistics & probabilitydo-calculuscausality; do-calculus; selection bias; transportability; missing data; case-control design; meta-analysisStatistics - Machine LearningSearch algorithmselection bias0101 mathematicsParametric statisticspäättelymeta-analyysicase-control designhakualgoritmit113 Computer and information sciencesMissing datameta-analysisIdentification (information)Artificial Intelligence (cs.AI)Causal inferencekausaliteettiIdentifiabilityProbability distributionData miningStatistics Probability and UncertaintycomputerSoftwareJournal of Statistical Software

researchProduct

Bayesian Checking of the Second Levels of Hierarchical Models

2007

Hierarchical models are increasingly used in many applications. Along with this increased use comes a desire to investigate whether the model is compatible with the observed data. Bayesian methods are well suited to eliminate the many (nuisance) parameters in these complicated models; in this paper we investigate Bayesian methods for model checking. Since we contemplate model checking as a preliminary, exploratory analysis, we concentrate on objective Bayesian methods in which careful specification of an informative prior distribution is avoided. Numerous examples are given and different proposals are investigated and critically compared.

FOS: Computer and information sciencesStatistics and ProbabilityModel checkingModel checkingComputer scienceconflictGeneral MathematicsBayesian probabilityMachine learningcomputer.software_genreMethodology (stat.ME)partial posterior predictivePrior probabilityStatistics - Methodologybusiness.industrymodel criticismProbability and statisticsExploratory analysisobjective Bayesian methodsempirical-Bayesposterior predictivep-valuesArtificial intelligenceStatistics Probability and Uncertaintybusinesscomputer

researchProduct

Centrality measures for networks with community structure

2016

Understanding the network structure, and finding out the influential nodes is a challenging issue in the large networks. Identifying the most influential nodes in the network can be useful in many applications like immunization of nodes in case of epidemic spreading, during intentional attacks on complex networks. A lot of research is done to devise centrality measures which could efficiently identify the most influential nodes in the network. There are two major approaches to the problem: On one hand, deterministic strategies that exploit knowledge about the overall network topology in order to find the influential nodes, while on the other end, random strategies are completely agnostic ab…

FOS: Computer and information sciencesStatistics and ProbabilityPhysics - Physics and SocietyExploitComplex networksFOS: Physical sciencesNetwork sciencePhysics and Society (physics.soc-ph)Network theoryMachine learningcomputer.software_genreNetwork topologyImmunization strategies01 natural sciences010305 fluids & plasmas0103 physical sciences010306 general physicsMathematicsSocial and Information Networks (cs.SI)Structure (mathematical logic)[PHYS.PHYS]Physics [physics]/Physics [physics]business.industryCommunity structureComputer Science - Social and Information NetworksComplex networkEpidemic dynamicsCondensed Matter Physics[ PHYS.PHYS ] Physics [physics]/Physics [physics]Community structureArtificial intelligenceData miningbusinessCentralitycomputer

researchProduct

Imputation Procedures in Surveys Using Nonparametric and Machine Learning Methods: An Empirical Comparison

2020

Abstract Nonparametric and machine learning methods are flexible methods for obtaining accurate predictions. Nowadays, data sets with a large number of predictors and complex structures are fairly common. In the presence of item nonresponse, nonparametric and machine learning procedures may thus provide a useful alternative to traditional imputation procedures for deriving a set of imputed values used next for the estimation of study parameters defined as solution of population estimating equation. In this paper, we conduct an extensive empirical investigation that compares a number of imputation procedures in terms of bias and efficiency in a wide variety of settings, including high-dimens…

FOS: Computer and information sciencesStatistics and ProbabilityStatistics::ApplicationsEmpirical comparisonbusiness.industryComputer scienceApplied MathematicsNonparametric statisticsMachine learningcomputer.software_genreStatistics - ComputationVariety (cybernetics)Methodology (stat.ME)Set (abstract data type)Statistics::MethodologyImputation (statistics)Artificial intelligenceStatistics Probability and UncertaintybusinesscomputerStatistics - MethodologyComputation (stat.CO)Social Sciences (miscellaneous)Journal of Survey Statistics and Methodology

researchProduct

Alignment-free Genomic Analysis via a Big Data Spark Platform

2021

Abstract Motivation Alignment-free distance and similarity functions (AF functions, for short) are a well-established alternative to pairwise and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive applications, the computation of AF functions is a Big Data problem, with the recent literature indicating that the development of fast and scalable algorithms computing AF functions is a high-priority task. Somewhat surprisingly, despite the increasing popularity of Big Data technologies in computational biology, the development of a Big Data platform for those tasks has not been pursued, possibly due to its complexity. Results We fill this impo…

FOS: Computer and information sciencesStatistics and Probabilitysequence analysisComputer science0206 medical engineeringBig data02 engineering and technologyMachine learningcomputer.software_genreBiochemistry03 medical and health sciencesSpark (mathematics)MapReduceMolecular Biology030304 developmental biology0303 health sciencesSettore INF/01 - Informaticabusiness.industryBioinformatics High Performance Computing Compressed Data StructuresMapReduce; hadoop; sequence analysisComputer Science ApplicationsComputational MathematicsTask (computing)Computer Science - Distributed Parallel and Cluster ComputingComputational Theory and MathematicsDistributed Parallel and Cluster Computing (cs.DC)Artificial intelligencehadoopbusinesscomputer020602 bioinformaticsBioinformatics

researchProduct