Search results for "Machine learning"
showing 10 items of 1464 documents
Bayesian Checking of the Second Levels of Hierarchical Models
2007
Hierarchical models are increasingly used in many applications. Along with this increased use comes a desire to investigate whether the model is compatible with the observed data. Bayesian methods are well suited to eliminate the many (nuisance) parameters in these complicated models; in this paper we investigate Bayesian methods for model checking. Since we contemplate model checking as a preliminary, exploratory analysis, we concentrate on objective Bayesian methods in which careful specification of an informative prior distribution is avoided. Numerous examples are given and different proposals are investigated and critically compared.
Centrality measures for networks with community structure
2016
Understanding the network structure, and finding out the influential nodes is a challenging issue in the large networks. Identifying the most influential nodes in the network can be useful in many applications like immunization of nodes in case of epidemic spreading, during intentional attacks on complex networks. A lot of research is done to devise centrality measures which could efficiently identify the most influential nodes in the network. There are two major approaches to the problem: On one hand, deterministic strategies that exploit knowledge about the overall network topology in order to find the influential nodes, while on the other end, random strategies are completely agnostic ab…
Imputation Procedures in Surveys Using Nonparametric and Machine Learning Methods: An Empirical Comparison
2020
Abstract Nonparametric and machine learning methods are flexible methods for obtaining accurate predictions. Nowadays, data sets with a large number of predictors and complex structures are fairly common. In the presence of item nonresponse, nonparametric and machine learning procedures may thus provide a useful alternative to traditional imputation procedures for deriving a set of imputed values used next for the estimation of study parameters defined as solution of population estimating equation. In this paper, we conduct an extensive empirical investigation that compares a number of imputation procedures in terms of bias and efficiency in a wide variety of settings, including high-dimens…
Alignment-free Genomic Analysis via a Big Data Spark Platform
2021
Abstract Motivation Alignment-free distance and similarity functions (AF functions, for short) are a well-established alternative to pairwise and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive applications, the computation of AF functions is a Big Data problem, with the recent literature indicating that the development of fast and scalable algorithms computing AF functions is a high-priority task. Somewhat surprisingly, despite the increasing popularity of Big Data technologies in computational biology, the development of a Big Data platform for those tasks has not been pursued, possibly due to its complexity. Results We fill this impo…
Semantic Computing of Moods Based on Tags in Social Media of Music
2014
Social tags inherent in online music services such as Last.fm provide a rich source of information on musical moods. The abundance of social tags makes this data highly beneficial for developing techniques to manage and retrieve mood information, and enables study of the relationships between music content and mood representations with data substantially larger than that available for conventional emotion research. However, no systematic assessment has been done on the accuracy of social tags and derived semantic models at capturing mood information in music. We propose a novel technique called Affective Circumplex Transformation (ACT) for representing the moods of music tracks in an interp…
Characterizing the maximum parameter of the total-variation denoising through the pseudo-inverse of the divergence
2017
International audience; We focus on the maximum regularization parameter for anisotropic total-variation denoising. It corresponds to the minimum value of the regularization parameter above which the solution remains constant. While this value is well know for the Lasso, such a critical value has not been investigated in details for the total-variation. Though, it is of importance when tuning the regularization parameter as it allows fixing an upper-bound on the grid for which the optimal parameter is sought. We establish a closed form expression for the one-dimensional case, as well as an upper-bound for the two-dimensional case, that appears reasonably tight in practice. This problem is d…
Semantic HMC for Big Data Analysis
2014
International audience; Analyzing Big Data can help corporations to im-prove their efficiency. In this work we present a new vision to derive Value from Big Data using a Semantic Hierarchical Multi-label Classification called Semantic HMC based in a non-supervised Ontology learning process. We also proposea Semantic HMC process, using scalable Machine-Learning techniques and Rule-based reasoning.
Implicit differentiation for fast hyperparameter selection in non-smooth convex learning
2022
International audience; Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian. Using implicit differentiation, we show it is possible to leverage the non-smoothness of the inner problem to speed up the computation. Finally, we provide a bound on the error made on the hypergradient when the inner optimization problem is solved approxim…
Dimensionality Reduction via Regression in Hyperspectral Imagery
2015
This paper introduces a new unsupervised method for dimensionality reduction via regression (DRR). The algorithm belongs to the family of invertible transforms that generalize Principal Component Analysis (PCA) by using curvilinear instead of linear features. DRR identifies the nonlinear features through multivariate regression to ensure the reduction in redundancy between he PCA coefficients, the reduction of the variance of the scores, and the reduction in the reconstruction error. More importantly, unlike other nonlinear dimensionality reduction methods, the invertibility, volume-preservation, and straightforward out-of-sample extension, makes DRR interpretable and easy to apply. The pro…
A Unified SVM Framework for Signal Estimation
2013
This paper presents a unified framework to tackle estimation problems in Digital Signal Processing (DSP) using Support Vector Machines (SVMs). The use of SVMs in estimation problems has been traditionally limited to its mere use as a black-box model. Noting such limitations in the literature, we take advantage of several properties of Mercer's kernels and functional analysis to develop a family of SVM methods for estimation in DSP. Three types of signal model equations are analyzed. First, when a specific time-signal structure is assumed to model the underlying system that generated the data, the linear signal model (so called Primal Signal Model formulation) is first stated and analyzed. T…