Search results for "sparse"
showing 10 items of 75 documents
Do Country Stereotypes Exist in PISA? A Clustering Approach for Large, Sparse, and Weighted Data
2015
Certain stereotypes can be associated with people from different countries. For example, the Italians are expected to be emotional, the Germans functional, and the Chinese hard-working. In this study, we cluster all 15-year-old students representing the 68 different nations and territories that participated in the latest Programme for International Student Assessment (PISA 2012). The hypothesis is that the students will start to form their own country groups when clustered according to the scale indices that summarize many of the students’ characteristics. In order to meet PISA data analysis requirements, we use a novel combination of our previously published algorithmic components to reali…
Analysing Student Performance using Sparse Data of Core Bachelor Courses
2015
Curricula for Computer Science (CS) degrees are characterized by the strong occupational orientation of the discipline. In the BSc degree structure, with clearly separate CS core studies, the learning skills for these and other required courses may vary a lot, which is shown in students' overall performance. To analyze this situation, we apply nonstandard educational data mining techniques on a preprocessed log file of the passed courses. The joint variation in the course grades is studied through correlation analysis while intrinsic groups of students are created and analyzed using a robust clustering technique. Since not all students attended all courses, there is a nonstructured sparsity…
CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations
2013
We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…
Extending graphical models for applications: on covariates, missingness and normality
2021
The authors of the paper “Bayesian Graphical Models for Modern Biological Applications” have put forward an important framework for making graphical models more useful in applied settings. In this discussion paper, we give a number of suggestions for making this framework even more suitable for practical scenarios. Firstly, we show that an alternative and simplified definition of covariate might make the framework more manageable in high-dimensional settings. Secondly, we point out that the inclusion of missing variables is important for practical data analysis. Finally, we comment on the effect that the Gaussianity assumption has in identifying the underlying conditional independence graph…
dglars: An R Package to Estimate Sparse Generalized Linear Models
2014
dglars is a publicly available R package that implements the method proposed in Augugliaro, Mineo, and Wit (2013), developed to study the sparse structure of a generalized linear model. This method, called dgLARS, is based on a differential geometrical extension of the least angle regression method proposed in Efron, Hastie, Johnstone, and Tibshirani (2004). The core of the dglars package consists of two algorithms implemented in Fortran 90 to efficiently compute the solution curve: a predictor-corrector algorithm, proposed in Augugliaro et al. (2013), and a cyclic coordinate descent algorithm, proposed in Augugliaro, Mineo, and Wit (2012). The latter algorithm, as shown here, is significan…
Differential geometric least angle regression: a differential geometric approach to sparse generalized linear models
2013
Summary Sparsity is an essential feature of many contemporary data problems. Remote sensing, various forms of automated screening and other high throughput measurement devices collect a large amount of information, typically about few independent statistical subjects or units. In certain cases it is reasonable to assume that the underlying process generating the data is itself sparse, in the sense that only a few of the measured variables are involved in the process. We propose an explicit method of monotonically decreasing sparsity for outcomes that can be modelled by an exponential family. In our approach we generalize the equiangular condition in a generalized linear model. Although the …
The smallest singular value of a shifted $d$-regular random square matrix
2017
We derive a lower bound on the smallest singular value of a random d-regular matrix, that is, the adjacency matrix of a random d-regular directed graph. Specifically, let $$C_1<d< c n/\log ^2 n$$ and let $$\mathcal {M}_{n,d}$$ be the set of all $$n\times n$$ square matrices with 0 / 1 entries, such that each row and each column of every matrix in $$\mathcal {M}_{n,d}$$ has exactly d ones. Let M be a random matrix uniformly distributed on $$\mathcal {M}_{n,d}$$ . Then the smallest singular value $$s_{n} (M)$$ of M is greater than $$n^{-6}$$ with probability at least $$1-C_2\log ^2 d/\sqrt{d}$$ , where c, $$C_1$$ , and $$C_2$$ are absolute positive constants independent of any other parameter…
The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression.
2020
This paper focuses on hypothesis testing in lasso regression, when one is interested in judging statistical significance for the regression coefficients in the regression equation involving a lot of covariates. To get reliable p-values, we propose a new lasso-type estimator relying on the idea of induced smoothing which allows to obtain appropriate covariance matrix and Wald statistic relatively easily. Some simulation experiments reveal that our approach exhibits good performance when contrasted with the recent inferential tools in the lasso framework. Two real data analyses are presented to illustrate the proposed framework in practice.
Adaptive sparse representation of continuous input for tsetlin machines based on stochastic searching on the line
2021
This paper introduces a novel approach to representing continuous inputs in Tsetlin Machines (TMs). Instead of using one Tsetlin Automaton (TA) for every unique threshold found when Booleanizing continuous input, we employ two Stochastic Searching on the Line (SSL) automata to learn discriminative lower and upper bounds. The two resulting Boolean features are adapted to the rest of the clause by equipping each clause with its own team of SSLs, which update the bounds during the learning process. Two standard TAs finally decide whether to include the resulting features as part of the clause. In this way, only four automata altogether represent one continuous feature (instead of potentially h…
High performance algorithms based on a new wawelet expansion for time dependent acoustics obstale scattering
2007
This paper presents a highly parallelizable numerical method to solve time dependent acoustic obstacle scattering problems. The method proposed is a generalization of the ``operator expansion method" developed by Recchioni and Zirilli [SIAM J.~Sci.~Comput., 25 (2003), 1158-1186]. The numerical method proposed reduces, via a perturbative approach, the solution of the scattering problem to the solution of a sequence of systems of first kind integral equations. The numerical solution of these systems of integral equations is challenging when scattering problems involving realistic obstacles and small wavelengths are solved. A computational method has been developed to solve these challenging p…