Search results for "Contingency table"
showing 8 items of 28 documents
On Rao Score and Pearson X2 Statistics in Generalized Linear Models
2005
The identity of the Rao score and PearsonX 2 statistics is well known in the areas where the latter was first introduced: goodness-of-fit in contingency tables and binary responses. We show in this paper that the same identity holds when the two statistics are used for testing goodness-of-fit of Generalized Linear Models. We also highlight the connections that exist between the two statistics when they are used for the comparison of nested models. Finally, we discuss some merits of these unifying results.
Sequentially Rejective Test Procedures for Detecting Outlying Cells in One- and Two-Sample Multinomial Experiments
1985
For multiple testing of multinomial models in the case of one or two samples we propose using test procedures based on the principle described by MARCUS, PERITZ and GABRIEL (1976). These methods are based in each step of the sequentially rejective strategy on tests which exhaust the full α level (i.e. which are not conservative). The tests can be performed in a finite or asymptotic version.
Assessing uncertainty of voter transitions estimated from aggregated data. Application to the 2017 French presidential election
2020
[EN] Inferring electoral individual behaviour from aggregated data is a very active research area, with ramifications in sociology and political science. A new approach based on linear programming is proposed to estimate voter transitions among parties (or candidates) between two elections. Compared to other linear and quadratic programming models previously published, our approach presents two important innovations. Firstly, it explicitly deals with new entries and exits in the election census without assuming unrealistic hypotheses, enabling a reasonable estimation of vote behaviour of young electors voting for the first time. Secondly, by exploiting the information contained in the model…
Comparison of MeSH terms and KeyWords Plus terms for more accurate classification in medical research fields. A case study in cannabis research
2021
Abstract KeyWords Plus and Medical Subject Headings (MeSH) are widely used in bibliometric studies for topic mapping. The objective of this study is to compare the two description systems in documents about cannabis research to find the concordance between systems and establish whether there is neutrality in topic mapping. A total of 25,593 articles from 1970 to 2019 were drawn from Web of Science's Core Collection and Medline and analyzed. The tidytext library, Zipf's law, topic modeling tools, the contingency coefficient, Cramer's V, and Cohen's kappa were used. The results included 10,107 MeSH terms and 28,870 KeyWords Plus terms. The Zipf distribution of the terms was different for each…
Computerized delimitation of odorant areas in gas-chromatography-olfactometry by kernel density estimation: Data processing on French white wines
2017
International audience; GC-O using the detection frequency method gives a list of odor events (OEs) where each OE is described by a linear retention index (LRI) and by the aromatic descriptor given by a human assessor. The aim of the experimenter is to gather OEs in a total olfactogram on which he tries to delimit odorant areas (OAs), then to compute each detection frequency. This paper proposes a computerized mathematical method based on kernel density estimation that makes up the total olfactogram as continuous and differentiable function from the OEs LRI only. The corresponding curve looks like a chromatogram, the peaks of which are potential OAs. The limits of an OA are the LRI of the t…
A multiple-response chi-square framework for the analysis of Free-Comment and Check-All-That-Apply data
2021
International audience; Free-Comment (FC) and Check-All-That-Apply (CATA) provide a contingency table containing citation counts of descriptors by products. The analyses performed on this table are most often related to the chi-square statistic. However, such practices are not well suited because they consider experimental units as being the citations (one descriptor for one product by one subject) while the evaluations (vector of citations for one product by one subject) should be considered instead. This results in incorrect expected frequencies under the null hypothesis of independence between products and descriptors and thus in an incorrect chi-square statistic. Thus, analyses related …
FPGA-based Acceleration of Detecting Statistical Epistasis in GWAS
2014
Abstract Genotype-by-genotype interactions (epistasis) are believed to be a significant source of unexplained genetic variation causing complex chronic diseases but have been ignored in genome-wide association studies (GWAS) due to the computational burden of analysis. In this work we show how to benefit from FPGA technology for highly parallel creation of contingency tables in a systolic chain with a subsequent statistical test. We present the implementation for the FPGA-based hardware platform RIVYERA S6-LX150 containing 128 Xilinx Spartan6-LX150 FPGAs. For performance evaluation we compare against the method iLOCi[9]. iLOCi claims to outperform other available tools in terms of accuracy.…
Improving Estimates Accuracy of Voter Transitions. Two New Algorithms for Ecological Inference Based on Linear Programming
2022
The estimation of RxC ecological inference contingency tables from aggregate data is one of the most salient and challenging problems in the field of quantitative social sciences, with major solutions proposed from both the ecological regression and the mathematical programming frameworks. In recent decades, there has been a drive to find solutions stemming from the former, with the latter being less active. From the mathematical programming framework, this paper suggests a new direction for tackling this problem. For the first time in the literature, a procedure based on linear programming is proposed to attain estimates of local contingency tables. Based on this and the homogeneity hypot…