Search results for "Data Science"
showing 10 items of 495 documents
Expert-based versus citation-based ranking of scholarly and scientific publication channels
2016
Abstract The Finnish publication channel quality ranking system was established in 2010. The system is expert-based, where separate panels decide and update the rankings of a set of publications channels allocated to them. The aggregated rankings have a notable role in the allocation of public resources into universities. The purpose of this article is to analyze this national ranking system. The analysis is mainly based on two publicly available databases containing the publication source information and the actual national publication activity information. Using citation-based indicators and other available information with association rule mining, decision trees, and confusion matrices, …
A model-based approach to Spotify data analysis: a Beta GLMM
2020
Digital music distribution is increasingly powered by automated mechanisms that continuously capture, sort and analyze large amounts of Web-based data. This paper deals with the management of songs audio features from a statistical point of view. In particular, it explores the data catching mechanisms enabled by Spotify Web API and suggests statistical tools for the analysis of these data. Special attention is devoted to songs popularity and a Beta model, including random effects, is proposed in order to give the first answer to questions like: which are the determinants of popularity? The identification of a model able to describe this relationship, the determination within the set of char…
An overview of robust Bayesian analysis
1994
Robust Bayesian analysis is the study of the sensitivity of Bayesian answers to uncertain inputs. This paper seeks to provide an overview of the subject, one that is accessible to statisticians outside the field. Recent developments in the area are also reviewed, though with very uneven emphasis. © 1994 SEIO.
Textual data compression in computational biology: a synopsis.
2009
Abstract Motivation: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks. Results: The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been use…
Spatio-temporal small area surveillance of the COVID-19 pandemic
2022
Abstract The emergence of COVID-19 requires new effective tools for epidemiological surveillance. Spatio-temporal disease mapping models, which allow dealing with small units of analysis, are a priority in this context. These models provide geographically detailed and temporally updated overviews of the current state of the pandemic, making public health interventions more effective. These models also allow estimating epidemiological indicators highly demanded for COVID-19 surveillance, such as the instantaneous reproduction number R t , even for small areas. In this paper, we propose a new spatio-temporal spline model particularly suited for COVID-19 surveillance, which allows estimating a…
Contributed discussion on article by Pratola
2016
The author should be commended for his outstanding contribution to the literature on Bayesian regression tree models. The author introduces three innovative sampling approaches which allow for efficient traversal of the model space. In this response, we add a fourth alternative.
Systematic handling of missing data in complex study designs : experiences from the Health 2000 and 2011 Surveys
2016
We present a systematic approach to the practical and comprehensive handling of missing data motivated by our experiences of analyzing longitudinal survey data. We consider the Health 2000 and 2011 Surveys (BRIF8901) where increased non-response and non-participation from 2000 to 2011 was a major issue. The model assumptions involved in the complex sampling design, repeated measurements design, non-participation mechanisms and associations are presented graphically using methodology previously defined as a causal model with design, i.e. a functional causal model extended with the study design. This tool forces the statistician to make the study design and the missing-data mechanism explicit…
Complex Detection in Protein-Protein Interaction Networks: A Compact Overview for Researchers and Practitioners
2012
The availability of large volumes of protein-protein interaction data has allowed the study of biological networks to unveil the complex structure and organization in the cell. It has been recognized by biologists that proteins interacting with each other often participate in the same biological processes, and that protein modules may be often associated with specific biological functions. Thus the detection of protein complexes is an important research problem in systems biology. In this review, recent graph-based approaches to clustering protein interaction networks are described and classified with respect to common peculiarities. The goal is that of providing a useful guide and referenc…
Analysis of Chromatin Structure and Composition
1989
Introduction Biochemistry, like many other sciences, is currently undergoing increasing specialization which is thought to be unavoidable because of the rapid progress within this field. Obviously education in Biochemistry and Molecular Biology is also affected. Consequently, the student may lose the ability to integrate his knowledge, which should be a requirement during the training of a scientist. The solution to this problem is quite easy in the case of theoretical courses because, here, the lecturer may include several 'integrative lessons' which give a global view of previously explained facts and place them within the general context of the course. However, in practical courses it is…
Standardized general purpose technologies: A note
2021
General purpose technologies (GPTs) have been important drivers of industrial revolutions and economic development, but their link to standards has not been analyzed systematically. We document that all of the most common examples of GPTs—steam, railway, electricity and information (and communication) technology—have been subject to standardization efforts over time. Standards development has acted as an institution that has more or less made an impact on the technological progress in these fields and their application sectors. While empirical studies of GPTs have utilized, among other things, patent data to identify GPTs, our observations indicate that the analysis of standards organizatio…