Search results for "Data Science"

showing 10 items of 495 documents

Expert-based versus citation-based ranking of scholarly and scientific publication channels

2016

Abstract The Finnish publication channel quality ranking system was established in 2010. The system is expert-based, where separate panels decide and update the rankings of a set of publications channels allocated to them. The aggregated rankings have a notable role in the allocation of public resources into universities. The purpose of this article is to analyze this national ranking system. The analysis is mainly based on two publicly available databases containing the publication source information and the actual national publication activity information. Using citation-based indicators and other available information with association rule mining, decision trees, and confusion matrices, …

Statistics and ProbabilityAssociation rule learningPerformance-based fundingComputer sciencemedia_common.quotation_subjectDecision treeScopusManagement Science and Operations ResearchLibrary and Information Sciences050905 science studiesModelling and SimulationScopusQuality (business)Reference modelmedia_commonta113Information retrievalApplied Mathematics05 social sciencesRank (computer programming)Journal citation reportsData scienceComputer Science ApplicationsRankingFinnish ranking system0509 other social sciences050904 information & library sciencesCitationJournal evaluationJournal of Informetrics
researchProduct

A model-based approach to Spotify data analysis: a Beta GLMM

2020

Digital music distribution is increasingly powered by automated mechanisms that continuously capture, sort and analyze large amounts of Web-based data. This paper deals with the management of songs audio features from a statistical point of view. In particular, it explores the data catching mechanisms enabled by Spotify Web API and suggests statistical tools for the analysis of these data. Special attention is devoted to songs popularity and a Beta model, including random effects, is proposed in order to give the first answer to questions like: which are the determinants of popularity? The identification of a model able to describe this relationship, the determination within the set of char…

Statistics and ProbabilityBeta GLMMDistribution (number theory)Computer scienceApplication Notes0211 other engineering and technologies02 engineering and technologycomputer.software_genreWeb API01 natural sciencesSet (abstract data type)010104 statistics & probabilitySpotify Web API audio features Popularity Index Beta GLMMsortSpotify Web API0101 mathematicsDigital audio021103 operations researchPoint (typography)Random effects modelData sciencePopularityIdentification (information)Popularity IndexData miningStatistics Probability and Uncertaintycomputeraudio feature
researchProduct

An overview of robust Bayesian analysis

1994

Robust Bayesian analysis is the study of the sensitivity of Bayesian answers to uncertain inputs. This paper seeks to provide an overview of the subject, one that is accessible to statisticians outside the field. Recent developments in the area are also reviewed, though with very uneven emphasis. © 1994 SEIO.

Statistics and ProbabilityComputer scienceBayesian probabilitycomputer.software_genreData scienceField (computer science)Bayesian robustnessN/ARobust Bayesian analysisPrior probabilityData miningSensitivity (control systems)Statistics Probability and Uncertaintycomputer
researchProduct

Textual data compression in computational biology: a synopsis.

2009

Abstract Motivation: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks. Results: The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been use…

Statistics and ProbabilityDatabases Factualbusiness.industryComputer sciencemedia_common.quotation_subjectSearch engine indexingcompression dataComputational BiologyInformation Storage and RetrievalComputational biologyBiochemistryData scienceComputer Science ApplicationsComputational MathematicsPresentationSoftwareComputational Theory and MathematicsBenchmark (computing)businessMolecular BiologyBiological networkSoftwareData compressionmedia_commonBioinformatics (Oxford, England)
researchProduct

Spatio-temporal small area surveillance of the COVID-19 pandemic

2022

Abstract The emergence of COVID-19 requires new effective tools for epidemiological surveillance. Spatio-temporal disease mapping models, which allow dealing with small units of analysis, are a priority in this context. These models provide geographically detailed and temporally updated overviews of the current state of the pandemic, making public health interventions more effective. These models also allow estimating epidemiological indicators highly demanded for COVID-19 surveillance, such as the instantaneous reproduction number R t , even for small areas. In this paper, we propose a new spatio-temporal spline model particularly suited for COVID-19 surveillance, which allows estimating a…

Statistics and Probabilitymedicine.medical_specialtyCoronavirus disease 2019 (COVID-19)instantaneous reproduction numberComputer sciencespatio-temporal modellingPublic healthPublic health interventionsdisease mappingCOVID-19Context (language use)Management Monitoring Policy and LawData scienceArticleSpatio-temporal modellingUnit of analysisPandemicmedicineEpidemiological surveillanceDisease mappingInstantaneous reproduction numberComputers in Earth SciencesTourism
researchProduct

Contributed discussion on article by Pratola

2016

The author should be commended for his outstanding contribution to the literature on Bayesian regression tree models. The author introduces three innovative sampling approaches which allow for efficient traversal of the model space. In this response, we add a fourth alternative.

Statistics and Probabilitymodel selectionMarkov Chain Monte Carlo (MCMC)Bayesian regression treeComputer scienceBig dataBayesian regression tree (BRT) modelsComputingMilieux_LEGALASPECTSOFCOMPUTINGbirth–death processMachine learningcomputer.software_genreSequential Monte Carlo methods01 natural sciencespopulation Markov chain Monte Carlo010104 statistics & probabilitysymbols.namesakebig data0502 economics and businessBayesian Regression Trees (BART)0101 mathematics050205 econometrics Bayesian treed regressionMultiple Try Metropolis algorithmsINFERÊNCIA ESTATÍSTICAbusiness.industryApplied MathematicsModel selection05 social sciencesRejection samplingData scienceVariable-order Bayesian networkTree (data structure)Tree traversalMarkov chain Monte Carlocontinuous time Markov processsymbolsArtificial intelligencebusinessBayesian linear regressioncommunication-freecomputerGibbs samplingBayesian Analysis
researchProduct

Systematic handling of missing data in complex study designs : experiences from the Health 2000 and 2011 Surveys

2016

We present a systematic approach to the practical and comprehensive handling of missing data motivated by our experiences of analyzing longitudinal survey data. We consider the Health 2000 and 2011 Surveys (BRIF8901) where increased non-response and non-participation from 2000 to 2011 was a major issue. The model assumptions involved in the complex sampling design, repeated measurements design, non-participation mechanisms and associations are presented graphically using methodology previously defined as a causal model with design, i.e. a functional causal model extended with the study design. This tool forces the statistician to make the study design and the missing-data mechanism explicit…

Statistics and Probabilitymultiple imputationComputer sciencecomputer.software_genre01 natural sciences010104 statistics & probability03 medical and health sciences0302 clinical medicinenon-responseSampling design030212 general & internal medicine0101 mathematicsCausal modelta112Clinical study designInverse probability weightingSampling (statistics)non-participationMissing dataData sciencedoubly robust methodsSurvey data collectionData miningStatistics Probability and Uncertaintycomputerinverse probability weightingStatisticiancausal model with designJournal of Applied Statistics
researchProduct

Complex Detection in Protein-Protein Interaction Networks: A Compact Overview for Researchers and Practitioners

2012

The availability of large volumes of protein-protein interaction data has allowed the study of biological networks to unveil the complex structure and organization in the cell. It has been recognized by biologists that proteins interacting with each other often participate in the same biological processes, and that protein modules may be often associated with specific biological functions. Thus the detection of protein complexes is an important research problem in systems biology. In this review, recent graph-based approaches to clustering protein interaction networks are described and classified with respect to common peculiarities. The goal is that of providing a useful guide and referenc…

Structure (mathematical logic)Computer scienceSystems biologyCellData ScienceNanotechnologyComputational biologyProtein protein interaction networkBioinformatics network analysismedicine.anatomical_structuremedicineGraph (abstract data type)Lecture Notes in Computer ScienceCluster analysisProtein modulesBiological network
researchProduct

Analysis of Chromatin Structure and Composition

1989

Introduction Biochemistry, like many other sciences, is currently undergoing increasing specialization which is thought to be unavoidable because of the rapid progress within this field. Obviously education in Biochemistry and Molecular Biology is also affected. Consequently, the student may lose the ability to integrate his knowledge, which should be a requirement during the training of a scientist. The solution to this problem is quite easy in the case of theoretical courses because, here, the lecturer may include several 'integrative lessons' which give a global view of previously explained facts and place them within the general context of the course. However, in practical courses it is…

Structure (mathematical logic)Computer sciencemedia_common.quotation_subjecteducationSpecialization (functional)Context (language use)SimplicityData scienceCurriculumField (computer science)Complement (complexity)media_commonSimple (philosophy)
researchProduct

Standardized general purpose technologies: A note

2021

General purpose technologies (GPTs) have been important drivers of industrial revolutions and economic development, but their link to standards has not been analyzed systematically. We document that all of the most common examples of GPTs—steam, railway, electricity and information (and communication) technology—have been subject to standardization efforts over time. Standards development has acted as an institution that has more or less made an impact on the technological progress in these fields and their application sectors. While empirical studies of GPTs have utilized, among other things, patent data to identify GPTs, our observations indicate that the analysis of standards organizatio…

Structure (mathematical logic)Empirical researchGeneral purposeStandardizationTechnological changeComputer scienceGeneral purpose technologyData scienceSSRN Electronic Journal
researchProduct