Search results for " Dataset"
showing 10 items of 37 documents
A Stochastic Variance Factor Model for Large Datasets and an Application to S&P Data
2008
The aim of this paper is to consider multivariate stochastic volatility models for large dimensional datasets. We suggest the use of the principal component methodology of Stock and Watson [Stock, J.H., Watson, M.W., 2002. Macroeconomic forecasting using diffusion indices. Journal of Business and Economic Statistics, 20, 147–162] for the stochastic volatility factor model discussed by Harvey, Ruiz, and Shephard [Harvey, A.C., Ruiz, E., Shephard, N., 1994. Multivariate Stochastic Variance Models. Review of Economic Studies, 61, 247–264]. We provide theoretical and Monte Carlo results on this method and apply it to S&P data.
An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments
2020
The problem of training with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications include those related to face recognition. In the audio domain, music fraud or speaker recognition can be clearly benefited from FSL methods. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, usin…
Human experts vs. machines in taxa recognition
2020
The step of expert taxa recognition currently slows down the response time of many bioassessments. Shifting to quicker and cheaper state-of-the-art machine learning approaches is still met with expert scepticism towards the ability and logic of machines. In our study, we investigate both the differences in accuracy and in the identification logic of taxonomic experts and machines. We propose a systematic approach utilizing deep Convolutional Neural Nets with the transfer learning paradigm and extensively evaluate it over a multi-pose taxonomic dataset with hierarchical labels specifically created for this comparison. We also study the prediction accuracy on different ranks of taxonomic hier…
CArDIS : A Swedish Historical Handwritten Character and Word Dataset
2022
This paper introduces a new publicly available image-based Swedish historical handwritten character and word dataset named Character Arkiv Digital Sweden (CArDIS) (https://cardisdataset.github.io/CARDIS/). The samples in CArDIS are collected from 64, 084 Swedish historical documents written by several anonymous priests between 1800 and 1900. The dataset contains 116, 000 Swedish alphabet images in RGB color space with 29 classes, whereas the word dataset contains 30, 000 image samples of ten popular Swedish names as well as 1, 000 region names in Sweden. To examine the performance of different machine learning classifiers on CArDIS dataset, three different experiments are conducted. In the …
Setting up of a machine learning algorithm for the identification of severe liver fibrosis profile in the general US population cohort
2022
Background: The progress of digital transformation in clinical practice opens the door to transforming the current clinical line for liver disease diagnosis from a late-stage diagnosis approach to an early-stage based one. Early diagnosis of liver fibrosis can prevent the progression of the disease and decrease liver-related morbidity and mortality. We developed here a machine learning (ML) algorithm containing standard parameters that can identify liver fibrosis in the general US population.Materials and methods: Starting from a public database (National Health and Nutrition Examination Survey, NHANES), representative of the American population with 7265 eligible subjects (control populati…
Victimisation and life satisfaction of gay and bisexual individuals in 44 European countries: the moderating role of country-level and person-level a…
2018
We examined the link between victimisation and life satisfaction for 85,301 gay and bisexual individuals across 44 European countries. We expected this negative link to be stronger when the internalised homonegativity of the victim was high (e.g. because the victim is more vulnerable) and weaker when victimisation occurs in countries that express intolerance towards homosexuality (e.g. because in such contexts victims expect victimisation more and they attribute it to their external environment). Additionally, we expected internalised homonegativity to relate negatively to life satisfaction. Multilevel analyses revealed that victimisation (i.e. verbal insults, threats of violence, minor or …
Disclosing progress in cancer survival with less delay
2019
Cancer registration plays a key role in monitoring the burden of cancer. However, cancer registry (CR) data are usually made available with substantial delay to ensure best possible completeness of case ascertainment. Here, we investigate empirically with routinely available data whether such a delay is mandatory for survival analyses or whether data can be used earlier to provide more up-to-date survival estimates. We compared distributions of prognostic factors and period relative survival estimates for three population-based CRs in Germany (Schleswig-Holstein (SH), Rhineland-Palatinate (RP), Saarland (SA)) computed on datasets extracted one (DY+1) to 5 years after the year of diagnosis (…
Perspectives on the Impact of Sampling Design and Intensity on Soil Microbial Diversity Estimates
2019
Soil bacterial communities have long been recognized as important ecosystem components, and have been the focus of many local and regional studies. However, there is a lack of data at large spatial scales, on the biodiversity of soil microorganisms; national or more extensive studies to date have typically consisted of low replication of haphazardly collected samples. This has led to large spatial gaps in soil microbial biodiversity data. Using a pre-existing dataset of bacterial community composition across a 16-km regular sampling grid in France, we show that the number of detected OTUs changes little under different sampling designs (grid, random, or representative), but increases with t…
UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets
2015
Background Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently repres…
Assessment of the 4-factor score: Retrospective analysis of 586 CLL patients receiving ibrutinib. A campus CLL study
2021
Not Available