Search results for " Dataset"

showing 10 items of 37 documents

A Stochastic Variance Factor Model for Large Datasets and an Application to S&P Data

2008

The aim of this paper is to consider multivariate stochastic volatility models for large dimensional datasets. We suggest the use of the principal component methodology of Stock and Watson [Stock, J.H., Watson, M.W., 2002. Macroeconomic forecasting using diffusion indices. Journal of Business and Economic Statistics, 20, 147–162] for the stochastic volatility factor model discussed by Harvey, Ruiz, and Shephard [Harvey, A.C., Ruiz, E., Shephard, N., 1994. Multivariate Stochastic Variance Models. Review of Economic Studies, 61, 247–264]. We provide theoretical and Monte Carlo results on this method and apply it to S&P data.

Economics and EconometricsMultivariate statisticsPrincipal componentsStochastic volatilityjel:C32jel:C33jel:G12Factor modelPrincipal component analysisEconometricsEconomicsStochastic volatility Factor models Principal componentsStochastic volatilityforecasting; stochastic volatility; large datasetFinanceFactor analysis

researchProduct

An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments

2020

The problem of training with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications include those related to face recognition. In the audio domain, music fraud or speaker recognition can be clearly benefited from FSL methods. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, usin…

FOS: Computer and information sciencesComputer Science - Machine LearningSound (cs.SD)sound processingaudio datasetmachine listeningUNESCO::CIENCIAS TECNOLÓGICASComputer Science - SoundMachine Learning (cs.LG)classificationArtificial IntelligenceAudio and Speech Processing (eess.AS)Signal ProcessingFOS: Electrical engineering electronic engineering information engineeringfew-shot learningopen-set recognitionComputer Vision and Pattern RecognitionSoftwareElectrical Engineering and Systems Science - Audio and Speech Processing

researchProduct

Human experts vs. machines in taxa recognition

2020

The step of expert taxa recognition currently slows down the response time of many bioassessments. Shifting to quicker and cheaper state-of-the-art machine learning approaches is still met with expert scepticism towards the ability and logic of machines. In our study, we investigate both the differences in accuracy and in the identification logic of taxonomic experts and machines. We propose a systematic approach utilizing deep Convolutional Neural Nets with the transfer learning paradigm and extensively evaluate it over a multi-pose taxonomic dataset with hierarchical labels specifically created for this comparison. We also study the prediction accuracy on different ranks of taxonomic hier…

FOS: Computer and information sciencesComputer Science - Machine Learninghahmontunnistus (tietotekniikka)Computer scienceClassification approachTaxonomic expert02 engineering and technologyneuroverkotcomputer.software_genreConvolutional neural networkQuantitative Biology - Quantitative MethodsField (computer science)Machine Learning (cs.LG)Machine learning approachesStatistics - Machine LearningAutomated approachDeep neural networks0202 electrical engineering electronic engineering information engineeringTaxonomic rankQuantitative Methods (q-bio.QM)Classification (of information)Artificial neural networksystematiikka (biologia)Prediction accuracyIdentification (information)koneoppiminenMulti-image dataBenchmark (computing)020201 artificial intelligence & image processingConvolutional neural networksComputer Vision and Pattern RecognitionClassification errorsMachine Learning (stat.ML)Machine learningState of the artElectrical and Electronic EngineeringTaxonomySupport vector machinesLearning systemsbusiness.industryNode (networking)020206 networking & telecommunicationsComputer circuitsHierarchical classificationConvolutionSupport vector machineFOS: Biological sciencesTaxonomic hierarchySignal ProcessingBiomonitoringBenchmark datasetsArtificial intelligencebusinesscomputertaksonitSoftware

researchProduct

CArDIS : A Swedish Historical Handwritten Character and Word Dataset

2022

This paper introduces a new publicly available image-based Swedish historical handwritten character and word dataset named Character Arkiv Digital Sweden (CArDIS) (https://cardisdataset.github.io/CARDIS/). The samples in CArDIS are collected from 64, 084 Swedish historical documents written by several anonymous priests between 1800 and 1900. The dataset contains 116, 000 Swedish alphabet images in RGB color space with 29 classes, whereas the word dataset contains 30, 000 image samples of ten popular Swedish names as well as 1, 000 region names in Sweden. To examine the performance of different machine learning classifiers on CArDIS dataset, three different experiments are conducted. In the …

Handwriting recognitionOptical character recognition softwareoptical character recognition (OCR)Computer SciencesCharacter recognitionold handwritten styleImage recognitionCharacter and word recognitionVDP::Teknologi: 500Datavetenskap (datalogi)Machine learningSwedish handwritten word datasetmachine learning methodsFeature extractionHidden Markov modelsSwedish handwritten character dataset

researchProduct

Setting up of a machine learning algorithm for the identification of severe liver fibrosis profile in the general US population cohort

2022

Background: The progress of digital transformation in clinical practice opens the door to transforming the current clinical line for liver disease diagnosis from a late-stage diagnosis approach to an early-stage based one. Early diagnosis of liver fibrosis can prevent the progression of the disease and decrease liver-related morbidity and mortality. We developed here a machine learning (ML) algorithm containing standard parameters that can identify liver fibrosis in the general US population.Materials and methods: Starting from a public database (National Health and Nutrition Examination Survey, NHANES), representative of the American population with 7265 eligible subjects (control populati…

Imbalanced datasetMachine learningOversampling techniqueLiver fibrosiNHANESHealth Informaticstest performance evaluation.

researchProduct

Victimisation and life satisfaction of gay and bisexual individuals in 44 European countries: the moderating role of country-level and person-level a…

2018

We examined the link between victimisation and life satisfaction for 85,301 gay and bisexual individuals across 44 European countries. We expected this negative link to be stronger when the internalised homonegativity of the victim was high (e.g. because the victim is more vulnerable) and weaker when victimisation occurs in countries that express intolerance towards homosexuality (e.g. because in such contexts victims expect victimisation more and they attribute it to their external environment). Additionally, we expected internalised homonegativity to relate negatively to life satisfaction. Multilevel analyses revealed that victimisation (i.e. verbal insults, threats of violence, minor or …

MaleHealth (social science)soziale Probleme050109 social psychologyPersonal Satisfaction20500Developmental psychologyviolenceddc:150Surveys and QuestionnairesPsychologyHomosexualityCrime VictimsGewaltSocial policymedia_common05 social sciencesHomosexualityhomosexualitypsychophysical stressLebenszufriedenheitEuropeanti-gay victimisation; internalised homonegativity; minority stress; European Values Study 2008 4th Wave Integrated Dataset. GESIS Data Archive Cologne Germany ZA4800 Dataset Version 2.0.0 (2010-11-30)Soziale Probleme und SozialdiensteBisexuality10700SozialpsychologieBisexualitätEuropa0305 other medical sciencePsychologySocial psychologyAdultSocial PsychologySocial ProblemsSexual BehaviorViktimisierungmedia_common.quotation_subjectsatisfaction with lifeMehrebenenanalyseViolenceStressVictimisation03 medical and health sciencesCountry levelHumans0501 psychology and cognitive sciences030505 public healthminorityvictimizationPublic Health Environmental and Occupational HealthLife satisfactionDiskriminierungMinority stressmulti-level analysisddc:360AttitudePsychologieMinderheitbisexualitySocial problems and servicesHomosexualitätdiscriminationCulture, Health & Sexuality

researchProduct

Disclosing progress in cancer survival with less delay

2019

Cancer registration plays a key role in monitoring the burden of cancer. However, cancer registry (CR) data are usually made available with substantial delay to ensure best possible completeness of case ascertainment. Here, we investigate empirically with routinely available data whether such a delay is mandatory for survival analyses or whether data can be used earlier to provide more up-to-date survival estimates. We compared distributions of prognostic factors and period relative survival estimates for three population-based CRs in Germany (Schleswig-Holstein (SH), Rhineland-Palatinate (RP), Saarland (SA)) computed on datasets extracted one (DY+1) to 5 years after the year of diagnosis (…

MaleOncologyCancer Researchmedicine.medical_specialtyTime FactorsPopulationCancer registrationEmpirical Research03 medical and health sciences0302 clinical medicineGermanyNeoplasmsInternal medicinemedicineHumansRegistrieseducationLung cancereducation.field_of_studyRelative survivalbusiness.industryCancer survivalmedicine.diseaseSurvival AnalysisCancer registryCase ascertainmentOncology030220 oncology & carcinogenesisFemalebusinessReference datasetInternational Journal of Cancer

researchProduct

Perspectives on the Impact of Sampling Design and Intensity on Soil Microbial Diversity Estimates

2019

Soil bacterial communities have long been recognized as important ecosystem components, and have been the focus of many local and regional studies. However, there is a lack of data at large spatial scales, on the biodiversity of soil microorganisms; national or more extensive studies to date have typically consisted of low replication of haphazardly collected samples. This has led to large spatial gaps in soil microbial biodiversity data. Using a pre-existing dataset of bacterial community composition across a 16-km regular sampling grid in France, we show that the number of detected OTUs changes little under different sampling designs (grid, random, or representative), but increases with t…

Microbiology (medical)Biomelcsh:QR1-502BiodiversityDistribution (economics)Sample (statistics)Microbiologylcsh:Microbiology03 medical and health sciencesglobal datasetsSampling designCitizen scienceEcosystemnational datasetsbiogeography030304 developmental biologybiodiversity0303 health sciences030306 microbiologybusiness.industrysoil bacteriaEnvironmental resource managementSampling (statistics)PerspectiveEnvironmental sciencebusinessFrontiers in Microbiology

researchProduct

UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

2015

Background Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently repres…

Multiple datasets analysisMethodology ArticleGene Expression ProfilingCell CycleGenes FungalBi-CoPaMSaccharomyces cerevisiaeConsistent co-expressionBiochemistryComputer Science ApplicationsComputingMethodologies_PATTERNRECOGNITIONGenome-wide analysisUNCLESCluster AnalysisGenome FungalMolecular BiologyOligonucleotide Array Sequence Analysis

researchProduct

Assessment of the 4-factor score: Retrospective analysis of 586 CLL patients receiving ibrutinib. A campus CLL study

2021

Not Available

OncologyMalechronic B cell leukemiachronic lymphocytic leukemia; ibrutinib; 4-factor score; prognosis.Datasets as TopicSeverity of Illness Indexchemistry.chemical_compoundPiperidinesRetrospective analysisMulticenter Studies as TopicChronicLeukemiaHematologyMiddle AgedPrognosisLymphocyticProgression-Free SurvivalIbrutinibFemalemedicine.medical_specialtyreal-word studyFactor scoreAntineoplastic AgentsAdenine; Aged; Antineoplastic Agents; Datasets as Topic; Female; Follow-Up Studies; Humans; Leukemia Lymphocytic Chronic B-Cell; Male; Middle Aged; Multicenter Studies as Topic; Piperidines; Prognosis; Progression-Free Survival; Proportional Hazards Models; Protein Kinase Inhibitors; Reproducibility of Results; Retrospective Studies; Risk Assessment; Severity of Illness Index; Survival AnalysisRisk AssessmentNOibrutinibInternal medicineSeverity of illnessmedicineHumansProgression-free survivalProtein Kinase InhibitorsSurvival analysisAgedProportional Hazards ModelsRetrospective Studiesbusiness.industryProportional hazards modelAdenineB-CellReproducibility of ResultsRetrospective cohort studyAdenine; Aged; Antineoplastic Agents; Datasets as Topic; Female; Follow-Up Studies; Humans; Leukemia Lymphocytic Chronic B-Cell; Male; Middle Aged; Multicenter Studies as Topic; Piperidines; Prognosis; Progression-Free Survival; Proportional Hazards Models; Protein Kinase Inhibitors; Reproducibility of Results; Retrospective Studies; Risk Assessment; Survival Analysis; Severity of Illness IndexLeukemia Lymphocytic Chronic B-CellSurvival AnalysisSettore MED/15 - MALATTIE DEL SANGUEchemistrybusinesschronic lymphocytic leukaemiaFollow-Up Studies

researchProduct