Search results for "dataset"
showing 10 items of 77 documents
A European Multi Lake Survey dataset of environmental variables, phytoplankton pigments and cyanotoxins
2018
Under ongoing climate change and increasing anthropogenic activity, which continuously challenge ecosystem resilience, an in-depth understanding of ecological processes is urgently needed. Lakes, as providers of numerous ecosystem services, face multiple stressors that threaten their functioning. Harmful cyanobacterial blooms are a persistent problem resulting from nutrient pollution and climate-change induced stressors, like poor transparency, increased water temperature and enhanced stratification. Consistency in data collection and analysis methods is necessary to achieve fully comparable datasets and for statistical validity, avoiding issues linked to disparate data sources. The Europea…
A Stochastic Variance Factor Model for Large Datasets and an Application to S&P Data
2008
The aim of this paper is to consider multivariate stochastic volatility models for large dimensional datasets. We suggest the use of the principal component methodology of Stock and Watson [Stock, J.H., Watson, M.W., 2002. Macroeconomic forecasting using diffusion indices. Journal of Business and Economic Statistics, 20, 147–162] for the stochastic volatility factor model discussed by Harvey, Ruiz, and Shephard [Harvey, A.C., Ruiz, E., Shephard, N., 1994. Multivariate Stochastic Variance Models. Review of Economic Studies, 61, 247–264]. We provide theoretical and Monte Carlo results on this method and apply it to S&P data.
USE-Net: Incorporating Squeeze-and-Excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets
2019
Prostate cancer is the most common malignant tumors in men but prostate Magnetic Resonance Imaging (MRI) analysis remains challenging. Besides whole prostate gland segmentation, the capability to differentiate between the blurry boundary of the Central Gland (CG) and Peripheral Zone (PZ) can lead to differential diagnosis, since tumor's frequency and severity differ in these regions. To tackle the prostate zonal segmentation task, we propose a novel Convolutional Neural Network (CNN), called USE-Net, which incorporates Squeeze-and-Excitation (SE) blocks into U-Net. Especially, the SE blocks are added after every Encoder (Enc USE-Net) or Encoder-Decoder block (Enc-Dec USE-Net). This study ev…
An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments
2020
The problem of training with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications include those related to face recognition. In the audio domain, music fraud or speaker recognition can be clearly benefited from FSL methods. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, usin…
Human experts vs. machines in taxa recognition
2020
The step of expert taxa recognition currently slows down the response time of many bioassessments. Shifting to quicker and cheaper state-of-the-art machine learning approaches is still met with expert scepticism towards the ability and logic of machines. In our study, we investigate both the differences in accuracy and in the identification logic of taxonomic experts and machines. We propose a systematic approach utilizing deep Convolutional Neural Nets with the transfer learning paradigm and extensively evaluate it over a multi-pose taxonomic dataset with hierarchical labels specifically created for this comparison. We also study the prediction accuracy on different ranks of taxonomic hier…
Fast Estimation of Diffusion Tensors under Rician noise by the EM algorithm
2016
Diffusion tensor imaging (DTI) is widely used to characterize, in vivo, the white matter of the central nerve system (CNS). This biological tissue contains much anatomic, structural and orientational information of fibers in human brain. Spectral data from the displacement distribution of water molecules located in the brain tissue are collected by a magnetic resonance scanner and acquired in the Fourier domain. After the Fourier inversion, the noise distribution is Gaussian in both real and imaginary parts and, as a consequence, the recorded magnitude data are corrupted by Rician noise. Statistical estimation of diffusion leads a non-linear regression problem. In this paper, we present a f…
Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe
2017
During the 1st millennium before the Common Era (BCE), nomadic tribes associated with the Iron Age Scythian culture spread over the Eurasian Steppe, covering a territory of more than 3,500 km in breadth. To understand the demographic processes behind the spread of the Scythian culture, we analysed genomic data from eight individuals and a mitochondrial dataset of 96 individuals originating in eastern and western parts of the Eurasian Steppe. Genomic inference reveals that Scythians in the east and the west of the steppe zone can best be described as a mixture of Yamnaya-related ancestry and an East Asian component. Demographic modelling suggests independent origins for eastern and western g…
Characterization of a fractured basement reservoir using high-resolution 3D seismic and logging datasets: A case study of the Sab'atayn Basin, Yemen.
2018
The Sab'atayn Basin is one of the most prolific Mesozoic hydrocarbon basins located in central Yemen. It has many oil producing fields including the Habban Field with oil occurrences in fractured basement rocks. A comprehensive seismic analysis of fractured basement reservoirs was performed to identify the structural pattern and mechanism of hydrocarbon entrapment and reservoir characteristics. A 3D post-stack time migration seismic cube and logging data of 20 wells were used and several 2D seismic sections were constructed and interpreted. Depth structure maps were generated for the basement reservoir and overlying formations. The top of the basement reservoir is dissected by a set of NW-S…
Big Data in Medical Science–a Biostatistical View
2015
Big data” is a universal buzzword in business and science, referring to the retrieval and handling of ever-growing amounts of information. It can be assumed, for example, that a typical hospital generates hundreds of terabytes (1 TB = 1012 bytes) of data annually in the course of patient care (1). For instance, exome sequencing, which results in 5 gigabytes (1 GB = 109 bytes) of data per patient, is on the way to becoming routine (2). The analysis of such enormous volumes of information, i.e., organization and description of the data and the drawing of (scientifically valid) conclusions, can already hardly be accomplished with the traditional tools of computer science and statistics. For ex…
CArDIS : A Swedish Historical Handwritten Character and Word Dataset
2022
This paper introduces a new publicly available image-based Swedish historical handwritten character and word dataset named Character Arkiv Digital Sweden (CArDIS) (https://cardisdataset.github.io/CARDIS/). The samples in CArDIS are collected from 64, 084 Swedish historical documents written by several anonymous priests between 1800 and 1900. The dataset contains 116, 000 Swedish alphabet images in RGB color space with 29 classes, whereas the word dataset contains 30, 000 image samples of ten popular Swedish names as well as 1, 000 region names in Sweden. To examine the performance of different machine learning classifiers on CArDIS dataset, three different experiments are conducted. In the …