Search results for "Dataset"
showing 10 items of 77 documents
CNN-Based Prostate Zonal Segmentation on T2-Weighted MR Images: A Cross-Dataset Study
2020
Prostate cancer is the most common cancer among US men. However, prostate imaging is still challenging despite the advances in multi-parametric magnetic resonance imaging (MRI), which provides both morphologic and functional information pertaining to the pathological regions. Along with whole prostate gland segmentation, distinguishing between the central gland (CG) and peripheral zone (PZ) can guide toward differential diagnosis, since the frequency and severity of tumors differ in these regions; however, their boundary is often weak and fuzzy. This work presents a preliminary study on deep learning to automatically delineate the CG and PZ, aiming at evaluating the generalization ability o…
Comparing the use of ERA5 reanalysis dataset and ground-based agrometeorological data under different climates and topography in Italy
2022
Study region: The study region is represented by seven irrigation districts distributed under different climate and topography conditions in Italy. Study focus: This study explores the reliability and consistency of the global ERA5 single levels and ERA5-Land reanalysis datasets in predicting the main agrometeorological estimates commonly used for crop water requirements calculation. In particular, the reanalysis data was compared, variable-by-variable (e.g., solar radiation, R; air temperature, T; relative humidity, RH; wind speed, u; reference evapotranspiration, ET), with in situ agrometeorological observations obtained from 66 automatic weather stations (2008–2020). In addition, the pre…
Lightweight LCP construction for next-generation sequencing datasets
2012
The advent of "next-generation" DNA sequencing (NGS) technologies has meant that collections of hundreds of millions of DNA sequences are now commonplace in bioinformatics. Knowing the longest common prefix array (LCP) of such a collection would facilitate the rapid computation of maximal exact matches, shortest unique substrings and shortest absent words. CPU-efficient algorithms for computing the LCP of a string have been described in the literature, but require the presence in RAM of large data structures. This prevents such methods from being feasible for NGS datasets. In this paper we propose the first lightweight method that simultaneously computes, via sequential scans, the LCP and B…
Towards A Twitter Observatory: A Multi-Paradigm Framework For Collecting, Storing And Analysing Tweets
2016
International audience; In this article we show how a multi-paradigm framework can fulfil the requirements of tweets analysis and reduce the waiting time for researchers that use computational resources and storage systems to support large-scale data analysis. The originality of our approach is to combine concerns about data harvesting, data storage, data analysis and data visualisation into a framework that supports inductive reasoning in multidisciplinary scientific research. Our main contribution is a polyglot storage system with a generic data model to support logical data independence and a set of tools that can provide a suitable solution for mixing different types of algorithms in or…
Scalable robust clustering method for large and sparse data
2018
Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. Moreprecisely, a general algorithm is described and the accuracy and scalability of a distributed implementation of the algorithm is tested. The obtained results allow us to conclude the viability of the proposed approach. peerReviewed
A Dataset of Annotated Omnidirectional Videos for Distancing Applications
2021
Omnidirectional (or 360°) cameras are acquisition devices that, in the next few years, could have a big impact on video surveillance applications, research, and industry, as they can record a spherical view of a whole environment from every perspective. This paper presents two new contributions to the research community: the CVIP360 dataset, an annotated dataset of 360° videos for distancing applications, and a new method to estimate the distances of objects in a scene from a single 360° image. The CVIP360 dataset includes 16 videos acquired outdoors and indoors, annotated by adding information about the pedestrians in the scene (bounding boxes) and the distances to the camera of some point…
Exploring learning analytics on YouTube: a tool to support students interactions analysis
2021
YouTube is a free online video-sharing platform that is often used by students for their learning activities. The interactions of the students when using the platform to shape new concepts, are worth to be investigated to better understand and to optimize the learning opportunities that take place in this platform. In this paper, we investigate which types of data are relevant to analyse the interactions of students with content on YouTube, and we introduce a new tool that emulates students’ interactions with the platform in order to provide data to be used in supporting Learning Analytics approaches. Our preliminary study inspects the tool effectiveness in data collection and analyses the …
speedglm: Fitting Linear and Generalized Linear Models to large data sets.
2009
This is an R packge to fit (generalized) linear models to large data sets. For data loaded in R memory the fitting is usually fast, especially if R is linked against an optimized BLAS. For data sets of size greater of R memory, the fitting is made by an updating algorithm
Selecting significant respondents from large audience datasets: The case of the World Hobbit Project
2016
International projects, online questionnaires, or data mining techniques now allow audience researchers to gather very large and complex datasets. But whilst data collection capacity is hugely growing, qualitative analysis, conversely, becomes increasingly difficult to conduct. In this paper, I suggest a strategy that might allow the researcher to manage this complexity. The World Hobbit Project dataset (36,109 cases), including answers to both closed and open-ended questions, was used for this purpose. The strategy proposed here is based on between-methods sequential triangulation, and tries to combine statistical techniques (k-means clustering) with textual analysis. K-means clustering pe…
Datos de investigación de ciencias de la salud con perspectiva de género [Dataset]
2020
Datos de investigación académica médica It is a dataset, about medical academic research, used with a gender perspective Se trata de un Fichero de datos manejados, referentes a investigación académica médica, con perspectiva de género