Search results for "Data set"
showing 10 items of 154 documents
A population density grid for Spain
2013
This article describes a high-resolution land cover data set for Spain and its application to dasymetric population mapping (at census tract level). Eventually, this vector layer is transformed into a grid format. The work parallels the effort of the Joint Research Centre (JRC) of the European Commission, in collaboration with Eurostat and the European Environment Agency (EEA), in building a population density grid for the whole of Europe, combining CORINE Land Cover with population data per commune. We solve many of the problems due to the low resolution of CORINE Land Cover, which are especially visible with Spanish data. An accuracy assessment is carried out from a simple aggregation of …
Using SOM and PCA for analysing and interpreting data from a P-removal SBR
2008
This paper focuses on the application of Kohonen self-organizing maps (SOM) and principal component analysis (PCA) to thoroughly analyse and interpret multidimensional data from a biological process. The process is aimed at enhanced biological phosphorus removal (EBPR) from wastewater. In this work, SOM and PCA are firstly applied to the data set in order to identify and analyse the relationships among the variables in the process. Afterwards, K-means algorithm is used to find out how the observations can be grouped, on the basis of their similarity, in different classes. Finally, the information obtained using these intelligent tools is used for process interpretation and diagnosis. In the…
Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms
2006
This paper presents a methodology to estimate the future success of a collaborative recommender in a citizen web portal. This methodology consists of four stages, three of them are developed in this study. First of all, a user model, which takes into account some usual characteristics of web data, is developed to produce artificial data sets. These data sets are used to carry out a clustering algorithm comparison in the second stage of our approach. This comparison provides information about the suitability of each algorithm in different scenarios. The benchmarked clustering algorithms are the ones that are most commonly used in the literature: c-Means, Fuzzy c-Means, a set of hierarchical …
ViziQuer: A Web-Based Tool for Visual Diagrammatic Queries Over RDF Data
2018
We demonstrate the open source ViziQuer tool for web-based creation and execution of visual diagrammatic queries over RDF/SPARQL data. The tool supports the data instance level and statistics queries, providing visual counterparts for most of SPARQL 1.1 select query constructs, including aggregation and subqueries. A query environment can be created over a user-supplied SPARQL endpoint with known data schema (a data schema exploration service is available, as well). There are pre-defined demonstration query environments for a mini-university data set, a fragment of synthetic similar to reality hospital data set, and a variant of Linked Movie Database RDF data set.
A hierarchical clustering strategy and its application to proteomic interaction data
2003
We describe a novel strategy of hierarchical clustering analysis, particularly useful to analyze proteomic interaction data. The logic behind this method is to use the information for all interactions among the elements of a set to evaluate the strength of the interaction of each pair of elements. Our procedure allows the characterization of protein complexes starting with partial data and the detection of "promiscuous" proteins that bias the results, generating false positive data. We demonstrate the usefulness of our strategy by analyzing a real case that involves 137 Saccharomyces cerevisiae proteins. Because most functional studies require the evaluation of similar data sets, our method…
New historical data for long-term swordfish ecological studies in the Mediterranean Sea
2021
Abstract. Management of marine fisheries and ecosystems is constrained by knowledge based on datasets with limited temporal coverage. Many populations and ecosystems were perturbed long before scientific investigations began. This situation is particularly acute for the largest and commercially most valuable species. We hypothesized that historical trap fishery records for bluefin tuna (Thunnus thynnus Linnaeus, 1758) could contain catch data and information for other, bycatch species, such as swordfish (Xiphias gladius Linnaeus, 1758). This species has a long history of exploitation and is presently overexploited, yet indicators of its status (biomass) used in fishery management only start…
Iteratively reweighted least squares in crystal structure refinements
2011
The use of robust techniques in crystal structure multipole refinements of small molecules as an alternative to the commonly adopted weighted least squares is presented and discussed. As is well known, the main disadvantage of least-squares fitting is its sensitivity to outliers. The elimination from the data set of the most aberrant reflections (due to both experimental errors and incompleteness of the model) is an effective practice that could yield satisfactory results, but it is often complicated in the presence of a great number of bad data points, whose one-by-one elimination could become unattainable. This problem can be circumvented by means of a robust least-squares regression that…
A Conceptual Probabilistic Model for the Induction of Image Semantics
2010
In this paper we propose a model based on a conceptual space automatically induced from data. The model is inspired to a well-founded robotics cognitive architecture which is organized in three computational areas: sub-conceptual, linguistic and conceptual. Images are objects in the sub-conceptual area, that become "knoxels" into the conceptual area. The application of the framework grants the automatic emerging of image semantics into the linguistic area. The core of the model is a conceptual space induced automatically from a set of annotated images that exploits and mixes different information concerning the set of images. Multiple low level features are extracted to represent images and…
Active Learning Methods for Efficient Hybrid Biophysical Variable Retrieval
2016
Kernel-based machine learning regression algorithms (MLRAs) are potentially powerful methods for being implemented into operational biophysical variable retrieval schemes. However, they face difficulties in coping with large training data sets. With the increasing amount of optical remote sensing data made available for analysis and the possibility of using a large amount of simulated data from radiative transfer models (RTMs) to train kernel MLRAs, efficient data reduction techniques will need to be implemented. Active learning (AL) methods enable to select the most informative samples in a data set. This letter introduces six AL methods for achieving optimized biophysical variable estimat…
The silver collection of San Gennaro treasure (Neaples): A multivariate statistic approach applied to X-ray fluorescence data
2021
Abstract In this work we report an X-ray fluorescence spectroscopy (XRF) study combined with a multivariate approach allowing to detect compositional differences and similarities among the alloys used in realization of silver collection of San Gennaro items collection. The San Gennaro treasure in Naples (Italy) represents, in fact, one of the most important silver collections in the world. The classification of the collection items is very complex, not only for the large number of objects, but also in consideration that between 1600 and 1700, in Naples, more than 350 laboratories were active, most of them specialized in specific art of work. As a consequence, a given collection object could…