Search results for " data"
showing 10 items of 7516 documents
Estimating aggregated nutrient fluxes in four Finnish rivers via Gaussian state space models
2013
Reliable estimates of the nutrient fluxes carried by rivers from land-based sources to the sea are needed for efficient abatement of marine eutrophication. Although nutrient concentrations in rivers generally display large temporal variation, sampling and analysis for nutrients, unlike flow measurements, are rarely performed on a daily basis. The infrequent data calls for ways to reliably estimate the nutrient concentrations of the missing days. Here, we use the Gaussian state space models with daily water flow as a predictor variable to predict missing nutrient concentrations for four agriculturally impacted Finnish rivers. Via simulation of Gaussian state space models, we are able to esti…
Weighted Clustering of Sparse Educational Data
2015
Clustering as an unsupervised technique is predominantly used in unweighted settings. In this paper, we present an efficient version of a robust clustering algorithm for sparse educational data that takes the weights, aligning a sample with the corresponding population, into account. The algorithm is utilized to divide the Finnish student population of PISA 2012 (the latest data from the Programme for International Student Assessment) into groups, according to their attitudes and perceptions towards mathematics, for which one third of the data is missing. Furthermore, necessary modifications of three cluster indices to reveal an appropriate number of groups are proposed and demonstrated. pe…
SparseHC: A Memory-efficient Online Hierarchical Clustering Algorithm
2014
Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance to this research area. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted and possibly sparse distance matrix chunk-by-chunk. Meanwhile, a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The k…
Spatial spillovers in France: a study on individual count data at the city level
2007
Our study aims to measure the effects of spatial R&D spillovers on firms' patent production at the city level. We use an original method to estimate the spatial dimension of spillovers using count data. The method, based on a generalized cross entropy approach, allows us to test spatial auto-correlation. The main result is that when there are local spillovers, their impact on knowledge production is different according to the geographical area and the sector.
Detecting clusters in spatially correlated waveforms
2017
Seismic networks often record signals characterized by similar shapes that provide important information according to their geographic positions. We propose an approach to identify homogeneous clusters of seismic waves, combining analysis of waveforms with metadata and spectrogram information. In waveforms clustering, cross-correlation measures between signals may presents some limitations, so we refer to more recent contributes relating data-depth based clustering analysis. The mechanism for alignment is also an important topic of the analysis: warping (or aligning) procedures identify nuisance effects in phase variation, that, if ignored, may result in a possible loss of information and t…
Spatial Econometrics and Spatial Data Pooled over Time: Towards an Adapted Modelling Approach
2013
International audience
Continuum: A spatiotemporal data model to represent and qualify filiation relationships
2013
International audience; This work introduces an ontology-based spatio-temporal data model to represent entities evolving in space and time. A dynamic phenomenon generates a complex relationship network between the entities involved in the process. At the abstract level, the relationships can be identity or topological filiations. The existence of an identity filiation depends on whether the object changes its identity or not. On the other hand, topological filiations are based exclusively on the spatial component, like in the case of growth, reduction, merging or splitting. When combining identity and topological filiations, six filiation relationships are obtained, forming a second abstrac…
Probabilistic and preferential sampling approaches offer integrated perspectives of Italian forest diversity
2023
Aim: Assessing the performances of different sampling approaches for documenting community diversity may help to identify optimal sampling efforts and strategies, and to enhance conservation and monitoring planning. Here, we used two data sets based on probabilistic and preferential sampling schemes of Italian forest vegetation to analyze the multifaceted performances of the two approaches across three major forest types at a large scale. Location: Italy. Methods: We pooled 804 probabilistic and 16,259 preferential forest plots as samples of vascular plant diversity across the country. We balanced the two data sets in terms of sizes, plot size, geographical position, and vegetation types. F…
Joint second-order parameter estimation for spatio-temporal log-Gaussian Cox processes
2018
We propose a new fitting method to estimate the set of second-order parameters for the class of homogeneous spatio-temporal log-Gaussian Cox point processes. With simulations, we show that the proposed minimum contrast procedure, based on the spatio-temporal pair correlation function, provides reliable estimates and we compare the results with the current available methods. Moreover, the proposed method can be used in the case of both separable and non-separable parametric specifications of the correlation function of the underlying Gaussian Random Field. We describe earthquake sequences comparing several Cox model specifications.
The Dawn of the Human-Machine Era: A forecast of new and emerging language technologies
2021
New language technologies are coming, thanks to the huge and competing private investment fuelling rapid progress; we can either understand and foresee their effects, or be taken by surprise and spend our time trying to catch up. This report scketches out some transformative new technologies that are likely to fundamentally change our use of language. Some of these may feel unrealistically futuristic or far-fetched, but a central purpose of this report - and the wider LITHME network - is to illustrate that these are mostly just the logical development and maturation of technologies currently in prototype. But will everyone benefit from all these shiny new gadgets? Throughout this report we …