Search results for "Data mining"
showing 10 items of 907 documents
PMT: New analytical framework for automated evaluation of geo-environmental modelling approaches
2019
Geospatial computation, data transformation to a relevant statistical software, and step-wise quantitative performance assessment can be cumbersome, especially when considering that the entire modelling procedure is repeatedly interrupted by several input/output steps, and the self-consistency and self-adaptive response to the modelled data and the features therein are lost while handling the data from different kinds of working environments. To date, an automated and a comprehensive validation system, which includes both the cutoff-dependent and –independent evaluation criteria for spatial modelling approaches, has not yet been developed for GIS based methodologies. This study, for the fir…
A tool for filtering information in complex systems
2005
We introduce a technique to filter out complex data-sets by extracting a subgraph of representative links. Such a filtering can be tuned up to any desired level by controlling the genus of the resulting graph. We show that this technique is especially suitable for correlation based graphs giving filtered graphs which preserve the hierarchical organization of the minimum spanning tree but containing a larger amount of information in their internal structure. In particular in the case of planar filtered graphs (genus equal to 0) triangular loops and 4 element cliques are formed. The application of this filtering procedure to 100 stocks in the USA equity markets shows that such loops and cliqu…
Resolution enhancement in integral microscopy by physical interpolation
2015
Integral-imaging technology has demonstrated its capability for computing depth images from the microimages recorded after a single shot. This capability has been shown in macroscopic imaging and also in microscopy. Despite the possibility of refocusing different planes from one snap-shot is crucial for the study of some biological processes, the main drawback in integral imaging is the substantial reduction of the spatial resolution. In this contribution we report a technique, which permits to increase the two-dimensional spatial resolution of the computed depth images in integral microscopy by a factor of √2. This is made by a double-shot approach, carried out by means of a rotating glass…
Enhanced transport-related air pollution prediction through a novel metamodel approach
2017
Abstract This research proposes a novel approach to improve the ability to forecast low frequency extreme events of transport-related pollution in urban areas using a limited input data set. The approach is based on the idea of a self-managing model, able to adapt to unexpected changes in pollution level. In more detail, for a given combination of variables, it selects the most suitable prediction model within a set of alternative air quality models, estimated for a wider range of locations and conditions. In this study, the new approach is tested for the prediction of nitrogen dioxide concentration in the United Kingdom (UK), specifically in an air quality monitoring site of the Greater Ma…
Polynomial Regression and Measurement Error
2020
Many of the phenomena of interest in information systems (IS) research are nonlinear, and it has consequently been recognized that by applying linear statistical models (e.g., linear regression), we may ignore important aspects of these phenomena. To address this issue, IS researchers are increasingly applying nonlinear models to their datasets. One popular analytical technique for the modeling and analysis of nonlinear relationships is polynomial regression, which in its simplest form fits a "U-shaped" curve to the data. However, the use of polynomial regression can be problematic when the independent variables are contaminated with measurement error, and the implications of error can be m…
Control of dataset bias in combined Affymetrix cohorts of triple negative breast cancer
2014
AbstractHeterogenous subtypes of breast cancer need to be analyzed separately. Pooling of datasets can provide reasonable sample sizes but dataset bias is an important concern. We assembled a combined dataset of 579 Affymetrix microarrays from triple negative breast cancer (TNBC) in Gene Expression Omnibus (GEO) series GSE31519. We developed a method for selecting comparable datasets and to control for the amount of dataset bias of individual probesets.
Modeling recurrent distributions in streams using possible worlds
2015
Discovering changes in the data distribution of streams and discovering recurrent data distributions are challenging problems in data mining and machine learning. Both have received a lot of attention in the context of classification. With the ever increasing growth of data, however, there is a high demand of compact and universal representations of data streams that enable the user to analyze current as well as historic data without having access to the raw data. To make a first step towards this direction, we propose a condensed representation that captures the various — possibly recurrent — data distributions of the stream by extending the notion of possible worlds. The representation en…
A Approach to Clinical Proteomics Data Quality Control and Import
2011
International audience; Biomedical domain and proteomics in particular are faced with an increasing volume of data. The heterogeneity of data sources implies heterogeneity in the representation and in the content of data. Data may also be incorrect, implicate errors and can compromise the analysis of experiments results. Our approach aims to ensure the initial quality of data during import into an information system dedicated to proteomics. It is based on the joint use of models, which represent the system sources, and ontologies, which are use as mediators between them. The controls, we propose, ensure the validity of values, semantics and data consistency during import process.
CheS-Mapper - Chemical Space Mapping and Visualization in 3D
2012
Abstract Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which f…
Ergonomic Indicators and Physical Workload Risks in Food Production and Possibilities for Risk Prevention
2021
The food industry is the most important and largest manufacturing industry in Latvia, producing almost a third of all manufacturing output. Employees in a food production enterprises are exposed to a variety of ergonomic risks: monotonous work movements that can be repeated up to 1000 times a day, overloads that exceeds 30 kg in lifting and moving operations, forced working postures, fast work pace. The aim of the study was to identify ergonomic indicators related to physical load for packers in one medium-sized company producing potato starch in Latvia. When summarizing the results of the survey on burden-lifting rates, it should be noted that, in a shift, 30% of packers lift the burden fr…