Search results for "Data mining"
showing 10 items of 907 documents
Unveiling Bacterial Interactions through Multidimensional Scaling and Dynamics Modeling
2015
AbstractWe propose a new strategy to identify and visualize bacterial consortia by conducting replicated culturing of environmental samples coupled with high-throughput sequencing and multidimensional scaling analysis, followed by identification of bacteria-bacteria correlations and interactions. We conducted a proof of concept assay with pine-tree resin-based media in ten replicates, which allowed detecting and visualizing dynamical bacterial associations in the form of statistically significant and yet biologically relevant bacterial consortia.
Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach.
2015
Genome-scale metabolic models usually contain inconsistencies that manifest as blocked reactions and gap metabolites. With the purpose to detect recurrent inconsistencies in metabolic models, a large-scale analysis was performed using a previously published dataset of 130 genome-scale models. The results showed that a large number of reactions (~22%) are blocked in all the models where they are present. To unravel the nature of such inconsistencies a metamodel was construed by joining the 130 models in a single network. This metamodel was manually curated using the unconnected modules approach, and then, it was used as a reference network to perform a gap-filling on each individual genome-s…
2021
Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…
Predictability and prediction of lowest observed adverse effect levels in a structurally heterogeneous set of chemicals
2005
A database of chronic lowest observed adverse effect levels (LOAELs) for 234 compounds, previously compiled from different sources (Toxicology Letters79, 131-143 (1995)), was modelled using graph theoretical descriptors. This study reveals that data are not homogeneous. Only those data originating from the U.S. Environmental Protection Agency (EPA) reports could be well modelled by multilinear regression (MLR) and linear discriminant analysis (LDA). In contrast, data available from the specific procedures of the National Toxicology Program (NTP) database introduced noise and did not render good models either alone, or in combination with the EPA data.
Comparison of different predictive models for nutrient estimation in a sequencing batch reactor for wastewater treatment
2006
Abstract In this paper different predictive models for nutrient estimation in a sequencing batch reactor (SBR) for wastewater treatment are compared: principal component regression (PCR), partial least squares (PLS), and artificial neural networks (ANNs). Two unfolding procedures were used: batch-wise and variable-wise. For the latter unfolding method, X and Y matrix augmentation with lagged variables were used in some models to incorporate process dynamics. The results have shown that batch-wise unfolding PLS models outperform the other approaches. The ANN models are good predictive models, but in this particular case-study, they do not outperform those multivariate projection models that …
Empirical Orthogonal Function and Functional Data Analysis Procedures to Impute Long Gaps in Environmental Data
2016
Air pollution data sets are usually spatio-temporal multivariate data related to time series of different pollutants recorded by a monitoring network. To improve the estimate of functional data when missing values, and mainly long gaps, are present in the original data set, some procedures are here proposed considering jointly Functional Data Analysis and Empirical Orthogonal Function approaches. In order to compare and validate the proposed procedures, a simulation plan is carried out and some performance indicators are computed. The obtained results show that one of the proposed procedures works better than the others, providing a better reconstruction especially in presence of long gaps.
On the internal multivariate quality control of analytical laboratories. A case study: the quality of drinking water
2001
Abstract Multivariate statistical process control (MSPC) tools, based on principal component analysis (PCA), partial least squares (PLS) regression and other regression models, are used in the present study for automatic detection of possible errors in the methods used for routine multiparametric analysis in order to design an internal Multivariate Analytical Quality Control (iMAQC) program. Such tools could notice possible failures in the analytical methods without resorting to any external reference since they use their own analytical results as a source for the diagnosis of the method's quality. Pseudo-univariate control charts provide an attractive alternative to traditional univariate …
Statistical Multivariate Techniques for the Stock Location Assignment Problem
1998
In previous papers we proposed to apply multivariate statistical methodologies, like Multidimensional Scaling (MDS) and Seriation to the stock location assignment problem of a warehouse, often solved by considering the Cube per Order Index (COI). In this paper we compare the results by MDS, Seriation, a COI based method and the Maximum Path criterion, considering the data of a whole year of a Sicilian supermarket chain warehouse. The comparison is based on the simulated times to satisfy a sample of real orders.
Estimating brain connectivity when few data points are available: Perspectives and limitations
2017
Methods based on the use of multivariate autoregressive modeling (MVAR) have proved to be an accurate and flexible tool for the estimation of brain functional connectivity. The multivariate approach, however, implies the use of a model whose complexity (in terms of number of parameters) increases quadratically with the number of signals included in the problem. This can often lead to an underdetermined problem and to the condition of multicollinearity. The aim of this paper is to introduce and test an approach based on Ridge Regression combined with a modified version of the statistics usually adopted for these methods, to broaden the estimation of brain connectivity to those conditions in …
Archetypal analysis: contributions for estimating boundary cases in multivariate accommodation problem
2013
[EN] The use of archetypal analysis is proposed in order to determine a set of representative cases that entail a certain percentage of the population, in the accommodation problem. A well-known anthropometric database has been used in order to compare our methodology with the common used PCA-approach, showing the advantages of our methodology: the level of accommodation is reached unlike the PCA approach, no more adjustments are necessary, the user can decide the number of archetypes to consider or leave the selection by a criterion. Unlike PCA, the objective of the archetypal analysis is obtaining extreme individuals, so it is the appropriate statistical technique for solving this type of…