Author: Alberto Ferrer

0000000000076193

AUTHOR

Alberto Ferrer

0000-0001-7244-5947

showing 9 related works from this author

Evaluation of the effect of chance correlations on variable selection using Partial Least Squares -Discriminant Analysis

2013

Variable subset selection is often mandatory in high throughput metabolomics and proteomics. However, depending on the variable to sample ratio there is a significant susceptibility of variable selection towards chance correlations. The evaluation of the predictive capabilities of PLSDA models estimated by cross-validation after feature selection provides overly optimistic results if the selection is performed on the entire set and no external validation set is available. In this work, a simulation of the statistical null hypothesis is proposed to test whether the discrimination capability of a PLSDA model after variable selection estimated by cross-validation is statistically higher than t…

Variable selectionESTADISTICA E INVESTIGACION OPERATIVAFeature selectionChance correlationsAnalytical ChemistrySet (abstract data type)ResamplingPartial least squares regressionStatisticsHumansMetabolomicsLeast-Squares AnalysisSelection (genetic algorithm)ProbabilityGaucher DiseaseModels StatisticalChemistryDiscriminant AnalysisReproducibility of ResultsPartial Least Squares-Discriminant Analysis (PLSDA)Linear discriminant analysisVariable (computer science)Null hypothesisAlgorithmsSoftware

researchProduct

Using Unfold-PCA for batch-to-batch start-up process understanding and steady-state identification in a sequencing batch reactor

2007

In chemical and biochemical processes, steady-state models are widely used for process assessment, control and optimisation. In these models, parameter adjustment requires data collected under nearly steady-state conditions. Several approaches have been developed for steady-state identification (SSID) in continuous processes, but no attempt has been made to adapt them to the singularities of batch processes. The main aim of this paper is to propose an automated method based on batch-wise unfolding of the three-way batch process data followed by a principal component analysis (Unfold-PCA) in combination with the methodology of Brown and Rhinehart 2 for SSID. A second goal of this paper is to…

Steady statebusiness.industryProcess (engineering)Computer scienceApplied MathematicsSequencing batch reactorStart upAnalytical ChemistryChemometricsIdentification (information)Principal component analysisBatch processingProcess engineeringbusinessJournal of Chemometrics

researchProduct

Process understanding of a wastewater batch reactor with block-wise PLS

2007

In this work a systematic methodology ‘block-wise PLS’ has been applied to thoroughly analyse data from a sequencing batch reactor (SBR) operated for biological phosphorus removal from wastewater. The aim of this study was to diagnose process variables (collected by the inexpensive and low-maintenance sensors installed in the SBR) likely related to the main key indicator of process performance: the phosphorus removal efficiency (PRE), determined off-line in the quality control laboratory. In this way, it is intended to aid the process operators in the detection of abnormal values of these critical variables which would indicate undesirable process performance, so that, they could act on the…

Statistics and Probabilitybusiness.industryComputer scienceProcess (engineering)Ecological Modelingmedia_common.quotation_subjectBatch reactorSequencing batch reactorEnhanced biological phosphorus removalWastewaterQuality (business)Process engineeringbusinessBlock (data storage)media_commonEnvironmetrics

researchProduct

Missing Data

2009

In this chapter, we deal with the problem of missing data in principal component analysis (PCA) and partial least squares (PLS) methods. First, we review several statistical methods proposed in the literature for handling missing data. Both single and multiple imputation (MI) methods are studied and compared using simulated data. After this, we particularize the missing data problem for building and exploiting multivariate calibration models. Several approaches proposed in the literature are introduced and their performance compared based on several real data sets.

Computer scienceIterative methodSimulated dataPrincipal component analysisExpectation–maximization algorithmPartial least squares regressionMultivariate calibrationMissing data problemData miningcomputer.software_genreMissing datacomputer

researchProduct

MCR-ALS on metabolic networks: Obtaining more meaningful pathways

2015

[EN] With the aim of understanding the flux distributions across a metabolic network, i.e. within living cells, Principal Component Analysis (PCA) has been proposed to obtain a set of orthogonal components (pathways) capturing most of the variance in the flux data. The problems with this method are (i) that no additional information can be included in the model, and (ii) that orthogonality imposes a hard constraint, not always reasonably. To overcome these drawbacks, here we propose to use a more flexible approach such as Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) to obtain this set of biological pathways through the network. By using this method, different constraint…

Mathematical optimizationProcess Chemistry and TechnologyESTADISTICA E INVESTIGACION OPERATIVAMetabolic networkMetabolic networkLeast SquaresVariance (accounting)Least squaresINGENIERIA DE SISTEMAS Y AUTOMATICAComputer Science ApplicationsAnalytical ChemistrySet (abstract data type)Constraint (information theory)OrthogonalityPichia pastorisPrincipal component analysisA priori and a posterioriMultivariate Curve Resolution-AlternatingGrey modellingSpectroscopySoftwareMathematics

researchProduct

Metabolic flux understanding of Pichia pastoris grown on heterogenous culture media

2014

[EN] Within the emergent field of Systems Biology, mathematical models obtained from physical chemical laws (the so-called first principles-based models) of microbial systems are employed to discern the principles that govern cellular behaviour and achieve a predictive understanding of cellular functions. The reliance on this biochemical knowledge has the drawback that some of the assumptions (specific kinetics of the reaction system, unknown dynamics and values of the model parameters) may not be valid for all the metabolic possible states of the network. In this uncertainty context, the combined use of fundamental knowledge and data measured in the fermentation that describe the behaviour…

Principal Component AnalysisbiologyMathematical modelManufacturing processComputer scienceProcess Chemistry and TechnologySystems biologyMonte Carlo samplingESTADISTICA E INVESTIGACION OPERATIVACellular functionsMetabolic networkMetabolic networkMissing-data methods for Exploratory Data AnalysisContext (language use)biology.organism_classificationINGENIERIA DE SISTEMAS Y AUTOMATICAComputer Science ApplicationsAnalytical ChemistryPichia pastorisEconometricsBiochemical engineeringPossibilistic consistency analysisFlux (metabolism)SpectroscopySoftware

researchProduct

Multivariate SPC of a sequencing batch reactor for wastewater treatment

2007

Data from a sequencing batch reactor (SBR) operated for enhanced biological phosphorus removal from wastewater have been analysed in order to propose an efficient MSPC scheme of the process. Different multivariate bilinear approaches have been applied and compared in terms of their capabilities for on-line and off-line fault detection and diagnosis. The typical three-way data structure from a batch process was unfolded batch-wise and variable-wise. In the latter case, two models were built: with (AT) and without (WKFH) removing the main non-linear behaviour of the process data. Since the process consists of several stages, the monitoring strategies tested include: one model for all stages a…

Multivariate statisticsComputer sciencebusiness.industryProcess Chemistry and TechnologyProcess (computing)Bilinear interpolationSequencing batch reactorCovarianceData structureFault detection and isolationComputer Science ApplicationsAnalytical ChemistryBatch processingProcess engineeringbusinessSpectroscopySoftwareChemometrics and Intelligent Laboratory Systems

researchProduct

How to simulate normal data sets with the desired correlation structure

2010

The Cholesky decomposition is a widely used method to draw samples from multivariate normal distribution with non-singular covariance matrices. In this work we introduce a simple method by using singular value decomposition (SVD) to simulate multivariate normal data even if the covariance matrix is singular, which is often the case in chemometric problems. The covariance matrix can be specified by the user or can be generated by specifying a subset of the eigenvalues. The latter can be an advantage for simulating data sets with a particular latent structure. This can be useful for testing the performance of chemometric methods with data sets matching the theoretical conditions for their app…

Mathematical optimizationCovariance functionCovariance matrixProcess Chemistry and TechnologyMathematicsofComputing_NUMERICALANALYSISMultivariate normal distributionCovarianceComputer Science ApplicationsAnalytical ChemistryEstimation of covariance matricesScatter matrixMatrix normal distributionCMA-ESAlgorithmComputer Science::DatabasesSpectroscopySoftwareMathematicsChemometrics and Intelligent Laboratory Systems

researchProduct

Comparison of different predictive models for nutrient estimation in a sequencing batch reactor for wastewater treatment

2006

Abstract In this paper different predictive models for nutrient estimation in a sequencing batch reactor (SBR) for wastewater treatment are compared: principal component regression (PCR), partial least squares (PLS), and artificial neural networks (ANNs). Two unfolding procedures were used: batch-wise and variable-wise. For the latter unfolding method, X and Y matrix augmentation with lagged variables were used in some models to incorporate process dynamics. The results have shown that batch-wise unfolding PLS models outperform the other approaches. The ANN models are good predictive models, but in this particular case-study, they do not outperform those multivariate projection models that …

Multivariate statisticsArtificial neural networkbusiness.industryComputer scienceProcess Chemistry and TechnologySequencing batch reactorSoft sensorMachine learningcomputer.software_genreMissing dataComputer Science ApplicationsAnalytical ChemistryPartial least squares regressionPrincipal component regressionArtificial intelligenceData miningbusinesscomputerModel buildingSpectroscopySoftwareChemometrics and Intelligent Laboratory Systems

researchProduct