Search results for "Data mining"

showing 10 items of 907 documents

Unveiling Bacterial Interactions through Multidimensional Scaling and Dynamics Modeling

2015

AbstractWe propose a new strategy to identify and visualize bacterial consortia by conducting replicated culturing of environmental samples coupled with high-throughput sequencing and multidimensional scaling analysis, followed by identification of bacteria-bacteria correlations and interactions. We conducted a proof of concept assay with pine-tree resin-based media in ten replicates, which allowed detecting and visualizing dynamical bacterial associations in the form of statistically significant and yet biologically relevant bacterial consortia.

MultidisciplinaryBacteriaComputer scienceMicrobial Consortiafood and beveragesIdentification (biology)Data miningMultidimensional scalingcomputer.software_genreModels BiologicalcomputerArticleScientific Reports
researchProduct

Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach.

2015

Genome-scale metabolic models usually contain inconsistencies that manifest as blocked reactions and gap metabolites. With the purpose to detect recurrent inconsistencies in metabolic models, a large-scale analysis was performed using a previously published dataset of 130 genome-scale models. The results showed that a large number of reactions (~22%) are blocked in all the models where they are present. To unravel the nature of such inconsistencies a metamodel was construed by joining the 130 models in a single network. This metamodel was manually curated using the unconnected modules approach, and then, it was used as a reference network to perform a gap-filling on each individual genome-s…

MultidisciplinaryConsistency analysisBacteriaProcess (engineering)lcsh:RGenome scalelcsh:MedicineBiologycomputer.software_genreBioinformaticsModels BiologicalMetamodelingSet (abstract data type)Consistency (database systems)Bacterial ProteinsProof of conceptlcsh:QData miningMetagenomicsCompleteness (statistics)lcsh:SciencecomputerGenome BacterialMetabolic Networks and PathwaysResearch ArticlePLoS ONE
researchProduct

2021

Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific m…

MultidisciplinaryData collectionComputer scienceNode (networking)media_common.quotation_subjectLaw enforcementDeceptionMissing datacomputer.software_genreCriminal investigationEuclidean distanceCovertTerrorismAdjacency listGraph (abstract data type)Data miningcomputermedia_commonPLOS ONE
researchProduct

Predictability and prediction of lowest observed adverse effect levels in a structurally heterogeneous set of chemicals

2005

A database of chronic lowest observed adverse effect levels (LOAELs) for 234 compounds, previously compiled from different sources (Toxicology Letters79, 131-143 (1995)), was modelled using graph theoretical descriptors. This study reveals that data are not homogeneous. Only those data originating from the U.S. Environmental Protection Agency (EPA) reports could be well modelled by multilinear regression (MLR) and linear discriminant analysis (LDA). In contrast, data available from the specific procedures of the National Toxicology Program (NTP) database introduced noise and did not render good models either alone, or in combination with the EPA data.

Multilinear mapComputer scienceLinear modelReproducibility of ResultsContrast (statistics)BioengineeringGeneral MedicineModels TheoreticalLinear discriminant analysiscomputer.software_genreRegressionLowest-observed-adverse-effect levelSet (abstract data type)Structure-Activity RelationshipDrug DiscoveryStatisticsLinear ModelsAnimalsMolecular MedicineData miningOrganic ChemicalsPredictabilityToxicity Tests ChroniccomputerSAR and QSAR in Environmental Research
researchProduct

Comparison of different predictive models for nutrient estimation in a sequencing batch reactor for wastewater treatment

2006

Abstract In this paper different predictive models for nutrient estimation in a sequencing batch reactor (SBR) for wastewater treatment are compared: principal component regression (PCR), partial least squares (PLS), and artificial neural networks (ANNs). Two unfolding procedures were used: batch-wise and variable-wise. For the latter unfolding method, X and Y matrix augmentation with lagged variables were used in some models to incorporate process dynamics. The results have shown that batch-wise unfolding PLS models outperform the other approaches. The ANN models are good predictive models, but in this particular case-study, they do not outperform those multivariate projection models that …

Multivariate statisticsArtificial neural networkbusiness.industryComputer scienceProcess Chemistry and TechnologySequencing batch reactorSoft sensorMachine learningcomputer.software_genreMissing dataComputer Science ApplicationsAnalytical ChemistryPartial least squares regressionPrincipal component regressionArtificial intelligenceData miningbusinesscomputerModel buildingSpectroscopySoftwareChemometrics and Intelligent Laboratory Systems
researchProduct

Empirical Orthogonal Function and Functional Data Analysis Procedures to Impute Long Gaps in Environmental Data

2016

Air pollution data sets are usually spatio-temporal multivariate data related to time series of different pollutants recorded by a monitoring network. To improve the estimate of functional data when missing values, and mainly long gaps, are present in the original data set, some procedures are here proposed considering jointly Functional Data Analysis and Empirical Orthogonal Function approaches. In order to compare and validate the proposed procedures, a simulation plan is carried out and some performance indicators are computed. The obtained results show that one of the proposed procedures works better than the others, providing a better reconstruction especially in presence of long gaps.

Multivariate statisticsComputer scienceFunctional data analysisEmpirical orthogonal functionsMissing datacomputer.software_genreEnvironmental dataEOF FDA Missing data Environmental dataSet (abstract data type)Singular value decompositionPerformance indicatorData miningSettore SECS-S/01 - Statisticacomputer
researchProduct

On the internal multivariate quality control of analytical laboratories. A case study: the quality of drinking water

2001

Abstract Multivariate statistical process control (MSPC) tools, based on principal component analysis (PCA), partial least squares (PLS) regression and other regression models, are used in the present study for automatic detection of possible errors in the methods used for routine multiparametric analysis in order to design an internal Multivariate Analytical Quality Control (iMAQC) program. Such tools could notice possible failures in the analytical methods without resorting to any external reference since they use their own analytical results as a source for the diagnosis of the method's quality. Pseudo-univariate control charts provide an attractive alternative to traditional univariate …

Multivariate statisticsComputer scienceMultiparametric AnalysisProcess Chemistry and TechnologyUnivariateRegression analysiscomputer.software_genreComputer Science ApplicationsAnalytical ChemistryAnalytical quality controlStatisticsPrincipal component analysisPartial least squares regressionControl chartData miningcomputerSpectroscopySoftwareChemometrics and Intelligent Laboratory Systems
researchProduct

Statistical Multivariate Techniques for the Stock Location Assignment Problem

1998

In previous papers we proposed to apply multivariate statistical methodologies, like Multidimensional Scaling (MDS) and Seriation to the stock location assignment problem of a warehouse, often solved by considering the Cube per Order Index (COI). In this paper we compare the results by MDS, Seriation, a COI based method and the Maximum Path criterion, considering the data of a whole year of a Sicilian supermarket chain warehouse. The comparison is based on the simulated times to satisfy a sample of real orders.

Multivariate statisticsGeographyData miningMultidimensional scalingMinimum spanning treeMultivariate statisticalcomputer.software_genrecomputerAssignment problemStock (geology)
researchProduct

Estimating brain connectivity when few data points are available: Perspectives and limitations

2017

Methods based on the use of multivariate autoregressive modeling (MVAR) have proved to be an accurate and flexible tool for the estimation of brain functional connectivity. The multivariate approach, however, implies the use of a model whose complexity (in terms of number of parameters) increases quadratically with the number of signals included in the problem. This can often lead to an underdetermined problem and to the condition of multicollinearity. The aim of this paper is to introduce and test an approach based on Ridge Regression combined with a modified version of the statistics usually adopted for these methods, to broaden the estimation of brain connectivity to those conditions in …

Multivariate statisticsUnderdetermined system0206 medical engineeringBiomedical EngineeringSignal Processing; Biomedical Engineering; 1707; Health InformaticsHealth Informatics02 engineering and technologyMachine learningcomputer.software_genreBrain Mapping Brain03 medical and health sciences0302 clinical medicineFalse positive paradox1707MathematicsBrain Mappingbusiness.industryBrain020601 biomedical engineeringRegressionData pointAutoregressive modelMulticollinearitySignal ProcessingSettore ING-INF/06 - Bioingegneria Elettronica E InformaticaOrdinary least squaresArtificial intelligenceData miningbusinesscomputer030217 neurology & neurosurgery2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
researchProduct

Archetypal analysis: contributions for estimating boundary cases in multivariate accommodation problem

2013

[EN] The use of archetypal analysis is proposed in order to determine a set of representative cases that entail a certain percentage of the population, in the accommodation problem. A well-known anthropometric database has been used in order to compare our methodology with the common used PCA-approach, showing the advantages of our methodology: the level of accommodation is reached unlike the PCA approach, no more adjustments are necessary, the user can decide the number of archetypes to consider or leave the selection by a criterion. Unlike PCA, the objective of the archetypal analysis is obtaining extreme individuals, so it is the appropriate statistical technique for solving this type of…

Multivariate statisticsrepresentative human model generationGeneral Computer ScienceComputer scienceBoundary (topology)Type (model theory)Anthropometry [Percentile]computer.software_genrearchetypepercentileSet (abstract data type)Archetypal analysisStatisticsArchetypeSelection (genetic algorithm)Archetypeanthropometryrepresentative casebusiness.industryGeneral EngineeringRepresentative humanPercentile: AnthropometryModel generationRepresentative caseData miningbusinesscomputerAccommodation
researchProduct