Search results for "Data mining"

showing 10 items of 907 documents

PMT: New analytical framework for automated evaluation of geo-environmental modelling approaches

2019

Geospatial computation, data transformation to a relevant statistical software, and step-wise quantitative performance assessment can be cumbersome, especially when considering that the entire modelling procedure is repeatedly interrupted by several input/output steps, and the self-consistency and self-adaptive response to the modelled data and the features therein are lost while handling the data from different kinds of working environments. To date, an automated and a comprehensive validation system, which includes both the cutoff-dependent and –independent evaluation criteria for spatial modelling approaches, has not yet been developed for GIS based methodologies. This study, for the fir…

Performance analysiEnvironmental EngineeringGeospatial analysis010504 meteorology & atmospheric sciencesComputer scienceSettore GEO/04 - Geografia Fisica E GeomorfologiaComputationGoodness-of-fit010501 environmental sciencescomputer.software_genre01 natural sciencesRobustness (computer science)ValidationEnvironmental ChemistryWaste Management and Disposal0105 earth and related environmental sciencescomputer.programming_languageEnvironmental modellingReceiver operating characteristicSpatial modellingPerformance analysisLandslidePMTPython (programming language)22/4 OA procedurePollutionDrought riskITC-ISI-JOURNAL-ARTICLEData miningPredictive model evaluation frameworkcomputerScience of The Total Environment
researchProduct

A tool for filtering information in complex systems

2005

We introduce a technique to filter out complex data-sets by extracting a subgraph of representative links. Such a filtering can be tuned up to any desired level by controlling the genus of the resulting graph. We show that this technique is especially suitable for correlation based graphs giving filtered graphs which preserve the hierarchical organization of the minimum spanning tree but containing a larger amount of information in their internal structure. In particular in the case of planar filtered graphs (genus equal to 0) triangular loops and 4 element cliques are formed. The application of this filtering procedure to 100 stocks in the USA equity markets shows that such loops and cliqu…

Physics - Physics and SocietyComputer scienceComplex systemFOS: Physical sciencesPhysics and Society (physics.soc-ph)Minimum spanning treecomputer.software_genrePlanarHierarchical organizationINTERNETCondensed Matter - Statistical MechanicsComplex data typeMultidisciplinarySmall-world networkStatistical Mechanics (cond-mat.stat-mech)SMALL-WORLD NETWORKSFilter (signal processing)Disordered Systems and Neural Networks (cond-mat.dis-nn)Condensed Matter - Disordered Systems and Neural NetworksComplex networkWEBDYNAMIC ASSET TREESPhysical SciencesGRAPHData miningAlgorithmcomputerMathematicsofComputing_DISCRETEMATHEMATICS
researchProduct

Resolution enhancement in integral microscopy by physical interpolation

2015

Integral-imaging technology has demonstrated its capability for computing depth images from the microimages recorded after a single shot. This capability has been shown in macroscopic imaging and also in microscopy. Despite the possibility of refocusing different planes from one snap-shot is crucial for the study of some biological processes, the main drawback in integral imaging is the substantial reduction of the spatial resolution. In this contribution we report a technique, which permits to increase the two-dimensional spatial resolution of the computed depth images in integral microscopy by a factor of √2. This is made by a double-shot approach, carried out by means of a rotating glass…

Point spread functionIntegral imagingComputer sciencebusiness.industryResolution (electron density)ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONImage processingcomputer.software_genreArticleAtomic and Molecular Physics and OpticsBiological specimenOpticsMicroscopyData miningbusinesscomputerImage resolutionBiotechnologyInterpolationBiomedical Optics Express
researchProduct

Enhanced transport-related air pollution prediction through a novel metamodel approach

2017

Abstract This research proposes a novel approach to improve the ability to forecast low frequency extreme events of transport-related pollution in urban areas using a limited input data set. The approach is based on the idea of a self-managing model, able to adapt to unexpected changes in pollution level. In more detail, for a given combination of variables, it selects the most suitable prediction model within a set of alternative air quality models, estimated for a wider range of locations and conditions. In this study, the new approach is tested for the prediction of nitrogen dioxide concentration in the United Kingdom (UK), specifically in an air quality monitoring site of the Greater Ma…

PollutionEngineering010504 meteorology & atmospheric sciencesMathematical modelbusiness.industrymedia_common.quotation_subjectAir pollutionTransportationStatistical model010501 environmental sciencesCovariancemedicine.disease_causecomputer.software_genre01 natural sciencesData setmedicineRange (statistics)Data miningbusinesscomputerAir quality index0105 earth and related environmental sciencesGeneral Environmental ScienceCivil and Structural Engineeringmedia_commonTransportation Research Part D: Transport and Environment
researchProduct

Polynomial Regression and Measurement Error

2020

Many of the phenomena of interest in information systems (IS) research are nonlinear, and it has consequently been recognized that by applying linear statistical models (e.g., linear regression), we may ignore important aspects of these phenomena. To address this issue, IS researchers are increasingly applying nonlinear models to their datasets. One popular analytical technique for the modeling and analysis of nonlinear relationships is polynomial regression, which in its simplest form fits a "U-shaped" curve to the data. However, the use of polynomial regression can be problematic when the independent variables are contaminated with measurement error, and the implications of error can be m…

PolynomialComputer Networks and CommunicationsComputer sciencemedia_common.quotation_subjectpiilevät muuttujatepälineaariset mallitcomputer.software_genrelineaariset mallitManagement Information Systems0504 sociology0502 economics and businessLinear regressionattenuationtietojärjestelmätmedia_commonPolynomial regressionlatent variablesObservational errorVariablesmittaus05 social sciencesLinear modelmuuttujat050401 social sciences methodsStatistical modelerrorNonlinear systemmittausvirheetpolynomial regressionnonlinear SEMmeasurementData miningcomputer050203 business & managementACM SIGMIS Database: the DATABASE for Advances in Information Systems
researchProduct

Control of dataset bias in combined Affymetrix cohorts of triple negative breast cancer

2014

AbstractHeterogenous subtypes of breast cancer need to be analyzed separately. Pooling of datasets can provide reasonable sample sizes but dataset bias is an important concern. We assembled a combined dataset of 579 Affymetrix microarrays from triple negative breast cancer (TNBC) in Gene Expression Omnibus (GEO) series GSE31519. We developed a method for selecting comparable datasets and to control for the amount of dataset bias of individual probesets.

Poolinglcsh:QH426-470MicroarrayPoolingComputational biologyMicroarrayBiologycomputer.software_genreBiochemistryBreast cancerBreast cancerData in BriefGeneticsmedicineddc:610Affymetrix microarraysTriple-negative breast cancerGene expression omnibusmedicine.diseaselcsh:GeneticsSample size determinationDataset biasMolecular MedicineGene expressionData miningcomputerBiotechnologyGenomics Data
researchProduct

Modeling recurrent distributions in streams using possible worlds

2015

Discovering changes in the data distribution of streams and discovering recurrent data distributions are challenging problems in data mining and machine learning. Both have received a lot of attention in the context of classification. With the ever increasing growth of data, however, there is a high demand of compact and universal representations of data streams that enable the user to analyze current as well as historic data without having access to the raw data. To make a first step towards this direction, we propose a condensed representation that captures the various — possibly recurrent — data distributions of the stream by extending the notion of possible worlds. The representation en…

Possible worldBasis (linear algebra)Computer scienceData stream miningRepresentation (systemics)Context (language use)Data pre-processingData miningRaw datacomputer.software_genrecomputerData modeling2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
researchProduct

A Approach to Clinical Proteomics Data Quality Control and Import

2011

International audience; Biomedical domain and proteomics in particular are faced with an increasing volume of data. The heterogeneity of data sources implies heterogeneity in the representation and in the content of data. Data may also be incorrect, implicate errors and can compromise the analysis of experiments results. Our approach aims to ensure the initial quality of data during import into an information system dedicated to proteomics. It is based on the joint use of models, which represent the system sources, and ontologies, which are use as mediators between them. The controls, we propose, ensure the validity of values, semantics and data consistency during import process.

Process (engineering)Computer sciencemedia_common.quotation_subject02 engineering and technologyOntology (information science)Proteomicscomputer.software_genreDomain (software engineering)03 medical and health sciences020204 information systems[ INFO.INFO-BI ] Computer Science [cs]/Bioinformatics [q-bio.QM]0202 electrical engineering electronic engineering information engineeringInformation systemQuality (business)[ SDV.BIBS ] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]030304 developmental biologymedia_common0303 health sciences[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB][SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]Data science[ INFO.INFO-DB ] Computer Science [cs]/Databases [cs.DB]Data qualityData mining[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]computer
researchProduct

CheS-Mapper - Chemical Space Mapping and Visualization in 3D

2012

Abstract Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which f…

Process (engineering)Computer sciencemedia_common.quotation_subjectLibrary and Information Sciencescomputer.software_genre01 natural scienceslcsh:Chemistry03 medical and health sciencesSimilarity (psychology)Physical and Theoretical ChemistryFunction (engineering)030304 developmental biologymedia_commonStructure (mathematical logic)0303 health scienceslcsh:T58.5-58.64lcsh:Information technology004 InformatikComputer Graphics and Computer-Aided DesignChemical spaceField (geography)0104 chemical sciencesVisualizationComputer Science Applications010404 medicinal & biomolecular chemistrylcsh:QD1-999CheminformaticsData miningcomputer004 Data processingSoftwareJournal of Cheminformatics
researchProduct

Ergonomic Indicators and Physical Workload Risks in Food Production and Possibilities for Risk Prevention

2021

The food industry is the most important and largest manufacturing industry in Latvia, producing almost a third of all manufacturing output. Employees in a food production enterprises are exposed to a variety of ergonomic risks: monotonous work movements that can be repeated up to 1000 times a day, overloads that exceeds 30 kg in lifting and moving operations, forced working postures, fast work pace. The aim of the study was to identify ergonomic indicators related to physical load for packers in one medium-sized company producing potato starch in Latvia. When summarizing the results of the survey on burden-lifting rates, it should be noted that, in a shift, 30% of packers lift the burden fr…

Product (business)Work (electrical)Food industrybusiness.industryLift (data mining)ManufacturingFood processingHuman factors and ergonomicsOperations managementWorkloadbusiness
researchProduct