Search results for "mining"

showing 10 items of 1730 documents

Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML)

2017

International audience; This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the pred…

Computer sciencecomputer.internet_protocol02 engineering and technologycomputer.software_genreIndustrial and Manufacturing EngineeringArticleSet (abstract data type)[SPI]Engineering Sciences [physics]Kriging020204 information systems0202 electrical engineering electronic engineering information engineeringUncertainty quantificationRepresentation (mathematics)predictive model markup language (PMML)Probabilistic logicdata miningPredictive analyticsXMLComputer Science Applicationspredictive analyticsControl and Systems EngineeringPredictive Model Markup Languagestandards020201 artificial intelligence & image processingData miningcomputerXMLGaussian process regression

researchProduct

A methodology to assess the intrinsic discriminative ability of a distance function and its interplay with clustering algorithms for microarray data …

2013

Abstract Background Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from statistics to computer science. Following Handl et al., it can be summarized as a three step process: (1) choice of a distance function; (2) choice of a clustering algorithm; (3) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Results A procedure is proposed for the assessment of the discriminative ability of a distance functi…

Computer sciencecomputer.software_genreBiochemistrysymbols.namesakeDiscriminative modelStructural BiologyCluster AnalysisRelevance (information retrieval)Cluster analysisMolecular BiologyOligonucleotide Array Sequence AnalysisClustering discriminative ability of a distance function external validation indicesSettore INF/01 - InformaticaResearchApplied MathematicsMutual informationPearson product-moment correlation coefficientComputer Science ApplicationsHierarchical clusteringEuclidean distanceRange (mathematics)Metric (mathematics)symbolsData miningTranscriptomecomputerAlgorithmsBMC Bioinformatics

researchProduct

Indexing a sequence for mapping reads with a single mismatch

2014

Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the s…

Computer sciencegenome sequenceGeneral Mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]General Physics and AstronomyContext (language use)algorithmscomputer.software_genrePattern matchingSequenceSearch engine indexingGeneral EngineeringWildcard characterArticlescomputer.file_formatConstruct (python library)Data structuremapping readspattern matchingComputingMethodologies_DOCUMENTANDTEXTPROCESSINGData mining[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Focus (optics)mismatchcomputerAlgorithmindexingPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

researchProduct

Automated Uncertainty Quantification Through Information Fusion in Manufacturing Processes

2017

International audience; Evaluation of key performance indicators (KPIs) such as energy consumption is essential for decision-making during the design and operation of smart manufacturing systems. The measurements of KPIs are strongly affected by several uncertainty sources such as input material uncertainty, the inherent variability in the manufacturing process, model uncertainty, and the uncertainty in the sensor measurements of operational data. A comprehensive understanding of the uncertainty sources and their effect on the KPIs is required to make the manufacturing processes more efficient. Towards this objective, this paper proposed an automated methodology to generate a hierarchical B…

Computer scienceinjection molding02 engineering and technologycomputer.software_genreIndustrial and Manufacturing Engineering[SPI]Engineering Sciences [physics]GME0202 electrical engineering electronic engineering information engineeringUncertainty quantificationuncertaintyautomationhierarchicalbusiness.industryBayesian network020207 software engineeringmeta-modelAutomationComputer Science ApplicationsMetamodelingInformation fusionBayesian networkControl and Systems Engineeringsemantic020201 artificial intelligence & image processingData miningbusinesscomputer

researchProduct

Mesh Visual Quality Assessment Metrics: A Comparison Study

2017

3D graphics technologies have known a developed progress in the last years, and several processing operations can be applied on 3D meshes such as watermarking, compression, simplification and so forth. Mesh visual quality assessment becomes an important issue to evaluate the visual appearance of the 3D shape after specific modifications. Several metrics have been proposed in this context, from the classical distance-based metrics to the perceptual-based metrics which include perceptual information about the human visual system. In this paper, we propose to study the performance of several mesh visual quality metrics. First, the comparison is conducted regardless the distortion types neither…

Computer sciencemedia_common.quotation_subject020207 software engineeringContext (language use)02 engineering and technologycomputer.software_genreVisual appearanceVisualizationMetric (mathematics)Human visual system model0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingQuality (business)Polygon meshData miningcomputer3D computer graphicsmedia_common2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)

researchProduct

Diversity in random subspacing ensembles

2004

Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with diffe…

Computer sciencemedia_common.quotation_subjectAmbiguityEnsemble diversitycomputer.software_genreEnsemble learningData warehouseCorrelationInformation extractionKnowledge extractionStatisticsEntropy (information theory)Data miningcomputermedia_common

researchProduct

Missing values in deduplication of electronic patient data

2011

Data deduplication refers to the process in which records referring to the same real-world entities are detected in datasets such that duplicated records can be eliminated. The denotation ‘record linkage’ is used here for the same problem.1 A typical application is the deduplication of medical registry data.2 3 Medical registries are institutions that collect medical and personal data in a standardized and comprehensive way. The primary aims are the creation of a pool of patients eligible for clinical or epidemiological studies and the computation of certain indices such as the incidence in order to oversee the development of diseases. The latter task in particular requires a database in wh…

Computer sciencemedia_common.quotation_subjectInferenceHealth InformaticsAmbiguityPatient dataMissing datacomputer.software_genreResearch and ApplicationsRegressionNeoplasmsStatisticsData deduplicationElectronic Health RecordsHumansData miningImputation (statistics)Medical Record LinkageRegistriescomputerRecord linkagemedia_common

researchProduct

A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR.

2013

(Q)SAR model validation is essential to ensure the quality of inferred models and to indicate future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to accept the (Q)SAR model, and to approve its use in real world scenarios as alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model, in particular whether to employ variants of cross-validation or external test set validation, is still under discussion. In this paper, we empirically compare a k-fold cross-validation with external test set validation. To this end we introduce a workflow allowing to realistically simulate t…

Computer sciencemedia_common.quotation_subjectOrganic ChemistryScale (descriptive set theory)Variance (accounting)computer.software_genreCross-validationComputer Science ApplicationsModel validationWorkflowStructural BiologyCheminformaticsTest setDrug DiscoveryMolecular MedicineQuality (business)Data miningcomputermedia_commonMolecular informatics

researchProduct

Application of model quality evaluation to systems biology

2008

Application of model quality evaluation to the quasispecies models is presented. These models are useful for the analysis of the DNA and RNA evolution and for the description of the population dynamics of viruses and bacteria. An estimate of the parameters together with their interval of variability is computed and the quality evaluation is tested on the basis of the model prediction error capability.

Computer sciencemedia_common.quotation_subjectSystems biologyset membershipPopulationViral quasispeciesInterval (mathematics)Computational biologycomputer.software_genreSettore ING-INF/04 - AutomaticaModels of DNA evolutionmolecular biophysicsQuality (business)educationgenetics microorganismsmedia_commoneducation.field_of_studyDNA; biochemistry evolution (biological); genetics microorganisms; molecular biophysics; reaction kinetics; identification; set membership; optimizationBasis (linear algebra)Estimation theoryDNADNA biochemistry evolution (biological) genetics microorganisms molecular biophysics reaction kinetics identification set membership optimizationbiochemistry evolution (biological)identificationreaction kineticsData miningcomputeroptimization

researchProduct

Managing sensor data streams in a smart home application

2020

A challenge in developing an ambient activity recognition system for use in elder care is finding a balance between the sophistication of the system and a cost structure that fits within the budgets of public and private sector healthcare organisations. Much activity recognition research in the context of elder care is based on dense networks of sensors and advanced methods, such as supervised machine learning algorithms. This paper presents the data processing aspects of an activity recognition system based on a simpler, knowledge-based unsupervised approach, designed for a sparse network of sensors. By structuring sensor data management as a streaming system, we provide a simple programmi…

Computer sciencesmart homeComputer Networks and CommunicationsData managementsensor data streamskotihoitoContext (language use)sensor data processing02 engineering and technology01 natural sciencesActivity recognitionwireless sensor networkHome automationälytalotpassive infrared sensor0202 electrical engineering electronic engineering information engineeringactivity recognitionanturitElectrical and Electronic EngineeringgeroteknologiaData stream miningbusiness.industry010401 analytical chemistryPublic sectorsensoriverkothealthcare020206 networking & telecommunicationsData sciencesensor data managementWSNsensor data0104 chemical sciencesComputer Science ApplicationsPIRControl and Systems EngineeringProgramming paradigmälytekniikkabusinesshome careWireless sensor networkInternational Journal of Sensor Networks

researchProduct