Search results for "Mining"

showing 10 items of 1730 documents

Executable Data Quality Models

2017

The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object's values and procedures for data object's analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in c…

Computer scienceData transformation02 engineering and technologycomputer.software_genreData modeling0203 mechanical engineering0202 electrical engineering electronic engineering information engineeringInformation systemLogical data modelGeneral Environmental ScienceData elementDatabaseInformation qualityData warehouseData mapping020303 mechanical engineering & transportsData modelData qualityGeneral Earth and Planetary Sciences020201 artificial intelligence & image processingData pre-processingData architectureData miningSoftware architecturecomputerData migrationData virtualizationProcedia Computer Science

researchProduct

Editing prototypes in the finite sample size case using alternative neighborhoods

1998

The recently introduced concept of Nearest Centroid Neighborhood is applied to discard outliers and prototypes 111 class overlapping regions in order to improve the performance of the Nearest Neighbor rule through an editing procedure, This approach is related to graph based editing algorithms which also define alternative neighborhoods in terms of geornetric relations, Classical editing algorithms are compared to these alternative editing schemes using several synthetic and real data problems. The empirical results show that, the proposed editing algorithm constitutes a good trade-off among performance and computational burden.

Computer scienceDelaunay triangulationbusiness.industryCentroidMachine learningcomputer.software_genreClass (biology)k-nearest neighbors algorithmSample size determinationPattern recognition (psychology)OutlierArtificial intelligenceData miningbusinesscomputer

researchProduct

Entropy-Based Classifier Enhancement to Handle Imbalanced Class Problem

2017

The paper presents a possible enhancement of entropy-based classifiers to handle problems, caused by the class imbalance in the original dataset. The proposed method was tested on synthetic data in order to analyse its robustness in the controlled environment with different class proportions. As also the proposed method was tested on the real medical data with imbalanced classes and compared to the original classification algorithm results. The medical field was chosen for testing due to frequent situations with uneven class ratios.

Computer scienceEntropy (statistical thermodynamics)business.industryDecision treePattern recognition02 engineering and technologycomputer.software_genre01 natural sciencesSynthetic data010305 fluids & plasmasEntropy (classical thermodynamics)0103 physical sciences0202 electrical engineering electronic engineering information engineeringGeneral Earth and Planetary SciencesEntropy (information theory)020201 artificial intelligence & image processingArtificial intelligenceData miningEntropy (energy dispersal)businessEntropy (arrow of time)computerGeneral Environmental ScienceEntropy (order and disorder)Procedia Computer Science

researchProduct

MetNet: A two-level approach to reconstructing and comparing metabolic networks

2021

Metabolic pathway comparison and interaction between different species can detect important information for drug engineering and medical science. In the literature, proposals for reconstructing and comparing metabolic networks present two main problems: network reconstruction requires usually human intervention to integrate information from different sources and, in metabolic comparison, the size of the networks leads to a challenging computational problem. We propose to automatically reconstruct a metabolic network on the basis of KEGG database information. Our proposal relies on a two-level representation of the huge metabolic network: the first level is graph-based and depicts pathways a…

Computer scienceEnzyme MetabolismMetabolic networkcomputer.software_genreBiochemistryInfographics0302 clinical medicineCluster AnalysisEnzyme ChemistryData ManagementMammals0303 health sciencesMultidisciplinaryBasis (linear algebra)Settore INF/01 - InformaticaQRChemical ReactionsEukaryotaGraphChemistryVertebratesPhysical SciencesMedicineCarbohydrate MetabolismData miningMetabolic PathwaysComputational problemGraphsNetwork AnalysisMetabolic Networks and PathwaysResearch ArticleComputer and Information SciencesComputingMethodologies_SIMULATIONANDMODELINGScience03 medical and health sciencesMetabolic NetworksSimilarity (psychology)Xenobiotic MetabolismAnimalsHumansMetabolomicsKEGGRepresentation (mathematics)Symbiosis030304 developmental biologyData VisualizationOrganismsBiology and Life SciencesMetabolismMetabolic pathwayComputingMethodologies_PATTERNRECOGNITIONMetabolismAmniotesEnzymologycomputerZoology030217 neurology & neurosurgerySoftwarePLoS ONE

researchProduct

Detection, tracking and event localization of jet stream features in 4-D atmospheric data

2012

We introduce a novel algorithm for the efficient detection and tracking of features in spatiotemporal atmospheric data, as well as for the precise localization of the occurring genesis, lysis, merging and splitting events. The algorithm works on data given on a four-dimensional structured grid. Feature selection and clustering are based on adjustable local and global criteria, feature tracking is predominantly based on spatial overlaps of the feature's full volumes. The resulting 3-D features and the identified correspondences between features of consecutive time steps are represented as the nodes and edges of a directed acyclic graph, the event graph. Merging and splitting events appear in…

Computer scienceEvent (computing)lcsh:QE1-996.5Feature selectionGridcomputer.software_genreTracking (particle physics)Directed acyclic graphData segmentlcsh:GeologyFeature (computer vision)Data miningCluster analysiscomputerAlgorithmPhysics::Atmospheric and Oceanic Physics

researchProduct

IntentStreams

2015

The user's understanding of information needs and the information available in the data collection can evolve during an exploratory search session. Search systems tailored for well-defined narrow search tasks may be suboptimal for exploratory search where the user can sequentially refine the expressions of her information needs and explore alternative search directions. A major challenge for exploratory search systems design is how to support such behavior and expose the user to relevant yet novel information that can be difficult to discover by using conventional query formulation techniques. We introduce IntentStreams, a system for exploratory search that provides interactive query refine…

Computer scienceExploratory search02 engineering and technologycomputer.software_genreSearch engine020204 information systemsUser interface design0202 electrical engineering electronic engineering information engineering0501 psychology and cognitive sciencesParallel browsingInformation exploration050107 human factorsSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniInformation retrievalConcept searchWeb search querySettore INF/01 - Informaticabusiness.industrySearch analytics05 social sciencesSemantic searchUser interface designData miningUser interfacebusinesscomputerProceedings of the 20th International Conference on Intelligent User Interfaces

researchProduct

Kernel-Based Framework for Multitemporal and Multisource Remote Sensing Data Classification and Change Detection

2008

The multitemporal classification of remote sensing images is a challenging problem, in which the efficient combination of different sources of information (e.g., temporal, contextual, or multisensor) can improve the results. In this paper, we present a general framework based on kernel methods for the integration of heterogeneous sources of information. Using the theoretical principles in this framework, three main contributions are presented. First, a novel family of kernel-based methods for multitemporal classification of remote sensing images is presented. The second contribution is the development of nonlinear kernel classifiers for the well-known difference and ratioing change detectio…

Computer scienceFeature vectorData classificationcomputer.software_genreKernel (linear algebra)Composite kernelMultitemporal classificationElectrical and Electronic EngineeringSupport vector domain description (SVDD)Remote sensingTelecomunicacionesSupport vector machinesContextual image classificationbusiness.industryKernel methodsPattern recognitionSupport vector machineKernel methodKernel (image processing)Change detectionGeneral Earth and Planetary Sciences3325 Tecnología de las TelecomunicacionesArtificial intelligenceData miningInformation fusionbusinessMultisourcecomputerChange detectionIEEE Transactions on Geoscience and Remote Sensing

researchProduct

Optimal Filter Estimation for Lucas-Kanade Optical Flow

2012

Optical flow algorithms offer a way to estimate motion from a sequence of images. The computation of optical flow plays a key-role in several computer vision applications, including motion detection and segmentation, frame interpolation, three-dimensional scene reconstruction, robot navigation and video compression. In the case of gradient based optical flow implementation, the pre-filtering step plays a vital role, not only for accurate computation of optical flow, but also for the improvement of performance. Generally, in optical flow computation, filtering is used at the initial level on original input images and afterwards, the images are resized. In this paper, we propose an image filt…

Computer scienceGaussianComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONOptical flowGaussian blurlcsh:Chemical technologyGaussian filteringcomputer.software_genreBiochemistryArticleAnalytical Chemistryoptical flowsymbols.namesakeLucas–Kanade methodoptical flow; Lucas-Kanade; Gaussian filtering; optimal filteringGaussian functionlcsh:TP1-1185SegmentationComputer visionLucas-KanadeElectrical and Electronic EngineeringInstrumentationbusiness.industryoptimal filteringMotion detectionFilter (signal processing)Atomic and Molecular Physics and OpticsComputer Science::Computer Vision and Pattern RecognitionsymbolsArtificial intelligenceData miningMotion interpolationbusinesscomputerData compressionSensors

researchProduct

Conventional and fuzzy comparisons of large scale land cover products: Application to CORINE, GLC2000, MODIS and GlobCover in Europe

2012

One of the major drawbacks of land cover products is the lack of interoperability among them. Since their development was driven by different national or international initiatives, they were developed for different purposes and hold diverse technical characteristics. Thus, comparison among products and quality monitoring is necessary in assessing their usefulness. This paper provides a methodology to compare global land cover maps that allows for differences in legend definitions among products. Two different approaches were considered for map comparison, a Boolean approach and a new methodology based on fuzzy set theory in which the Land Cover Classification System (LCCS) acted as a genera…

Computer scienceInteroperabilityFuzzy setLand covercomputer.software_genreFuzzy logicAtomic and Molecular Physics and OpticsComputer Science ApplicationsSet (abstract data type)Consistency (database systems)Identification (information)Data miningComputers in Earth SciencesScale (map)Engineering (miscellaneous)computerISPRS Journal of Photogrammetry and Remote Sensing

researchProduct

Missing Data

2009

In this chapter, we deal with the problem of missing data in principal component analysis (PCA) and partial least squares (PLS) methods. First, we review several statistical methods proposed in the literature for handling missing data. Both single and multiple imputation (MI) methods are studied and compared using simulated data. After this, we particularize the missing data problem for building and exploiting multivariate calibration models. Several approaches proposed in the literature are introduced and their performance compared based on several real data sets.

Computer scienceIterative methodSimulated dataPrincipal component analysisExpectation–maximization algorithmPartial least squares regressionMultivariate calibrationMissing data problemData miningcomputer.software_genreMissing datacomputer

researchProduct