0000000000845144

AUTHOR

Paulo J. G. Lisboa

showing 15 related works from this author

Clustering categorical data: A stability analysis framework

2011

Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm is a popular choice, but K-means is not generally appropriate for categorical data. A specific extension of k-means for categorical data is the k-modes algorithm. Both of these partition clustering methods are sensitive to the initialization of prototypes, which creates the difficulty of selecting the best solution for a given problem. In addition, selecting the number of clusters can be an issue. Further, the k-modes method is especially prone to instability when presented with ‘noisy’ data, since the calculation of the mode lacks the smoothing effect inherent in the calculation …

Computer sciencebusiness.industrySingle-linkage clusteringCorrelation clusteringConstrained clusteringcomputer.software_genreMachine learningDetermining the number of clusters in a data setData stream clusteringCURE data clustering algorithmConsensus clusteringData miningArtificial intelligenceCluster analysisbusinesscomputer2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)
researchProduct

An AI Walk from Pharmacokinetics to Marketing

2009

This work is intended for providing a review of reallife practical applications of Artificial Intelligence (AI) methods. We focus on the use of Machine Learning (ML) methods applied to rather real problems than synthetic problems with standard and controlled environment. In particular, we will describe the following problems in next sections: • Optimization of Erythropoietin (EPO) dosages in anaemic patients undergoing Chronic Renal Failure (CRF). • Optimization of a recommender system for citizen web portal users. • Optimization of a marketing campaign. The choice of these problems is due to their relevance and their heterogeneity. This heterogeneity shows the capabilities and versatility …

Support vector machineEngineeringComputingMethodologies_PATTERNRECOGNITIONAdaptive resonance theoryArtificial neural networkbusiness.industryMultilayer perceptronReinforcement learningArtificial intelligencebusinessCluster analysisFuzzy logicHierarchical clustering
researchProduct

Towards interpretable classifiers with blind signal separation

2012

Blind signal separation (BSS) is a powerful tool to open-up complex signals into component sources that are often interpretable. However, BSS methods are generally unsupervised, therefore the assignment of class membership from the elements of the mixing matrix may be sub-optimal. This paper proposes a three-stage approach using Fisher information metric to define a natural metric for the data, from which a Euclidean approximation can then be used to drive BSS. Results with synthetic data models of real-world high-dimensional data show that the classification accuracy of the method is good for challenging problems, while retaining interpretability.

business.industryPattern recognitionBlind signal separationSynthetic dataData mappingsymbols.namesakeComponent (UML)Metric (mathematics)symbolsArtificial intelligenceFisher informationbusinessFisher information metricInterpretabilityMathematics
researchProduct

Data Mining in Cancer Research [Application Notes

2010

This article is not intended as a comprehensive survey of data mining applications in cancer. Rather, it provides starting points for further, more targeted, literature searches, by embarking on a guided tour of computational intelligence applications in cancer medicine, structured in increasing order of the physical scales of biological processes.

ComputingMethodologies_PATTERNRECOGNITIONCancer MedicineArtificial IntelligenceComputer scienceComputational intelligenceData miningcomputer.software_genreData sciencecomputerTheoretical Computer ScienceIEEE Computational Intelligence Magazine
researchProduct

An approach based on the Adaptive Resonance Theory for analysing the viability of recommender systems in a citizen Web portal

2007

This paper proposes a methodology to optimise the future accuracy of a collaborative recommender application in a citizen Web portal. There are four stages namely, user modelling, benchmarking of clustering algorithms, prediction analysis and recommendation. The first stage is to develop analytical models of common characteristics of Web-user data. These artificial data sets are then used to evaluate the performance of clustering algorithms, in particular benchmarking the ART2 neural network with K-means clustering. Afterwards, it is evaluated the predictive accuracy of the clusters applied to a real-world data set derived from access logs to the citizen Web portal Infoville XXI (http://www…

Information retrievalArtificial neural networkComputer scienceGeneral EngineeringRecommender systemcomputer.software_genreComputer Science ApplicationsData setAdaptive resonance theoryArtificial IntelligenceCollaborative filteringData miningCluster analysiscomputerExpert Systems with Applications
researchProduct

Preface to Data Mining in Biomedical Informatics and Healthcare

2013

EngineeringHealth Administration Informaticsbusiness.industryHealth careTranslational research informaticsData miningbusinesscomputer.software_genreHealth informaticsData sciencecomputer2013 IEEE 13th International Conference on Data Mining Workshops
researchProduct

Probabilistic quantum clustering

2020

Abstract Quantum Clustering is a powerful method to detect clusters with complex shapes. However, it is very sensitive to a length parameter that controls the shape of the Gaussian kernel associated with a wave function, which is employed in the Schrodinger equation with the role of a density estimator. In addition, linking data points into clusters requires local estimates of covariance which requires further parameters. This paper proposes a Bayesian framework that provides an objective measure of goodness-of-fit to the data, to optimise the adjustable parameters. This also quantifies the probabilities of cluster membership, thus partitioning the data into a specific number of clusters, w…

Information Systems and ManagementJaccard indexComputer scienceProbabilistic logicEstimatorProbability density function02 engineering and technologyFunction (mathematics)CovarianceMeasure (mathematics)Management Information Systemssymbols.namesakeArtificial Intelligence020204 information systems0202 electrical engineering electronic engineering information engineeringGaussian functionsymbolsCluster (physics)020201 artificial intelligence & image processingStatistical physicsQASoftwareQuantum clusteringKnowledge-Based Systems
researchProduct

Scalable implementation of measuring distances in a Riemannian manifold based on the Fisher Information metric

2019

This paper focuses on the scalability of the Fisher Information manifold by applying techniques of distributed computing. The main objective is to investigate methodologies to improve two bottlenecks associated with the measurement of distances in a Riemannian manifold formed by the Fisher Information metric. The first bottleneck is the quadratic increase in the number of pairwise distances. The second is the computation of global distances, approximated through a fully connected network of the observed pairwise distances, where the challenge is the computation of the all sources shortest path (ASSP). The scalable implementation for the pairwise distances is performed in Spark. The scalable…

0209 industrial biotechnologyComputer science02 engineering and technologyRiemannian manifoldBottleneckManifoldsymbols.namesake020901 industrial engineering & automationShortest path problemSpark (mathematics)Scalability0202 electrical engineering electronic engineering information engineeringsymbols020201 artificial intelligence & image processingFisher informationAlgorithmDijkstra's algorithmFisher information metric2019 International Joint Conference on Neural Networks (IJCNN)
researchProduct

Classical Training Methods

2006

This chapter reviews classical training methods for multilayer neural networks. These methods are widely used for classification and function modelling tasks. Nevertheless, they show a number of flaws or drawbacks that should be addressed in the development of such systems. They work by searching the minimum of an error function which defines the optimal behaviour of the neural network. Different standard problems are used to show the capabilities of these models; in particular, we have benchmarked the algorithms in a nonlinear classification problem and in three function modelling problems.

Artificial neural networkComputer sciencebusiness.industrymedia_common.quotation_subjectTraining methodsMachine learningcomputer.software_genreError functionDelta ruleMultilayer perceptronArtificial intelligenceNonlinear classificationbusinessFunction (engineering)computermedia_common
researchProduct

Quantum clustering in non-spherical data distributions: Finding a suitable number of clusters

2017

Quantum Clustering (QC) provides an alternative approach to clustering algorithms, several of which are based on geometric relationships between data points. Instead, QC makes use of quantum mechanics concepts to find structures (clusters) in data sets by finding the minima of a quantum potential. The starting point of QC is a Parzen estimator with a fixed length scale, which significantly affects the final cluster allocation. This dependence on an adjustable parameter is common to other methods. We propose a framework to find suitable values of the length parameter σ by optimising twin measures of cluster separation and consistency for a given cluster number. This is an extension of the Se…

0301 basic medicineClustering high-dimensional dataMathematical optimizationCognitive NeuroscienceSingle-linkage clusteringCorrelation clustering02 engineering and technologyComputer Science ApplicationsHierarchical clusteringDetermining the number of clusters in a data set03 medical and health sciences030104 developmental biologyArtificial Intelligence0202 electrical engineering electronic engineering information engineeringCluster (physics)020201 artificial intelligence & image processingQACluster analysisAlgorithmk-medians clusteringMathematicsNeurocomputing
researchProduct

Robust Conditional Independence maps of single-voxel Magnetic Resonance Spectra to elucidate associations between brain tumours and metabolites.

2020

The aim of the paper is two-fold. First, we show that structure finding with the PC algorithm can be inherently unstable and requires further operational constraints in order to consistently obtain models that are faithful to the data. We propose a methodology to stabilise the structure finding process, minimising both false positive and false negative error rates. This is demonstrated with synthetic data. Second, to apply the proposed structure finding methodology to a data set comprising single-voxel Magnetic Resonance Spectra of normal brain and three classes of brain tumours, to elucidate the associations between brain tumour types and a range of observed metabolites that are known to b…

False discovery rateB VitaminsMagnetic Resonance SpectroscopyComputer scienceDirected Acyclic GraphsBiochemistry030218 nuclear medicine & medical imaging0302 clinical medicineMetabolitesMedicine and Health SciencesAmino AcidsQANeurological Tumors0303 health sciencesMultidisciplinaryDirected GraphsOrganic CompoundsBrain NeoplasmsQRTotal Cell CountingBrainMutual informationVitaminsLipidsChemistryConditional independenceOncologyNeurologyPhysical SciencesEngineering and TechnologyMedicineMeningiomaAlgorithmManagement EngineeringAlgorithmsResearch ArticleComputer and Information SciencesScienceCell Enumeration TechniquesGlycineFeature selectionCholinesResearch and Analysis MethodsSynthetic data03 medical and health sciencesInsuranceRobustness (computer science)HumansMetabolomics030304 developmental biologyRisk ManagementOrganic ChemistryChemical CompoundsBayesian networkBiology and Life SciencesCancers and NeoplasmsProteinsBayes TheoremDirected acyclic graphR1MetabolismAliphatic Amino AcidsGraph TheoryMathematicsPLoS ONE
researchProduct

An integrated framework for risk profiling of breast cancer patients following surgery.

2006

Objective: An integrated decision support framework is proposed for clinical oncologists making prognostic assessments of patients with operable breast cancer. The framework may be delivered over a web interface. It comprises a triangulation of prognostic modelling, visualisation of historical patient data and an explanatory facility to interpret risk group assignments using empirically derived Boolean rules expressed directly in clinical terms. Methods and materials: The prognostic inferences in the interface are validated in a multicentre longitudinal cohort study by modelling retrospective data from 917 patients recruited at Christie Hospital, Wilmslow between 1983 and 1989 and predictin…

Risk profilingAdultmedicine.medical_specialtyDecision support systemMedicine (miscellaneous)Breast NeoplasmsMachine learningcomputer.software_genreModels BiologicalRisk AssessmentDecision Support TechniquesUser-Computer InterfaceBreast cancerRisk groupsArtificial IntelligencemedicineConfidence IntervalsHealth Status IndicatorsHumansMedical physicsSurvival analysisMastectomyRetrospective StudiesInternetbusiness.industryPatient SelectionReproducibility of ResultsPatient dataMiddle Agedmedicine.diseaseDecision Support Systems ClinicalPrognosisConfidence intervalTreatment OutcomeNottingham Prognostic IndexFemaleArtificial intelligenceNeural Networks ComputerbusinesscomputerMonte Carlo MethodAlgorithmsArtificial intelligence in medicine
researchProduct

A principled approach to network-based classification and data representation

2013

Measures of similarity are fundamental in pattern recognition and data mining. Typically the Euclidean metric is used in this context, weighting all variables equally and therefore assuming equal relevance, which is very rare in real applications. In contrast, given an estimate of a conditional density function, the Fisher information calculated in primary data space implicitly measures the relevance of variables in a principled way by reference to auxiliary data such as class labels. This paper proposes a framework that uses a distance metric based on Fisher information to construct similarity networks that achieve a more informative and principled representation of data. The framework ena…

business.industryCognitive NeuroscienceFisher kernelPattern recognitionProbability density functionConditional probability distributionExternal Data Representationcomputer.software_genreComputer Science ApplicationsWeightingEuclidean distancesymbols.namesakeData pointArtificial IntelligencesymbolsArtificial intelligenceData miningFisher informationbusinesscomputerMathematicsNeurocomputing
researchProduct

Making nonlinear manifold learning models interpretable: The manifold grand tour

2015

Smooth nonlinear topographic maps of the data distribution to guide a Grand Tour visualisation.Prioritisation of data linear views that are most consistent with data structure in the maps.Useful visualisations that cannot be obtained by other more classical approaches. Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to…

Clustering high-dimensional dataQA75Nonlinear dimensionality reductionDiscriminative clusteringComputer scienceVisualització de la informaciócomputer.software_genreData visualizationProjection (mathematics)Information visualizationArtificial IntelligenceQA:Informàtica::Infografia [Àrees temàtiques de la UPC]business.industryData visualizationDimensionality reductionGrand tourGeneral EngineeringNonlinear dimensionality reductionTopographic mapData structureComputer Science ApplicationsVisualizationManifold learningData miningbusinesscomputerGenerative topographic mappingLinear projections
researchProduct

A Novel Semi-Supervised Methodology for Extracting Tumor Type-Specific MRS Sources in Human Brain Data

2013

Background: The clinical investigation of human brain tumors often starts with a non-invasive imaging study, providing \ud information about the tumor extent and location, but little insight into the biochemistry of the analyzed tissue. Magnetic \ud Resonance Spectroscopy can complement imaging by supplying a metabolic fingerprint of the tissue. This study analyses \ud single-voxel magnetic resonance spectra, which represent signal information in the frequency domain. Given that a single \ud voxel may contain a heterogeneous mix of tissues, signal source identification is a relevant challenge for the problem of\ud tumor type classification from the spectroscopic signal.\ud Methodology/Princ…

Magnetic Resonance SpectroscopyStatistics as TopicBioinformaticsSignalDiagnostic RadiologyEngineeringDiscriminative modelBasic Cancer ResearchMathematical ComputingNeurological TumorsComplement (set theory)PhysicsMultidisciplinaryBrain NeoplasmsApplied MathematicsQRBrainMagnetic Resonance ImagingIdentification (information)OncologyFrequency domainMetric (mathematics)MedicineRadiologyAlgorithmsResearch ArticleScienceLipid signalingGlioblastoma multiformeMatrix decompositionRC0254Magnetic resonance imagingCancer detection and diagnosisMagnetic resonance spectroscopyCancer Detection and DiagnosisHumansPrototypesbusiness.industryFingerprint (computing)Cancers and NeoplasmsData acquisitionPattern recognitionComputing MethodsR1Computer ScienceSignal ProcessingRC0321Artificial intelligencebusinessMathematics
researchProduct