0000000000170994

AUTHOR

Ian H. Jarman

showing 9 related works from this author

Clustering categorical data: A stability analysis framework

2011

Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm is a popular choice, but K-means is not generally appropriate for categorical data. A specific extension of k-means for categorical data is the k-modes algorithm. Both of these partition clustering methods are sensitive to the initialization of prototypes, which creates the difficulty of selecting the best solution for a given problem. In addition, selecting the number of clusters can be an issue. Further, the k-modes method is especially prone to instability when presented with ‘noisy’ data, since the calculation of the mode lacks the smoothing effect inherent in the calculation …

Computer sciencebusiness.industrySingle-linkage clusteringCorrelation clusteringConstrained clusteringcomputer.software_genreMachine learningDetermining the number of clusters in a data setData stream clusteringCURE data clustering algorithmConsensus clusteringData miningArtificial intelligenceCluster analysisbusinesscomputer2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)
researchProduct

Towards interpretable classifiers with blind signal separation

2012

Blind signal separation (BSS) is a powerful tool to open-up complex signals into component sources that are often interpretable. However, BSS methods are generally unsupervised, therefore the assignment of class membership from the elements of the mixing matrix may be sub-optimal. This paper proposes a three-stage approach using Fisher information metric to define a natural metric for the data, from which a Euclidean approximation can then be used to drive BSS. Results with synthetic data models of real-world high-dimensional data show that the classification accuracy of the method is good for challenging problems, while retaining interpretability.

business.industryPattern recognitionBlind signal separationSynthetic dataData mappingsymbols.namesakeComponent (UML)Metric (mathematics)symbolsArtificial intelligenceFisher informationbusinessFisher information metricInterpretabilityMathematics
researchProduct

Probabilistic quantum clustering

2020

Abstract Quantum Clustering is a powerful method to detect clusters with complex shapes. However, it is very sensitive to a length parameter that controls the shape of the Gaussian kernel associated with a wave function, which is employed in the Schrodinger equation with the role of a density estimator. In addition, linking data points into clusters requires local estimates of covariance which requires further parameters. This paper proposes a Bayesian framework that provides an objective measure of goodness-of-fit to the data, to optimise the adjustable parameters. This also quantifies the probabilities of cluster membership, thus partitioning the data into a specific number of clusters, w…

Information Systems and ManagementJaccard indexComputer scienceProbabilistic logicEstimatorProbability density function02 engineering and technologyFunction (mathematics)CovarianceMeasure (mathematics)Management Information Systemssymbols.namesakeArtificial Intelligence020204 information systems0202 electrical engineering electronic engineering information engineeringGaussian functionsymbolsCluster (physics)020201 artificial intelligence & image processingStatistical physicsQASoftwareQuantum clusteringKnowledge-Based Systems
researchProduct

Scalable implementation of measuring distances in a Riemannian manifold based on the Fisher Information metric

2019

This paper focuses on the scalability of the Fisher Information manifold by applying techniques of distributed computing. The main objective is to investigate methodologies to improve two bottlenecks associated with the measurement of distances in a Riemannian manifold formed by the Fisher Information metric. The first bottleneck is the quadratic increase in the number of pairwise distances. The second is the computation of global distances, approximated through a fully connected network of the observed pairwise distances, where the challenge is the computation of the all sources shortest path (ASSP). The scalable implementation for the pairwise distances is performed in Spark. The scalable…

0209 industrial biotechnologyComputer science02 engineering and technologyRiemannian manifoldBottleneckManifoldsymbols.namesake020901 industrial engineering & automationShortest path problemSpark (mathematics)Scalability0202 electrical engineering electronic engineering information engineeringsymbols020201 artificial intelligence & image processingFisher informationAlgorithmDijkstra's algorithmFisher information metric2019 International Joint Conference on Neural Networks (IJCNN)
researchProduct

Quantum clustering in non-spherical data distributions: Finding a suitable number of clusters

2017

Quantum Clustering (QC) provides an alternative approach to clustering algorithms, several of which are based on geometric relationships between data points. Instead, QC makes use of quantum mechanics concepts to find structures (clusters) in data sets by finding the minima of a quantum potential. The starting point of QC is a Parzen estimator with a fixed length scale, which significantly affects the final cluster allocation. This dependence on an adjustable parameter is common to other methods. We propose a framework to find suitable values of the length parameter σ by optimising twin measures of cluster separation and consistency for a given cluster number. This is an extension of the Se…

0301 basic medicineClustering high-dimensional dataMathematical optimizationCognitive NeuroscienceSingle-linkage clusteringCorrelation clustering02 engineering and technologyComputer Science ApplicationsHierarchical clusteringDetermining the number of clusters in a data set03 medical and health sciences030104 developmental biologyArtificial Intelligence0202 electrical engineering electronic engineering information engineeringCluster (physics)020201 artificial intelligence & image processingQACluster analysisAlgorithmk-medians clusteringMathematicsNeurocomputing
researchProduct

Robust Conditional Independence maps of single-voxel Magnetic Resonance Spectra to elucidate associations between brain tumours and metabolites.

2020

The aim of the paper is two-fold. First, we show that structure finding with the PC algorithm can be inherently unstable and requires further operational constraints in order to consistently obtain models that are faithful to the data. We propose a methodology to stabilise the structure finding process, minimising both false positive and false negative error rates. This is demonstrated with synthetic data. Second, to apply the proposed structure finding methodology to a data set comprising single-voxel Magnetic Resonance Spectra of normal brain and three classes of brain tumours, to elucidate the associations between brain tumour types and a range of observed metabolites that are known to b…

False discovery rateB VitaminsMagnetic Resonance SpectroscopyComputer scienceDirected Acyclic GraphsBiochemistry030218 nuclear medicine & medical imaging0302 clinical medicineMetabolitesMedicine and Health SciencesAmino AcidsQANeurological Tumors0303 health sciencesMultidisciplinaryDirected GraphsOrganic CompoundsBrain NeoplasmsQRTotal Cell CountingBrainMutual informationVitaminsLipidsChemistryConditional independenceOncologyNeurologyPhysical SciencesEngineering and TechnologyMedicineMeningiomaAlgorithmManagement EngineeringAlgorithmsResearch ArticleComputer and Information SciencesScienceCell Enumeration TechniquesGlycineFeature selectionCholinesResearch and Analysis MethodsSynthetic data03 medical and health sciencesInsuranceRobustness (computer science)HumansMetabolomics030304 developmental biologyRisk ManagementOrganic ChemistryChemical CompoundsBayesian networkBiology and Life SciencesCancers and NeoplasmsProteinsBayes TheoremDirected acyclic graphR1MetabolismAliphatic Amino AcidsGraph TheoryMathematicsPLoS ONE
researchProduct

An integrated framework for risk profiling of breast cancer patients following surgery.

2006

Objective: An integrated decision support framework is proposed for clinical oncologists making prognostic assessments of patients with operable breast cancer. The framework may be delivered over a web interface. It comprises a triangulation of prognostic modelling, visualisation of historical patient data and an explanatory facility to interpret risk group assignments using empirically derived Boolean rules expressed directly in clinical terms. Methods and materials: The prognostic inferences in the interface are validated in a multicentre longitudinal cohort study by modelling retrospective data from 917 patients recruited at Christie Hospital, Wilmslow between 1983 and 1989 and predictin…

Risk profilingAdultmedicine.medical_specialtyDecision support systemMedicine (miscellaneous)Breast NeoplasmsMachine learningcomputer.software_genreModels BiologicalRisk AssessmentDecision Support TechniquesUser-Computer InterfaceBreast cancerRisk groupsArtificial IntelligencemedicineConfidence IntervalsHealth Status IndicatorsHumansMedical physicsSurvival analysisMastectomyRetrospective StudiesInternetbusiness.industryPatient SelectionReproducibility of ResultsPatient dataMiddle Agedmedicine.diseaseDecision Support Systems ClinicalPrognosisConfidence intervalTreatment OutcomeNottingham Prognostic IndexFemaleArtificial intelligenceNeural Networks ComputerbusinesscomputerMonte Carlo MethodAlgorithmsArtificial intelligence in medicine
researchProduct

A principled approach to network-based classification and data representation

2013

Measures of similarity are fundamental in pattern recognition and data mining. Typically the Euclidean metric is used in this context, weighting all variables equally and therefore assuming equal relevance, which is very rare in real applications. In contrast, given an estimate of a conditional density function, the Fisher information calculated in primary data space implicitly measures the relevance of variables in a principled way by reference to auxiliary data such as class labels. This paper proposes a framework that uses a distance metric based on Fisher information to construct similarity networks that achieve a more informative and principled representation of data. The framework ena…

business.industryCognitive NeuroscienceFisher kernelPattern recognitionProbability density functionConditional probability distributionExternal Data Representationcomputer.software_genreComputer Science ApplicationsWeightingEuclidean distancesymbols.namesakeData pointArtificial IntelligencesymbolsArtificial intelligenceData miningFisher informationbusinesscomputerMathematicsNeurocomputing
researchProduct

A Novel Semi-Supervised Methodology for Extracting Tumor Type-Specific MRS Sources in Human Brain Data

2013

Background: The clinical investigation of human brain tumors often starts with a non-invasive imaging study, providing \ud information about the tumor extent and location, but little insight into the biochemistry of the analyzed tissue. Magnetic \ud Resonance Spectroscopy can complement imaging by supplying a metabolic fingerprint of the tissue. This study analyses \ud single-voxel magnetic resonance spectra, which represent signal information in the frequency domain. Given that a single \ud voxel may contain a heterogeneous mix of tissues, signal source identification is a relevant challenge for the problem of\ud tumor type classification from the spectroscopic signal.\ud Methodology/Princ…

Magnetic Resonance SpectroscopyStatistics as TopicBioinformaticsSignalDiagnostic RadiologyEngineeringDiscriminative modelBasic Cancer ResearchMathematical ComputingNeurological TumorsComplement (set theory)PhysicsMultidisciplinaryBrain NeoplasmsApplied MathematicsQRBrainMagnetic Resonance ImagingIdentification (information)OncologyFrequency domainMetric (mathematics)MedicineRadiologyAlgorithmsResearch ArticleScienceLipid signalingGlioblastoma multiformeMatrix decompositionRC0254Magnetic resonance imagingCancer detection and diagnosisMagnetic resonance spectroscopyCancer Detection and DiagnosisHumansPrototypesbusiness.industryFingerprint (computing)Cancers and NeoplasmsData acquisitionPattern recognitionComputing MethodsR1Computer ScienceSignal ProcessingRC0321Artificial intelligencebusinessMathematics
researchProduct