6533b85cfe1ef96bd12bd1a4

RESEARCH PRODUCT

Polar Classification of Nominal Data

Yaniv ShmueliAmir AverbuchAmir AverbuchShachar HarussiShachar HarussiGuy WolfGuy Wolf

subject

Data setSimilarity (geometry)Computer scienceDimensionality reductionPrincipal component analysisDiffusion mapCluster analysisMeasure (mathematics)Categorical variableAlgorithm

description

Many modern systems record various types of parameter values. Numerical values are relatively convenient for data analysis tools because there are many methods to measure distances and similarities between them. The application of dimensionality reduction techniques for data sets with such values is also a well known practice. Nominal (i.e., categorical) values, on the other hand, encompass some problems for current methods. Most of all, there is no meaningful distance between possible nominal values, which are either equal or unequal to each other. Since many dimensionality reduction methods rely on preserving some form of similarity or distance measure, their application to such data sets is not straightforward. We propose a method to achieve clustering of such data sets by applying the diffusion maps methodology to it. Our method is based on a distance metric that utilizes the effect of the boolean nature of similarities between nominal values (i.e., equal or unequal) on the diffusion kernel and, in turn, on the embedded space resulting from its principal components. We use a multi-view approach by analyzing small, closely related, sets of parameters at a time instead of the whole data set. This way, we achieve a comprehensive understanding of the data set from many points of view.

https://doi.org/10.1007/978-94-007-5288-7_14