6533b85ffe1ef96bd12c10e6

RESEARCH PRODUCT

Robustness, Stability, and Fidelity of Explanations for a Deep Skin Cancer Classification Model

Mirka SaarelaLilia Geogieva

subject

Fluid Flow and Transfer Processesexplainable artificial intelligenceskin cancerProcess Chemistry and TechnologyGeneral Engineeringconvolutional neural networkdeep learningsyväoppimineninterpretable machine learningpäätöksentukijärjestelmätneuroverkotdiagnostiikkaComputer Science Applicationsihosyöpälocal model-agnostic explanationskoneoppiminenGeneral Materials ScienceInstrumentationexplainable artificial intelligence; interpretable machine learning; skin cancer; convolutional neural network; deep learning; integrated gradients; local model-agnostic explanationsintegrated gradients

description

Skin cancer is one of the most prevalent of all cancers. Because of its being widespread and externally observable, there is a potential that machine learning models integrated into artificial intelligence systems will allow self-screening and automatic analysis in the future. Especially, the recent success of various deep machine learning models shows promise that, in the future, patients could self-analyse their external signs of skin cancer by uploading pictures of these signs to an artificial intelligence system, which runs such a deep learning model and returns the classification results. However, both patients and dermatologists, who might use such a system to aid their work, need to know why the system has made a particular decision. Recently, several explanation techniques for the deep learning algorithm’s decision-making process have been introduced. This study compares two popular local explanation techniques (integrated gradients and local model-agnostic explanations) for image data on top of a well-performing (80% accuracy) deep learning algorithm trained on the HAM10000 dataset, a large public collection of dermatoscopic images. Our results show that both methods have full local fidelity. However, the integrated gradients explanations perform better with regard to quantitative evaluation metrics (stability and robustness), while the model-agnostic method seem to provide more intuitive explanations. We conclude that there is still a long way before such automatic systems can be used reliably in practice.

https://doi.org/10.3390/app12199545