An Experimental Investigation into the Evaluation of Explainability Methods

Sédrick Stassin, Alexandre Englebert, Gilles Peiffer, Geraldin Nanfack, Julien Albert, Nassim Versbraegen, Miriam Doh, Nicolas Riche, Benoît Frénay, Christophe De Vleeschouwer

March 2022

Abstract

EXplainable Artificial Intelligence (XAI) aims at helping users to grasp the reasoning behind the predictions of an Artificial Intelligence (AI) system. Many XAI approaches have emerged in recent years. Consequently, a subfield related to the evaluation of XAI methods has gained considerable attention, with the aim to determine which methods provide the best explanation using various approaches and criteria. However, the literature lacks a comparison of the evaluation metrics themselves, that one can use to evaluate XAI methods. This work aims to fill this gap by comparing 14 different metrics when applied to 8 state-of-the-art XAI methods and 3 dummy methods (e.g., random saliency maps) used as reference bases. We show which of these metrics produce concordant results and which ones differ, indicating redundancy. We also demonstrate the important impact of specific hyperparameters on the evaluation metric values. Finally, we use the dummy (i.e. naive) methods to assess the reliability of metrics in terms of ranking. We uncover four redundant metrics and show that varying a specific hyperparameter strongly hinders evaluation metrics’ coherence. The main takeaway of our work is to highlight the importance of using metrics carefully, while being aware of their potential limitations when evaluating explainability methods.

Type

Conference paper

Publication

AIMLAI workshop at ECML-PKDD

Source Themes