Consolidating Explanation Stability Metrics

Bogaert, Jeremie; Descampe, Antonin; Standaert, François-Xavier

doi:10.1007/978-3-032-08327-2_15

Consolidating Explanation Stability Metrics

Bogaert, Jeremie

;

Descampe, Antonin

;

Standaert, François-Xavier

(2025) Explainable Artificial Intelligence - Third World Conference, xAI 2025 — Location: Istanbul, Turkey (9.July.2025)

Files

ConsolidatingExplanationStabilityMetrics.pdf

Open Access
Adobe PDF
850.72 KB

Download

Details

Authors

Bogaert, Jeremie
Author
Descampe, Antonin
Author
Standaert, François-XavierUCLouvain
Author

Abstract

The explanations of large language models (e.g., where each word is assigned a relevance score) have recently been shown to be sensitive to the randomness used during model training, creating a need to evaluate this sensitivity. While simple visualization tools such as box plots can provide a qualitative characterization, exploring the design space of the parameters influencing the explanation’s sensitivity to the training randomness may benefit from a more quantitative approach. First attempts in this direction explored simple (word-level univariate, first-order) explanations and proposed tentative information theoretic metrics such as the explanation’s signal, noise and Signal-to-Noise Ratio (SNR). They left the suitability of such metrics as an open question, which we tackle in this work. For this purpose, we start by identifying corner cases where they appear unable to capture intuitively desirable features of explanations corresponding to a different training randomness. Namely, the SNR does not reflect well the relative differences of relevance (between words). We next put forward that the correlation with a mean explanation provides a better treatment of these corner cases, at the cost of being unable to reflect absolute differences of relevance (for single words). We then discuss how to turn these observations into a consolidated approach for analyzing the explanations’ sensitivity to the training randomness. While there is no silver bullet that perfectly deals with the full complexity of this sensitivity problem, we argue that design space exploration with the correlation metric and individual model analysis with box plots provides a good tradeoff. Besides, we put forward additional desirable features of the correlation metric (e.g., unbiased estimation thanks to cross-validation and simple confidence intervals).

Affiliations

UCLouvainSST/ICTM/ELEN - Pôle en ingénierie électrique

Citations

APA
Chicago
FWB

Bogaert, J., Descampe, A., & Standaert, F.-X. (2025). Consolidating Explanation Stability Metrics. Consolidating Explanation Stability Metrics, p. 310-323. https://doi.org/10.1007/978-3-032-08327-2_15