Files

coli.a.624.pdf
  • Open Access
  • Adobe PDF
  • 7.44 MB

Details

Authors
Show more
Abstract
This paper examines the variability of human and large language model (LLM) token-level explanations applied to the use case of French journalistic text classification between news and opinion articles. To do so, we compare human annotations (in the form of highlighted tokens indicating perceived decision-relevant words) with Layer-wise Relevance Propagation (LRP) explanations from fine-tuned transformer models. Using ten texts annotated by multiple readers and classified by equivalent instantiations of a LLM, we analyze qualitative patterns and quantitative trends with a similarity score. Results show that humans highlight fewer, longer spans focused on linguistically salient cues while LRP produces more diffuse, token-level attributions. Humans also tend to agree more on the most important tokens, whereas models align better when considering all tokens, reflecting divergent sensitivities to granularity. Prediction class matters as well: humans are more consistent on opinion texts, while models show greater stability on news. To refine our variability analysis, we apply discretization schemes aligning LRP values with categorical human judgments. Both linear and human-aligned discretization increase similarity with human explanations and thus improve visual plausibility in aggregated attention maps without altering model predictions. These findings suggest that model explanations are not systematically more variable than human ones but follow different dynamics depending on representation and scope. They also highlight the Rashomon effect in LLM explainability, showing that agreement on outputs does not imply convergence in reasoning. Our work demonstrates how methodological choices shape explanation variability and offers practical insights for bridging faithfulness and plausibility in explainable NLP.
Affiliations

Citations

Bogaert, J., Escouflaire, L., Descampe, A., Fairon, C., de Marneffe, M.-C., & Standaert, F.-X. (2026). Explanation Variability in Text Classification: Humans vs. LLMs. Computational Linguistics, 1-31. https://doi.org/10.1162/coli.a.624 (Original work published 2026)