The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective

This paper explores whether and to what degree different post hoc explainability tools provide consistent information about model behavior. It seeks to identify in specific scenarios the reasons that drive disagreement in outputs of these tools and potential ways to resolve such disagreements. The evaluation includes empirical analysis and a survey of how users of these tools contend with inconsistent outputs. The authors conclude that when explainability tools produce inconsistent information about model behavior, there are no official or consistent methods to resolving these disagreements and call for development of principled evaluation metrics to more reliably identify when such disagreements occur and their causes.

Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Zhiwei Steven Wu, and Himabindu Lakkaraju, Arxiv
February 2022

The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective

The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective

Categories

Recent Posts

Join our Newsletter