Related Content For arXiv

This survey delves into challenges of federated machine learning beyond potential security issues that could affect adoption in industries like financial services. For example, the authors consider how asymmetric data and communications systems might make building networks between heterogenous institutions difficult and increase the costs related to uploading and downloading models or portions of models. These considerations may be especially important in underserved and emerging markets.
By Tian Li
This paper explores whether and to what degree different post hoc explainability tools provide consistent information about model behavior. It seeks to identify in specific scenarios the reasons that drive disagreement in outputs of these tools and potential ways to resolve such disagreements. The evaluation includes empirical analysis and a survey of how users of these tools contend with inconsistent outputs. The authors conclude that when explainability tools produce inconsistent information about model behavior, there are no official or consistent methods to resolving these disagreements and call for development of principled evaluation metrics to more reliably identify when such disagreements occur and their causes.
This paper addresses the importance of situating explainable AI approaches within human social interactions to improve model transparency. The paper focuses on the concept of “social transparency,” which incorporates the context of those social interactions into explanations of AI systems. Interviews with AI users and practitioners ground the paper’s offering of a conceptual framework for identifying and measuring social transparency in order to improve AI decision making, increasing trust in AI, and nurturing broader values of AI explainability.
This paper explores the problem of “underspecification” – a statistical phenomenon that occurs when an observed issue may have several possible causes, not all of which are accounted for in the model. The team of authors from Google examined case studies in computer vision, medical imaging, natural language processing, and medical genomics, and found variation in model performance based on underspecification problems using a variety of ML pipelines. As a result, training processes that can produce sound models often result in poor models, and the difference between the two will not be apparent until the model is in use and has to generalize to non-training data. Based on these findings, the authors point to greater rigor in specifying model requirements and stress testing models before they are approved for use.