Summary of Disparate Impact Usability Results

Abstract

Stakeholders in consumer lending are debating whether lenders can responsibly use machine learning models in compliance with a range of pre-existing legal and regulatory requirements, including those that relate to non-discrimination and fairness.

We focus on how certain proprietary and open-source model diagnostic tools affect lenders’ ability to manage fairness concerns related to obligations to identify less discriminatory alternatives for models used to extend credit.

We evaluate these tools on a “usability” criterion that assesses whether and how well these tools enable lenders to construct alternative models that are less discriminatory.

We find that dropping features identified as drivers of disparities does not lead to less discriminatory alternative models, and often leads to substantial performance deterioration. In contrast, more automated tools that search for a range of less discriminatory alternative models can successfully improve fairness metrics.

The findings presented here are extracted from a larger study that evaluates certain model diagnostic tools in the context of additional regulatory requirements (FinRegLab et al., 2022).

Set Up

FinRegLab, a non-profit research organization, collaborated with Stanford Professors Laura Blattner and Jann Spiess to evaluate the capabilities, limitations, and performance of model diagnostic tools from the following seven technology companies and select open-source tools:

The full study assesses these tools in the context of other aspects of fairness and other kinds of regulatory requirements.

Approach

This poster focuses on a single aspect of the fairness evaluation:

Methodology

This graphic shows a rough layout of our process.

We focused on Logistic Regression and XGBoost models since we received the most complete responses for these models.

After training models, participants provided recommendations on how to create less discriminatory alternative models through various methods of their choosing.

We evaluated whether the less discriminatory alternative models proposed by each participating company reduce adverse impact when test data are run through the models – and at what cost to predictive performance.

Search for Less Discriminatory Models

Less Discriminatory Alternative Model Metrics:

The point at the top of the graph is the baseline, and the two company modified baseline values are colored in blue. The listed methods are companies’ processes to identify less discriminatory alternative models. Three methods used automated tools: the solid line, the dotted line, and the X symbols. Three methods used versions of feature dropping: the circle, the square and the diamond. On the y-axis here we have predictive performance, where higher AUC indicates better performance. On the x-axis we plot Adverse Impact Ratio, again where an AIR closer to 1 represents a fairer model. The best less discriminatory alternative models would be in the upper right corner of this plot, with high predictive performance and high fairness.

Conclusions

Automated approaches outperform feature-drop and reweighing strategies.

Ours results generalized well to: different populations and different model types.

No single automated approach is always best; results varied depending on model type and fairness metric used.

More fairness is possible, but may come at some performance cost.

Related Publications

Machine Learning Explainability & Fairness: Insights from Consumer Lending
(April 2022)

Learn More

The Use of Machine Learning for Credit Underwriting: Market & Data Science Context (Sept. 2021)

Learn More

Explainability and Fairness of Machine Learning in Credit Underwriting: Empirical Research

FinRegLab is working with a team of researchers from the Stanford Graduate School of Business to evaluate the explainability and fairness of machine learning for credit underwriting. Our focus is on measuring the ability of currently available model diagnostic tools to provide information about the performance and capabilities of machine learning underwriting models. This research will help stakeholders assess how machine learning models can be developed and used in compliance with regulatory expectations regarding model risk management, anti-discrimination, and adverse action reporting.

Learn More
Translate »