News

Summary of Disparate Impact Usability Results


Abstract

Stakeholders in consumer lending are debating whether lenders can responsibly use machine learning models in compliance with a range of pre-existing legal and regulatory requirements, including those that relate to non-discrimination and fairness.

We focus on how certain proprietary and open-source model diagnostic tools affect lenders’ ability to manage fairness concerns related to obligations to identify less discriminatory alternatives for models used to extend credit.

We evaluate these tools on a “usability” criterion that assesses whether and how well these tools enable lenders to construct alternative models that are less discriminatory.

We find that dropping features identified as drivers of disparities does not lead to less discriminatory alternative models, and often leads to substantial performance deterioration. In contrast, more automated tools that search for a range of less discriminatory alternative models can successfully improve fairness metrics.

The findings presented here are extracted from a larger study that evaluates certain model diagnostic tools in the context of additional regulatory requirements (FinRegLab et al., 2022).

Set Up

FinRegLab, a non-profit research organization, collaborated with Stanford Professors Laura Blattner and Jann Spiess to evaluate the capabilities, limitations, and performance of model diagnostic tools from the following seven technology companies and select open-source tools:

The full study assesses these tools in the context of other aspects of fairness and other kinds of regulatory requirements.

Fiddler AI

Fiddler AI provides artificial intelligence platforms that are designed to help enterprises create and offer AI and machine learning services that are transparent, explainable, and understandable. The company’s platform offers business and statistical metrics, model performance monitoring, and model explanations, which enables businesses to analyze, manage, and deploy machine learning models at scale.

H2O.AI

H2O provides Driverless AI, an enterprise AutoML platform that enables customers across industries from financial services to healthcare to rapidly prototype and deploy competition grade ML models, built with a focus on explainability. H2O.ai has a long history of researching, identifying, and developing leading methods to make algorithms more transparent and explainable and building frontier technology for data scientists to understand and trust their AI.

RelationalAI

RelationalAI’s cloud-based relational knowledge graph management system offers truly revolutionary speed and seamless integration of information already in enterprise databases. Our unique technology accelerates deployments at scale and unlocks the commercial potential of a wider array of Artificial Intelligence algorithms – including counterfactual reasoning. Our knowledge-centric approach connects business intelligence with prediction systems. RelationalAI serves customers across sectors, from financial service to retail and telecom.

SolasAI

BLDS provides consulting services relating to the application of statistics and economics to questions of law and regulation. The firm has extensive experience in deploying statistical methods in the context of credit and marketing decisions that comply with anti-discrimination laws. The firm has also helped clients in a variety of other industries, including insurance and healthcare services. In recent years, BLDS has focused on the development and implementation of techniques that provide a clearer understanding of AI decision-making and evaluate the fairness of such models. BLDS established an algorithmic fairness software company and corresponding product, SolasAI, used by many of its clients to find fairer and highly predictive models.

Stratyfy

Stratyfy is an ethical artificial intelligence (AI) company that offers predictive analytics and decision optimization software for credit and risk teams, helping lenders provide more people with access to fair and transparent credit. Stratyfy’s unique solutions provide the level of understanding and control that regulated institutions require to proactively identify and mitigate bias and make better credit decisions. With Stratyfy’s solutions, users can seamlessly combine the precision of data and the wisdom of domain expertise, optimizing risk-based decisions without introducing regulatory or operational risk.

Zest AI

Zest AI machine learning software and services help lenders make more accurate and fairer credit underwriting decisions. Zest’s Model Management System allows credit teams to build, analyze, adopt, and operate ML decisioning models using hundreds or thousands of FCRA-compliant data points with speed and transparency. Since 2009, Zest has provided credit scores for hundreds of millions of prospective borrowers worldwide, including those with little to no credit history.

Approach

This poster focuses on a single aspect of the fairness evaluation:

Methodology

We focused on Logistic Regression and XGBoost models since we received the most complete responses for these models.

After training models, participants provided recommendations on how to create less discriminatory alternative models through various methods of their choosing.

We evaluated whether the less discriminatory alternative models proposed by each participating company reduce adverse impact when test data are run through the models – and at what cost to predictive performance.

Search for Less Discriminatory Models

Less Discriminatory Alternative Model Metrics:

The point at the top of the graph is the baseline, and the two company modified baseline values are colored in blue. The listed methods are companies’ processes to identify less discriminatory alternative models. Three methods used automated tools: the solid line, the dotted line, and the X symbols. Three methods used versions of feature dropping: the circle, the square and the diamond. On the y-axis here we have predictive performance, where higher AUC indicates better performance. On the x-axis we plot Adverse Impact Ratio, again where an AIR closer to 1 represents a fairer model. The best less discriminatory alternative models would be in the upper right corner of this plot, with high predictive performance and high fairness.

Conclusions

Automated approaches outperform feature-drop and reweighing strategies.

Ours results generalized well to: different populations and different model types.

No single automated approach is always best; results varied depending on model type and fairness metric used.

More fairness is possible, but may come at some performance cost.

Related Publications

  • Explainability and Fairness in Machine Learning for Credit Underwriting

    FinRegLab worked with a team of researchers from the Stanford Graduate School of Business to evaluate the explainability and fairness of machine learning for credit underwriting. We focused on measuring the ability of currently available model diagnostic tools to provide information about the performance and capabilities of machine learning underwriting models. This research helps stakeholders… Learn More


About FinregLab

FinRegLab is an independent, nonprofit organization that conducts research and experiments with new technologies and data to drive the financial sector toward a responsible and inclusive marketplace. The organization also facilitates discourse across the financial ecosystem to inform public policy and market practices. To receive periodic updates on the latest research, subscribe to FRL’s newsletter and visit www.finreglab.org. Follow FinRegLab on LinkedIn and Twitter (X).

FinRegLab.org | 1701 K Street Northwest, Suite 1150, Washington, DC 20006