Fact Sheet: FinRegLab to Undertake Analysis of Machine Learning Underwriting Models with and without Cash-Flow Data

Download the Fact Sheet

FinRegLab is launching a ground-breaking comparison of the financial inclusion benefits of using machine learning underwriting models with and without cash-flow data to increase responsible access to credit for consumers who may otherwise find it difficult to obtain safe and affordable loans.

Credit is critical in helping households bridge short-term financial gaps and make long-term investments in homes, reliable transportation, and small business formation. Yet nearly 20 percent of U.S. adults lack sufficient traditional credit history to generate scores using the most widely used scoring models, and an even larger number of consumers may struggle to access affordable credit because they have low credit scores. Limitations in traditional information and modeling affect some groups disproportionately. For instance, almost 30 percent of Black and Hispanic consumers cannot be scored by the most widely used models as compared to only 16 percent of White and Asian consumers.¹

In light of these limitations, large banks and fintechs are increasingly working to improve their underwriting models by using machine learning techniques, alternative data sources such as bank transaction account records, or a combination of the two innovations. However, no publicly available research directly compares each innovation’s separate and combined effects on predictiveness, inclusion, and fairness.

While stakeholders are interested in the potential of new data sources and analytical techniques to increase credit access, these more complex analytics also raise concerns about reliability, bias, and the ability to understand and manage more sophisticated models.

The new project could help to inform the entire lending ecosystem and be particularly helpful to smaller institutions, advocates, and policymakers in prioritizing resources and new initiatives. The project comes at a time when artificial intelligence (AI) applications are drawing particular scrutiny as to their potential benefits and risks. While machine learning models used for credit underwriting rely on more carefully curated data and are substantially less complex than those used in so-called “generative AI” applications such as ChatGPT, the consequences for borrowers
and lenders of inaccurate or biased credit models can still be severe.

Phases of Research

The first phase of the project will involve building two groups of credit underwriting models using logistic regression techniques and a form of machine learning called XGBoost. The models will be trained on credit bureau data, consumer-permissioned bank account information, or a combination of the two sources.

The overarching goal of the research is to investigate the impact of incorporating machine learning methods and cash flow information in consumer credit underwriting models on accuracy, financial inclusion, and fairness. For example, the project will compare the performance of traditional underwriting models and machine learning models that incorporate credit bureau and cash flow features to models that use only one of those data sets and analyze whether these results vary among different demographic groups.

Later phases of the project will include additional workstreams focusing on alternative approaches to managing concerns about model explainability and fairness, such as building additional models using alternative techniques and comparing them to the original set to evaluate potential tradeoffs between model performance, simplicity, and fairness metrics.

The research aims to inform policymakers, lenders, other industry actors, advocates, and researchers in prioritizing initiatives to advance the responsible, fair, and inclusive use of machine learning and new data sources in credit underwriting.

Endnotes

[1] Mike Hepistall et al., Financial Inclusion and Access to Credit, Oliver Wyman (2022); Kenneth Brevoort et al., Data Point: Credit Invisibles, Consumer Financial Protection Bureau (2015).

Machine Learning Underwriting Models & Cash-Flow Data

FinRegLab is launching a ground-breaking comparison of the financial inclusion benefits of using machine learning underwriting models with and without cash-flow data to increase responsible access to credit for consumers who may otherwise find it difficult to obtain safe and affordable loans. Learn More

Read more: Machine Learning Underwriting Models & Cash-Flow Data

About FinRegLab

FinRegLab is an independent, nonprofit organization that conducts research and experiments with new technologies and data to drive the financial sector toward a safe and responsible marketplace. The organization also facilitates discourse across the financial ecosystem to inform public policy and market practices. To receive periodic updates on the latest research, subscribe to FRL’s newsletter and visit www.finreglab.org. Follow FinRegLab on LinkedIn.

FinRegLab.org | 1701 K Street Northwest, Suite 1150, Washington, DC 20006

Phases of Research

Endnotes

Machine Learning Underwriting Models & Cash-Flow Data

About FinRegLab