Press Releases
FinRegLab Responds to Treasury Request for Information on Artificial Intelligence.
Department of the Treasury
1500 Pennsylvania Avenue, NW
Washington, D.C. 20220
Thank you for the opportunity to provide input on the uses of AI in the financial services sector and the opportunities and risks presented by AI developments and applications. We commend the Treasury Department’s ongoing engagement on this important and rapidly evolving topic.
FinRegLab is a nonprofit, nonpartisan innovation center that tests new technologies and data to inform public policy and drive the financial sector toward a responsible and inclusive financial marketplace. With our research insights, we facilitate discourse across the financial ecosystem to inform public policy and market practices. Financial inclusion is central to FinRegLab’s mission because an inclusive financial system helps to increase broader economic participation, financial health, and wealth building, particularly among historically marginalized and disadvantaged populations.
Much of our research has focused on the financial inclusion and fairness implications of data and machine learning (ML) in consumer and small business lending, where we have written in depth about potential benefits and risks, current practices, current risk mitigation strategies and tools, and potential market and policy evolution. We incorporate those reports by reference here,1 in addition to highlighting a number of other AI use cases that have potentially important implications for financial inclusion. Our comment is split into three sections:
- Background on AI adoption and evolution in financial services
- AI use cases with particular implications for financial inclusion
- Next steps to facilitate adjusting market practices and regulatory frameworks.
Overall, we emphasize the importance of considering the impacts of AI adoption on historically underserved and under-resourced populations. To the extent that AI and more representative data sources can help to produce more individually tailored financial services and to substantially reduce the costs of delivery, these populations could derive substantial benefits from such innovations since their needs are often not well met by existing systems. However, market incentives may not lead to such use cases being prioritized relative to more profitable customer segments. In addition, the risk of unintended consequences can be particularly high for such populations to the extent that AI adoption increases fraud and identity theft, perpetuates and exacerbates historical disparities, or creates new barriers to inclusion. We urge the Treasury Department, other policymakers, and financial services stakeholders more broadly to monitor and prioritize initiatives to ensure that AI adoption improves access to responsible financial services over time.
Background: AI adoption and evolution in financial services
Artificial intelligence is a term coined in 1956 to describe computers that perform processes or tasks that “traditionally have required human intelligence.” Machine learning is often used to refer to the subset of artificial intelligence that gives “computers the ability to learn without being explicitly programmed.”2 In practice, these terms are often used to describe a broad range of models and applications that may be applied to a variety of data sources (e.g., tabular financial data, text, images, and voice) for a wide range of purposes (including predictive analytics, web and information searches, autofill functions, and generating new content in response to queries). The level of human involvement also varies depending on the particular application and technique. At a minimum human oversight is critical in determining what data the algorithms are exposed to, choosing among different techniques and model architectures at the start, and validating, monitoring, and managing the models depending on the desired goals and the risks to address.
It is important to emphasize that ML/AI is not monolithic and that terminology can vary in different contexts (e.g., investor pitches compared to data science convenings compared to public policy debates), which can sometimes make it difficult to understand how much new innovations differ from older forms of statistical analysis and automation. The proposed definition in the Request for Information also poses this challenge, since it appears to include a wide range of automated decisioning.3 There also can be tremendous differences between sectors, use cases, and individual companies as to how ML/AI is deployed and managed.
The financial services sector has been an early adopter of statistical analysis and automation, including the use of machine learning models decades ago particularly in contexts such as fraud detection, cybersecurity, and market trading.4 Improvements in computing power, access to digital information sources, and data science techniques for diagnosing and managing models have encouraged adoption in additional use cases over time, including marketing, credit and insurance underwriting, servicing, and customer support functions.5 In recent years, use of machine learning models in credit underwriting has been a particularly significant development, both because of the potential to improve access to credit for millions of people who are difficult to underwrite using traditional techniques and data sources and because it has required lenders to grapple with a range of regulatory requirements including individualized consumer disclosures and fair lending laws in addition to more general risk management expectations. Given the consequences to both lenders and borrowers, use in underwriting has triggered substantially more scrutiny and debate than older applications such as fraud.6
In the past 18 months, interest in and debate about AI applications has accelerated in financial services just as it has in other sectors after the release of ChatGPT in November 2022 underscored the potential for large language models (LLMs) and other forms of generative AI to create new content (including text and images) by relating prompts to learned patterns in training data.7 The largest and most technologically savvy financial services providers are testing these techniques across a broad range of use cases, and adoption has begun for selective functions in areas such as fraud detection and back office operations. However, as noted in the Treasury Department’s recent cybersecurity report, financial services providers have tended to be substantially more cautious in adopting these models than other sectors, due to several considerations:
- Data management and hygiene: Large language models are typically trained on massive datasets scraped from large portions of the internet.8 They are requiring more training data and becoming more complex as models become more advanced. For example, GPT (2018) has 117 million parameters and was trained on five gigabytes of data, GPT-2 (2019) has 1.5 billion parameters and was trained on 40 gigabytes of data, and GPT-3 (2020) has 175 billion parameters and was trained on 45 terabytes of data.9 Particularly with regard to proprietary LLMs, this raises substantial concerns for financial services providers about intellectual property considerations and risks in exposing their own proprietary information to such models, as well as accuracy, bias, and explainability as discussed below.
- Accuracy, consistency, and bias: The broad-based nature of LLMs’ training data, the frequency of inaccurate answers (often called “hallucinations”), variations in output wording, and the risk that the models will perpetuate or even exacerbate bias are shaping financial services providers’ decisions about when and how to deploy these technologies. Where queries are inputted directly by consumers, those inquiries could also become a potential source of inconsistency and bias.10 Risks can be reduced by carefully curating and filtering training data, engaging in extensive fine-tuning processes, or the use of different embedded models to overcome hallucinations, but these measures can be expensive and their effectiveness is still being evaluated.11
- Explainability: The size and complexity of LLMs makes it extremely difficult for data scientists to understand how the models are generating their answers for purposes of risk management, regulatory supervision, or consumer disclosure. While the broader technology community is working to develop techniques and strategies to address explainability concerns in the LLM context,12 financial services providers are particularly sensitive to the lack of agreed-upon practices given that they are subject to a range of explainability expectations in different activities, particularly in credit-related applications.13
All of these factors add to the cost and risks of deployment, particularly with regard to direct-to-customer applications. While a variety of large financial services providers have begun testing versions of these technologies, many banks have banned the use of ChatGPT, Dolly, GitHub Copilot, and similar proprietary products due to risk and security concerns.14 Instead, many providers are working to use open source LLMs such as LLaMA, Mistral, and open source GPTs, which they often bring within their firewalls before conducting supplemental training with their own data to build products or use them for various applications such as text summarization, extraction, generation, prohibited language detection, sentiment analysis, and computer vision.15 Financial service providers are also designing many of their early deployments as “employee assistant” models that are designed to assist coders, customer services representatives, and other knowledge-based employees, without replacing those employees or interacting directly with customers.16
Financial institutions and other firms are also tailoring their specific model choices to manage risks for the particular task at hand. For example, where a task involves text generation, companies may select different models based on the size of the parameters. While a smaller model may not perform best in a general setting, it might excel in a specific use case and present less risk of hallucinations where it is trained on smaller but more curated data sources.17 Additionally, companies can use RAG (Retrieval-Augmented Generation) or fine-tune LLMs to reduce hallucinations, though these processes increase costs.18 For tasks such as text extraction and sentiment analysis, where hallucinations usually are not a concern but accuracy still matters, LLMs that have been pre-trained on large amounts of data can provide higher accuracy than traditional NLP models, but have downsides with regard to computational cost and complexity of deployment. If analyzing the broader context in which a word or phrase appears is important, smaller language models like BERT can be used in some contexts.19 Where data labels are available and context is not critical, convolutional neural networks (CNNs), recurrent neural networks (RNNs), and some other simpler, less expensive models can also be used for a range of NLP tasks.20
AI use cases with significant inclusion implications
Although there are serious reliability issues to be sorted out with various forms of generative AI, the technology is causing stakeholders to consider potential advancements in various use cases that previously seemed many years away. We highlight three financial services use cases for evolving forms of AI with particularly important implications for financial inclusion of historically underserved and under-resourced populations.
A. Delivery of financial advice and coaching
Even prior to recent advancements in generative AI, the development and deployment of chatbots has been accelerating rapidly across various industries, particularly as customer service or sales agents. These chatbots can generally be categorized as either rule-based or generative AI-based.21 Rule-based chatbots may use either simple keywords, or a natural language processing engine, to determine what the user is asking (i.e. intent), then direct a user to a set of pre-written template responses. Generative-AI chatbots use deep learning architectures, particularly transformers, to generate responses. They generate responses dynamically based on the context and the responses are not pre-scripted. Some chatbot systems combine these approaches, for instance by pointing the user to pre-written content as well as generating commentary or a summary of the static content.
Like in other industries, customer service chatbots have become widespread in the financial industry, though these are almost entirely rule-based systems. All of the top 10 commercial banks have released such a chatbot, and about 37% of the US population is estimated to have interacted with a bank through a chatbot in 2022.22 Some banks have developed their own rule-based chatbots using first-party data such as customer conversation logs.23 Models such as Capital One’s Eno, released in 2017, and Bank of America’s Erica, released in 2018, have grown in adoption since their launch. Bank of America reported that Erica took four years to reach 1 billion interactions, yet surpassed 2 billion just 18 months later.24 Erica can perform a range of tasks, such as informing checking account-holders of their monthly subscription charges and of any bills due.
Other banks have partnered with third parties for these services. For example, Kasisto, a company that specializes in creating conversational financial assistants, was founded in 2013 and released its first model in 2018.25 Kasisto has now partnered to deploy its system, Kasisto AI (KAI), with 49 financial institutions, including over 30 credit unions and community banks, and serves more than 50 million customers. In 2018, the company partnered with JP Morgan to release a KAI-powered assistant to help corporate clients in its treasury services division.26
FinRegLab has spoken to some financial institutions that are testing the use of LLMs to process inquiries, though they continue to use rule-based outputs. In general, banks remain extremely averse to deploying fully LLM-generated responses in customer-facing external applications due to concerns about hallucinations, inconsistency in answer wording, and the potential for legal liability vis a vis consumers or regulators, in addition to reputational damage.27
One fintech startup, Cleo, has attempted to resolve some of these concerns with a combination of rule-based and LLM-generated responses. Cleo seeks to reach a younger audience that may find financial advisors inaccessible or intimidating, by adopting what they refer to as a “big sister persona”,28 offering features including “Roast Mode”, which makes fun of a user’s spending habits, and “Hype Mode,” which congratulates the user on their good financial decisions.29 Like the above applications, the chatbot uses LLMs for understanding user intent. It then classifies the request: some queries (such as specific requests for financial advice) are given a pre-written response. Cleo advertises that they have multiple comedians on staff helping to write many of their responses. However, other responses receive a response generated with an pre-trained LLM model, such as GPT-4 or Claude, summarizing material from Cleo’s customer service knowledge base.30
As financial chatbots become increasingly sophisticated, questions about how they present to consumers and how consumers react to their use raise important implications for issues of privacy, equity and financial inclusion. One study found that consumers’ trust in a brand decreases more when disclosing personal information to an AI compared to a human, because consumers infer that data shared with an AI system is more likely to be shared with a wider audience.31 This effect can be mitigated when a bot is more anthropomorphic (i.e. with more human-like dialogue, names, or avatars), or when a customer is assured that their confidentiality will be protected.
In some sensitive contexts, the non-human nature of a chatbot may be an asset. A study within the UK’s National Health Service system suggested that using personalized chatbots increased self-referrals for mental healthcare services, an effect that was especially pronounced among non-binary individuals.32 Research in a similar vein suggests that customers who are shopping for sensitive personal items, such as anti-diarrheal pills or condoms, prefer to communicate with a chatbot over a human, while also favoring a less anthropomorphic chatbot that is clearly identifiable as such over a more human-like bot.33
This research suggests that chatbots may be especially well-received for assisting customers in contexts where self-presentation concerns are high. Counseling about financial distress could be one such context; financial hardship has often been found to induce feelings of embarrassment and shame, with one survey finding that 36% of Americans feel embarrassed by their finances.34 Individuals that are ashamed of their finances can be fearful of judgment from others, and therefore may not seek the help that they need.35 Research has indeed shown that shame can lead individuals to disengage from their financial situation—avoiding help, making counterproductive decisions, and ultimately creating a vicious cycle of shame and financial difficulty.36 This raises important questions about whether a financial advice chatbot that clearly presents as non-human, while also containing assurances that consumers’ information is private and secure, could increase the accessibility of financial advice and credit counseling. However, concerns about accuracy as discussed above become especially pertinent in such applications, where advice is often complex and variable from person to person, and the consequences of being wrong are especially high.
Research also raises important questions about the race and gender presentation of advice bots, and how these traits might affect customers’ receptiveness. Some research has found that compared to male-presenting bots, female chatbots are perceived as more human, helpful, and nurturing, though other studies have found no difference between the two.37 There are also questions on how users may respond to chatbots with avatars of different races. Research in other contexts has frequently found a homophily effect—that people are more drawn to and more favorable toward people of the same race as them, whereas one study found that consumers of all races perceived a bot with a Black avatar to be more responsive.38 Regardless, the appearance and style of a bot’s presentation may have implications for which customers feel best-served by the bot, and this is an area in which more research would be warranted.
With the increasing use and complexity of AI chatbots in financial services, these various questions and challenges will become increasingly relevant. AI may create opportunities to reach historically underserved customers, either because they are more comfortable with a chatbot over a human, or because the chatbot can present information in a way that is highly tailored to the specific consumer. Regardless, many consumers will undoubtedly prefer to communicate with a human financial advisor—either in general, or when they have more complex inquiries. These considerations underscore the importance of financial institutions continuing to offer clearly advertised alternatives to chatbots and providing simple and straightforward off-ramps from chat-based channels for users to speak to a human.39
B. Data and technology applications for identity verification, fraud detection, and anti-money laundering
As discussed in Treasury’s cybersecurity report, fraud detection is one of the oldest and broadest examples of machine learning deployment in financial services. Machine learning models are often used to monitor bank account, credit card, or other financial transactions data to flag suspicious patterns, sometimes in combination with more traditional rules-based tools. Some anti-money laundering (AML) compliance programs also make use of ML models, although rules-based tools appear to be more common in AML than in fraud detection.40 In light of the nature and volume of the data, rapidly evolving fraud patterns, and other considerations, these ML models are often more complex and are updated more frequently than those used in other financial services contexts, although they are not necessarily fully dynamic in the sense of continuously adjusting themselves based on incoming data without developer initiation and validation prior to deployment.41 Even very large financial institutions often rely in part on models developed by vendors in this context, given that such platforms can provide insights across multiple institutions at the same time.
ML models are also used for identity verification in initial account opening and application processes to estimate the likelihood that an applicant is who they say they are. These models may analyze, for instance, if recent account openings with the applicant’s social security number or ID are suspicious. Related applications of artificial intelligence include systems that analyze an ID card–typically drivers’ licenses–to detect fake IDs.
While generative AI is not currently in wide use in identity verification or fraud detection processes, several organizations have announced upcoming generative AI-powered products. For example, some FIs and service providers are experimenting with using transformer models (which rely on a model architecture normally used for generative AI applications) for use in transaction monitoring.42 There is also interest in using generative AI to create synthetic datasets to train anti-fraud and AML models, although the efficacy of this technique is unclear.43 Public research into the utility and efficacy of techniques and architectures that are used in generative AI for identity verification and transaction monitoring could be helpful, including potential inclusion implications. Some stakeholders argue that financial institutions should instead focus on more basic prevention and detection strategies, such as using two- or multi-factor authentication and refining step-up verification methods, before investing in generative AI for identity verification and transaction monitoring.44
Generative AI can also be used by fraudsters to attempt to fool AML and anti-fraud systems, however. As Treasury’s recent report detailed, AI may be used to increase the sophistication of social engineering attacks as well as to generate fake IDs and imitate victims to bypass voice authentication systems.45 Artificial intelligence and advanced analytical techniques can be used to detect these imitations with varying degrees of success.46 Yet whether these defenses can keep pace and whether safeguards implemented by commercial generative AI providers are having meaningful effects are open questions as AI-enabled fraudsters become more sophisticated.
While stakeholders tend to view identity verification and transaction monitoring almost exclusively through a security lens, these processes have large inclusion impacts that should be monitored and managed carefully as they evolve. An estimated 11.6 percent of unbanked households cite that do not have the personal identification required to open a bank account,47 and transaction monitoring models that mistakenly flag transactions as illegitimate can prevent consumers from completing important transactions and, in extreme cases, lead to account closures.48 Financially vulnerable consumers can also be disproportionately affected where financial institutions fail to detect fraud, since losses of even small dollar amounts can represent a significant percentage of monthly income or balance sheet savings. The Federal Trade Commission’s fraud surveys have found Black and Latino consumers experience fraud at a higher rate than non-Latino White consumers,49 and the Identity Theft Resource Center (ITRC) reports that women and Black victims seek its assistance at higher rates than their proportion of the general population.50
Particularly as large data breaches and fraud losses continue to mount,51 financial system stakeholders are increasingly investing in new types of data, upgrading their monitoring models, and pursuing additional data sharing initiatives. These innovations include broadening identity verification systems to rely on an increasingly diverse range of data, such as crosschecking basic information with wireless carriers and other companies and “digital footprint” data concerning keystroke timing and other device usage patterns, as well as migrating away from rules-based systems toward increasing use of ML predictive models. Stakeholders are also exploring a range of “privacy enhancing technologies” such as encryption and federated machine learning models, to balance the potential benefits of exposing analytical models to more data with the risks to privacy and security from moving and aggregating more customer information.
As identity verification and transaction monitoring become more advanced, the inclusion impacts of these developments–whether positive or negative–should be carefully monitored and considered. The adoption of more diverse data sources and sophisticated analytical techniques could potentially be helpful for consumers who lack extensive traditional financial histories. At the same time, as fraud and illicit finance accelerate, there is a risk that financial institutions may increasingly consider consumers with more limited identity documentation or less traditional financial patterns to be too risky to serve. A forthcoming FinRegLab paper will explore these trends and concerns in greater detail.
C. Back-office applications to increase the nimbleness of model and product development
Use of machine learning and natural language processing in back-office applications often receives less public attention but has potentially important implications for financial inclusion where these tools can help financial institutions be more nimble in processing data, developing new products, or migrating away from legacy systems to platforms that can be used to tailor and deliver financial products and services to historically underserved populations.
One potentially compelling illustration of such use cases concerns the deployment of techniques and model architectures that are commonly used in generative AI applications to help parse information from the text fields on bank account statements that helps to describe the payee or payor and other details of individual transactions. Financial institutions may wish to use this data to help with underwriting and fraud monitoring and building budgeting and other personal financial management tools, among other uses. However, the language of these memo fields is not standardized, and formats vary widely across financial institutions. While categorization models have improved over time, financial institutions are eager for more accurate and reliable techniques.
Financial services providers are using a variety of approaches to convert the data into more structured and standardized formats for analysis. For example, the B2B payments processor Slope has leveraged various models, such as GPT and BERT, to create unsupervised clusters of inflows and outflows, allowing them to see streams of incomes and expenses.52 In late 2023 Slope also developed TransFormer, a highly fine-tuned model built on an open source LLM, which can label and categorize transactions with a reportedly high level of accuracy.53 Slope now uses this model in its credit monitoring dashboards, and is planning to integrate it into its underwriting system. Another company working on similar technology is Hyperplane, which was recently acquired by Brazil’s Nubank.54 Hyperplane works with banks to convert their various forms of first-party data into a more usable form. Like with Slope, a large component of this includes transaction data. Hyperplane offers services utilizing this data to train and fine-tune predictive models, and to develop marketing campaigns targeting users with financial products.55
Such AI technologies could potentially help to derive significant additional insight from bank account data for underwriting, personal financial management, and other applications, as well as to help financial services providers in a variety of other contexts such as mining customer service notes for trend spotting and compliance purposes and migrating their data from one platform to another.56 Especially for financial services providers who are reliant on older legacy systems, the ability to migrate to newer platforms could be a significant boost to promoting more innovative uses of data and technology.
Many of these use cases involve converting unstructured data into structured data, often requiring text extraction exercises. As discussed in the background section, companies have a variety of options to choose from depending on the specific task and potential trade-offs between complexity, cost, and accuracy. The accuracy of LLMs can be improved with RAG (Retrieval-Augmented Generation) or fine-tuning processes, although they can be costly and complex to deploy. Simpler traditional models are a potential alternative for specific tasks where the context surrounding a particular word or phrase is less critical.57
Next steps to facilitate the adjustment of market practices and regulatory frameworks
While the regulatory frameworks governing financial services have provided a useful starting point in affirmatively managing AI risks that many other sectors do not have,58 it is important to consider how both business practices and regulatory expectations may need to evolve to promote consistently responsible, fair, and inclusive implementation of ML/AI applications in financial services. Given the diversity and speed of evolution in technologies, use cases, market practices, and policy debates, this requires particularly careful balancing between the advantages of consistent baseline standards and the need for tailored approaches. Diversity of backgrounds and disciplines is also critical, given that the issues posed by ML/AI are not simply technical in nature but rather implicate a range of broader economic, policy, and dignitary considerations.
As we look ahead to what actions Treasury and other stakeholders can take to promote responsible innovation and competition in the use of AI for financial services and to protect consumers and other potentially impacted parties from unintended consequences, several actions could help the financial services ecosystem move toward more rapid identification and implementation of best practices and regulatory safeguards:
- Increasing resources to support the production of public research, engagement by historically underrepresented and under-resourced actors, and broad intra- and intersector dialogue: As the Treasury Department has highlighted in its own past reports, technology resources and expertise have a significant impact on the extent which stakeholders can adopt ML/AI applications and engage in the process of refining related market practices and policy frameworks. Considering ways to facilitate public research that is specifically grounded in the regulatory context of financial services as well as to support greater engagement by smaller financial services providers, historically underserved communities and their advocates, civil society and academic organizations, and government agencies (both regulators and law enforcement) is critical to ensuring that ML/AI adoption operates to the benefit of broader populations and the general economy rather than narrower groups of providers or customers. For instance, increasing resources for public research that considers the financial inclusion and health implications of particular use cases and to facilitate the creation of high quality datasets for use by the financial services sector for development and testing could help to increase the knowledge base of all stakeholders about evolving technologies and market practices. Additional guidance and supervisory activity by regulators could help to address the particular challenges that smaller providers face in both adopting new technologies and performing due diligence on vendors.59 Creating opportunities for a broad range of financial services stakeholders to dialogue about emerging issues and to draw upon broader debates about data science and standards in other sectors is also critically important at a time of rapid change.
- Careful consideration of data governance practices and standards: On a related note, questions regarding data quality, accessibility, and protection are fundamental to both the potential benefits and risks of ML/AI applications. Many of the worst headlines concerning ML/AI applications gone wrong relate to flaws in the nature or treatment of underlying data,60 and many of the most promising use cases of ML/AI for financial inclusion also hinge in significant part on the ability to access new data sources.61 Yet data bias and governance issues also arise in contexts that do not involve machine learning in the first instance, and therefore require continuing direct attention in their own right to further refine best practices and regulatory expectations. While federal law provides more detailed and robust protections for consumer financial data than for many other categories of consumer information, many of the consumer financial laws have not had meaningful updates in several decades. They also vary substantially as to what products and services are covered, how they extend to small business owners (if at all), and the level of regulatory monitoring and enforcement.62 Consideration of ways that synthetic data and privacy enhancing technologies can potentially facilitate improvements to model accuracy while avoiding the creation of large lakes of personally identifiable information is particularly critical as fraud pressures mount.
- Review of other risk management and customer protection frameworks that apply to automated decisionmaking: As detailed in FinRegLab’s prior research, model risk management expectations and fair lending requirements have pushed financial services providers to manage affirmatively for performance, fairness, and transparency risks when adopting ML/AI in the context of credit underwriting. However, those regulatory regimes do not apply equally to the entire financial services sector, prompting concerns both about concentration of risk and about level playing fields between competitors. Accordingly, some stakeholders have suggested that imposing basic governance expectations on nonbank financial services providers could be beneficial to the broader ecosystem. Stakeholders are also pointing to ways in which the existing regulatory guidance could be updated and expanded to address topics that are becoming more urgent in the ML/AI era, such as standards for evaluating post hoc explainability tools and for evaluating multiple underwriting models to determine whether they constitute a “less discriminatory alternative” for purposes of disparate impact compliance.63 In evaluating the utility of existing frameworks, one important consideration is whether to focus on machine learning and artificial intelligence specifically, on a broader range of statistical models and automated systems, or on the performance of the general function regardless of the extent it is executed by humans, computers, or some combination of the two.64 Consistent high-level principles or standards may be particularly useful at this stage of evolution, given the definitional challenges discussed above, the fact that technologies are evolving rapidly, and the universal importance of qualities such as accuracy and fairness.
- Broader efforts to increase opportunity, fairness, and economic participation: It is also critical to note that while filling information gaps and adopting more predictive models could help substantial numbers of consumers and small business owners access more affordable credit, such actions will not by themselves erase longstanding disparities in income and assets or recent hardships imposed by the pandemic. These factors will continue to shape whether and how customers access financial services, for instance by affecting the number of loan applicants who are assessed as presenting significant risk of default, which will in turn continue to affect whether they are granted credit and at what price. This underscores the importance of using many initiatives and policy levers to address the deep racial disparities in income and assets at the same time that stakeholders in the financial system continue to explore and implement promising data and modeling technique innovations. While there is reason to believe that the financial system can enhance its ability to provide fair and inclusive products and services, relying solely on it to address these cumulative, structural issues would produce too little change too slowly.
FinRegLab appreciates the opportunity to comment on the use, opportunities, and risks of AI in the financial services sector and to emphasize the importance of the issues of financial inclusion, equity, and consumer protection as the sector navigates advancements in AI technology. We would welcome an opportunity to engage further on these topics.
Endnotes
1 FinRegLab, “Explainability & Fairness in Machine Learning for Credit Underwriting: Policy Analysis” (Dec. 2023); FinRegLab, “Explainability & Fairness in Machine Learning for Credit Underwriting: Policy & Empirical Findings Overview” (July 2023); FinRegLab et al., “Machine Learning Explainability & Fairness: Insights from Consumer Lending” (updated July 2023); FinRegLab, “The Use of Machine Learning for Credit Underwriting: Market & Data Science Context” (Sept. 2021).
2 See, e.g., “Artificial Intelligence and Machine Learning in Financial Services” (Financial Stability Board, November 1, 2017); Ting Huang et al., “The History of Artificial Intelligence” (University of Washington, 2006).
3 “The term ‘artificial intelligence’ or ‘AI’ has the meaning set forth in 15 U.S.C. 9401(3): a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments. Artificial intelligence systems use machine and human––based inputs to perceive real and virtual environments; abstract such perceptions into models through analysis in an automated manner; and use model inference to formulate options for information or action.” 89 Fed. Reg. 50048, 50050 (June 12, 2024).
4 Maghsoud Amiri and Siavash Hekmat, “Banking Fraud: A Customer-Side Overview of Categories and Frameworks of Detection and Prevention,” Journal of Applied Intelligent Systems and Information Sciences 2, no. 2 (December 1, 2021): 58–67; Bonnie G. Buchanan, “Artificial Intelligence in Finance” (Alan Turing Institute, April 2019); PK Doppalapudi et al., “The Fight against Money Laundering: Machine Learning Is a Game Changer” (McKinsey & Company, October 7, 2022); T.J. Horan, “Evolution of Fraud Analytics – An Inside Story,” KDnuggets (blog), March 14, 2014; Oliver Wyman et al., “Artificial Intelligence Applications in Financial Services: Asset Management, Banking and Insurance” (2019); I Venkata Srihith et al., “Trading on Autopilot: The Rise of Algorithmic Trading” 3 (May 1, 2023): 2581–9429. See also U.S. Treasury Department, “Managing Artificial Intelligence-Specific Cybersecurity Risks in the Financial Services Sector” (March 2024).
5 Bonnie G. Buchanan, “Artificial Intelligence in Finance” (Alan Turing Institute, April 2019); CAS Machine Learning Working Party, “Machine Learning in Insurance” (Casualty Actuarial Society, Winter 2022); “Chatbots in Consumer Finance,” Issue Spotlight (Consumer Financial Protection Bureau, June 2023); “Artificial Intelligence and Machine Learning in Financial Services” (Financial Stability Board, November 1, 2017); FinRegLab, “The Use of Machine Learning for Credit Underwriting: Market & Data Science Context,” § 2 (Sept. 2021); Geneva Association, Promoting Responsible Artificial Intelligence in Insurance (2020); Oliver Wyman et al., “Artificial Intelligence Applications in Financial Services: Asset Management, Banking and Insurance” (2019).
6 See FinRegLab, “Explainability & Fairness in Machine Learning for Credit Underwriting: Policy Analysis” (December 2023).
7 See, e.g. “What Is ChatGPT, DALL-E, and Generative AI?” (McKinsey & Company, April 2, 2024); Mark Riedl, “A Very Gentle Introduction to Large Language Models without the Hype,” Medium (blog), May 25, 2023; Foley & Lardner LLP, “ChatGPT: Herald of Generative AI in 2023?” JD Supra, 2023; David De Cremer, Nicola Morini Bianzino, and Ben Falk, “How Generative AI Could Disrupt Creative Work,” Harvard Business Review, April 13, 2023. The content creation process in generative AI relies on transformer models that predict next words. Transformers weigh the importance of different words in a sentence, enabling the model to generate contextually relevant text based on patterns learned in large amounts of sequential data. For instance, auto-fill functions are a low-level version of generative AI that predict the most likely letters or phrases that follow the initial content. Depending on the use case, users can fine-tune these LLMs for specific domains, such as finance, medical, or technical fields, to improve their performance in specialized tasks.
8 See Mark Riedl, “A Very Gentle Introduction to Large Language Models without the Hype,” Medium (blog), May 25, 2023.
9 Min Zhang and Juntao Li, “A Commentary of GPT-3 in MIT Technology Review 2021,” Fundamental Research 1, no. 6 (November 1, 2021): 831–33. Scaling up pre-training data and using LLMs with a high number of parameters helps to improve their performance capability, but can introduce misinformation and biases. Emily M. Bender et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21 (New York, NY, USA: Association for Computing Machinery, 2021), 610–23.
10 See Rem Hida, Masahiro Kaneko, and Naoaki Okazaki, “Social Bias Evaluation for Large Language Models Requires Prompt Variations” (arXiv, July 3, 2024).
11 Extensive adjustments to combat inaccurate and biased responses are very resource intensive, as they generally require humans to continually verify and provide feedback on responses. This can be done at scale by asking users whether they like the response they received, yet relying on user feedback for large language models in the context of financial services would generally be too high-risk due to user error and other factors. An external bad actor could bombard the model with incorrect feedback, for example.
12 See Haoyan Luo and Lucia Specia, “From Understanding to Utilization: A Survey on Explainability for Large Language Models” (arXiv, February 21, 2024); Johannes Schneider, “Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda” (arXiv, April 15, 2024).
13 For example, financial institutions must meet prudential regulators’ model risk management expectations that include model development and validation processes and model governance, policies, and controls. FRB, SR 11-7; Office of the Comptroller of the Currency, “Bulletin 2011-12: Sound Practices for Model Risk Management: Supervisory Guidance on Model Risk Management” (Apr. 4, 2011). Another example is explainability requirements for credit underwriting for the purpose of generating adverse action notices and meeting fair lending requirements. See FinRegLab, “Explainability & Fairness in Machine Learning for Credit Underwriting: Policy Analysis” (Dec. 2023).
14 See, e.g., Brian Bushard, “Workers’ ChatGPT Use Restricted At More Banks—Including Goldman, Citigroup,” Forbes (Feb. 24, 2023); “A Third of Banks Ban Employees from Using Gen AI. Here’s Why,” American Banker, March 18, 2024. Other concerns include the risk that information processed by these models could be retrieved by an external actor, putting customer data and proprietary information at risk. See, e.g., Jaydeep Borkar, “What Can We Learn from Data Leakage and Unlearning for Law?” (arXiv, July 19, 2023).
15 See also “How Can a Bank Pick the AI Model That Suits It Best?” American Banker, September 28, 2023.
16 For instance, Morgan Stanley is launching a chatbot to help their financial advisors quickly access relevant Morgan Stanley research and to provide them with administrative support. Tatiana Bautzer & Lananh Nguyen, “Morgan Stanley to Launch AI Chatbot to Woo Wealthy,” Reuters (Sept. 7, 2023). Intuit has announced a new generative AI financial assistant to provide small businesses and consumers with personalized information to make more informed financial decisions. Intuit, Press Release, “Introducing Intuit Assist: The Generative AI-Powered Financial Assistant for Small Businesses and Consumers” (Sept. 6, 2023).
17 For broader discussions of hallucination in language models, see “Vectara/Hallucination evaluation model,” Hugging Face, accessed August 12, 2024 and Lei Huang et al., “A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions” (arXiv, November 9, 2023).
18 Patrice Béchard and Orlando Marquez Ayala, “Reducing Hallucination in Structured Outputs via Retrieval- Augmented Generation” (arXiv, April 11, 2024).
19 For example, the BERT base model has 110 million parameters and the BERT large model has 340 million parameters, both of which are substantially smaller than LLaMA’s variants that range from 7 billion to 70 billion parameters. For comparison between language models, see “Large Language Models: A New Moore’s Law?” accessed August 2, 2024; Celeste Mottesi, “GPT-3 vs. BERT: Comparing the Two Most Popular Language Models,” accessed August 2, 2024. Language models like BERT, GPT, and other transformer-based models are designed to be context-aware, meaning they consider the context of a word or phrase within a sentence or passage to generate more accurate and relevant outputs. For more information, see Ashish Vaswani et al., “Attention Is All You Need” (arXiv, August 1, 2023).
20 For example, “Show and Tell: Anural Caption Generator” (Vinyal et al 2015) is a use case for text generation using RNN. https://arxiv.org/abs/1411.4555. Also Andre Karpathy’s blog post on “The Unreasonable Effectiveness of Recurrent Neural Networks” showcases a project where an RNN is trained on various text datasets, including Shakespeare’s works. https://www.datasciencecentral.com/the-unreasonable-effectiveness-of-recurrent-neural- networks
21 “What Is Chatbot Design?” IBM, August 30, 2023.
22 “Chatbots in Consumer Finance,” (Consumer Financial Protection Bureau, June 6, 2023).
23 Ibid.
24 “BofA’s Erica Surpasses 2 Billion Interactions, Helping 42 Million Clients Since Launch,” Bank of America, April 8, 2024.
25 “About Us,” Kasisto, accessed August 2, 2024.
26 Hugh Son, “JP Morgan Is Unleashing Artificial Intelligence on a Business That Moves $5 Trillion for Corporations Every Day,” CNBC, June 20, 2018.
27 For a discussion of why inconsistencies occur in LLMs, see Bill Franks, “No, That Is Not A Good Use Case For Generative AI!,” Analytics Matters (blog), August 8, 2023. Studies also highlight the challenges of factual inconsistencies and propose methodologies to detect and mitigate these issues. Liyan Xu et al., “Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy” (arXiv, June 19, 2024).
28 The Writer Team, “‘Big Sis Energy:’ How Cleo Built a Gen Z Brand,” Writer (blog), April 16, 2021.
29 “AI Meets Money,” Cleo, accessed August 2, 2024.
30 Benj Pettit, “Can We Sub in GPT-4 Anytime You Chat to Cleo?,” accessed August 2, 2024.
31 Deniz Lefkeli, Mustafa Karataş, and Zeynep Gürhan-Canli, “Sharing Information with AI (versus a Human) Impairs Brand Trust: The Role of Audience Size Inferences and Sense of Exploitation,” International Journal of Research in Marketing 41, no. 1 (March 1, 2024): 138–55.
32 Johanna Habicht et al., “Closing the Accessibility Gap to Mental Health Treatment with a Personalized Self- Referral Chatbot,” Nature Medicine 30, no. 2 (February 2024): 595–602.
33 Jianna Jin, Jesse Walker, and Rebecca Walker Reczek, “Avoiding Embarrassment Online: Response to and Inferences about Chatbots When Purchases Activate Self-Presentation Concerns,” Journal of Consumer Psychology n/a, no. n/a, accessed August 2, 2024.
34 See, e.g., Bengt Starrin, Cecilia Åslund, and Kent W. Nilsson, “Financial Stress, Shaming Experiences and Psychosocial Ill-Health: Studies into the Finances-Shame Model,” Social Indicators Research 91, no. 2 (April 1, 2009): 283–98; “Two-Thirds of Americans Have Decreased Spending Due to Economy, Wells Fargo Money Study Finds,” accessed August 2, 2024.
35 Daniel Sznycer et al., “Shame Closely Tracks the Threat of Devaluation by Others, Even across Cultures,” Proceedings of the National Academy of Sciences 113, no. 10 (March 8, 2016): 2625–30.
36 Joe J. Gladstone et al., “Financial Shame Spirals: How Shame Intensifies Financial Hardship,” Organizational Behavior and Human Decision Processes 167 (November 1, 2021): 42–56.
37 Mathilde H. A. Bastiansen, Anne C. Kroon, and Theo Araujo, “Female Chatbots Are Helpful, Male Chatbots Are Competent?” Publizistik 67, no. 4 (November 1, 2022): 601–23; Sylvie Borau et al., “The Most Human Bot: Female Gendering Increases Humanness Perceptions of Bots and Acceptance of AI,” Psychology and Marketing 38, no. 7 (July 2021): 1052–68.
38 Nicole Davis et al., “I’m Only Human? The Role of Racial Stereotypes, Humanness, and Satisfaction in Transactions with Anthropomorphic Sales Bots,” Journal of the Association for Consumer Research 8, no. 1 (January 2023): 47–58.
39 Consumer Financial Protection Bureau, “Chatbots in Consumer Finance,” (June 6, 2023).
40 Berkan Oztas et al., “Transaction Monitoring in Anti-Money Laundering: A Qualitative Analysis and Points of View from Industry,” Future Generation Computer Systems 159, pages 161-171 (October 2024).
41 Maghsoud Amiri and Siavash Hekmat, “Banking Fraud: A Customer-Side Overview of Categories and Frameworks of Detection and Prevention,” Journal of Applied Intelligent Systems and Information Sciences 2, no. 2 (December 1, 2021): 58–67; Bonnie G. Buchanan, “Artificial Intelligence in Finance” (Alan Turing Institute, April 2019); PK Doppalapudi et al., “The Fight against Money Laundering: Machine Learning Is a Game Changer” (McKinsey & Company, October 7, 2022); T.J. Horan, “Evolution of Fraud Analytics – An Inside Story,” KDnuggets (blog), March 14, 2014.
42 See, e.g., Ryan Browne, “Mastercard Jumps into Generative AI Race with Model It Says Can Boost Fraud Detection by up to 300%,” CNBC, February 1, 2024; “Visa Uses Generative AI to Catch Suspicious Financial Transactions,” accessed August 2, 2024.
43 See “Leveraging Generative AI (GenAI) for Fraud Detection and Prevention,” accessed August 2, 2024; “Synthetic Data Generation: Definition, Types, Techniques, & Tools,” accessed August 2, 2024.
44 “Bankers Think AI Is an Anti-Fraud Lifeline. They’re Gravely Mistaken,” American Banker, July 31, 2024.
45 [Add cite to Treasury report section 3.] For generative AI use in the production of fake IDs, see, e.g., Joseph Cox, “An Instant Fake ID Factory,” 404 Media, February 5, 2024. The lack of security of voice authentication systems has been raised by the Senate Banking Committee and has been echoed by industry stakeholders that FinRegLab has interviewed. United States Committee on Banking, Housing, and Urban Affairs, “Brown Presses Banks on Voice Authentication Services,” May 4, 2023.
46 For instance, security systems can detect the use of virtual cameras for liveness checks and even analyze miniscule changes in skin color to detect a user’s heartbeat to guard against video deep fakes. See Hua Qi et al., “DeepRhythm: Exposing DeepFakes with Attentional Visual Heartbeat Rhythms” (arXiv, August 26, 2020). For a discussion of the detection of voice deep fakes, see Lauren Leffer, “AI Audio Deepfakes Are Quickly Outpacing Detection,” Scientific American, accessed August 2, 2024.
47 “2021 FDIC National Survey of Unbanked and Underbanked Households” (Federal Deposit Insurance Corporation, October 2022). This number is likely an undercount because many of those with little identity documentation are housing insecure or otherwise difficult to survey.
48 See, e.g.,Tara Siegel Bernard and Ron Lieber, “Banks Are Closing Customer Accounts, With Little Explanation,” The New York Times, April 8, 2023, sec. Your Money; Penny Crosman, “Rushed Anti-Money-Laundering Calls Backfire. Can AI Help?” American Banker, December 4, 2023.
49 “Serving Communities of Color: A Staff Report on the Federal Trade Commission’s Efforts to Address Fraud and Consumer Issues Affecting Communities of Color” (Federal Trade Commission, October 2021).
50 “2023 Consumer Impact Report” (Identity Theft Resource Center, 2023); “Understanding Identity Crimes in Black Communities: Phase One” (Identity Theft Resource Center, 2023).
51 The Identity Theft Resource Center reported a 78 percentage point increase in data breaches from 2022 to 2023. “Identity Theft Resource Center 2023 Annual Data Breach Report Reveals Record Number of Compromises; 72 Percent Increase Over Previous High,” ITRC, accessed August 2, 2024. The Financial Crimes Enforcement Network that financial institutions reported $212 billion in suspicious activity related to identity in 2021 Bank Secrecy Act filings. “Identity-Related Suspicious Activity: 2021 Threats and Trends,” Financial Trend Analysis (Financial Crimes Enforcement Network, January 2024).
52 Jason Huang, “SlopeGPT: The First Payments Risk Model Powered by GPT,” Slope Stories (blog), April 9, 2024. As between the two forms of generative AI, BERT is often used for extracting specific keywords, while LLMs can handle a wider range of applications.
53 Alex Wu, “Slope TransFormer: The First LLM Trained to Understand the Language of Banks,” Slope Stories (blog), April 9, 2024.
54 Joy Guo, “What Does Nubank’s Acquisition of Hyperplane Mean for the Future of AI in Fintech,” Typeshare.Co (blog), July 11, 2024.
55 “The Vertical AI for Banking,” accessed August 2, 2024.
56 Vijayakumar G A, “Unlock Innovation & Cut Costs: How Generative AI Transforms Your Legacy Apps,” Application Modernisation with Generative AI (blog), May 6, 2024; Behrooz Omidvar Tehrani, Ishaani M, and Anmol Anubhai, “Evaluating Human-AI Partnership for LLM-Based Code Migration,” in Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, CHI EA ’24 (New York, NY, USA: Association for Computing Machinery, 2024), 1–8. Rahul Agarwal et al., “The Future of Generative AI in Banking,” McKinsey & Company, March 1, 2024; Alex Johnson, “The Generative AI Revolution in Banking Is Perfectly Timed,” Fintech Takes, October 21, 2023. For a more specific use case, see Lucas Chapin, “How We Built Our First AI Prototype,” accessed August 6, 2024. For a general overview of using generative AI to leverage unstructured data outside of the financial services context, see Eilon Reshef, “Council Post: How AI Can Unlock The Power of Unstructured Data,” Forbes, accessed August 5, 2024.
57 For a discussion of transformer models, see Ashish Vaswani et al., “Attention Is All You Need” (arXiv, August 1, 2023). For discussion of simpler models such as Word2Vec for embedding or CNNs and RNNs for classification tasks, see Tomas Mikolov et al., “Efficient Estimation of Word Representations in Vector Space” (arXiv, September 6, 2013);. Yoon Kim, “Convolutional Neural Networks for Sentence Classification” (arXiv, September 2, 2014).
58 For this reason, financial services risk management frameworks have been cited by the National Institute of Standards & Technology and other stakeholders focusing on ML/AI adoption across a wide variety of markets and use cases. See, e.g., NIST, “AI Risk Management Framework Playbook,” 2023; Michael Richards (U.S. Chamber of Commerce Technology Engagement Center), “Re: AI Risk Management Framework Request for Information,” September 15, 2021.
59 “Explainability & Fairness in Machine Learning for Credit Underwriting: Policy & Empirical Findings Overview” (FinRegLab, July 2023), § 5.3.
60 See, e.g., Gianluca Mauro and Hilke Schellmann, “‘There Is No Standard’: Investigation Finds AI Algorithms Objectify Women’s Bodies,” The Guardian, February 8, 2023, sec. Technology; Janus Rose, “Facebook’s New AI System Has a ‘High Propensity’ for Racism and Bias,” Vice (blog), May 9, 2022; Leonardo Nicoletti and Dina Bass Technology + Equality, “Humans Are Biased. Generative AI Is Even Worse,” Bloomberg, July 2, 2024; Steve Lohr, “Facial Recognition Is Accurate, If You’re a White Guy,” The New York Times, February 9, 2018, sec. Technology; Ed Yong, “A Popular Algorithm Is No Better at Predicting Crimes Than Random People,” The Atlantic (blog), January 17, 2018; Starre Vartan, “Racial Bias Found in a Major Health Care Risk Algorithm,” Scientific American, October 24, 2019.
61 The combination of more inclusive data sources and ML/AI applications also holds promise in identity verification for purposes of other financial products and services. Kathleen Yaworksy, Dwijo Goswami, and Prateek Shrivastava, “Unlocking the Promise of (Big) Data to Promote Financial Inclusion,” Accion Insights (Accion, March 2017)..
62 See generally “Consumer Financial Data: Legal and Regulatory Landscape” (Financial Health Network, Flourish, FinRegLab, and Mitchell Sandler, n.d.).
63 See generally “Machine Learning Policy & Empirical Findings Overview.”
64 See generally Jonas Schuett, “Defining the Scope of AI Regulations,” Law, Innovation and Technology 15, no. 1 (January 2, 2023): 60–82.
About FinregLab
FinRegLab is an independent, nonprofit organization that conducts research and experiments with new technologies and data to drive the financial sector toward a responsible and inclusive marketplace. The organization also facilitates discourse across the financial ecosystem to inform public policy and market practices. To receive periodic updates on the latest research, subscribe to FRL’s newsletter and visit www.finreglab.org. Follow FinRegLab on LinkedIn and Twitter (X).
FinRegLab.org | 1701 K Street Northwest, Suite 1150, Washington, DC 20006