Big Data takes a Big Step: CFPB Offers Insight into Its Fair Lending Proxy Methodology

By: Melanie BrodyAnjali Garg*
*Ms. Garg is not admitted in D.C. She is supervised by Stephanie Robinson, a member of the D.C. Bar.

The Home Mortgage Disclosure Act requires residential mortgage lenders to collect race and ethnicity information about loan applicants, and lenders, regulators and others routinely use this information to statistically evaluate whether there is a risk that a lender has discriminated against borrowers on a prohibited basis. With regard to other types of credit, with respect to which federal law generally prohibits the collection of demographic information, lenders and other interested parties must impute credit applicants’ race and ethnicity using proxies. For example, a lender could use the racial composition of the census tract in which a consumer resides to assign an assumed race to the consumer. Although proxying provides a way to evaluate fair lending risk in the absence of actual demographic data, there historically has not been a generally-accepted methodology for performing the proxy process, and this has made it particularly challenging to evaluate fair lending compliance for non-mortgage credit products.

On September 17, 2014 the CFPB released a white paper that may help address this challenge. The white paper — “Using publicly available information to proxy for unidentified race and ethnicity” — outlines the methodology and computer code that the Bureau uses to proxy for race and ethnicity in its fair lending program. The Bayesian Improved Surname Geocoding (“BISG”) proxy method is used by the Bureau’s Office of Research and the Division of Supervision, Enforcement, and Fair Lending to determine the race and ethnicity of a consumer where lenders are not otherwise allowed to collect the consumer’s demographic information. The model assesses the probability that a consumer is of a particular race or ethnicity by analyzing the consumer’s geography and surname. The white paper outlines the data sources the CFPB uses, how the BISG method works, and the CFPB’s rationale for using this method.

Data Sources

CFPB obtains surname data from the 2000 Census (released in 2007), which lists 151,671 surnames and the percentage of individuals from different racial and ethnic groups that identify as having that surname. The Bureau uses geographic data from the 2010 Census based on the race and ethnicity information of adults by census block group, census tract, and 5-digit zip codes. Note that the CFPB will use updated Census data as it becomes available.

BISG Methodology

In order to run the proxy model, the CFPB uses the consumer’s residence address and surname. The white paper details how the CFPB “scrubs” the data above in order to run the BISG model. For example, in order to determine the probability that a particular surname belongs to a particular racial or ethnic group, the CFPB looks to the percentage of individuals with that surname that identify as that race or ethnicity. CFPB uses the program ArcGIS Version 10.1 with Street Map Premium 2011 Release 3 to geocode the consumer’s address.

The white paper includes the Bayes Theorem that is used to calculate the probability that an individual with a particular surname, residing in a particular geography, belongs to a particular racial or ethnic group. The Bureau has also made available the statistical software code for the BISG proxy.

CFPB’s Rationale

The white paper explains in detail the CFPB’s rationale for using the BISG methodology, and outlines how the CFPB tested the model for accuracy based on home mortgage data. The CFPB notes that the BISG proxy is more accurate in approximating race and ethnicity than traditional proxy models, except in probabilities for Asian/Pacific Islander and Multiracial consumers. The CFPB uses the BISG probability directly and, unlike some practitioners, does not use a “threshold rule” to remove uncertainty about racial or ethnic group. The CFPB notes that a threshold rule removes statistical precision and could result in underestimating the number of consumers in a particular group.

Impact on Lenders

The CFPB emphasizes that the methodology in the white paper is not a requirement for how lenders should analyze their own portfolios. However, the methods set forth in the white paper do offer lenders the opportunity to mimic the CFPB’s methods and better understand the Bureau’s fair lending oversight strategy outside of the mortgage originations context. The CFPB is open to continuing a dialogue regarding the methodology, which may come as good news to lenders using different methods to proxy for race and ethnicity. However, as the white paper and recent settlements indicate, the BISG proxy model is here to stay.

The white paper was released on the same day as a proposed rule that would enable the Bureau to oversee nonbank auto finance companies. Stay tuned for our analysis of the proposed rule.

Leave a Reply

Copyright © 2019, K&L Gates LLP. All Rights Reserved.