The Subprime Lending Data Exploration Project is a “big data” project designed to produce policy-relevant insights using an administrative data set that covers nearly 50 million individuals who have applied for or used subprime credit. The data set contains information on borrower demographics, loan types and terms, account types and balances, and repayment histories. To investigate whether there were distinct groups of borrowers in terms of loan usage patterns and outcomes, we used a data discovery process called K-means clustering.
To improve outcomes among high-interest borrowers, policymakers need to understand what is driving usage. This second post in MDRC’s Reflections on Methodology series discusses how a data discovery process revealed clusters of borrowers who differed greatly in the kinds of loans and lenders they used and in their loan outcomes.