Researchers across disciplines have long taken advantage of natural experiments to study the effects of policies at full scale. In the past decade rapid growth in the number of charter schools and school district choice systems has provided education researchers with exciting opportunities to do the same: to use naturally occurring pockets of randomization to rigorously study the effects of policy-relevant education reforms that are already in place, often on a large scale.
Many researchers are concerned about a crisis in the credibility of social science research because of insufficient replicability and transparency in randomized controlled trials and in other kinds of studies. In this post we discuss some of the ways that MDRC strives to address these issues and ensure the rigor of its work.
Researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can increase the likelihood of spurious findings: that is, finding statistically significant effects that do not in fact exist. Without the use of a multiple testing procedure (MTP) to counteract this problem, the probability of false positive findings increases, sometimes dramatically, with the number of tests. Yet the use of an MTP can result in a substantial change in statistical power, greatly reducing the probability of detecting effects when they do exist.
The Subprime Lending Data Exploration Project is a “big data” project designed to produce policy-relevant insights using an administrative data set that covers nearly 50 million individuals who have applied for or used subprime credit. The data set contains information on borrower demographics, loan types and terms, account types and balances, and repayment histories. To investigate whether there were distinct groups of borrowers in terms of loan usage patterns and outcomes, we used a data discovery process called K-means clustering.
Across policy domains, practitioners and researchers are benefiting from a trend of greater access to both more detailed and frequent data and the increased computing power needed to work with large, longitudinal data sets. There is growing interest in using such data as a case management tool, to better understand patterns of behavior, better manage caseload dynamics, and better target individuals for interventions. In particular, predictive analytics — which has long been used in business and marketing research — is gaining currency as a way for social service providers to identify individuals who are at risk of adverse outcomes. MDRC has multiple predictive analytics efforts under way, which we summarize here while highlighting our methodological approach.