Reflections on Methodology
The Value of Predictive Analytics and Machine Learning to Predict Social Service Milestones
Lessons Learned from Career Pathways and Child First
Social services programs are increasingly looking for ways to forecast which participants are likely to reach major milestones so they can tailor services and allocate resources. In recent years, some programs have explored advanced predictive modeling approaches that harness potentially millions of data points and may incorporate machine learning: a variety of algorithms that determine relationships between prediction measures and the outcome.
While there is potential for social service programs to use advanced models, MDRC’s Center for Data Insights (CDI) has found that such methods are not always better at making reliable predictions and come with trade-offs, as they are difficult to explain and can be expensive to develop and maintain.
This post outlines CDI’s approach to predictive analytics, using illustrations from two studies: Career Pathways, a workforce training program, and Child First, a home visiting program.
Before deploying a predictive model, CDI first assesses whether the model makes reliable predictions, often using historical data. In doing so, a research team works with a program to understand how predictions might be used and what it might cost to implement complex methods. The goal is to understand how well progress toward milestones can be predicted using a simple approach and whether advanced methods add value. This process has five steps.
Step 1: Clearly define what milestone will be predicted, for whom the prediction will be made, and when it should be made.
Predicted milestones should closely relate to problems a program wants to address and be made for participants with information similar to past participants who have reached the milestone. Predictions should be made when the program still has time to affect outcomes.
The Career Pathways research team wanted to predict at program entry a combination of milestones representing success: having completed or being still enrolled in training, and being employed in a job after 15 months in the program.
The Child First research team wanted to predict upon program entry which families were at risk of being in the program for fewer than 90 days, a common problem in the home visiting field.
Step 2: Determine the appropriate data needed to make predictions.
Models should use measures that are available before predictions are made, and that are collected and defined in the same way for past and future participants. Since the choice of measures might introduce bias by using data correlated with protected classes such as race, gender, or age, it is critical to consider how prediction mistakes can affect participants.
The Career Pathways research team aimed to predict program success and used measures collected upon program entry. Data for about 50 variables were available for 5,566 participants.
The Child First research team aimed to predict the likelihood of family retention and used intake data collected on participants. Data for over 100 variables were available for 3,750 families.
Step 3: Specify a “benchmark” prediction method.
The benchmark will be compared with more complex prediction methods to gauge how well those more complex methods are performing. Typically, a benchmark predictor method consists of an indicator or indicators that the program already uses or a regression model with a few measures.
Career Pathways used three different indicators as benchmarks. The first benchmark model predicted that a student would succeed if that student had a high school diploma or equivalent, the second model predicted success if the student had been recently employed, and the third model predicted success if the student experienced relatively few barriers to employment at program entry.
The Child First benchmark prediction used a regression model with a set of programmatically important variables that were identified collaboratively with the program.
Step 4: Specify additional prediction methods that are gradually more complex.
Prediction models can be made more complex by adding predictors or incorporating machine learning. An incremental approach to model building helps to isolate how much increased complexity improves predictions.
Data are typically partitioned into a “training” and a “testing” set to examine how well the prediction methods perform. Research teams typically cross-validate a few well-performing models from “training” data to see how well the models perform with unseen “testing” data.
For Career Pathways, the team first specified a regression model with the three baseline indicators, then added predictors to create successively richer regression models. The team also tested machine learning models using the same predictors.
For Child First, the team used a similar approach, first increasing the complexity of regression models, and then using machine learning algorithms. The team cross-validated three models: the benchmark prediction method, a “hybrid” model using machine learning and the benchmark predictors, and an “advanced” model using machine learning and all data.
Step 5: Evaluate and compare model results.
Model performance can be evaluated using a variety of metrics. The choice of a metric may depend on the interests of the program and the milestone being predicted. Researchers should also evaluate metrics at different thresholds since they can change results.
When comparing results, there are trade-offs to consider. More complex models may predict better but be less transparent. Whether the trade-offs are worthwhile depends on the context in which the predictions are made. Additionally, programs must also consider the potential for bias in the data, which can lead to biased models. It is critical to consider how decisions could be affected by predictions and how prediction mistakes can affect participants.
For Career Pathways, the team assessed predictive performance using the F0.5 score because of its sensitivity to false positives, which were important for the study to reduce. The team found only marginal differences in performance between the benchmark prediction and the most complex regression and machine learning models. The team decided that the marginal benefits did not warrant the costs associated with increased complexity, including staff resources, decreased transparency, and potential bias in the algorithm.
For Child First, the team wanted to minimize both false positive and false negative predictions, since false positives result in unnecessary expenditure of limited resources and false negatives result in families missing out on interventions. These goals led them to assess predictive performance using precision and sensitivity. The team found that the benchmark model predicted the outcome as well as more advanced models, but none of the models had high levels of both precision and sensitivity. For that reason, the team did not recommend that the Child First program use predictive modeling.
Although both studies found that the benchmark methods predicted milestones almost as well as advanced prediction methods and that machine learning added little value, these studies had access to much less data than is typically used in sectors that have found more success with predictive analytics. Whether more advanced predictive methods perform better than simpler methods is an open question and can vary by case. However, the default assumption should not be that more advanced methods are necessary or better. Ultimately, teams and social service programs need to consider the trade-offs between the predictive performance of machine learning models and their added costs, before they implement such methods.
An infographic on the Career Pathways predictive analytics study is forthcoming. Home visiting models provide information, resources, and support to expectant parents and families with young children—typically infants and toddlers—in their homes. For more about the Child First predictive analytics study, see Samantha Xia, Zarni Htet, Kristin Porter, and Meghan McCormick, “Exploring the Value of Predictive Analytics for Strengthening Home Visiting: Evidence from Child First” (New York: MDRC, 2022).
See Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, An Introduction to Statistical Learning (New York: Springer, 2013).
See Jonas Wanner, Lukas-Valentin Herm, Kai Heinrich, and Christian Janiesch, “Stop Ordering Machine Learning Algorithms by Their Explainability! An Empirical Investigation of the Tradeoff between Performance and Explainability,” pages 245–258 in Dennis Denehy, Anastasia Griva, Nancy Pouloudi, Yogesh K. Dwivedi, Ilias Pappas, and Matti Mäntymäki (eds.), Responsible AI and Analytics for an Ethical and Inclusive Digitized Society (New York: Springer, Cham, 2021).
“Precision” captures the proportion of correct predictions that are predicted to be positive. “Recall,” also called “sensitivity,” captures the proportion of correct predictions out of all cases that are positive. The F0.5 score factors in both precision and sensitivity and weights the assessment of accuracy in favor of precision.