Research Advances: Using Cluster Random Assignment
Evaluation findings are only as credible as the research designs and data that underlie them. Ever since its founding more than 35 years ago, MDRC has been known for its use of randomized controlled trials to measure the effects of social and educational policy initiatives. Widely accepted as the “gold standard” of evaluation designs, a randomized controlled trial, if properly implemented, yields the most robust and credible estimates of a program’s effects. Typically, in this design, individuals who are eligible for a program are randomly assigned one by one to a treatment group whose members receive the specified intervention or to a control group whose members are embargoed from the program but free to receive other available services. If random assignment is done correctly, the members of both groups share the same readily measureable traits (like gender, education, and economic status) — as well as important harder-to-measure characteristics (like motivation). When the two groups are followed up over time, the differences in their outcomes therefore provide a reliable measure of the program’s effects — or impacts.
But there are circumstances in which random assignment of individuals is infeasible, undesirable, or both. MDRC has also led the way in developing and using alternative research designs that will yield the strongest possible evidence when random assignment of individuals is not the design of choice. In this methodology issue focus, the first in a series, we explain one such design, cluster — or group — random assignment. Under the leadership of Chief Social Scientist Howard Bloom, MDRC has been at the forefront of both the theoretical refinement and practical use of this methodology.
What Is Cluster Random Assignment? Why Use It?
As the name suggests, cluster random assignment means the random assignment of whole groups, or clusters, of people. The groups in question may be organizations, like schools or hospitals or businesses, or they may be geographically defined, like neighborhoods or even cities. These groups typically exert an influence of some kind on their individual members.
Like random assignment of individuals, random assignment of groups yields unbiased conclusions about program impacts, and there are a number of circumstances in which random assignment of groups may be the preferred option.
First, the program services being tested may be directed toward everyone in the group. Thus, a whole-school reform effort, for example, may seek to change the everyday practices of all administrators, teachers, staff, and students in a school; an effort to boost employment in a public housing development may be aimed at all of the development’s residents, not just selected residents.
Second, even if the services are not directed toward everyone, they may have “spillover” effects that would make a fair test of the services impossible. Suppose, for example, that a study called for half the teachers in a set of schools to be randomly selected to receive a certain kind of professional development, the other half to be assigned to a control group, and the instructional practices of both groups of teachers to be measured over time. It is likely that such a study would understate the effects of the professional development. Why? Because teachers (usually) talk with one another; they share what they are doing in their classes and how students are responding. So even if only some teachers received the professional development, other teachers would also learn about it, albeit at second hand. The diffusion of the key concepts and practices associated with the treatment among those not formally slated to receive that treatment — known as “contamination” in research parlance — would make a clean test of program impacts impossible.
Finally, in some situations, school officials or other administrators might not agree to random assignment of individuals but might permit an entire organization, such as a school, to be part of a study that randomized organizations. In these cases, cluster random assignment may be the only option.
Improving Cluster Random Assignment Methodology
MDRC has accumulated a wealth of experience and knowledge in the design and analysis of cluster random assignment experiments. Howard Bloom has authored and co-authored with colleagues inside and outside of MDRC a number of publications about cluster randomized trials that are read by graduate students at major universities and by researchers in public, private, and nonprofit organizations, and that are cited as recommended reading in requests for proposals (RFPs) released by foundations and federal agencies.
In an experiment, the difference in outcomes between the treatment group and the control group is an estimate of the program’s true impact. If a different mix of individuals had been randomly assigned to a study’s treatment and control groups, a somewhat different impact estimate would have been obtained. This reality, which can be referred to as “randomization error,” creates uncertainty about whether the estimated impact is the true impact of the intervention. The standard error of the impact estimate attaches a numerical value to this uncertainty. In an experiment involving individuals, the greater the variation in their outcomes, the more sensitive the impact estimate will be to just who is in the treatment group and control group and the larger the standard error will be. Larger standard errors make it harder to determine that an impact is statistically significant — that is, reflective of a real difference between the treatment and control groups, rather than likely to have arisen as a result of chance. One way to reduce the standard error is to increase the number of individuals in the sample, since the larger the sample, the more likely it is that the treatment and control groups will be substantially identical.
Until relatively recently, it was common for studies that used cluster random assignment to analyze the resulting data as if individuals rather than groups had been randomly assigned — a procedure that led to erroneous conclusions. But because individuals are “nested” within the unit of randomization — the group or cluster — at least two sources of sampling error enter the picture: one resulting from variation in outcomes among groups in the treatment and control samples and one resulting from variation in outcomes among individuals within each group. Unless both kinds of sampling error are included in the standard error, investigators may wrongly decide that a program is making a significant difference when, in fact, it is not. The strategy that Bloom and other leading social scientists employ in cluster randomized trials — referred to as “multilevel modeling” or as “hierarchical modeling” — takes account of both sources of sampling error in producing impact estimates.
The presence of two sources of sampling error means that the impact estimates produced by a cluster randomized trial, although unbiased, will inevitably be less precise than the impact estimates produced by a randomized trial involving the same number of ungrouped individuals. This is a key factor that MDRC researchers consider when they design cluster randomized trials, particularly when they determine the number of groups (e.g., schools, agencies) and the number of individuals in each group needed to produce impacts that are both statistically significant and large enough to be policy-relevant.
To come up with the right sample sizes, it is important to know something about the extent of variation that exists both among and within the units to be randomized. Imagine, for example, two scenarios in which schools are the entities being randomly assigned to treatment or control conditions. In the first scenario, there is considerable variation in student performance within each school at the start of the experiment, but average performance is quite similar across all the schools. In the second scenario, the situation is just the reverse: at the start of the experiment, students within a school perform vary similarly, but there is considerable variation in performance among the schools. Especially in this latter scenario with only a small number of schools in the sample, impact estimates will be highly sensitive to just which schools are in the treatment group and which are in the control group.
Just as in experiments involving the random assignment of individuals, in which precision is improved by increasing the number of individuals in the sample, so in cluster randomized trials, precision is improved mainly by increasing the number of clusters that are randomized. And the greater the variation among (as opposed to within) the clusters, the greater the number of units that will need to be randomized to help ensure that effects of a given magnitude are statistically significant. Increasing the number of individuals within the clusters also improves the precision of the estimate, but to a much smaller extent than increasing the number of clusters.
Calculating the right sample sizes requires preexisting data about the extent of intra-school and inter-school variation. Bloom, along with colleagues inside and outside MDRC, has undertaken empirical work to estimate the proportion of total variation attributable to within-school rather than between-school differences. To do this, they have made use of a database MDRC has compiled that contains information from six major urban school districts. This empirical analysis, which is ongoing, makes it possible to put cluster random assignment theory into practice.
Bloom and his associates have also given considerable attention to other issues associated with cluster random assignment: how to allocate clusters between the treatment and control conditions for maximum precision; how to increase the similarities between clusters in the treatment and control groups through pre-randomization matching; how to analyze results for subgroups defined by program, cluster, or individual characteristics; and how to account for mobility into and out of the clusters.
What Are Some Examples of Cluster Random Assignment Studies?
MDRC has used cluster random assignment in a variety of settings. For example, in our evaluation of Achieve, an employer-based program for reducing job turnover rates among low-wage workers in the health care industry, 44 health care firms in Cleveland that volunteered to participate in the study were randomly assigned to program and control groups.
Schools provide especially opportune settings for cluster random assignment because many educational reforms aim to change the culture and practices of the school as a whole, or to affect the learning of students at the classroom level. MDRC together with its colleagues in other organizations has undertaken several evaluations, funded by the U.S. Department of Education, that called for school-level randomization. In the Middle School Mathematics Professional Development Impact Study, 77 mid- to high-poverty middle schools in 12 districts were randomly assigned either to treatment or control conditions. In the control schools, teachers received the professional development services normally delivered by their districts; in the treatment schools, teachers received more intensive professional development, in the form of a summer institute, five day-long seminars during the school year, and two days of coaching following each seminar. In the Reading Professional Development Impact Study, which was targeted toward teachers of second-graders, 90 elementary schools in six districts were randomly assigned to one of three conditions: a “business-as-usual” control group (with teachers at these schools receiving only the usual professional development provided by their districts) or two treatment groups that received alternative models of professional development. As in the Mathematics Professional Development Impact Study, one group participated in a 40-hour summer institute and several one-day seminars during the school year and also received intensive coaching; the second group took part in the summer institute and the seminars, but did not receive coaching.
Community colleges, too, have begun to sign on for cluster random assignment studies. In MDRC’s evaluation of the South Texas College Beacon Mentoring Program, 83 sections of developmental (remedial) math or college algebra were randomly assigned either to receive a mentor to be part of the control group. The mentors were college employees who made short presentations in the treatment-group classes about services available on campus to help students succeed; they also worked with professors to identify struggling students and offer them help early on.
Cluster random assignment has become a valuable and frequently-used tool in the program evaluation toolkit. MDRC is proud to have played a role in building appreciation and understanding of this methodology throughout the evaluation community.
MDRC resources on cluster or group random assignment:
Finite Sample Bias from Instrumental Variables Analysis in Randomized Trials by Howard S. Bloom, Pei Zhu, and Fatih Unlu (August 2010)
New Empirical Evidence for the Design of Group Randomized Trials in Education by Robin Jacob, Pei Zhu, and Howard S. Bloom (December 2009)
Empirical Issues in the Design of Group-Randomized Studies to Measure the Effects of Interventions for Children by Howard S. Bloom, Pei Zhu, Robin Jacob, Stephen Raudenbush, Andres Martinez, and Fen Lin (July 2008)
The Core Analytics of Randomized Experiments for Social Research by Howard S. Bloom (August 2006)
Using Covariates to Improve Precision: Empirical Guidance for Studies That Randomize Schools to Measure the Impacts of Educational Interventions by Howard S. Bloom, Lashawn Richburg-Hayes, and Alison Rebeck Black (November 2005)
Sample Design for an Evaluation of the Reading First Program by Howard S. Bloom (March 2003)
Using Cluster Random Assignment to Measure Program Impacts: Statistical Implications for the Evaluation of Education Programs by Howard S. Bloom, Johannes M. Bos, Suk-Won Lee (August 1999)