An Evaluability Assessment of the Toyota Families in Schools Program


The Toyota Families in Schools (TFS) program is a new family literacy initiative developed by the National Center for Family Literacy (NCFL), with support from the Toyota Motor Corporation. The program adopts and adapts a model that NCFL has used earlier in programs for preschoolers and their parents; in TFS, the participating children are between the ages of 5 and 12 and attend Title I schools serving largely low-income populations. TFS was first implemented during the 1998–1999 academic year in three elementary schools in each of five cities (“sites”) across the country; during the 1999–2000 academic year, 15 schools in another five cities were added, and by the end of the three-year demonstration period, the program will be in place at 45 schools in 15 cities. During the first year, participation in TFS was voluntary in all except one site; there, those parents who received TANF were required to attend TFS (or another approved program) in order to satisfy the requirements of the local welfare-to-work program.

TFS seeks to influence a broad range of outcomes for children and parents. For children, these outcomes include scores on standardized tests, school attendance, positive behavior, and attitudes toward learning. For parents, they include improved academic skills, increased critical thinking and problem-solving strategies, improved employability skills, enhanced knowledge of child development, and improved behavior management skills.

NCFL initially approached MDRC to design and conduct a definitive evaluation of the effects (or “impacts”) created by the TFS program. After some discussion, the two parties agreed that because the program was just getting under way, and because at the outset it operated on such a small scale, it would be inadvisable to measure effects at this juncture. Instead, MDRC proposed to prepare this evaluability assessment, which would discuss the conditions under which a rigorous impact study could be conducted and the implications that doing such a study would hold for program operations.

To prepare for the evaluability assessment, MDRC staff reviewed program documents and statistics on the characteristics of program enrollees and their participation in program activities. The principal author also attended a NCFL-sponsored conference for sites that were completing the first year of operations and sites just coming on board. In addition, a consultant to MDRC visited the three first-year sites where, in the view of NCFL staff members, TFS had been implemented most successfully and conducted interviews with local site coordinators, adult education coordinators, and principals of the participating schools.

The essential conclusion we have reached is that a random assignment experiment is the most feasible and methodologically rigorous way to evaluate TFS. Although we consider other outcomes as well, for purposes of illustration, much of the following discussion centers on the use of an experiment to measure how TFS affects children’s scores on standardized reading tests. We suspect that literacy gains are the main criterion by which funders and others will judge the effectiveness of family literacy interventions, and we doubt that any such intervention will be considered a success unless it raises reading scores. Furthermore, the same general principles that apply to measuring program effects on reading scores would apply to any other outcome measures that might be selected.

An experiment could be mounted with a reasonable level of effort under the following conditions, discussed in detail later in the paper:

  • The program would have to be operationally strong enough to produce effects of at least middling size (since small effects could not be measured with any degree of statistical reliability).
  • Programs would have to operate in a more uniform manner than at present, so that data could be pooled across sites.
  • Between three and seven schools would have to agree to participate in the evaluation.
  • If three schools were involved in the study, then at least 15 children in each school would be selected to enroll in TFS and another 15 would be assigned to a control group and excluded from the program; if four schools were involved in the study, then each school would need to include ten TFS children and ten control group children; and if seven schools were involved, each school would need to include five TFS children and five control group children.
  • To operate on this scale, there would need to be an adequate pool of interested families in the communities served by the participating schools, and the schools would have to recruit twice as many eligible and interested families as they could serve.
  • The schools would have to select families to participate in TFS randomly.
  • Pretest scores (e.g., achievement scores from a previous year, the more recent the better) would need to be available for children in the study.

Deviations from most of these conditions would require a substantially larger number of families involved at each school or a substantially larger number of schools.

In addition, an experiment conducted under these conditions would involve a group of children who are extremely diverse in terms of their age, the language they speak at home, their pre-program level of literacy, and many other variables. Given this degree of heterogeneity, the estimates of program effects resulting from any evaluation are likely to be imprecise. If program administrators were to opt to focus the study on more homogeneous groups of children (and their families) — such as children in certain grades — the precision of the estimates would be increased. Narrowing the scope of the evaluation would entail a trade-off, however, since the estimates of program impacts could not be generalized to the entire group of TFS participants.

We also want to point out several other considerations that NCFL will have to weigh carefully. First, we fully recognize that an evaluation of this nature would require major changes in the way that TFS has operated to date. Second, we want to emphasize that the number of schools and participants specified above represents the number needed to measure the effects of TFS overall. Often, we find that programs have especially strong effects on particular subgroups of the population (e.g., children whose mothers have not completed high school, children from families in which English is not the primary language). If these subgroups do not constitute a large share of the TFS population, the research sample might need to be expanded considerably in order to determine how the intervention affects them. Third, the cost of mounting an experiment on a small scale in multiple locations could be considerable and is yet another factor that NCFL will want to consider seriously.

Perhaps most important, we believe, and will argue later in this paper, that the first condition specified above has not yet been met: our examination has suggested that the program might not yet be strong enough to yield strong and lasting impacts. Our assessment has also pointed to pronounced differences in the way the program has been put in place in the different sites. We therefore suggest that it would be wise to defer an impact analysis until program operations have become more stable and more uniform.

The remainder of this paper is divided into eight sections. The next section examines the rationale for the TFS model and preliminary evidence about its potential effectiveness. Section III reviews salient features of the early implementation of the programs in three cities — the strongest performers during the program’s first year — as these relate to a possible impact evaluation. Sections IV through VII discuss the issues associated with conducting a rigorous study of program impacts, both in general and as these issues pertain to the TFS program in particular. Section VIII first surveys the magnitude of effects that have been achieved in other demonstration programs; it then considers the TFS program model and its capacity to produce the desired changes. Finally, Section IX recapitulates the major themes and points toward their implications for action.