Posted on February 22, 2013

The Shaky Science Behind Obama’s Universal Pre-K

Charles Murray, Bloomberg, February 20, 2013

“Study after study shows that the earlier a child begins learning, the better he or she does down the road,” said U.S. President Barack Obama in Feb. 14 speech in Decatur, Georgia. “Every dollar we invest in high-quality early education can save more than seven dollars later on — boosting graduation rates, reducing teen pregnancy, reducing violent crime.”

{snip} There are just two problems with his [Obama’s] solution: The evidence used to support the positive long-term effects of early childhood education is tenuous, even for the most intensive interventions. And for the kind of intervention that can be implemented on a national scale, the evidence is zero.

Let me begin with the two studies in the early education literature that are so famous you may well have heard of them: the Perry Preschool Project and the Abecedarian Project. The Perry Preschool study took place a half-century ago, in the early 1960s. It treated 58 children ages 3 and 4 years old. The Abecedarian Project took place in the early 1970s. It treated 57 children, starting a few months after birth and continuing through age 5.

{snip}

Both programs achieved positive results of the kind that Obama described. But caveats about those results have troubled careful observers of the programs for years, especially when they hear Perry Preschool and Abecedarian cited as proof that early education accomplishes great things. The main problem is the small size of the samples. Treatment and control groups work best when the numbers are large enough that idiosyncrasies in the randomization process even out. When you’re dealing with small samples, even small disparities in the treatment and control groups can have large effects on the results. There are reasons to worry that such disparities existed in both programs.

Another problem is that the evaluations of both Perry Preschool and Abecedarian were overseen by the same committed, well-intentioned people who conducted the demonstration projects. Evaluations of social programs are built around lots of judgment calls — from deciding how the research is designed to figuring out how to analyze the data. People with a vested interest in the results shouldn’t be put in the position of making those judgments. {snip}

The most concrete reason for doubting the wider applicability of the Perry Preschool and Abecedarian effects is this: A large-scale, high-quality replication of the Abecedarian approach failed to achieve much of anything. Called the Infant Health and Development Program, it was begun in 1985. Like Abecedarian, IHDP identified infants at risk of developmental problems because of low birth weight and supplied similarly intensive intervention. Unlike Abecedarian, IHDP had a large sample (377 in the treatment group, 608 in the control group) spread over several sites assessed by independent researchers. IHDP provided a level of early intervention that couldn’t possibly be replicated nationwide, but it gave us by far the most thorough test of intensive early intervention to date.

{snip}

The follow-ups at ages 2 and 3 were positive, with large gains in cognitive functioning for the treatment group. But by age 5, those gains had attenuated. Where are things now? In the most recent report, the children in the study had reached 18. For the two-thirds of the sample who weighed no more than 2,000 grams (4.4 pounds) at birth, almost all of the outcome measures weren’t even in the right direction: The control group did slightly better. For those who weighed 2,001 to 2,500 grams at birth, the best news the analysts could find were positive differences on a math test and on a self-report of risky behaviors that reached statistical significance but were substantively small. Combine the results for both groups, and the IHDP showed no significant effects on any of the reported measures — not cognitive tests, measures of behavior problems and academic achievement, or arrest, incarceration and school- dropout rates.

{snip}

The disappointing results from the IHDP don’t mean that early education can’t do any good. Other studies of good technical quality have convinced me that the best early education programs sometimes have positive long-term effects, though much more modest than the ones ascribed to Perry Preschool and Abecedarian. That leaves us with one last problem: None of those first-rate programs are replicable on a large scale. The kind of nationwide expansion of early education that Obama wants won’t have the highly motivated administrators and hand-picked staffs that demonstration projects enjoy, and the per-child cost of the interventions on the Perry Preschool and Abecedarian model are prohibitively high. If you’re going to have a national program, you’re going to get the kind of early education that Head Start provides.

{snip}

This brings us to the third-grade follow-up of the national impact assessment of Head Start, submitted to the government in October and released to the public late last year. Head Start has been operating since the 1960s. After decades of evaluations that mostly showed no effects, Congress decided in 1998 to mandate a large-scale, rigorous, independent evaluation of Head Start’s impact, including randomized assignment, representative samplings of programs and a comprehensive set of outcomes observed over time.

Of the 47 outcome measures reported separately for the 3- year-old and 4-year-old cohorts that were selected for the treatment group, 94 separate results in all, only six of them showed a statistically significant difference between the treatment and control group at the .05 level of probability — just a little more than the number you would expect to occur by chance. The evaluators, recognizing this, applied a statistical test that guards against such “false discoveries.” Out of the 94 measures, just two survived that test, one positive and one negative.

One aspect of the Head Start study deserves elaboration. The results I gave refer to the sample of children who were selected to be part of the treatment group. But 15 percent of the 3-year-old cohort and 20 percent of the 4-year-old cohort were no-shows — – a provocative finding in itself. When the analysis is limited to children who actually participated in Head Start, some of those outcomes do become statistically significant, though still substantively small. But keep in mind that we’re looking at selection artifacts: Children who end up coming to the program every day have cognitive, emotional or parental assets going for them that children who fail to participate don’t have. This means that if somehow the no-shows could be forced to attend, you couldn’t expect them to get the same benefit as those who participated voluntarily. If you’re asking what impact we could expect by making Head Start available to all the nation’s children who might need it, you have to make the calculation based on giving access to the service.

{snip}

So what should we make of all this? The take-away from the story of early childhood education is that the very best programs probably do a modest amount of good in the long run, while the early education program that can feasibly be deployed on a national scale, Head Start, has never proved long-term results in half a century of existence. In the most rigorous evaluation ever conducted, Head Start doesn’t show results that persist even until the third grade.

{snip}