A nice little how-do-you-do in the world of public education: The test scores that education officials in New York have been ballyhooing these past few years turn out to have been inflated, to put it charitably, or bogus, to put it more baldly. When adjusted for reality, they fall dramatically.
The proportion of elementary students proficient in English is not really 77 percent, it’s 53 percent. The proportion proficient in math is not really 86 percent, it’s 61 percent.
“We haven’t been testing the right things in the right ways,” said Regents Chancellor Mary H. Tisch, and isn’t that fascinating, after all the publicity about improving test scores and all the expertise brought to bear?
What finally provoked the state to take a second look was the grim fact that New York students kept doing worse on national tests even as they improved their scores on our state tests, our barely proficient students dropping from the 36th to the 19th percentile nationally.
“That is a huge, massive difference,” said the state education commissioner, David M. Steiner, who took over last year.
So the state contracted with some experts at Harvard University to study this matter, and the experts found just what you might have guessed, given the stakes: “The performance standards had become very lenient,” in the words of Daniel Koretz, professor at the Harvard Graduate School of Education.
Sure. With the federal No Child Left Behind law, the push was on to improve test scores, and New York did it not by improving education but essentially by scoring tests more leniently.
“It was a political problem to everyone — you can only have certain amount of schools failing in the state without people going into an uproar,” said Jacqueline Ancess, of Columbia University’s Teachers College.
I got in touch with her after I saw her letter to The New York Times declaring, “The tests aren’t dishonest; the test creators and test policy makers are.”
She laid the blame on Steiner’s predecessor, Richard P. Mills, who served as education commissioner from 1995 until retiring last year. (I was unable to reach Mills for a response.)
Surely someone did it. It didn’t happen by itself.
To parse the scoring of the tests and try to figure out how it’s actually done is no easy matter, as I can testify, having tried it, but the essential matter is what’s called the cut score, that is, the score that serves as the dividing line between passing and failing. Or not exactly passing and failing, since such clear terminology is verboten.
The cut score is between the two bottom levels and the two top levels of scores, and please note, that even as the cut score (or passing grade) has been raised to give a more realistic result, the categories have been renamed so that the second lowest level, below passing, is no longer “partially meeting learning standards,” which was euphemistic enough, but instead “meets basic standard,” which sounds even better, sounds indeed like a passing grade.
I have heard teachers protest that far too little was required to pass, or be labeled “proficient.” Ancess says with the sobered-up changes it’s not much better. She says on one particular English test a student needed a mere 14 correct answers out of approximately 50 to meet the proficiency standard, and that has now been raised to 19. In other words, big deal.
You old-timers out there are probably wondering exactly what the passing grade, or “cut score,” is for an elementary student. Is it 85? 75? 65? 55? But too bad for you. It’s none of those. In fact it’s nothing that you (or I) could understand.
Scrutinizing tables graciously provided to me by the state Education Department, trying to figure out raw scores, weighted scores and scaled scores so as to understand how many questions a kid had to get right on a fairly simple test in order to be judged “proficient,” I came up empty. I can only say I’m glad I don’t have to take a standardized test in educational mathematics. I wouldn’t even “partially meet learning standards.”
“The whole thing is ludicrous,” says Ancess, and I agree. It’s an elaborate dodge to make it appear that students are learning like little dynamos when they might be or might not be. The dodge is necessary to justify the toil and the salaries of teachers and administrators, to reassure anxious parents, and to appease the federal government. Whether leaving no child behind or racing to the top, educators have to show measurable results. If anyone can figure a way to do that with guaranteed honesty, I’d like to hear about it. The temptation must always be great to jigger the scoring to produce the desired results, and the process employed is so esoteric that only initiates can plumb it.
Right now we know that the advertised test scores of the past few years, showing wonderful progress by our little ones, have been inflated. What comes next, we can’t be sure.