New school tests don’t make the grade

More than a dozen states are trying out new tests meant to free school kids from the tyranny of multiple-choice exams

Thousands rallied in Albany, N.Y., against the standardized testing that is required by New York state, in June 2013.
Shannon DeCelle/AP

MONTCLAIR, N.J. — This upscale, racially diverse suburb isn’t the first place you’d expect to see a pitched battle break out over school reform. In recent years, Montclair has become a frequent landing site for middle-class families looking to recreate Manhattan’s Upper West Side on a lawn-bedecked hilltop. It is home to New Jersey’s first medical marijuana dispensary as well as a healthy slice of The New York Times’ editorial staff and is, as one resident puts it, the place you flee to “so you don’t have to fight school wars anymore.”

Yet the last few months have seen everything from shouting matches at school board meetings to subpoenas leveled at parents for allegedly leaking new standardized tests online. The tests were imposed, over parent and teacher protests, by a new district superintendent who declared them necessary to prepare Montclair schools for an even bigger change: the new multistate standardized tests being prepared by the Partnership for Assessment of Readiness for College and Careers.

PARCC, as it’s universally known, is at the forefront of the push to reinvent the standardized test. Starting March 24 and running through early June, the consortium will conduct a series of three-hour field tests with about 1.2 million students in 14 states plus the District of Columbia, with plans to roll out the full assessments for 22 million students in 2015. The promise is to sweep away the panoply of fill-in-the-bubble exams that states currently use. They would be replaced by computerized assessments that, PARCC says, will provide schools with more complete data and better reflect the new national Common Core curriculum, which has been adopted in 45 states since 2010.

But as the recent dustup over the revised SAT has shown, nothing is that simple in the world of schools and tests. Many in the growing crowd of PARCC critics, in Montclair and beyond, worry that the new assessments will only impose longer tests, divert money from classroom instruction and increase stresses on children and teachers alike without necessarily providing better information about what and whether students are learning.

“There’s kind of a belief in a town like Montclair that the more we test, the more we can be sure that our teachers are delivering a quality curriculum,” says Michelle Fine, a City University of New York psychology professor who is a member of the parent group Montclair Cares About Schools. “I think that’s magical thinking.”

There’s kind of a belief in a town like Montclair that the more we test, the more we can be sure that our teachers are delivering a quality curriculum. I think that’s magical thinking.

Michelle Fine

member, Montclair Cares About Schools

The roots of PARCC go back to 2001, when President George W. Bush’s No Child Left Behind (NCLB) Act promised to force the evaluation of school systems on how they educated all students, not just a rarefied few at the top. When it was shown that NCLB’s all-stick approach — schools were threatened with closure if they didn’t meet an escalating series of benchmarks, culminating in the impossible goal of 100 percent of students scoring as proficient by 2014 — was doomed to fail, President Barack Obama introduced a plan that was more carrot. Obama’s Race to the Top provided $4.3 billion of new federal funding to be doled out to states that best met certain criteria, which include improving teacher performance and student evaluation, which means testing.

Of that funding, $350 million would go to the development of new multistate standardized tests for grades 3 through 11, which would, as Education Secretary Arne Duncan said at the time, “close the data gap that now handcuffs districts from tracking growth in student learning and improving classroom instruction.” The money ended up being split between two consortiums: PARCC, made up of 24 states mostly in the East and South, and Smarter Balanced, with 28, primarily in the West and Midwest. (Six states — Alabama, Colorado, Kentucky, North Dakota, Pennsylvania and South Carolina — hedged their bets and joined both groups, and some states joined neither group.)

Fans of the new PARCC assessments say they will be a much-needed reimagining of what standardized tests should look like. “You look at the assessments that were in place in so many of these states before, and so many of them were of such poor quality,” says Michael Brickman, policy director of the Thomas B. Fordham Institute, a conservative education think tank. “They were getting a low-quality assessment at a low price.”

Price, however, has proved one of the early sticking points of the new tests, as states face higher costs not only for the assessments but also for the technology needed to implement them. Over the past year, Georgia and Kentucky have announced they were dropping out of PARCC and seeking a cheaper alternative, and the New York State Board of Regents quietly put off its adoption of the new tests, largely over concerns that schools wouldn’t have the hardware to administer them.

In New Jersey, which is going full speed ahead, educators and parent advocates are worried about what the new tests will mean for already strapped school budgets. Tina Weishaus of a Highland Park, N.J., parent group says her school district recently defended layoffs as necessary to fund new technology that, she charges, “pertains to the PARCC tests.” Other New Jersey towns have raised similar concerns over PARCC’s imposing “unfunded mandates” on their communities.

I think [the new PARCC tests are] going to be much more engaging for kids. This is the 21st century, and they’ll be in a 21st century environment. It won’t be, ‘Oh, here’s another bubble test.’

Jeff Nellhaus

policy, research and design director, PARCC

Montclair recently announced it would spend $1 million on new computers for PARCC, even as, according to parent activist Maia Davis, multiple teachers have complained, “My classes don’t have textbooks, I’m having to get curriculum materials off the Web.” Some, she says, report having been told by supervisors to go to online charity sites to raise money for supplies — “and this is one of the most high-property-tax districts in New Jersey.”

For their part, PARCC and its supporters say the added costs are relatively small, rising from $349 million a year on testing to $357 million — a drop in the bucket of total school budgets. (Those figures, though, omit tech upgrade costs, according to the Brookings Institution’s Matthew Chingos, who compiled them.) And the result — better and more comprehensive data — is worth it, they argue.

The prospect of one-size-fits-all tests, though, has not pleased everyone. Some Republican-led states have balked at what’s seen as a Democratic initiative — Florida Gov. Rick Scott complained that the program has been “marked by overreaches from the federal government into education policy.” Meanwhile, the parent complaints have begun, particularly among families in the increasingly vocal opt-out movement, which argues that the new tests will tighten testing’s grip on modern education.

In Montclair, the first sign of conflict came last summer, when newly appointed district superintendent Penny MacCormack declared that in preparation for the PARCC exams, come fall, all local schools would create tests of their own: quarterly assessments drawn up by district administrators. Several hundred parents and high school students signed a petition asking for the new assessments to be delayed, but MacCormack refused.

Then in October, when the first quarterly tests were about to be given, 14 of the 60 assessments turned up at a scavenger website. The school board responded by issuing subpoenas against the test critics — a plan that backfired when it turned out that a school district staffer had merely saved the tests to a server with no password protection.

Montclair teachers, meanwhile, had begun speaking up during school board meetings against the new testing regimen. At a Jan. 27 meeting, one elementary school teacher testified, “As much as a third of some of the classroom time is presently being used to assess children.” Another said that the children “might be able to pass the PARCC, but they might not retain that content for the following September.”

John Wodnick, a Montclair parent who teaches in another New Jersey district, sees the PARCC fight as the bitter culmination of changes he has seen firsthand as his three children have gone through the Montclair school system. When his oldest son, now 12, was in the early grades, “you could tell every day there was something fun and energizing going on in his classroom.” In contrast, by the time his younger kids were in school, “there was definitely more of an emphasis on tests. ‘Have they reached this particular math benchmark?’”

None of the Montclair parents, meanwhile, have gotten a glimpse of the actual tests, which launched March 24. Even as the PARCC start date loomed, the only hints of its content were the sample items for each grade that have been posted on the PARCC website.

Those items reveal an exam at once familiar and transformed. There are still multiple-choice questions, though more skewed toward inferential thinking, often requiring students to incorporate elements from different sections. Some questions take advantage of the computer-based format by having students drag and drop elements in the correct order.

If there are a billion answers that need to be scored in a two-month span, how is it possible without hiring every person off the street you can and cutting corners?

Todd Farley

author, ‘Making the Grades’

Jeff Nellhaus, who designed Massachusetts’ highly regarded state tests and now serves as PARCC’s policy, research and design director, promises that the shift to more complex questions will provide deeper data about student knowledge. The PARCC English tests, he says, will require students to read and consolidate multiple texts, then write an essay drawing on evidence from them. Math assessments, meanwhile, will place greater emphasis on real-world problems  — one sample question asks third-graders to calculate how an art teacher can best use tiles to cover a wall — and extended mathematical thinking. He says, “I think it’s going to be much more engaging for kids. This is the 21st century, and they’ll be in a 21st century environment. It won’t be, ‘Oh, here’s another bubble test.’”

Testing experts, however, have long warned that more elaborate questions come with a price. One common problem is construct-irrelevant variance — the risk of testing for things, such as the ability to navigate the new computer interface, that have nothing to do with academic skills. It doesn’t help that Common Core testing materials use language that can be confusing even to educators. “They’re written like Ikea instructions for putting together a desk,” says the Montclair parent group’s Fine.

Even the complexity of the questions can degrade the quality of test results. As Harvard education professor Daniel Koretz explained in his 2008 book “Measuring Up,” multiple-choice questions have one built-in advantage: You can ask lots of them, assessing students across a wide range of knowledge. More involved questions provide depth but not breadth — which means less opportunity for an especially good or poor score on any one topic to be evened out.

The new tests offer more open-ended responses, which raises yet another dilemma: How do you ensure that students’ varied responses are graded objectively?

Some former workers at testing companies such as Pearson and ETS warn that this is a bigger challenge than most people realize. In his book “Making the Grades,” longtime test-scoring leader Todd Farley described a testing industry dominated by low-paid temp scorers, typically earning $11 to $13 an hour, forced to puzzle out illegible handwriting and unclear scoring rubrics (if a student is assigned a reading on “The Lion and the Squirrel” but calls the squirrel a mouse, should he get full credit?) on a tight schedule — and then as often as not deciding to just pick a score, as long as they are all in agreement, keeping the scorers’ reliability rating up without being accurate.

Nellhaus promises a “very rigorous” scoring process for PARCC essay questions, with a certain percentage of questions read over by a second reader to confirm the scores. Farley, though, is unconvinced. “If there are a billion answers that need to be scored in a two-month span, how is it possible without hiring every person off the street you can and cutting corners?” he asks.

It’s been found in all kinds of fields, from doctors to bus drivers: If you put high stakes on a particular set of indicators, you distort everything else.

Monty Neill

National Center for Fair and Open Testing

The most widespread complaint about the new tests, though, may be not what they contain but how they’re being used. High-stakes tests — those used to determine the fate not just of students but of teachers and schools — create strong incentives for those taking and giving the test to game the system, via everything from conducting massive test prep to outright cheating. “It’s been found in all kinds of fields, from doctors to bus drivers. If you put high stakes on a particular set of indicators, you distort everything else,” says Monty Neill, director of the National Center for Fair and Open Testing, a group skeptical about standardized testing. (The phenomenon is known in social science literature as Campbell’s law.)

This distorting effect becomes only more acute in the more complex and open-ended question format: The College Board announced earlier this month that it will cut essay questions from the SAT, largely in response to complaints that they serve only to identify which students have parents who could afford the test prep to learn how to hit all the right boxes on the essay-scoring rubric.

The tragedy of all this, say testing critics, is that there are plenty of ways to tell if schools are doing a good job without spending hundreds of millions of dollars and countless hours of classroom time. As Neill points out, several studies have found that classroom grades are as good at predicting college success as any standardized test, if not better.

A good school district, concludes Wodnick, “encourages teachers to take risks, to be inventive, to collaborate successfully, to take an interest in students as individuals” — and to use low-stakes assessments to gauge their students’ progress. By contrast, using tests in which the results “are going to become the [sole] measure of whether or not you’ve succeeded” invariably has a distorting impact on behavior, he says. “The individual questions can be designed in such a way that they might encourage someone to think critically or use research in a snazzy high-tech way, but it’s really all for show.”

Testing advocates, though, remain unmoved, and the pendulum swing toward more and higher-stakes assessments seems unlikely to be reversed anytime soon. It’s something Fine ascribes to “a swamp of multiple motives”: test companies looking to drum up business, school privatization advocates eager to find ways to redirect funding from public schools without running afoul of unions, and liberal reformers looking to measure how well schools teach the neediest.

“I wish this were a way of holding schools accountable,” Fine says. “It’s just not.”

Related News

Find Al Jazeera America on your TV

Get email updates from Al Jazeera America

Sign up for our weekly newsletter

Get email updates from Al Jazeera America

Sign up for our weekly newsletter