Earlier this week I tuned in to watch my beloved Iowa Hawkeyes taking on the Oregon Ducks in men’s basketball. I don’t follow sports as closely as I used to, and Iowa is not having a particularly good year, but the game was at home, and they were keeping it close, so I turned it on.
It was… painful to watch. Iowa lost, but my reaction wasn’t about the outcome or the quality of the play on the court. It was the stoppages. I swear the last two minutes of game time took 20 minutes of real time, thanks to video reviews of shots that may or may not have left someone’s hand before the shot clock reached zero, whether a player’s foot did or did not graze the out-of-bounds line, and whose finger was the last to touch a ball that went out of bounds. The announcers were watching the same video reviews as I was, on repeat, and none of us could tell for certain what exactly happened.
I know I’m far from the first person to comment on the harmful effects that video reviews have on sports. But earlier this year Ken Pomeroy pointed out that college basketball games are getting longer every year, fueled in part by a rule that allows coaches to challenge every decision in the final two minutes of a game, which is messing up the flow of the game. But Pomeroy’s observation applies to more than just college basketball:
Because right now the overwhelming vibe from people involved with college basketball is that “we have to get every call right” regardless of how that affects watchability. And as long as that is your belief, there is no limit to the amount of things that should be reviewed or the amount of time needed to review them.
So what, you might ask? What do college sports have to do with education policy?
The answer is that education has its own problems with accuracy versus timeliness. Specifically, I think state testing systems have prioritized precision at the expense of speed.
I have a new piece out this week for EduProgress.org laying out a different vision for what state assessment systems could be:
The tests themselves could be much shorter than they are today. Most states have weeks-long testing windows, and schools often have to set aside an entire day for each subject. That’s too much, and it’s far longer than other tests take. The NWEA MAP Growth tests, for example, are untimed, but they take students about 30-100 minutes depending on the grade and subject. The NAEP tests take 90 minutes per subject.
Even high-stakes tests can get results with much less testing time than the typical state test. The ACT takes two hours and 55 minutes to assess students in four subjects, and it’s planning to go even shorter. The SAT assesses reading and math in two hours and 14 minutes of testing time. The DuoLingo English proficiency test takes one hour.
None of these tests are perfect, but perfection is not the goal. The whole point of state tests is for the results to be used.
There are a couple points wrapped up here. One, state tests could be shorter. For example, there’s no reason that a 5th grader should have to spend more time on their end-of-year math test than a high schooler taking the ACT or SAT. The latter have actual consequences for students, while most state tests do not.
Two, state tests must be faster. There’s no reason that the ACT, SAT, and every other private-sector testing system can deliver results within 2-4 weeks, while states are taking months to process theirs.
If states were to adopt shorter, faster tests that would lead to a bunch of different decisions around what gets tested, the types of questions offered on the tests, and how much we need to worry about cheating protocols.
There are of course trade-offs here. The assessment system I’m imagining would probably not win any prizes for accuracy or sophistication. There might also be high-stakes use cases, such as a high school graduation exam, where 100% accuracy really is important. But state leaders need to think through these trade-offs more clearly. How short could their tests be and still offer a relatively accurate picture of student performance? How much accuracy would they have to sacrifice to get results back in days or weeks instead of months?
As shown by some of the private testing companies, these trade-offs may not be as dramatic as states might fear. More importantly, asking questions like these could help states prioritize speed and re-focus their annual tests toward their ultimate purpose—to serve as an honest check on schools and districts.
I think this misses a few key points with state tests. I'll caveat this with a note that state assessment systems vary; however, many states are using the same vendors for similar assessments under different names. My own experience is as a district assessment coordinator in Washington State.
1. Preliminary results are available to school and district staff FAR earlier than they are made public. Rarely are the official results any different than preliminary. For that reason, my district now sends preliminary score reports home at the end of the school year. Reasons for public delays at the state level include: staff shortages within data departments, discrepancies with determining accountability site for high mobility students, and disputes around individual student scores. (I'm sure there are others I don't see from a district point-of-view.)
2. State tests are not always measuring the same constructs as other assessments like MAP and NAEP, which feeds into the varied testing times. If the state is assessing English Language Arts, some amount of writing will need to be produced by the student. When assessments only measure reading, they can be shorter and are easier for machine scoring. Something similar happens with math tests that are designed to elicit evidence for the thinking and reasoning on the student's end. Whether we should measure these is an open question. MAP is highly correlated to state test results and maybe we don't need to assess writing, too.
3. State tests aren't necessarily any longer than tests like MAP or NAEP: it just looks like they are. First, assessment developers report the "typical" time taken for a test, but teachers are affected by the time taken by slowest student. That's generally much longer than vendors will share. Second, weeks-long testing windows show when the assessment could happen, not when it did happen. When testing does drag on too long, which it often does, it's frequently a result of the perceived stakes of the assessment. Staff may feel an incentive to really drag out the big accountability measure to eke out any last score improvements.