One more thing that standardized achievement and aptitude tests are beginning to do wrong: they are having computers grade essays, a procedure sometimes going by the name Robo-Grading. Several of the things that are wrong with Robo-Grading have already been ably pointed out, as for example, in Michael Winerip's New York Times article of 22-Apr-2012, Facing a Robo-Grader? Just Keep Obfuscating Mellifluously.
My purpose here is to recommend two ultra-simple tests that every Robo-Grader should be able to pass before it is allowed to grade anyone's writing.
Under the heading GENIUS on the left below (perhaps out of sight on a small screen unless you scroll down a bit), is the beginning of Ernest Hemingway's The Old Man and the Sea. My justification for considering it writing of genius is its unstinted praise in the Nobel Prize For Literature Award Ceremony Presentation Speech of 1954, reproduced immediately below more to convey the wholeheartedness of its enthusiasm for Hemingway's writing than for its detailing the justification for that enthusiasm:
When mentioning these principal elements in his production, one should not forget that his narrative skill often attains its highest point when cast in a smaller mould, in the laconic, drastically pruned short story, which, with a unique combination of simplicity and precision, nails its theme into our consciousness so that every blow tells. Such a masterpiece, more than any other, is The Old Man and the Sea (1952), the unforgettable story of an old Cuban fisherman's duel with a huge swordfish in the Atlantic. Within the frame of a sporting tale, a moving perspective of man's destiny is opened up; the story is a tribute to the fighting spirit, which does not give in even if the material gain is nil, a tribute to the moral victory in the midst of defeat. The drama is enacted before our eyes, hour by hour, allowing the robust details to accumulate and take on momentous significance. "But man is not made for defeat", the book says. "A man can be destroyed but not defeated." www.nobelprize.org/~
One thing that any Robo-Grader must be able to do, then, is to assign its highest available grade to any writing that has received such high acclaim as the above. However, it may be seriously doubted that any Robo-Grader will rate Hemingway highly, as Robo-Graders are impressed by long words and long sentences whereas Hemingway leaned toward short words and short sentences. But any Robo-Grader that disparages work of recognized genius is obviously a Robo-Grader that should not be given responsibility for grading student essays, because that could potentially result in devaluation and discouragement of genius. Robo-Graders, in other words, should be banned from the field of achievement and aptitude testing because they can be counted on to fail to recognize Genius.
And under the heading GIBBERISH on the right below (Oops! Robo-Graders deduct marks for sentences that start with "and"), is also the beginning of The Old Man and the Sea, but now with sentence order reversed — a simple method for transforming Genius into Gibberish. While any reader familiar with the original may be more amused than annoyed, we can be sure that this style of writing would be met with cold disapproval and scathing evaluation no matter where it made its appearance. At the same time, my bet is that no Robo-Grader is able to assign a lower grade to the Gibberish version than it assigns to the Genius version. Quite simply, today's Robo-Graders can be counted on to fail the Gibberish Test as well as the Genius test.
Beginning of Ernest Hemingway's The Old Man and the Sea
Sentence order normal
Beginning of Ernest Hemingway's The Old Man and the Sea
Sentence order reversed
He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish. In the first forty days a boy had been with him. But after forty days without a fish the boy's parents had told him that the old man was now definitely and finally salao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fish the first week. It made the boy sad to see the old man come in each day with his skiff empty and he always went down to help him carry either the coiled lines or the gaff and harpoon and the sail that was furled around the mast. The sail was patched with flour sacks and, furled, it looked like the flag of permanent defeat.
"Between fishermen." "Why not?" the old man said.
Another consideration when thinking about Robo-Grading.
Robo-Grading vendors take credit for observing a positive correlation between the grades assigned by Robo-Graders and by human graders. However, as the human graders are expected to work for starvation wages on tasks of mind-numbing boredom, they tend to attract to their number perhaps no summa cum laude graduates, and perhaps no magna cum laude graduates either, and perhaps even worse than that. And when on top of that they may be allowed a maximum of two minutes per essay to come up with a grade — what can anyone expect them to deliver? Why nothing better than grading so shallow that it begins to resemble Robo-Grading, bringing for the Robo-Grading retailers the inflated correlation coefficients they hope will win their methodology public approval.
Supporting the above view are two reader comments accompanying the Winerip NYT article that I cited in my first paragraph at the top of this page:
Citizen Mom West Windsor, NJ
madrona Seattle, WA
A better evaluation of Robo-Grading would be to have PhDs in English Literature do the human grading, and to allow them as much time as they feel they need to fully appreciate each essay, and then see how closely the Robo-Grader is able to match such expert scores.
In other words, the efficacy of Robo-Grading should not be demonstrated by making human grading so shallow that it begins to approximate Robo-Grading. Rather, the efficacy of Robo-Grading should be demonstrated by making Robo-Grading so excellent that it begins to approximate expert human grading.
The Robo-Grading vendors are right in expecting that valid Robo-Grading will someday be a reality. They are wrong to hold back the further information that that happy day will not arrive within our lifetimes. The incentive of horrendous profits makes them reluctant to broadcast such a discouraging estimated time of arrival — the horrendous profits that come from reducing grading costs to the exam administrator to practically zero while keeping the grading charge to the examinee comfortably above zero:
The automated reader developed by the Educational Testing Service, e-Rater, can grade 16,000 essays in 20 seconds, according to David Williamson, a research director for E.T.S., which develops and administers 50 million tests a year, including the SAT. www.nytimes.com/~