TwelveByTwelve (TBT)      Genius-or-Gibberish


Can Robo-Grading Pass
The Genius-or-Gibberish Test?
by Luby Prytulak
You are at   www.twelvebytwelve.net/ets/genius-or-gibberish-test.html
Email comments to   Lubyprytulak@gmail.com
First posted  09 May 2013 05:40pm,  last edited  07 Jun 2013 09:32am


One more thing that standardized achievement and aptitude tests are beginning to do wrong: they are having computers grade essays, a procedure sometimes going by the name Robo-Grading.  Several of the things that are wrong with Robo-Grading have already been ably pointed out, as for example, in Michael Winerip's New York Times article of 22-Apr-2012, Facing a Robo-Grader?  Just Keep Obfuscating Mellifluously.    www.nytimes.com/~.

My purpose here is to recommend two ultra-simple tests that every Robo-Grader should be able to pass before it is allowed to grade anyone's writing.

Under the heading GENIUS on the left below (perhaps out of sight on a small screen unless you scroll down a bit), is the beginning of Ernest Hemingway's The Old Man and the Sea.  My justification for considering it writing of genius is its unstinted praise in the Nobel Prize For Literature Award Ceremony Presentation Speech of 1954, reproduced immediately below more to convey the wholeheartedness of its enthusiasm for Hemingway's writing than for its detailing the justification for that enthusiasm:


Ernest Hemingway
 

When mentioning these principal elements in his production, one should not forget that his narrative skill often attains its highest point when cast in a smaller mould, in the laconic, drastically pruned short story, which, with a unique combination of simplicity and precision, nails its theme into our consciousness so that every blow tells.  Such a masterpiece, more than any other, is The Old Man and the Sea (1952), the unforgettable story of an old Cuban fisherman's duel with a huge swordfish in the Atlantic.  Within the frame of a sporting tale, a moving perspective of man's destiny is opened up; the story is a tribute to the fighting spirit, which does not give in even if the material gain is nil, a tribute to the moral victory in the midst of defeat.  The drama is enacted before our eyes, hour by hour, allowing the robust details to accumulate and take on momentous significance.  "But man is not made for defeat", the book says.  "A man can be destroyed but not defeated."     www.nobelprize.org/~


One thing that any Robo-Grader must be able to do, then, is to assign its highest available grade to any writing that has received such high acclaim as the above.  However, it may be seriously doubted that any Robo-Grader will rate Hemingway highly, as Robo-Graders are impressed by long words and long sentences whereas Hemingway leaned toward short words and short sentences.  But any Robo-Grader that disparages work of recognized genius is obviously a Robo-Grader that should not be given responsibility for grading student essays, because that could potentially result in devaluation and discouragement of genius.  Robo-Graders, in other words, should be banned from the field of achievement and aptitude testing because they can be counted on to fail to recognize Genius.

And under the heading GIBBERISH on the right below (Oops! Robo-Graders deduct marks for sentences that start with "and"), is also the beginning of The Old Man and the Sea, but now with sentence order reversed — a simple method for transforming Genius into Gibberish.  While any reader familiar with the original may be more amused than annoyed, we can be sure that this style of writing would be met with cold disapproval and scathing evaluation no matter where it made its appearance.  At the same time, my bet is that no Robo-Grader is able to assign a lower grade to the Gibberish version than it assigns to the Genius version.  Quite simply, today's Robo-Graders can be counted on to fail the Gibberish Test as well as the Genius test.

GENIUS
Beginning of Ernest Hemingway's The Old Man and the Sea
Sentence order normal

  GIBBERISH
Beginning of Ernest Hemingway's The Old Man and the Sea
Sentence order reversed


He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish.  In the first forty days a boy had been with him.  But after forty days without a fish the boy's parents had told him that the old man was now definitely and finally salao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fish the first week.  It made the boy sad to see the old man come in each day with his skiff empty and he always went down to help him carry either the coiled lines or the gaff and harpoon and the sail that was furled around the mast.  The sail was patched with flour sacks and, furled, it looked like the flag of permanent defeat.

The old man was thin and gaunt with deep wrinkles in the back of his neck.  The brown blotches of the benevolent skin cancer the sun brings from its reflection on the tropic sea were on his cheeks.  The blotches ran well down the sides of his face and his hands had the deep-creased scars from handling heavy fish on the cords.  But none of these scars were fresh.  They were as old as erosions in a fishless desert.

Everything about him was old except his eyes and they were the same color as the sea and were cheerful and undefeated.

"Santiago," the boy said to him as they climbed the bank from where the skiff was hauled up.  "I could go with you again.  We've made some money."

The old man had taught the boy to fish and the boy loved him.

"No," the old man said.  "You're with a lucky boat.  Stay with them."

"But remember how you went eighty-seven days without fish and then we caught big ones every day for three weeks."

"I remember," the old man said.  "I know you did not leave me because you doubted."

"It was papa made me leave.  I am a boy and I must obey him."

"I know," the old man said.  "It is quite normal."

"He hasn't much faith."

"No," the old man said.  "But we have.  Haven't we?"

"Yes," the boy said.  "Can I offer you a beer on the Terrace and then we'll take the stuff home."

"Why not?" the old man said.  "Between fishermen."

 

"Between fishermen."  "Why not?" the old man said.

"Can I offer you a beer on the Terrace and then we'll take the stuff home."  "Yes," the boy said.

"Haven't we?  But we have."  "No," the old man said.

"He hasn't much faith."

"It is quite normal."  "I know," the old man said.

"I am a boy and I must obey him.  It was papa made me leave."

"I know you did not leave me because you doubted."  "I remember," the old man said.

"But remember how you went eighty-seven days without fish and then we caught big ones every day for three weeks."

"Stay with them.  You're with a lucky boat."  "No," the old man said.

The old man had taught the boy to fish and the boy loved him.

"I could go with you again.  We've made some money."  "Santiago," the boy said to him as they climbed the bank from where the skiff was hauled up.

Everything about him was old except his eyes and they were the same color as the sea and were cheerful and undefeated.

They were as old as erosions in a fishless desert.  But none of these scars were fresh.  The blotches ran well down the sides of his face and his hands had the deep-creased scars from handling heavy fish on the cords.  The brown blotches of the benevolent skin cancer the sun brings from its reflection on the tropic sea were on his cheeks.  The old man was thin and gaunt with deep wrinkles in the back of his neck.

The sail was patched with flour sacks and, furled, it looked like the flag of permanent defeat.  It made the boy sad to see the old man come in each day with his skiff empty and he always went down to help him carry either the coiled lines or the gaff and harpoon and the sail that was furled around the mast.  But after forty days without a fish the boy's parents had told him that the old man was now definitely and finally salao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fish the first week.  In the first forty days a boy had been with him.  He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish.



Another consideration when thinking about Robo-Grading.

Robo-Grading vendors take credit for observing a positive correlation between the grades assigned by Robo-Graders and by human graders.  However, as the human graders are expected to work for starvation wages on tasks of mind-numbing boredom, they tend to attract to their number perhaps no summa cum laude graduates, and perhaps no magna cum laude graduates either, and perhaps even worse than that.  And when on top of that they may be allowed a maximum of two minutes per essay to come up with a grade — what can anyone expect them to deliver?  Why nothing better than grading so shallow that it begins to resemble Robo-Grading, bringing for the Robo-Grading retailers the inflated correlation coefficients they hope will win their methodology public approval.

Supporting the above view are two reader comments accompanying the Winerip NYT article that I cited in my first paragraph at the top of this page:


Citizen Mom     West Windsor, NJ

I score such essays for Pearson, and I am not allowed to give a lower score for inaccurate information.  It doesn't matter if the student states that the American Civil War took place in England or if he/she states that grad students make more money than professors.

What matters is the quality and development of the argument.  Yes, writing style and mechanics are taken into account, too, but a student could have a grammar error here and there and still receive a high score, if the essay meets the enough of the characteristics.

Readers do not have the time to go through and fact-check every essay, so even obvious errors must be looked at as simply a detail.

I just hope [the Educational Testing Service's] e-Rater doesn't put me out of a job.
April 23, 2012 at 6:50 p.m.     www.nytimes.com/~




madrona     Seattle, WA

I scored essay tests for Pearson a few years ago.  Athough many of us began the job with a certain warm interest in evaluating student writing, our training process made clear that this attitude would be detrimental.  We were whipped into scoring ever faster, under threat of losing the job, until we could score a 3-page essay in less than two minutes.  It was an $11 per hour temp job where you weren't even allowed to stand up and stretch or go to the restroom except during your ten-minute break.  Supervisors paced the room scolding anyone who looked away from their screen for too long.  Eventually we reached the point where we just read the first and last paragraph and searched for a few basic connector words.

Furthermore, when the aggregate daily scores we were producing turned out to all be on the low side for a particular test, suddenly the rubric got changed.  Things we had previously been told to take points off for suddenly became OK, from one day to the next.  We were all perfectly able to see that Pearson had some statistical agenda for what it provided its customers (the school districts), and it made sure that our scoring fit that agenda.  But we were literally not allowed to mention this topic out loud to each other.

So just in case anyone is still laboring under the illusion that a human scorer is "reflecting" on what a student writes, I'd like to point out that "robo-readers" are more or less what has been in place all alog.
April 23, 2012 at 1:46 p.m.     www.nytimes.com/~


A better evaluation of Robo-Grading would be to have PhDs in English Literature do the human grading, and to allow them as much time as they feel they need to fully appreciate each essay, and then see how closely the Robo-Grader is able to match such expert scores.

In other words, the efficacy of Robo-Grading should not be demonstrated by making human grading so shallow that it begins to approximate Robo-Grading.  Rather, the efficacy of Robo-Grading should be demonstrated by making Robo-Grading so excellent that it begins to approximate expert human grading.

The Robo-Grading vendors are right in expecting that valid Robo-Grading will someday be a reality.  They are wrong to hold back the further information that that happy day will not arrive within our lifetimes.  The incentive of horrendous profits makes them reluctant to broadcast such a discouraging estimated time of arrival — the horrendous profits that come from reducing grading costs to the exam administrator to practically zero while keeping the grading charge to the examinee comfortably above zero:

The automated reader developed by the Educational Testing Service, e-Rater, can grade 16,000 essays in 20 seconds, according to David Williamson, a research director for E.T.S., which develops and administers 50 million tests a year, including the SAT.     www.nytimes.com/~




TwelveByTwelve (TBT)      Genius-or-Gibberish