AI text detectors aren’t working. Is regulation the answer?

Tools developed to stamp out misconduct have been shown to be biased and inaccurate. Will AI creators themselves be forced to do it better?

August 9, 2023
A man walks along Brighton Beach using a metal detector to illustrate Programs are not detecting AI text. Is regulation needed to halt cheats?
Source: Getty Images

More regulation could make the job of detecting whether academic writing has been generated by artificial intelligence easier, amid concerns that tools created for this purpose are suffering from low accuracy rates and inbuilt biases.

Universities worldwide have embraced the use of AI detectors to combat the rising concern that the likes of ChatGPT and its successor GPT-4 can help students cheat on assignments, although many remain wary as an increasing body of evidence shows that they struggle in real-world scenarios.

In a paper published in June, researchers based across European universities concluded that “the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AI-generated text”. This followed another paper that showed that students whose second language was English were being disproportionately penalised because their vocabularies were more limited than native English speakers’.

A third study from academics at the University of Maryland confirmed inaccuracy concerns and found that detectors could be easily outwitted by students using paraphrasing tools to rewrite text initially generated by large language models (LLMs).


Campus collection: AI transformers like ChatGPT are here, so what next?


One of that study’s authors, Soheil Feizi, assistant professor of computer science, said the flaws in the tools had already had a “real-world impact”, with many cases of students suffering “trauma” after being falsely accused of misconduct.

“The issue is that the ‘AI detection camp’ is quite powerful and is successful in muddying the water: they often evaluate their detection accuracy under unrealistic or very specific scenarios and don’t report the full spectrum of false positive and detection rates,” he added.

One of the detectors Dr Feizi tested was the model created by OpenAI, the company behind ChatGPT, which was recently shelved in a move that many viewed as evidence that detection could not be done.

Turnitin – whose detector generally scored higher than most in the studies but did not prove infallible – recently revealed that its tool has already been used 65 million times. 

Annie Chechitelli, the company’s chief product officer, said the product was helping maintain “fairness and consistency in classrooms” but was also still “evolving” and the next step was to help educators better understand the numbers the detector produces and what this might indicate.

Swansea University was not yet using Turnitin, according to Michael Draper, a professor of legal education who also serves as the university’s academic integrity director.

He said he had “mixed feelings” about detection. “If you use a detection tool as a primary means of evidence when accusing a student of committing misconduct, then you are on a hiding to nothing,” he said.

“But I think using it as a first step is legitimate. You can then have an exploratory conversation with a student in relation to their submission. Some may volunteer they have used AI, or it will become clear they can’t adequately explain how they have arrived at their answer.”

Professor Draper said universities should consider asking students to submit a “research trail” alongside their final draft to show their workings out, which could form part of the assessment.

“These things can also be fabricated, but it is still a useful extra step in detection,” he said. “Anyway, it would be beneficial for students to develop this skill.”

AI detection was not going to go away, however, according to Professor Draper, who pointed to a recent voluntary commitment made in the US by many of the major companies creating LLMs to develop “robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system”.

This, he said, would likely be followed by regulation if adequate detection methods were not produced voluntarily, in a “turning of the tide” against companies that “have a vested commercial interest in not having detection”.

“There is increasing recognition that we need to have the ability to differentiate between AI- and human-written text for a number of ethical and legal reasons. It is in everyone’s interest long term to know if something is AI generated or not,” Professor Draper said.

“Some people say detection will never keep up. That’s true when it’s an independent company trying to second-guess what will happen next, but when you have a commitment from the AI companies themselves to create a means of detection, you are on a much stronger wicket.”

Savvy and determined students will find ways around watermarking, but another issue was the blurring of the lines between AI and human writing as chatbots become embedded into everyday programs, according to Mike Sharples, emeritus professor at the Open University's Institute of Educational Technology.

For example, “Copilot” – Microsoft’s soon-to-launch AI assistant – promises to be able to “shorten, rewrite or give feedback” on a user’s written work.

“Rather than generating an entire essay with AI, students will just press the ‘continue’ button or equivalent when they get stuck,” said Professor Sharples.

“Or use it to rewrite a section, or to suggest references. AI will become part of the workflow. It will become increasingly difficult for AI detectors to call out these ‘AI-assisted’ student assignments.”

tom.williams@timeshighereducation.com

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Related articles

Reader's comments (5)

Like with elections, maybe there is something to be said for having to turn up on the day with pen and paper.
It's all very well looking at technical 'solutions' but surely it is more important that students learn that it is wrong to cheat? Students need clear and robust information as to what they should and should not be doing. This won't stop the truly dishonest, of course, but many students are penalised because they are not aware of all the nuances... last year I found that one student penalised for plagiarism was useless at paraphrasing and sat with them for a whole afternoon showing them how to rewrite source material in their own words.
Anyone with a reasonable English vocabulary can defeat Tutnitin. Lords Prayer "The Good Lord is my shephered, he leads me beside quiet waters". My version, "A wonderful deity is the keeper of my flock, he guides me to peaceful lakes and rivers". Not even God himself could detect any plagiarism there. Just hand out essay titles that require local study, e.g. of supermarkets, field boundaries, houses, towns, whatever you are teaching. now see the students find something they can plagiarise on that.
What gives away those that are poorly researched and don't know what they are talking about is the context of their writing. For example those claiming to have re-written a part of the Lord's Prayer when actually it's a verse from the Psalms.
No it is not the answer. There are a multitude of other things wrong with HE that could do with better regulation. students using chat GPT is not one of them. Academics should wise up and think of better ways of assessing stdudents if indeed assessments are necessary.

Sponsored