The Problem with Detecting AI Writing
In our Generative AI Syllabus guidance, we stress that detecting AI usage is problematic and nearly impossible. We’ll elaborate on this here. First, we’d like to emphasize that, rather than putting considerable effort into trying to catch students, we would rather put our energies into preventing or discouraging academic dishonesty. We can do this by clearly stating generative AI expectations in our courses (what’s permitted, what isn’t, how to document, how to cite) and changing the way we teach and assess. Those will be the topics for future articles as we continue to elaborate on the Gies Generative AI Guiding Principles.
How AI Detectors Work
AI detectors analyze writing in terms of perplexity and burstiness. If text within a sentence is predictable and less creative or complex (low perplexity) and if sentences look similar in terms of structure and length (low burstiness), then they are considered to have a higher probability of being generated by AI. You can learn more about how detectors work.
Inherent Problems
These new AI tools are based on large language models (LLM) that are built to mimic human writing. Earlier versions were trained on millions of writing samples. Newer versions are trained on billions of writing samples. As the training expands, AI writing will be better able to mimic human writing, with increased creativity, complexity, and variation in structure and length.
Many tools exist claiming they can detect AI such as GPT Zero, Winston, and Originality.ai. The tools are available for free, or for a small fee. There is nothing stopping someone from putting their generated writing into one of these tools and seeing if it comes back with a high AI probability. If it does, then you can refine your AI prompt, asking for the writing to be more creative and varied. A common technique is assignment “spinning” which is the practice of rewriting text multiple times, in multiple tools, to avoid detection.
A recent research study found that AI detectors are biased, more often identifying false positives with non-native English writers.
Real World Testing
Instructors at the University of Illinois have access to the “AI Score” functionality within Turnitin in Canvas. When rolling out this feature, Turnitin claimed an accuracy rate of over 98% in detecting AI writing. However, in our testing, we have not been able to reproduce such high accuracy. We took nearly 1000 papers that were written prior to the availability of generative AI platforms. Of those papers, 5% came back with AI scores ranging from 3% to 45%. These are false positives. We submitted 10 completely AI-generated papers created from a simple prompt to ChatGPT 4. While all of these should have come back with AI scores of 100%, only two of the documents had an AI score of 100% and 3 had AI scores of 0%. These are false negatives.
You may be familiar with Turnitin’s Originality Reports, which are an effective way of seeing if text has been plagiarized. These work well because there is no ambiguity, you can quickly see where plagiarized text comes from. Unfortunately, the Turnitin AI score feature does not work this way. It is not possible to see how (the prompt) or where (ChatGPT, Bard, something else?) text may have been generated.
Industry Trends
Turnitin recently lowered their accuracy claims, increased thresholds, and added more disclaimers to their AI detection tool. OpenAI, the makers of ChatGPT, recently shut down their own AI detection tool over inaccuracies. At the 2023 InstructureCon conference, Proctorio announced that they are putting on hold the release of their AI detection product due to accuracy concerns.
AI companies have committed to developing watermarking technologies as a more reliable way to detect AI-generated content. However, that technology is still to be developed and there will likely be limitations.
When is AI Detection Useful?
There are instances when AI detection, such as Turnitin’s AI score can be useful. If you notice several papers that look similar, this could indicate that students used a very similar prompt and AI generation tool. You can then take a closer look at the AI scores and try reverse engineering a suspected AI prompt to see if you can get a similar answer. If a Turnitin AI score is very high, then you may want to take a closer look at the student’s work; the company touts increased accuracy when the AI score is higher.
Handling Suspected AI Misuse
If you suspect AI misuse, first avoid rushing to judgment. One Texas A&M professor was in the news for accusing students of cheating after asking ChatGPT if it had written student’s papers (ChatGPT lied and said it did). Take the time to understand what the technology can and can’t do and to look closely at the student’s work. You may need to have a frank conversation with the student about AI misuse, and Turnitin has several resources that instructional teams and even students can find useful (below).
- Approaching a student regarding potential AI misuse: A guide to support honest, open dialog with students regarding their work.
- Discussion starters for tough conversations about AI: This guide helps educators focus on growth in the work that has been submitted and facilitate productive conversations.
- AI conversations: Handling false positives for educators: This guide shares strategies educators can consider before and after submissions when discovering a false positive.
- AI conversations: Handling false positives for students: This guide for students shares strategies they can consider before and after submissions when confronted with a false positive.
- Ethical AI use checklist for students: This checklist for students provides guidance during all phases of work and suggests ways to make decisions that support integrity and align with educator guidelines.
Ultimately, we encourage you to try out AI tools yourself and to try AI detection tools. Better yet, take a look at your assignments and how to improve them in light of this new technology. Here are two resources to get started:
- AI Misuse Rubric: A rubric to aid educators in reviewing assignment prompts for potential vulnerabilities to generative AI tools and/or creating prompts with fewer vulnerabilities.
- AI Misuse Checklist: A checklist that focuses on some areas that current models of generative AI tools are not generally capable of producing as a student writer might. If the responses to the questions are generally “no,” take a closer look at the assignment.
We also encourage you to look for ways to evaluate the learning process in addition to the learning product. This will give you as the instructor a more meaningful way to assess student learning.