Jump to content

User:Sophisticatedevening/Detecting LLM

From Wikipedia, the free encyclopedia

Current AI chatbots tend to follow certain patterns, regardless of which one is used. While sometimes it is easy to detect them with obvious errors that the person missed while copying over, it can be difficult when they are trimmed for errors. While the use of AI is not inherently bad per se, it is still prone to frequent errors and hallucinations, and it is important to be able to identify LLM creations so cleanup and fact-checking can be done.

What I've noticed

[edit]
  • If any citation contains utm_source=chatgpt.com (usually at the end of the reference), then it is extremely likely that an LLM was used at some point in the edit. Special:AbuseFilter/1346 is able to catch a large amount of these for you.
  • Phrases outlined in Wikipedia:WikiProject AI Cleanup/AI catchphrases; while not a guarantee that it is generated (especially with newer models), it can be good for determining borderline scenarios.
  • Large blocks of unreferenced (and unlinked sometimes) text that hit 0% or very low with Earwig's.
  • Be wary of hits with Special:AbuseFilter/1325, whilst helpful, it can be vulnerable to frequent false positives.
  • It should not be assumed that AI sounding text is in fact AI with very short articles. Professional AI sounding wording is a lot easier to achieve with less text, and most LLM outputs will likely not be very short.
  • Frequent use of * characters. Not entirely sure why this happens, but LLM has a habit of adding strange markdown.
  • If it contains a heading or subheading named "Conclusion", it is highly likely it is LLM. AI typically writes articles as in a "story" kind of way for some reason.
  • If it only contains level 2 headings, AI typically will add in headings during the initial generation, and that just turns into level 2 headings when copied over.
  • Odd templates and template errors; LLM is slightly aware of wikimarkup, however when templates are involved it tends to give odd or broken results.
  • LLMs will frequently write in lists, especially using including: and then a bunch of bullet points. The bullet points will usually be the least NPOV portion of the article.
  • Fake ISBNs and links. AI will sometimes flat-out make up an ISBN with a fake page name and author, so make sure to double check that they exist in the first place. https://isbnsearch.org/ is helpful for that.
  • On the digital side of hallucinations, realistic sounding links to real domains often don't exist, so make sure to check that none of them will 404.
  • Sporadic use of bold text is a definite sign that AI might be involved. Just another strange habit LLM has currently.
  • Ironically enough, a large amount of the AI articles and drafts I have seen happen to be about a particular LLM or AI-centered company.
  • In AfC drafts, LLMs will often output a submission template that is already declined and poorly formatted. In particular, the timestamp is very messed up, with results like a readable {{AFC submission|d|ts=18:47, 29 April 2025 (UTC)}} instead of the usual number string like 20250430141340.