Wikipedia talk:WikiProject AI Cleanup

This is the talk page for discussing WikiProject AI Cleanup and anything related to its purposes and tasks.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Archives: 1, 2: 30 days

To help centralize discussions and keep related topics together, all non-archive subpages of this talk page redirect here.

This page has been mentioned by multiple media organizations:

Maiberg, Emanuel (9 October 2024). "The Editors Protecting Wikipedia from AI Hoaxes". 404 Media. Retrieved 9 October 2024.
Nine, Adrianna (9 October 2024). "People Are Stuffing Wikipedia with AI-Generated Garbage". ExtremeTech. Retrieved 10 October 2024.
Harrison Dupré, Maggie (10 October 2024). "Wikipedia Declares War on AI Slop". The Byte. Retrieved 10 October 2024.

WP:UPSD Update

Following Wikipedia:Village_pump_(policy)/Archive_201#URLs_with_utm_source=chatgpt.com_codes, I have added detection for possible AI-generated slop to my script.

Possible AI-slop sources will be flagged in orange, thought I'm open to changing that color in the future if it causes issues. If you have the script, you can see it in action on those articles.

For now the list of AI sources is limited to ChatGPT (utm_source=chatgpt.com), but if you know of other chatGPT-like domains, let me know!

Headbomb {t · c · p · b} 22:24, 8 April 2025 (UTC)[reply]

Thanks, this is awesome, I've already found a bunch of garbage to revert. You're probably already aware of this, but there's also a filter for this, Special:AbuseFilter/1346, being trialed. Apocheir (talk) 21:52, 9 April 2025 (UTC)[reply]

Thanks for the EF, I'll add the other AI agents to my script! Headbomb {t · c · p · b} 21:57, 9 April 2025 (UTC)[reply]

@Samwalton9:, I've added m365copilot.com to the EF, since that was listed at Microsoft Copilot. I think I did it right? Headbomb {t · c · p · b} 22:10, 9 April 2025 (UTC)[reply]

If you want, you can take a look at a relevant Phabricator task where I tested out the outputs of a few LLMs to see if any others gave a utm_source parameter, it seems like it is exclusive to ChatGPT. Chaotic Enby (talk · contribs) 22:29, 9 April 2025 (UTC)[reply]

I found this thread after some searching from now-closed thread [1], where it was used as a telltale for LLM use. Anyway there may be some urgency for searching insource:"utm_source=chatgpt.com", because there are also bots that go around stripping off utm-source junk from urls and we want to catch it before it is cleaned away. Currently I'm seeing about 1400 of them. —David Eppstein (talk) 21:43, 26 April 2025 (UTC)[reply]

Strip it out from all articles using script? scope_creep^Talk 22:06, 26 April 2025 (UTC)[reply]

But we don't want to just strip it out. We want to find it and check that the text added with it is accurate and not an AI hallucination. Stripping it out would prevent us from finding it. —David Eppstein (talk) 22:57, 26 April 2025 (UTC)[reply]

AI cleanup at NPP

I just became a NPP reviewer and have been messing around with it, and I just ran into some article by the same author whose sources I can't access at all (they're offline mostly, but the ones which have links are mostly deadlinks). I'm not going to link it because it's probably not AI, but I just realized that NPP reviewers are supposed to prevent hoaxes and suchlike, but for articles with mostly offline sources, especially those in different languages, there's no real good way to tell if an article is AI without knowledge of the subject matter. Should (or does) NPP have some guidance on this? Mrfoogles (talk) 15:59, 28 April 2025 (UTC)[reply]

If everything is offline (and several different websites are cited) then either it's AI or all the servers are affected by the current Iberian blackout. Flounder fillet (talk) 20:23, 28 April 2025 (UTC)[reply]

No, I meant books, not dead links. Also, I'm guessing you looked through my contributions history, but you've gotten the wrong one. Mrfoogles (talk) 01:31, 30 April 2025 (UTC)[reply]

A first step is checking if the books exist. Not to say that AI can't pretend it's using a real book, but if the book doesn't exist that's a strong indicator. CMD (talk) 03:53, 30 April 2025 (UTC)[reply]

Yeah, I gave it a shot, but they're in Arabic, so it's hard to tell. Mrfoogles (talk) 17:37, 30 April 2025 (UTC)[reply]

Talk to the creator for more information. —Alalch E. 21:27, 1 May 2025 (UTC)[reply]

Recommended additions to mentions

For the Mentions > Talk section: Wikipedia:WikiProject AI Cleanup/List of uses of ChatGPT on Wikipedia#Talk 2

Keyword(s) flagged	Talk page
ChatGPT	Talk_Carbon footprint
ChatGPT; LLM; AI	Talk_Climate Change
ChatGPT; LLM; AI	Talk_Donald Trump
ChatGPT; LLM	Talk_Earth
ChatGPT	Talk_Effects of Climate Change
ChatGPT; AI	Talk_Environmental, social, and governance
ChatGPT; LLM; AI	Talk_Generative artificial intelligence
ChatGPT; AI	Talk_Greenhouse gas
AI	Talk_Jimmy Carter
ChatGPT; AI; Quillbot;	Talk_Meetup SDGs Communication
AI	Talk_Natural disaster
ChatGPT	Talk_Net-zero emissions
ChatGPT	Talk_Sustainable energy
ChatGPT; LLM; AI	Talk_Tesla Model S
ChatGPT; LLM	Talk_Wikiproject Climate change
ChatGPT; LLM; AI	Talk_Wikiproject Environment

For the Mentions > User talk section: Wikipedia:WikiProject AI Cleanup/List of uses of ChatGPT on Wikipedia#User talk 2

Keyword(s) flagged	User talk page
ChatGPT; Deep Seek; Le Chat	User talk_Wikipistemologist

Didn't want to add these directly in, incase you only wanted ChatGPT-related Talk pages.

Wikipistemologist (talk) 22:23, 30 April 2025 (UTC)[reply]

Proposal: adopting WP:LLM as this WikiProject's WP:ADVICEPAGE (2)

Previous proposal: Wikipedia talk:WikiProject AI Cleanup/Archive 1#Proposal: adopting WP:LLM as this WikiProject's WP:ADVICEPAGE

Nothing major, adopting this proposal would just mean that Wikipedia:Large language models is tagged with {{WikiProject advice}} instead of {{essay}} and that it is moved to Wikipedia:WikiProject AI Cleanup/Guide. The current incomplete "Guide" would be merged either with it or with Wikipedia:WikiProject AI Cleanup/AI catchphrases. —Alalch E. 21:57, 1 May 2025 (UTC)[reply]

The issue of llms has been discussed far more widely than this WikiProject, in very broad community forums. Things are a bit scattered, but there should be a central repository for the community directly in the Wikipedia space. CMD (talk) 23:04, 1 May 2025 (UTC)[reply]

It doesn't appear that WP:LLM is that "repository", or any kind of repository. It would rather be the case that this WikiProject is the central hub of interest in this topic on Wikipedia. The breadth of forums that have discussed LLMs and AI did not translate into breadth of support for the essay such that it might become anything other than an ordinary essay. At the same time, Wikipedia:Artificial intelligence is an information page also covering LLMs. —Alalch E. 23:56, 1 May 2025 (UTC)[reply]

Yep, having a central hub here could be helpful. Wikipedia:WikiProject AI Cleanup/Resources kinda does that, but we can consider a separate subpage for on-wiki discussions. Chaotic Enby (talk · contribs) 13:59, 12 May 2025 (UTC)[reply]

Has the "AI images in non-AI contexts" list served its purpose?

Wikipedia:WikiProject AI Cleanup/AI images in non-AI contexts has been documenting reasons given for removing AI-generated images from Wikipedia articles, since 2023. Is there any reason to continue keeping track of this, now that WP:AIIMAGES has become policy? I assume the list page was created to help guide that eventual policy with organic examples from across Wikipedia, which would mean it was no longer really needed. Belbury (talk) 11:37, 12 May 2025 (UTC)[reply]

Yep, most of them have been deleted, and "what to do" is much clearer with the policy. Borderline cases (which will be less frequent, but will certainly happen) can be discussed on this very noticeboard. Chaotic Enby (talk · contribs) 14:06, 12 May 2025 (UTC)[reply]

Request for cleanup assistance at ANI

There is a request for cleanup assistance at WP:ANI § Cleaning up after User:M1rrorCr0ss's mess, which involves over 2,000 edits that need to be reviewed for AI-generated content. Some details were mentioned in an earlier discussion, WP:ANI § User:M1rrorCr0ss creating articles with fake sources, possibly with LLMs. — Newslinger talk 12:30, 21 May 2025 (UTC)[reply]

I've (hopefully) deleted all articles I can find created by M1rrorCr0ss, but (a) I'm not absolutely sure I've got them all, and (b) there are still the huge number of redirects and an unknown amount of garbage content inserted into other, legitimate, articles. Are there any tools for digging this sort of thing out, to allow root-and-branch removal of contributions by an editor? — The Anome (talk) 11:09, 22 May 2025 (UTC)[reply]

I don't know of any specific tool for that, but one could probably be coded using Wikipedia:WikiBlame to find the editor's additions. Chaotic Enby (talk · contribs) 11:20, 22 May 2025 (UTC)[reply]

The Edit Counter can identify all pages with live edits by this user, but not if their content is still in those articles. –LaundryPizza03 (d c̄) 04:40, 24 May 2025 (UTC)[reply]

Course of action for new AI account

Hello, a new user has begun editing and on their user page says that they "extensively utilize BIDAI (Boundaryless Information & Data Analysis Intelligence), an advanced analytical system engineered by EIF." I've found their edits to be extremely unproductive and have warned them of such, but I was wondering if there is a standard approach for dealing with such accounts? Reporting without warning or discussion seems extreme, but the potential for this user to cause significant damage to Wikipedia is also very real. I didn't see a clear-cut policy, but I also admittedly didn't look to deep. Thanks. Vegantics (talk) 14:29, 22 May 2025 (UTC)[reply]

We don't specifically have policies for this yet (we still don't have a general AI-use policy), but the course of action for unproductive AI-using editors has usually been to report them to ANI. Chaotic Enby (talk · contribs) 14:37, 22 May 2025 (UTC)[reply]

Thanks @Chaotic Enby. I'll see if they respond to my Talk page comments/continue editing and will plan to report if they continue this disruptive pattern. Vegantics (talk) 14:39, 22 May 2025 (UTC)[reply]

I believe the obvious lack of any meaningful human oversight means this Spledia (talk · contribs) is merely acting as a facade for a computer program, and that their account is thus in effect a disguised bot account. I've suggested they request approval via the normal bot approval process. Given their past editing record, I think they have a mountain to climb with this, but the bot approval process seems like a good way to deal with this kind of blatant automated editing. In the meantime, I've blocked them from editing or creating article content. — The Anome (talk) 05:50, 23 May 2025 (UTC)[reply]

Collapsible templates

I've created the {{Collapse AI top}} and {{Collapse AI bottom}} templates that can be used for collapsing (hatting) disruptive talk page discussions that contain LLM-generated text. The {{cait}} and {{caib}} shortcuts are easier to use than the full template names. For an example of the template in action, see Talk:Ark of the Covenant.

The benefits of these AI-focused templates over generic collapsible templates like {{hat}} and {{hab}} are the convenient standardized message and the fact that transclusions of these templates can be tracked to monitor the extent of disruptive LLM use on talk pages.

Please let me know if you have any feedback, or simply improve the templates yourself. — Newslinger talk 09:25, 23 May 2025 (UTC)[reply]

Automatic reference link checker

Would it be possible to create a bot that would check new articles, follow all embedded links, such as citation links, and attempt to fetch them? 404-ing and similar reference links are an obvious sign of lazy AI slop, and it would be easy to catch these early using this, and to tag articles for examination by editors. It could also try to check the linked references for at least some reseblance to the subject of the article: either through simple text comparison, or a ML method such as comparing embeddings (of which text comparison is a trivial example). It would obviously not detect sophisticated AI slop, but that's another issue entirely.

The obvious problem is the anti-crawler features of websites themselves that would tend to block accesses by the bot. Are there any services that can provide this kind of crawler access to third party sites in an ethical way, for example via a WMF-brokered use-whitelisted API obtained via an organization like Google, Cloudflare, Microsoft, Kagi ([2]) or the Internet Archive who have generally unrestricted access to crawling (something like, say, Google's "Fetch as Google" service)? — The Anome (talk) 10:42, 23 May 2025 (UTC)[reply]

See also this: https://news.ycombinator.com/item?id=23149841 While slow, the IA's fetch would be ideal for this purpose. Combined with a cache, it would be highly effective. It doesn't really matter if it takes several minutes to do a fetch, for the purposes of bots, which can take as long as they like. Because it would get a lot of hits, it would probably have to be a service agreeement with the IA to prevent it being rate-limited or blocked by them. The IA also seems to offer an API: https://archive.org/help/wayback_api.php — The Anome (talk) 11:24, 23 May 2025 (UTC)[reply]

Some AI generated content possibly goes under the radar. So, this bot proposal is a good idea. But this will only be good for new articles, which needs to undergo patrolling, so there is already some human supervision. For AI editors expanding existing articles with fake references, bot would need to check every article that has seen a recent edit. —CX Zoom[he/him] ^{(let's talk • {C•X})} 12:49, 23 May 2025 (UTC)[reply]

Absolutely. It will only catch the very dumbest AI slop content, but it appears that is currently low-hanging fruit, and still worth doing. I really like the idea of a content cache for already-fetched reference content; automated checking of references is a really promising research area, and one, I think, where using LLMs is entirely valid, if it is used with the correct threshold settings, so that it is more sceptical than the average human reviewer, and bad references can either be flagged as wholly bad (naive slop detection) or simply questionable (detecting either superior-quality slop, vandalism, or mediocre human contributions), and human review can then take over. — The Anome (talk) 13:30, 23 May 2025 (UTC)[reply]

I'll take the opportunity to point to #WP:UPSD Update above, in case @The Anome: didn't see it. Headbomb {t · c · p · b} 13:45, 23 May 2025 (UTC)[reply]

WP:LLMN?

Should this talkpage be considered the LLM noticeboard (perhaps adding a couple of redirects like WP:LLMN and Wikipedia:Large language models/Noticeboard?)? If not, should one be made? I wonder because I came across Zaida, Khyber Pakhtunkhwa and wanted someone more familiar with LLM to take a look, though I did find a maintenance template I added to the article. Gråbergs Gråa Sång (talk) 05:31, 24 May 2025 (UTC)[reply]

I think it should. That seems like the intent of the WP:AINB shortcut, and there is precedent in designating a maintenance-oriented WikiProject talk page as a noticeboard: see Wikipedia talk:WikiProject Spam, which is listed on {{Noticeboard links}}. — Newslinger talk 06:19, 24 May 2025 (UTC)[reply]

I agree with Newslinger, this has been de facto our LLM noticeboard, and it makes sense to have WP:LLMN and similar shortcuts redirect here. Chaotic Enby (talk · contribs) 12:35, 24 May 2025 (UTC)[reply]

To facilitate searching for specific discussions in the archives, I suggest the active participants on this talk page should consider if it wants to keep project discussion separate from discussions of specific situations. isaacl (talk) 15:41, 24 May 2025 (UTC)[reply]

That could also be a good alternative, assuming there are too many discussions and searching them ends up overwhelming. However, some discussions of specific situations can easily end up broadening in scope, so a separation between them might not always be practical. Chaotic Enby (talk · contribs) 15:46, 24 May 2025 (UTC)[reply]

I do think that a separately maintained page will be better, because I can only see the issue grow in size in future. —CX Zoom[he/him] ^{(let's talk • {C•X})} 17:57, 24 May 2025 (UTC)[reply]