Jump to content

Wikipedia:Edit filter/Requested

From Wikipedia, the free encyclopedia
    Requested edit filters

    This page can be used to request edit filters, or changes to existing filters. Edit filters are primarily used to address common patterns of harmful editing.

    Private filters should not be discussed in detail. If you wish to discuss creating an LTA filter, or changing an existing one, please instead email details to wikipedia-en-editfilters@lists.wikimedia.org.

    Otherwise, please add a new section at the bottom using the following format:

    == Brief description of filter ==
    *'''Task''': What is the filter supposed to do? To what pages and editors does it apply?
    *'''Reason''': Why is the filter needed?
    *'''Diffs''': Diffs of sample edits/cases. If the diffs are revdelled, consider emailing their contents to the mailing list.
    ~~~~
    

    Please note the following:

    • Edit filters are used primarily to prevent abuse. Contributors are not expected to have read all 200+ policies, guidelines and style pages before editing. Trivial formatting mistakes and edits that at first glance look fine but go against some obscure style guideline or arbitration ruling are not suitable candidates for an edit filter.
    • Filters are applied to all edits on all pages. Problematic changes that apply to a single page are likely not suitable for an edit filter. Page protection may be more appropriate in such cases.
    • Non-essential tasks or those that require access to complex criteria, especially information that the filter does not have access to, may be more appropriate for a bot task or external software.
    • To prevent the creation of pages with certain names, the title blacklist is usually a better way to handle the problem - see MediaWiki talk:Titleblacklist for details.
    • To prevent the addition of problematic external links, please make your request at the spam blacklist.
    • To prevent the registration of accounts with certain names, please make your request at the global title blacklist.
    • To prevent the registration of accounts with certain email addresses, please make your request at the email blacklist.




    Edits adding raw text maintenance tags instead of standard templates

    [edit]
    • Task: Log edits meeting above criteria
    • Reason: Editors have been adding maintenance tags in raw text instead of using the correct templates
    • Diffs: Examples of fix: Special:Diff/1293287735, 74 JWB edits

    I wish to log edits where editors add Wikipedia:, WP:, Help: (case insensitive) inside of <sup></sup> to check the extent to which new editors use raw text instead of actual maintenance tags, and if a bot will be required for regular maintenance of this or not. Thanks! CX Zoom[he/him] (let's talk • {CX}) 21:35, 31 May 2025 (UTC)[reply]

    This might work:
    added_lines contains "<sup>[''[[Wikipedia:"
    
    『π』BalaM314〘talk〙 14:37, 11 June 2025 (UTC)[reply]
    The basics could be something like
    equals_to_any(page_namespace, 0) &
    added_lines irlike "<sup>(\[|&#x5B;)\'\'\[\[(Wikipedia|Help|WP|H)\:"
    
    I haven't checked yet for how many of those edits it works, but it could be a start. Nobody (talk) 06:10, 12 June 2025 (UTC)[reply]
    And instead of using equals_to_any(page_namespace, 0), we could just use page_namespace == 0 since there is only one namespace being checked here. – PharyngealImplosive7 (talk) 07:37, 12 June 2025 (UTC)[reply]
    I left it like that because I'm not sure if we should add the draftspace too. Nobody (talk) 07:40, 12 June 2025 (UTC)[reply]
    The regex looks to be working as intended. CX Zoom[he/him] (let's talk • {CX}) 01:21, 13 June 2025 (UTC)[reply]
    I'll test the regex with all of the hits from above. @CX Zoom looks like there's already some more again according to this search. Want to start JWB again or wait until we got the filter in place? Nobody (talk) 06:29, 18 June 2025 (UTC)[reply]

    Adding nonexistent templates

    [edit]
    • Task: Log (and possibly tag?) whenever the user adds a transclusion of a template that does not exist. Also, warn on certain edits. Warning all such edits would be far too bitey in my opinion. However, in my opinion, I think the following categories of edits would likely benefit from a warning. (Also, I think the warnings should mainly be restricted to mainspace or maybe talkspace; drafts and userpages should be free for now.) The categories of edits to warn could be, of course, adjusted once the filter is in.
    • Edits by IPs and new users. In many cases, these are vandalism.
    • Edits made in the mobile Wikipedia app. (Since it doesn't have a visual editor, it's incredibly easy to forget to close your curly braces; a warning would really help with this.)
    • Malformed citations (this could be detected by looking for the word "cite" or a URL in the title).
    • Nonexistent "country data" templates; these can arise from misuse of templates like {{flag}} or {{flagicon}}
    • Nonexistent WikiProject tags in talkspace (which could both arise from vandalism and from AWB mistakes).
    (One possible approach could be to pair a log-only filter, which catches all nonexistent templates, with a separate warn filter. This two-filter approach has been tried before, for example in 1296 and 1297, so I don't see any reason why it wouldn't work here. Given the heterogeneity of these categories, it might be even better to have multiple warn filters, so we could show a different warning for each category.)
    • Diffs: For some of these categories:

    Duckmather (talk) 00:38, 3 June 2025 (UTC)[reply]

    @Duckmather: I'm commenting on the technical aspects here, to see if such a filter (or multiple) could exist. I'm not sure if the abusefilter can check whether a template exists or not; maybe it could check if a the text in a template exists (in the template namespace of course) using page_last_edit_age (if it's not null, then the page exists) but I'm not sure. The mobile app constraint is fairly easy to check (the AbuseFilter extension) has a variable user_app built into it for this exact purpose.
    I'm not sure if the "country data" thing is actually possible with an AbuseFilter - that would require crosschecking with whatever module supports those the {{flag}} and {{flagicon}} templates which isn't possible as far as I know. A similar issue occurs with the wikiproject tags issue you bring up; the AbuseFilter can not cross-reference a module as far as I know. I also don't think checking if someone has properly closed template brackets or otherwise is possible with the AbuseFilter in a feasible way; the logic would be pretty complex and would still produce a lot of FPs. – PharyngealImplosive7 (talk) 03:50, 3 June 2025 (UTC)[reply]
    @PharyngealImplosive7: The filter is in fact doable. The key idea is to use the new_html variable, which uses parsing to detect whether templates exist or not. For example, if I were to write {{fake example}}, this would translate into HTML as <a href="/w/index.php?title=Template:Fake_example&action=edit&redlink=1" class="new" title="Template:Fake example (page does not exist)">Template:Fake example</a>. You could in turn detect this redlink using the regex <a href="\/w\/index\.php\?title=Template:[^"]*\;redlink=1\" class=\"new\". Of course, this line of code by itself would generate false positives, as wikilinking a nonexistent template the usual way will also produce identical HTML, so it would need some refining. But I don't see any fundamentally technical barriers preventing you from pulling this off. Duckmather (talk) 04:37, 3 June 2025 (UTC)[reply]
    @Duckmather: That indeed is a smart approach; I didn't think of that. However, one more thing to note is that new_html is a large variable, so ideally it should be placed at the end of any filter for performance reasons. I believe that my point of detecting if someone has left brackets closed or not is unfeasible still stands though. – PharyngealImplosive7 (talk) 06:07, 3 June 2025 (UTC)[reply]
    After a moderate amount of thought and a lot of procrastination, I have some draft code. Define the following regular expressions:
    wikitext_template := "{{[^\||\n|}]*(\||\n|}})";
    common_template := "(?x){{(?:
    !
    |[Aa]nchor
    |[Aa]s\ of|[Aa]uthority\ control
    |[Bb]irth\ date(?:\ and\ age)?
    |[Bb]lockquote
    |[Cc]-SPAN
    |[Cc]bignore
    |[Cc]irca
    |[Cc]itation needed
    |[Cc]ite\ (?:AV\ media|book|conference|encyclopedia|interview|journal|magazine|news|press\ release|tweet|web)
    |[Cc]lear
    |[Cc]n
    |[Cc]oord
    |[Ee]fn
    |[Ee]?m(?:dash)?
    |[Ff]urther
    |[Gg]Burl
    |[Gg]loss
    |[Gg]oogle\ [Bb]ooks(?:\ URL)?
    |[Hh]arvnb
    |[Ii]PAc-(?:ar|cmn|en|hu|pl|yue)
    |(?:ISBN|isbn)\??
    |[Ii]nfobox\ (?:album|book|company|film|football\ biography|musical\ artist|NRHP|officeholder|person|settlement|song|television)
    |[Ll]angx?
    |[Ll]egend
    |[Mm]ain
    |[Mm]ath
    |[Mm]dash
    |[Mm]ultiple\ image
    |[Nn]bsp
 |[Nn]owrap
    |[Oo]fficial\ website
    |[Pp]lainlist
    |[Pp]p(?:-(blp|dispute|extended|semi-indef|sock|vandalism))?
    |[Pp]roQuest
    |[Rr]ef(?:begin|end|h|list)
    |[Ss]fnm?
    |[Ss]hort\ description
    |[Tt]OC\ limit
    |[Uu]se\ (dmy|mdy)\ dates
    |[Uu]se\ (American|Australian|British|Canadian|Hong\ Kong|Indian|Jamaican|Kenyan|Liberian|New\ Zealand|Nigerian|Pakistani|Philippine|Singapore|South\ African|Sri\ Lankan|Trinidad\ and\ Tobago|Ugandan)\ English
    |[Ww]ebarchive
    |[Ww]ikiProjectBannerShell

    |[Ww]ikiProject\ (Albums|Anthroponymy|Australia|Articles\ for\ creation|Biography|Canada|Cities|banner\ shell|Disambiguation|Football|Film|France|Germany|India|Lepidoptera|Lists|Military\ history|Olympics|Songs|Television|United\ States)
    )\s*(?:\||\n|}})";
    nonexistent_template = '<a href="\/w\/index\.php\?title=Template:([^" ]*)\;redlink=1\" class=\"new\"';
    (To explain: wikitext_template catches the use of any template in wikitext; common_template catches various commonly used templates; and nonexistent_template catches a HTML link to a nonexistent template. Part of why this took so long is that I had to try several different things before I could get a satisfactory list of common templates.)
    With these regular expressions in place, the logging filter could be defined as follows:
    added_lines rlike wikitext_template &
    rcount(common_template, added_lines) < rcount(wikitext_template, added_lines) &
    new_html rlike nonexistent_template
    and with the same regular expressions in place, the warning filter could be defined as follows:
    equals_to_any(page_namespace, 0, 1, 118) &
    rcount(common_template, added_lines) < rcount(wikitext_template, added_lines) &
    new_html like nonexistent_template &
    (
    !(contains_any(user_groups, "autoconfirmed", "bot", "confirmed"))
    | user_mobile
    | user_app
    | (summary rlike "^Created by translating the page" & page_id == 0)
    | new_html rlike '<a href="\/w\/index\.php\?title=Template:[^" ]*(cite|https?:\/\/|doi|isbn|(IPA|lang-)[\w-]+|\w+\ icon|wikiproject|[^\x00-\xFF\s–—])[^" ]*\;redlink=1\" class=\"new\"'
    ) &
    !(summary irlike "restor(?:ed?|ing)|revert(?:ed|ing)?|und(?:o|id)" & page_id != 0)
    Duckmather (talk) 05:43, 23 June 2025 (UTC)[reply]
    Also, another thing you could watch out for when it comes to malformed citations are DOIs (example) and ISBNs (example). Duckmather (talk) 05:14, 3 June 2025 (UTC)[reply]

    Monitoring disruptive reclassification of Nigerian ethnic groups

    [edit]

    There's a fair bit of disruptive edit warring around Igboid and Ijaw languages and cultures, swapping them back and forth without sources. The most recent editor has been making edits primarily removing claims of Igbo heritage and inserting other ethnic names, most commonly Ijaw. [1][2][3] However, articles in this space have significant editing in the opposite direction as well, with some editors inserting claims of Igbo heritage into articles in the place of other ethnic groups. [4][5][6]

    I'm interested in helping out more with edit filters so I've tried to draft one below. It's my first time requesting an edit filter so I'm not sure if it's possible to handle both "Ijaw" → "Igbo" and "Igbo" → "Ijaw", etc.

    !contains_any(user_groups, "extendedconfirmed", "sysop", "bot") &
    page_namespace == 0 &
    (
      ethnic_groups := "(?i)ijaw|ijo|igbo|igboid|igboland|edo|edoid";
      
      / * ethnic group names present both before and after edit */
      added_lines rlike ethnic_groups & removed_lines rlike ethnic_groups
    )
    

    Dan Leonard (talk • contribs) 17:15, 3 June 2025 (UTC)[reply]

    It would probably also be smart to add a check that does not flag edits that add references using regex. irlike also exists for case-insensitive marking. The current filter would also match someone adding/removing names of the same ethnic group. Here is a revised version of the filter:
    !contains_any(user_groups, "extendedconfirmed", "sysop", "bot") &
    page_namespace == 0 &
    (
      igbo_text := "\sigbo(?:id|land)?\s";
      ijaw_text := "\sij(?:aw|o)\s";
      edo_text := "\sedo(?:id)?\s";
      sourcing := "\{\{[Cc]ite\b|(?i)<ref(?:\s[^>]*?)?/?>";
      
      rcount(sourcing, added_lines) <= rcount(sourcing, removed_lines) &
      
      (
         added_lines irlike igbo_text &
         (removed_lines irlike ijaw_text | removed_lines irlike edo_text)
      ) ^
      (
         added_lines irlike ijaw_text &
         (removed_lines irlike igbo_text | removed_lines irlike edo_text)
      ) ^
      (
         added_lines irlike edo_text &
         (removed_lines irlike igbo_text | removed_lines irlike ijaw_text)
      )
    )
    
    PharyngealImplosive7 (talk) 18:02, 3 June 2025 (UTC)[reply]
    It's probably worth adding a single added_lines irlike "\b(group1|group2|group3)\b" as a pre-filter. Going straight to two rcount calls for any new user edit in article space would probably be more expensive than ideal. In the interior of the filter, it might be good to do counts on each ethnic group somewhat like 1338 (hist · log) then use some logic around numeric comparisons (I'm not sure about using XOR, but I'd want to test a filter on a lot more unique examples). @Dan Leonard: Could you dig up more example edits? The more unique edits from different users that we have to test, the better our initial filter will be. Thanks. Daniel Quinlan (talk) 22:27, 4 June 2025 (UTC)[reply]

    Prevent Undo of Rollbacks/tool edits by new users

    [edit]

    Not sure if this has been recommended before, but it's a pretty common vandalism pattern to just click "undo" on rollbacks/undos performed by tools like Twinkle, Huggle, RedWarn, etc. Is it worth considering a filter that would prevent new users from using the undo tool to reverse these edits? Ed (talk) 21:54, 3 June 2025 (UTC)[reply]

    Such a filter would raise similar concerns to what was discussed in this conversation about a similar suggested filter. A lot of rollbacks/undos will also be mistakes.
    On the technical side of things, we will have to rely on edit summaries to tell if an edit was made by twinkle, is a rollback, etc, as the AbuseFilter can't tell as far as I know. – PharyngealImplosive7 (talk) 22:33, 3 June 2025 (UTC)[reply]
    In that case, this might be better off as a bot task instead. Duckmather (talk) 05:18, 4 June 2025 (UTC)[reply]
    I don't think this is a good task for a bot, as again, the false positive rate would be unallowably high - rollbackers do make mistakes. A manually-confirmed bot would essentially be doing nothing that interfaces like RecentChanges paired with Twinkle/rollback/etc can't do. – PharyngealImplosive7 (talk) 05:56, 4 June 2025 (UTC)[reply]
    In that case, the bot could add a tag to such edits. Alternatively, the list of recent such edits could be a subpage of Wikipedia:Database reports. But then again, this is probably getting out of scope of this page. Duckmather (talk) 19:53, 22 June 2025 (UTC)[reply]
    As far as I know, a bot can't really add a tag - that is a job for an edit filter, but as Daniel Quinlan says, we need consensus before implementing such a filter. Someone here could create an RfC or go to the village pump. – PharyngealImplosive7 (talk) 21:49, 22 June 2025 (UTC)[reply]
    A filter with that level of impact would essentially be a major policy change. A much broader discussion would need to happen before considering technical feasibility. Daniel Quinlan (talk) 22:12, 4 June 2025 (UTC)[reply]

    Possible conflict of interest edits where username is suspiciously similar to the page title?

    [edit]
    • Task: see section title
    • Reason: I just reverted two edits by User:AndiArndt8, a username which I noticed is suspiciously similar to the title of the article, Andi Arndt. (This is also why I warned them with {{uw-coi}}.) There are two filters that check for exactly this - 148 and its logging companion 147 - but they only check page creations. Since COI is a huge problem on Wikipedia, this filter could be very beneficial.
    • Diffs: first, second

    Duckmather (talk) 14:43, 22 June 2025 (UTC)[reply]

    Hey Duckmather, this is Andi Arndt. Are we not allowed to edit info that is about us? I no longer live in the Shenandoah Valley so I updated it along with a few other items. I’m new to Wikipedia so if it’s not allowed we can just pretend I live in Virginia. 172.222.126.162 (talk) 17:14, 22 June 2025 (UTC)[reply]
    As a rule of thumb, no. See WP:COI. You should make an edit request instead, or at least cite a reliable source that isn't yourself. Duckmather (talk) 18:36, 22 June 2025 (UTC)[reply]