Jump to content

Wikipedia:Wikipedia Signpost/Single

From Wikipedia, the free encyclopedia
The Signpost
Single-page Edition
WP:POST/1
24 June 2025

 

File:Wikipedia-logo-v2-en-7m-square.svg
Chaotic Enby
CC-By-SA 4.0
0
25
300

Happy 7 millionth!

Wikipedia administrators arrested in Belarus

TKTK
Maksim Lepushenka (left) at a 2016 Wikipedia community meetup
TKTK
Olga Sitnik at Wikimania 2014

Belarusian Wikipedia sysop and sole bureaucrat Maksim Lepushenka was arrested last month, becoming the latest Wikipedian arrested in Belarus, as reported by Wikinews with reference to Nasha Niva and other sources.

With his arrest, granting sysop rights in Belarusian Wikipedia became impossible without stewards' intervention. Lepushenka's account is currently globally locked as a compromised account.

Additionally another sysop, Olga Sitnik, was arrested in April. Sitnik's account is also similarly globally locked, leaving the project with only eight sysops to work on administrative tasks.

The arrests follow detentions and disappearances of other Belarusian community members since late 2024, and similar persecutions in 2022 (see Signpost coverage: "The Russian Wikipedia edit that resulted in arrest and jail time", "Second case of persecution of Wikipedians in Belarus"). – RS, H

7 million articles on English Wikipedia

Puzzle globe with ribbon surrounding, stating "7,000,000 articles", and "Wikipedia the free encyclopedia" written beneath

On May 28, 2025, 02:26 UTC, English Wikipedia reached another milestone, having seven million articles. There were several articles created about the same time, so after a discussion, Operators and Things created by User:Therapyisgood was selected to represent this milestone. Among some of the articles created about the same time were 1955 Yuba–Sutter floods, British American Hospital, Khorastava rural council, Nikolay Alyokhin, and Taraxacum angustisectum. – RS

Signpost coverage of preceding milestones:

Tesla demands logo removal

TKTK
Tesla demands takedown of this image, claiming it passes threshold of originality of copyrightable art.

The Wikimedia Foundation requested the attention of Wikimedia reviewers regarding a DMCA media takedown demand which Tesla, Inc. made to Wikimedia. As is usual for the Wikimedia Foundation, they are sharing the demand letter and inviting community conversation at the Wikimedia Foundation office action discussion board on Wikimedia Commons.

While DMCA takedown demands are serious, they are also routine. For years, many Wikimedia community volunteers have regularly enjoyed processing these orders. As with all Wikimedia community processes, everyone is invited to join and engage, and everyone is invited to visit and observe the review process.

This is a standard DMCA request, but it might be seen as different to outsiders due to the political climate surrounding Elon Musk. Current popular thought connects Tesla as a corporation, its CEO Elon Musk, and the so-called United States Department of Government Efficiency where Musk was director. Along with that, Musk has been criticizing Wikipedia, as The Signpost previously reported in January and February. A United States attorney (no longer in office) has expressed intent to remove Wikipedia's nonprofit status as previously reported in May. In February, United States National Agricultural Library Wikimedian in Residence Jamie Flood had reported that DOGE ended their role.

The logos to be removed are

The conventional Wikimedia reviewer consensus of these images till now is that they are not eligible for copyright and are instead in the public domain due to being simple geometric shapes which are below the threshold of originality, which makes a work eligible to gain copyright protection. The text rationales which the Wikipedia community uses to explain this are in Commons:Template:TOO-US and Commons:Template:PD-textlogo.

Of the four files above, the shield logo was deleted by the Foundation upon further review as the uploader had provided inaccurate information about the shield logo and it was the exact file registered with the copyright office.BR

OKI replaces A2K after Indian government's crackdown on WMF funding

Following the April 9 announcement by the Centre for Internet and Society (CIS) that it was halting its longstanding "Access to Knowledge" (A2K) program in India after losing its "Foreign Contribution (Regulation) Act" certificate (Signpost coverage: "India cuts off the Wikimedia Foundation's funding of the 'Access to Knowledge' program"), a May 27 posting on the Wikimediaindia mailing list (signup required) informed the Indian community that

As part of a strategic integration, Access to Knowledge (A2K) has become part of IIIT-Hyderabad's Raj Reddy Centre for Technology and Society and will continue its work as the Open Knowledge Initiatives team https://meta.wikimedia.org/wiki/IIITH-OKI. Most members of A2K have joined the Open Knowledge Initiatives team at IIIT-Hyderabad, alongside a few colleagues already working on Open Knowledge and Wikimedia within the institute.
[...] While the team’s structure has evolved, our dedication to collaborating with Indic Wikimedia and Open Knowledge communities remains unchanged.

The announcement was greeted with relief and excitement by various community members (including some WMF staff) on the mailing list and on Meta-wiki. The newly created IIITH-OKI page on Meta-wiki doesn't yet provide information about the initiative's funding, but describes its mission as centre[d] around two interconnected goals: Advancing language diversity and Expanding equitable access to knowledge to support and augment the open knowledge and technology ecosystem across the Indic languages. – H

AI summary trial halted

A screenshot of AI-generated summary of the Dopamine article
AI generated summary of the Dopamine article (which repeatedly violates MOS:OUR)

On 2 June 2025, the Web Team introduced an upcoming trial in which AI-generated summaries of articles would be placed on the top of the articles. The community responded, criticising inaccuracies in the sample generated summary that was used in the mockup. Linus Media Group's Luke Lafreniere remarked in a podcast on June 13 that article leads written by humans are good and are what drive him to desire more information. The proposed trial faced a large amount of backlash from the community and after 11 days of discussions, the team decided to put the project on hold.

A limited experiment held between 4 and 17 December 2024 was later uncovered. The experiment was conducted to readers who had installed a browser extension that served Content Discovery experiments. It generated over 10,000 impressions and had 825 clicks to open the generated summaries. The experiment measures the reaction of the participants with a question, "Was this useful?" and a binary 'Yes' or 'No'. It received approximately 75 'Yes' out of 101 responses. Checks on the summaries by editors had revealed a number of issues ranging from phrasing issues (including promotional writing) to inaccuracies.

Further discussion on the use of large language models on Wikipedia and other matters are currently ongoing at the original thread (though partially archived) and at a RfC on a community position on WMF AI development at the time of publication/writing. – RS

Brief notes

Phuan women in traditional dress, appearing on the main page of Lao Wikipedia
  • Wikimedia chapter recognition: The Affiliations Committee recognized Wikimedia Aotearoa New Zealand as a chapter. It started as a user group in 2020.
  • There were two administrator recall petitions initiated - The petition for Bbb23 was certified on June 6, while that for Necrothesp was closed early after all signatories withdrew. Bbb23 stated that they will not try to regain the admin tools.
  • Milestones: The following Wikimedia projects have reached milestones in May or June:
  • Articles for Improvement: This week's Article for Improvement is Modern Pagan views on LGBT people (beginning 16 June), followed by Urarina language (beginning 23 June 2025). Please be bold in helping improve these articles!
  • Resource pilot project: Wikipedia:Resource support pilot, a pilot project by the Foundation has launched to provide editors an avenue to acquire books as sources.
  • Terms of use update: The ToU has been updated administratively by Wikimedia Foundation. You can read the diff post about the changes made.



Reader comments

File:Allan Alcorn in 2012 with first Pong ad.jpg
Digital Game Museum
cc-by-2.0
150
700

Playing professor pong with prosecutorial discretion

Four-dimensional pong

Three-way pong, anyone?

There is a three-way pong match taking place, concerning the 2023 article "Wikipedia’s Intentional Distortion of the History of the Holocaust" published in The Journal of Holocaust Research and the conclusions that should be drawn from it.

To enter the fourth dimension, you can sample "Wikipedia's War on Truth: The Fight Against Bias Toward Israel" (2:45:36 duration) in which Boaz Hepner hosts a panel of Wikipedia experts, including journalists and Justapedians, on the assumed role of Wikipedia's neutrality, and how far it appears to have veered from that path[, e]specially when analyzed by its stance on the Israel/Palestinian conflict, and Zionism. It was filmed in February 2025 and presented by The Jewish Journal of Greater Los Angeles in May. Or "Taking Sides: Wikipedia Advances Anti-Israel Narratives" in RealClearInvestigations, released after our writing deadline, which finds The plethora of anti-Israel academics makes it easy to present anti-Israel narratives under the guise of neutrality.

For prior Signpost coverage see 2023-03-09 Recent research, 2023-05-22 In the media, 2023-05-22 Arbitration report, and 2023-06-05 In the media. – B, SR, AK

Ed Martin's record

The New York Times reports on Ed Martin's record as acting U.S. Attorney for the District of Columbia ending, with his high jinks trying to get the WMF's tax-free status revoked. The Times described Martin's action as being radically opposed to the accepted standards of prosecutorial discretion.

At another point, Mr. Martin said he viewed himself as being engaged in a "war over information." He cited a letter he had written to Wikipedia accusing it of bias and improperly shielding itself from scrutiny through its tax-exempt status.

He never suggested that Wikipedia had broken the law. He indicated that his letter was to air his view of their actions in the public arena.

"A prosecutor saying that about Wikipedia is vastly different than Tucker Carlson saying it," Mr. Martin said. "And that’s the point of the job."

The role of a prosecutor, he claimed, "is not just to find the right guy to prosecute" but to publicize their purported wrongdoing in public.

In a landmark 1940 speech, a top Justice Department official offered another vision.

Robert H. Jackson, then the attorney general, told department employees that the "greatest danger of abuse of prosecuting power" was when an official "finds some person whom he dislikes or desires to embarrass, or selects some group of unpopular persons and then looks for an offense."

Meanwhile, the Senate failed to vote to confirm Trump's appointment of Martin as US Attorney for DC, so Martin was instead appointed to the positions of the head DOJ Pardon Attorney and director of the DOJ's Weaponization Working Group, which don't require Senate confirmation. To be clear, the Weaponization Working Group's job is to counter weaponization that allegedly occured during the Biden administration. – S

In brief

Donald Trump in the Oval Office, February 11, 2025 (public domain)



Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit our next issue in the Newsroom or leave a tip on the suggestions page.




Reader comments

File:Sock puppet and keyboard.jpg
Alexbrn
CC-by-sa 4.0
500

Pardon me, Mr. President, have you seen my socks?

TKTK
Get out of jail free card, Mark Strozier CC-BY 2.0

Since his inauguration on January 20, US President Donald Trump has been passing out a lot of get-out-of-jail-free cards: pardons, remissions of fines and restitution, and other forms of executive clemency. But this isn't exactly free. Chris Christie divides Trump's clemencies into three categories:

"Pay-to-play pardons" implies that Christie believes that Trump is, in effect, selling pardons. He is not the only one.

Looking through a list of recent pardons,[1][2] you may see some stories of people you agree ought to have been pardoned, and some that look like they've simply had their records whitewashed.

Is whitewashing the way the world works now, or is it just that way in Washington, DC? Does it work that way, for example, on Wikipedia?

This Signpost investigation examines how it has worked on Wikipedia, for some of the same people that Trump has pardoned. For reasons of space, I limit the sample to white collar criminals, and of course there needs to be an article about them on Wikipedia.

Have the articles been whitewashed? Have they paid for someone to do the whitewashing? That can be difficult to determine: but there are good, if not perfect, records of which editors have edited each article and how often, which editors have been blocked by our administrators for undisclosed paid editing (UPE), and which editors have been blocked via a sockpuppet investigation (SPI). Sockpuppets (or just "socks") are alternate accounts used by editors to deceive other editors. A large group of socks is sometimes called a "sockfarm"; often SPIs show information that indicates which socks belong to which farm, and who the farm's usual types of customers are. There are limits, however, on how much we know from just examining Wikipedia's extensive editing records. For example some editors may try to embarrass an article subject by mimicking a subject's supporter in a Joe job. Wikipedia's rules also limit what we can publish about socks – or about any editor for that matter.

White collar criminals

TKTK
Rod Blagojevich, December 9, 2008, US Marshals Service , pd

The editing records of articles on 18 convicted felons who were all granted clemency this year by Trump, and of four enterprises associated with them, were examined. The editors who were later blocked for undisclosed paid editing, for sockpuppeting, or blocked by checkusers were recorded, as were the number of edits they made for each article. Most of these editors were likely to be working for the article subject, but some may have been working against the subject — perhaps for their competitors or political enemies. Not all of the blocked socks or paid editors were making controversial edits. There are almost always some who make grammatical edits, small updates, or other housekeeping-style edits.

Article(s) Legal milestones Apparent socks Sock edits Selected socks Selected sockfarms
Rod Blagojevich
  • 2009 impeached and removed as governor
  • 2012 Federal conviction
  • 2012–2020 time served
  • 2020 clemency
  • 2025 full pardon
40 94 Hello4321
Levineps
Freakshownerd
Hello4321
Oriole85
User:ChildofMidnight
User:[email protected]
Michael Grimm (politician)
  • 2014 indicted (20 counts)
  • 2014 pleaded guilty (1 count)
  • 2015 served 7 months in prison
15 147 Screwball23
Champaign Supernova
CFredkin
Marquis de la Eirron
DisuseKid
[email protected]
Trevor Milton,
Nikola Corporation
  • 2021 indicted for 3 counts of fraud
  • 2022 convicted but not imprisoned pending appeal
  • 2025 pardoned
14 29 Harley.M.X
Dimentow
Sonarsavvy (UPE)
Deltagammaz
Yoodaba (2 editors)
Todd Chrisley Julie Chrisley (0 socks) Chrisley Knows Best
  • 2019 indicted on 12 bank fraud, wire fraud, and tax evasion counts
  • 2022 convicted on all counts: Todd sentenced to 12 years in prison, Julie to 7 years. They were to pay $17.8 million in restitution.
  • 2023 began prison sentences
  • 2024 convictions upheld
  • 2025 full and unconditional pardon
11 26 Highwatermark1
CLCStudent
FoCuSandLeArN
[email protected]
John G. Rowland
  • 2003 indicted for fraud
  • 2004 resigned governorship, pleaded guilty
  • 2005–2006 time served
10 14 Voter turnout252
CLCStudent
Love of Cory
Marquis de la Eirron
DisuseKid
Jeremy Young Hutchinson
  • 2018 indicted on 12 wire and tax fraud charges and resigned state senate seat
  • 2019 pleaded guilty
  • 2022 and 2023 sentenced to two 4 year prison terms, entered prison
  • 2025 released with full pardon
8 26 AmericanPolitical19
Sectra0
Virgoikonio
Marquis de la Eirron (3 editors)
Carlos Watson (journalist), Ozy Media
  • 2023 arrested for securities fraud
  • 2024 convicted and sentenced to ten years in prison
  • 2025 sentence commuted on same day the prison term was to start
8 12 Love of Cory
Oriole85
DisuseKid
Levineps
Ben Delo
Arthur Hayes (banker)
BitMEX
  • 2020 BitMEX settled with the CFTC by paying $100 million
  • 2022 Delo and Hayes pleaded guilty on money laundering charges They were fined $10 million each and Delo was sentenced to 30 months in prison
  • 2025 received full and unconditional pardons
6 8 Lunar Clock
Portuportu2
Vimcix
Yeatkai2006
FrankTursetta
Ituxiaoyao (UPE)
Brian Kelsey
  • 2021 Indicted on 5 charges of campaign finance violations
  • 2022 pleaded guilty on two charges
  • 2023 sentenced to 21 months in prison
  • 2025 served 15 days in prison before being pardoned and released
4 16 Devinn
Didsomeonesaybacon
Virgoikonio
Ludivine

Marquis de la Eirron
Michele Fiore
  • 2024 indicted on 6 counts of wire fraud and 1 count of conspiracy (to defraud a charity), convicted on all counts
  • 2025 pardoned three weeks before scheduled sentencing.
4 4 Kbabej
Nvpolitico
Love of Corey
Parsley Man


DisuseKid (2 editors)
P.G. Sittenfeld
  • 2020 arrested for accepting bribes
  • 2021 convicted for bribery and extortion
  • 2022 sentenced to 16 months in prison.
  • 2025 appeal denied, pardoned
3 18 [email protected]
Tracescoops
Virgoikonio

Rowssusan
Marquis de la Eirron
Devon Archer
Draft:Jason Galanis
A very complicated legal history
  • 2018 Galanis pleaded guilty of securities fraud and sentenced to 14 years in prison. Archer convicted, multiple appeals
  • 2022 Archer sentenced to one+ year in prison (never served).
  • 2025 Archer fully pardoned, Galanis sentence commuted
3 6 Miner Editor
MaryBeth1
SwisterTwister (proposed deletion only)
Hannibal Corrector
Emetman
Imaad Zuberi
  • In 2019 and 2020 pleaded guilty to 4 charges involving falsifying records, obstruction of justice, illegal foreign campaign donations, and tax evasion
  • 2021 sentenced to 18 years in prison and $16.5 million in fines and restitution. Reported to prison
1 2 Williamsdoritios
Paul Walczak Draft:Elizabeth Fago
  • April 2024 indicted on ten or more counts of tax evasion
  • November 2024 pleaded guilty to two counts
  • February 2025 pleaded guilty
  • April 2025 sentenced 18 months in prison and to pay restitution of $4.4 million
  • April 2025 received a full and unconditional pardon
0 0
James Callahan (unionist)
0 0

Rod Blagojevich is probably the best-known person who was granted a pardon this year. He was the governor of Illinois from 2003–2009. The article was created 21 years ago. He was removed from the governor's office by impeachment by the Illinois State House of Representatives and conviction by the Illinois State Senate. He was convicted, following two federal trials and several appeals of 13 government corruption charges including wire fraud and extortion. He'd made the mistake of telling FBI agents on tape that "I've got this thing, and it's fucking golden. I'm just not giving it up for fucking nothing," referring to the U.S. Senate seat vacated when Barrack Obama ascended to the presidency.

Forty apparent sockpuppets have edited the article a total of 94 times. User:Hello4321 was the most active apparent sock on the Blagojevich article. His edits seemed to favor Blagojevich. A sockfarm with over 20 socks in it was named after Hello4321 by sock puppet investigators. Other socks in the farm were noted for editing articles about politicians, courts, and journalists, as well as aggressively editwarring.

Blagojevich was released from prison having served almost eight years of his 14 year sentence in 2020 following his first grant of clemency from Trump. It's not clear what additional benefit he gets from this year's full pardon. It is unlikely, however, that no paid socks, either Hello4321, or any of the 39 other socks, had edited the article to favor Blagojevich.

Michael Grimm was a Republican congressman who represented part of New York City from 2011–2014. He was indicted for 20 counts of fraud, federal tax evasion, and perjury in 2014. As part of a plea deal, he pleaded guilty to one count of felony tax fraud. He served seven months of his eight month sentence. In 2024 he was paralyzed in an accident while playing polo.

The article about Grimm was edited 147 times by 15 apparent sockpuppets. At least seven of the sockpuppets appear to be part of well-known large sockfarms, or made important edits.

  • User:Screwball23 made the most edits, 71, of anybody who contributed to the article, starting with Grimm's underdog primary campaign for the Republican nomination against the incumbent congressman. Screwball23's edits generally favored Grimm, at first going against the establishment Republican position. Grimm won that primary election and the next three congressional elections in his district, until he pleaded guilty in 2014 shortly after that year's election. After the 2010 primary victory Screwball23's edits were generally consistent with establishment Republican views until they were blocked in November 2012. Screwball23 screwed up when User:Votevotevote2012 responded to a talk page message to Screwball23 and admitted that both accounts were controlled by the same person.
  • User:Champaign Supernova made 29 edits on the Grimm article generally opposing Screwball23. User:CFredkin made 16 edits there as well. Both CFredkin and Champaign Supernova were accused of making biased edits in the article about 2016 Florida Republican primary candidate David Jolly. Jolly's campaign spokesperson User:Bascomcomm declared that they had edited the Jolly article and accused CFredkin and Champaign Supernova of making biased edits. [Disclosure – Smallbones reported Bascomcomm to the COI noticeboard in 2016]

In 2014 Trevor Milton founded Nikola Corporation, which began developing ecologically friendly large trucks. In 2020 short-seller Hindenburg Research produced a report characterizing Nikola's marketing as an Ocean of Lies and "an intricate fraud built on dozens of lies".[3] (See also this video.)

In 2021, Milton was indicted and the next year convicted on three counts of fraud. He received a four year prison sentence with a $1 million fine and was required to pay $168 million in restitution. He was not imprisoned pending appeal. In 2024 Milton and his wife gave $1.8 million to Trump's campaign. The 2025 full pardon made the sentence void.

Fourteen apparent socks made 29 edits to either the Trevor Milton or Nikola Corporation articles. User:Deltagammaz made 10 edits and was indefinitely blocked with the account globally locked. Users Harley.M.X and Dimentow were later blocked as part of the industrial scale Yoodaba sockfarm. Each made just one edit, which was neither neutral nor very destructive.

Todd Chrisley and Julie Chrisley were reality TV stars with their Chrisley Knows Best and other shows. They borrowed about $30 million based on falsified documents. The article about Todd was edited by two sockpuppets, but no apparent socks edited the article about Julie. Nevertheless, 11 sockpuppets or UPEs edited Chrisley Knows Best.

John G. Rowland served as the Republican governor of Connecticut from 1995–2004. A construction firm which had state contracts did free work on Rowland's vacation home. He resigned under threat of impeachment and later pleaded guilty to a federal charge of fraud and spent almost a year in prison. The ten apparent sockpuppets who edited the article about him included two who worked with large sockfarms.

Jeremy Young Hutchinson is part of a very large Arkansas political family. He pleaded guilty to bribery, campaign finance violations, and tax fraud. He was sentenced to a total of eight years in prison by federal courts in two states and served about two years. In his request for a pardon his lawyers stated "it is absolutely clear that Democrats at the Department of Justice and within the F.B.I. chose to prosecute the case because he was a high-profile conservative legislator from a Republican family." [1]

Carlos Watson founded Ozy Media in 2013. In 2021 The New York Times reported that Watson made serious misrepresentations to investors. Ozy's board closed the company the same day, though some operations continued for more than a year. He was arrested for fraud in 2023, convicted in 2024 and sentenced to ten years in prison. In 2025 Trump commuted the sentence on the same day Watson was to report to prison.

Ben Delo and Arthur Hayes are two of the three founders of BitMEX, a large cryptocurrency exchange. Like most of the crypto-establishment they've argued for years that old financial regulations don't, or at least shouldn't, apply to the new cryptocurrencies. The US regulator, the CFTC, disagreed and fined BitMEX $100 million because the exchange was trading with US residents without registering as an exchange or following anti-money laundering (AML) rules. This is just standard procedure in US financial regulation. The CFTC later fined Delo and Hayes $10 million each. Delo and Hayes then pleaded guilty to money laundering charges.

Six apparent sockpuppets or UPEs have edited the Delo, Hayes, and BitMEX articles. None of them appear to be part of the large sockfarms, but they do appear to be socks.

Brian Kelsey was a Tennessee state senator from 2009–2022. In 2017 he ran for the US Congress and violated federal campaign financing law, pleading guilty in 2022 and received a 21-month sentence. He entered prison in 2025 and was pardoned 15 days later. The Kelsey article was edited by the Marquis de la Eirron sockfarm, which edited four other articles in this report, and by the strange Ludivine sockfarm. That farm was originally identified because its members all edited "Ritchie Blackmore, French actresses/film, (and) Tennessee politics."

Michele Fiore, a long term right wing Nevada Republican politician, rallied support for Trump within days of the January 6 insurrection. She also raised $70,000 through a non-profit for a memorial to Nevada police officers killed in the line of duty, then stole the money. Two of the four sockpuppets who edited the article about her were part of a 60 member sockfarm.

P.G. Sittenfeld was a Democratic Cincinnati city councilman convicted of taking bribes and extortion. In 2023 he was sentenced to 16 months in prison plus restitution, but served less than 5 months. In 2025 he received a full and unconditional pardon. Two of the three apparent sockpuppets who edited the article about him were members of large sockfarms.

Devon Archer and Jason Galanis were both convicted of fraud. They raised $60 million in the bond market for the Oglala Sioux tribal government and then used the funds for their own purposes. They also worked with Hunter Biden on different projects. In 2024 they testified in a Congressional closed-door inquiry about impeaching President Joe Biden. The impeachment drive failed. The article about Archer was only edited by one apparent sockpuppet. An article about Galanis was deleted and remains as a draft. It only had two apparent sockpuppets and in any case seems to have nothing to do with their legal problems or the Bidens.

Imaad Zuberi was a high-level fundraiser for presidential candidates – including Hillary Clinton, Obama, and Trump – who specialized in raising illegal campaign donations from foreigners.

Paul Walczak, who made $360,000 per year as the CEO of a nursing care company, apparently really doesn't like paying taxes. He didn't even file a tax return for three years, 2019-2021. He did collect withholding taxes from his employees though, on income taxes, medicare, and social security. But he didn't turn that money over to the IRS. All told, he didn't pay over $10 million in taxes. Instead he bought a $2 million yacht. In April 2024 he was indicted on about ten counts of tax related crimes and then sprinted through the court system.

In November soon after Trump won the presidential election, Walczak pleaded guilty to two counts. In February he pleaded guilty to the rest of the charges. On April 11 he was sentenced to 18 months in prison with $4.4 million restitution. About the same time Walczak's mother, Elizabeth Fago, attended a $1 million-per-person fundraising dinner for MAGA Inc. at Mar-a-Lago. On April 23, 12 days after sentencing, his pardon came through, allowing him to avoid prison and $4.4 million in restitution.

The saga of the Wikipedia article about Paul Walczak might be equally unbelievable. There is no article about him (yet). But a draft article was submitted about his mother Elizabeth Fago in 2014. The editor who submitted the 400 word draft made only a single edit on Wikipedia. They registered their account, and 10 minutes later saved the draft (archive), never to be heard from again.

James Callahan was general president of the International Union of Operating Engineers when he received sports tickets and similar items worth $315,000, from a company that worked for the union placing advertisements. He did not report the receipt of these items to the Department of Labor as required. Apparently no UPEs or sockpuppets edited the article about Callahan.

Did paid socks try to whitewash felons on Wikipedia?

I've examined Wikipedia editing records for evidence on whether the 18 white collar convicted felons given clemency by Trump this year have hired editors to whitewash the articles about themselves on Wikipedia. Trump has claimed that these people were treated badly by previous administrations and that he has corrected these miscarriages of justice. A brief survey of the mainstream reliable press and the Department of Justice's website does not confirm those statements. There were real crimes committed. They've had their day in court and ten of them have used it to plead guilty. Another four have lost an appeal and the others were convicted in an open court by a judge or jury. Eight never served a day in prison on the charges, another only served 15 days. Two had already been released from prison. About six of those pardoned may not have to pay fines or restitution that was still owed.

Two of the felons did not have blocked socks edit the articles about themselves and another one only had one blocked sock edit. It would be difficult to conclude that they paid for a whitewash.

Eight felons had articles where 8–40 blocked socks had edited – which would make it difficult to conclude that they had not paid for a whitewash. The remaining seven had between 3–7 blocked socks edit. I conclude that about half had paid for whitewashing.

The socks tended to come from the same sockfarms. Five of the articles were edited by the Marquis de la Eirron farm and four by the DisuseKid farm, with two of the articles edited by both. Four were edited by the [email protected] farm with two overlapping the above group. The large scale Yoodaba sockfarm only edited one. Thus ten of the articles were covered by these four large sockfarms. Again, it looks like about half (or more) of the felons employed these farms. Wikipedia's community, sockpuppet investigators, and checkusers should be congratulated for tracking down all these socks. These volunteers might be able to give the professional pardon screeners at the Department of Justice a few pointers.

References

  1. ^ a b New York Times list
  2. ^ DOJ pardons 2025
  3. ^ Nikola: How to Parlay An Ocean of Lies Into a Partnership With the Largest Auto OEM in America (Report). Hindenburg Research. 2020-09-10. Archived from the original on 2020-09-10. Retrieved 2020-09-11. Our Conclusion: Nikola is a Massive Fraud Constructed on Dozens of Lies



Reader comments

File:What should the Wikipedia community do in this moment (WikiWorkshop 2024 keynote, slide 40).pdf
Brent Hecht / Nohat, Wikimedia
CC BY-SA 3.0
243
100
800

Wikipedia's political bias; "Ethical" LLMs accede to copyright owners' demands but ignore those of Wikipedians


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Wikipedia's political bias: Halfway to liberal

"Distribution of Wikipedia's news media citation political polarization scores [...]
Negative: liberal; positive: conservative" (from the paper)
Reviewed by Clayoquot

Is Wikipedia "Wokepedia" as some have claimed? A 2024 paper[1] by Puyu Yang and Giovanni Colavizza sheds some light on the question. It adds to a corpus of research on ideological bias on Wikipedia; some previous studies have found leftist bias and one study found a center-right bias. The authors of the present study (whose previous work includes several papers on Wikipedia citations) had already reported on it in a 2022 preprint (see our earlier review), but it has since been published in the peer-reviewed journal Online Information Review, with some changes including an updated abstract.

The paper looks at the English Wikipedia's citations to news sources and associates each source with a score corresponding to its political bias. The bias scores come from a dataset called Media Bias Monitor (MBM), described in this 2018 paper.

The MBM dataset is based on the propensity of Facebook users to share links to particular sources. For instance, it presumes that if a source is shared more by self-identified liberals than by self-identified conservatives, the source has a liberal bias.

Yang and Colavizza find that on a scale ranging from –2 (very liberal) to +2 (very conservative), the average Wikipedia news citation has a score of -0.5, which is halfway between "moderate" and "liberal".

Could editors be preferring liberal news sources because they are more factually accurate? The paper anticipates this question. Through further analysis using ratings of factual reliability from Media Bias/Fact Check, Yang and Colavizza conclude that the favouring of liberal sources "persists when accounting for the factual reliability of the news media."

The authors say their findings "can be attributed to several factors, including the political leanings of Wikipedia contributors, the prominence and accessibility of liberal-oriented news sources, and potential methodological biases in gauging political polarization." With regard to the last two factors, The Guardian, which makes up more than half of Wikipedia's "very liberal" citations, owes some of its popularity to its open access. Its classification as "very liberal" is debatable, as other sources have described it as closer to the centre.

See also our earlier reviews of related research, in particular:


Reviewed by Tilman Bayer

Several years into the AI boom, controversies rage on about whether and to what extent the training of models on copyrighted material is covered by fair use or instead requires permission from copyright owners. Numerous lawsuits about the matter are still making their way through the courts in the US and elsewhere. And although various US judges have already dismissed many overwrought claims by copyright industry plaintiffs, independent legal scholars still consider it possible that some others may succeed. Separately, legislative changes have been proposed in several countries to either tighten or loosen requirements around AI training.

Two groups recently waded into these debates by releasing datasets for LLM training that are based only on text that is either in the public domain or "permissively licensed" (such as from Wikipedia), with the insinuation that they are free of such concerns. Notably, both also already trained their own LLMs on them, which they claim to have competitive performance.

French startup PleIAs earlier this month published a preprint titled "Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training"[2]. From the abstract:

Large Language Models (LLMs) are pre-trained on large amounts of data from different sources and domains. These data most often contain trillions of tokens with large portions of copyrighted or proprietary content, which hinders the usage of such models under AI legislation. This raises the need for truly open pre-training data that is compliant with the data security regulations. In this paper, we introduce Common Corpus [https://huggingface.co/datasets/PleIAs/common_corpus], the largest open dataset for language model pre-training. The data assembled in Common Corpus are either uncopyrighted or under permissible licenses and amount to about two trillion tokens. The dataset contains a wide variety of languages [...]

This paper (presented as a "technical report") follows several earlier announcements for various versions of the same corpus, which had attracted media coverage as early as April 2024 ("This French start-up just proved OpenAI wrong. It claims you can train AI on non-copyrighted data").

Three days later, on June 5, a group of 28 authors (from e.g. the University of Toronto and other North American universities, EleutherAI, and Hugging Face) announced "The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text"[3]. From the abstract:

Large language models (LLMs) are typically trained on enormous quantities of unlicensed text, a practice that has led to scrutiny due to possible intellectual property infringement and ethical concerns. Training LLMs on openly licensed text presents a first step towards addressing these issues, but prior data collection efforts have yielded datasets too small or low-quality to produce performant LLMs. To address this gap, we collect, curate, and release the Common Pile v0.1 [https://huggingface.co/common-pile], an eight terabyte collection of openly licensed text designed for LLM pretraining. The Common Pile comprises content from 30 sources that span diverse domains including research papers, code, books, encyclopedias, educational materials, audio transcripts, and more.


The datasets, and how Wikimedia projects are represented in them

As is well known, Wikipedia text has long been a staple of LLM training – either via the dumps provided by the Wikimedia Foundation itself, or (perhaps more frequently) as part of larger datasets such as that of Common Crawl or "The Pile", which cover a large number of websites regardless of their copyright licenses. Common Crawl, a US-based nonprofit, has been offering these to the public since about 2008, relying on fair use. Similarly, "The Pile" was compiled by the nonprofit EleutherAI (also one of the driving forces behind the new "Common Pile" dataset) and distributed under fair use. But as the provision of datasets forms a significant early step in what one extensive legal analysis calls "the generative-AI supply chain", such invocations of fair use have come under increased scrutiny more recently. This evidently contributes to both groups' motivation for providing datasets that can be hosted everywhere without such legal considerations, by confining themselves to public domain and "permissively licensed" material – such as Wikipedia.

"The sources comprising the Common Pile, categorized by textual domain" (figure from the paper)

While Wikipedia and its sister projects form only a relatively small part of both datasets (see below for more detail), they are interesting from a Wikimedian perspective for several reasons. Firstly, they make much more content from Wikimedia projects available than what is commonly used in LLM training (which has often been confined to mainspace content from English Wikipedia, at least according to what is publicly known).

The Common Pile group

downloaded the official database dumps from March 2025 of the English-language wikis that are directly managed by the Wikimedia foundation [...]. These database dumps include the wikitext — Mediawiki's custom markup language — for each page as well as talk pages, where editors discuss changes made to a page. [...]
The Common Pile includes the following Wikimedia wikis: Wikipedia, Wikinews, Wikibooks, Wikiquote, Wikisource, Wikiversity, Wikivoyage, and Wiktionary.

(not, however, Wikidata)

PleIAs/Common Corpus on the other hand only draws from two of these eight Wikimedia projects, namely Wikipedia and Wikisource. But it includes several languages, and uses the newer HTML dumps instead of wikitext dumps:

Wikimedia projects have always been major sources for language model training due to their reliability, extensive coverage, and textbook-like style. Despite this centrality, there is still a range of unresolved challenges with the most common versions available for training. The raw source of Wikimedia projects is made available in a specific mediawiki syntax, including a lot of project-specific models, tags, and conventions. The parsing of models is especially not straightforward, as they can either format existing text or remove or include external content (transclusion). As part of Wikimedia Enterprise, the Wikimedia Foundation created entirely new dumps from the rendered HTML sources, which in effect ensure that they include all the text made available to readers.

Here, project-specific models apparently means templates, as a false friend mistranslation of the French modèle (Pierre-Carl Langlais, the vocal co-founder of PleIAs, is a longtime admin on French Wikipedia himself as User:Alexander Doria). The Common Corpus paper leaves it open how much of an issue this parsing of templates is in practice, or how much of an improvement the use of HTML dumps yields for the purpose of LLM training. In contrast, the Common Pile group simply converted wikitext to plain text using wtf_wikipedia.

"Wikimedia Enterprise" refers to the API products of the Wikimedia Foundation's for-profit subsidiary Wikimedia LLC. The Wikimedia Enterprise HTML dumps had been available for public download since 2021 on the same dumps.wikimedia.org site as the regular wikitext dumps, but recently were taken down there and now require a signing up for a free account with Wikimedia Enterprise.

Apparently separately, in February 2025 Wikimedia Enterprise announced a partnership with PleIAs under which Pleias has leveraged Wikimedia Enterprise's structured datasets to develop verifiable language models, multilingual content enriched with metadata and credibility signals like RevertRisk, pre-parsed infoboxes, sections, and summaries. This presumably refers to the separate Structured Contents snapshots – see also last month's Signpost coverage: "New version of AI-optimized Wikipedia dataset released on Kaggle". The current paper does not mention this data yet. Wikimedia Enterprise keeps the list of its paying customers confidential (apart from a few exceptions like Google), so it's not clear whether PleIAs is among them or whether it is using this Wikimedia data without financial compensation.

The "Impact" section of the Common Corpus paper lists various third-party uses of the dataset, and last month Langlais posted "Happy to see Common Corpus has grown to become one of the most popular pretraining dataset on @huggingface". (At the time of writing, Hugging Face listed it as the 13th most downloaded text dataset.)

The first(?) Wikidata-derived text dataset for LLM training

Of particular interest to Wikimedians is the fact that besides using Wikipedia and Wikisource, PleIAs/Common Corpus appears to be the first to make Wikidata available as a source for pretraining of language models:

"Semantic data is the latest set added to Common Corpus and currently includes only one collection: Wikidata. [...] Despite the rising interest in mixed LLM/knowledge graph methods, Wikidata has hardly been used in language models. [...] A persistent challenge has been the exclusive availability of Wikidata dumps under formats optimized for data exchange rather than language model training. Thanks to a collaboration with Wikimedia Deutschland, the entire set of Wikidata has been adapted in natural language and added to Common Corpus. This is to date the only available textual collection of Wikidata covering the entire range of 300 languages. Data processing involved the translation of items and properties into formal language sequences as simple natural language sequences, without textual synthesis: "Q41309 | P:27 | Q171150" becoming "Franz Liszt country of citizenship Kingdom of Hungary". Within each entry, we provide all the available translations as consecutive blocks separated by a newline, anticipating that this may contribute to language alignment."

(This might be regarded as a rudimentary mini version of Abstract Wikipedia.)

License due diligence

The Common Pile authors emphasize that they "put a lot of work into our metadata" – specifically, vetting the license information that accompanies each piece of content in their dataset. A section in the paper covers "License due diligence", for example pointing out the common pitfall [of] "license laundering," where a copyrighted work is redistributed (typically by a non-rights holder) with an incorrect license (a problem that is well-known to Wikimedians, too).

The PleIAs/Common Corpus paper on the other hand seems less concerned with such problems (it instead devotes more space to data issues that might affect regulatory compliance in the EU, such as toxicity detection and PII removal). Correspondingly, the Common Corpus data seems rather cavalier with license information, containing thousands of rows whose license field provides only vague information like "Various open licenses", or (including for Wikipedia content) specifies the license as "CC-By-SA" without a version number. (Such issues were pointed out by this reviewer over half a year ago already, and acknowledged at the time with a "Yes it's planned", but don't seem to have been addressed yet.) Similarly, Table 3 in the Common Corpus paper provides token counts by "license type", but confusingly lists "CC-By" and "CC-BY-4.0" as separate types.

Assuming that the provided license and attribution information is correct though (or will be fixed), few Wikipedians are likely to object to their content being included in these datasets. In the big picture, those join the numerous mirrors and forks that are intentionally enabled by the projects' free license and have existed almost since Wikipedia's founding.

The "ethical" LLMs that both projects train on their respective datasets present more complex questions, though.

The "ethical" models and their performance

The Common Pile authors ask "Can you train a performant language model using only openly licensed text?" and report a positive answer:

Crucially, we validate our efforts by training two 7 billion parameter LLMs on text from the Common Pile: Comma v0.1-1T and Comma v0.1-2T, trained on 1 and 2 trillion tokens respectively. Both models attain competitive performance to LLMs trained on unlicensed text with similar computational budgets, such as Llama 1 and 2 7B.

However, "competitive" has to be taken with a big grain of salt, or at least a thorough understanding of the qualifiers. For example, the Llama 2 model chosen for comparison dates from 2023, eons ago in GenAI terms. And even back then was the smallest and least capable LLM of Meta's LLama model family. AI expert Simon Willison called the Comma models "promising". But he also seemed a bit underwhelmed by their performance in a quick test, pointing out among other limitations that right now [...] it's a raw base model—it hasn't been instruction-tuned or set up for chat (unlike e.g. ChatGPT or most of the open-weight LLMs that have seen wider usage in recent times), which makes it a lot harder to evaluate. (On the other hand, as indicated by the "v0.1", the group still expects to be able to train bigger and better models on freely licensed data in the future.)

PleIAs' paper on the other hand provides less information about the models that the startup has already trained on its Common Corpus dataset. Instead, the company had described them briefly in a December 2024 blog post:

Today we release Pleias 1.0 models, a family of fully open small language models. Pleias 1.0 models include three base models: 350M, 1.2B, and 3B parameters. They feature two specialized models for knowledge retrieval with unprecedented performance for their size on multilingual Retrieval-Augmented Generation, Pleias-Pico (350M parameters) and Pleias-Nano (1.2B parameters).
These represent the first ever models trained exclusively on open data, meaning data that are either non-copyrighted or are published under a permissible license. These are the first fully EU AI Act compliant models. In fact, Pleias sets a new standard for safety and openness.

Like with the Common Pile paper, these "unprecedented performance" claims should be appropriately contextualized. While the Common Pile authors at least evaluated their LLMs with a number of standard, widely used third-party benchmarks, the PleIAs authors relate that this was not possible in their case because The most popular generalist benchmarks are not suitable for evaluating small models. Therefore, they had to grade themselves: Instead, we develop targeted benchmarks to evaluate key capabilities that are essential to our desired downstream application, namely for RAG (retrieval-augmented generation), the models' ability to refrain from switching languages while generating text for a variety of EU languages, and their avoidance of generating toxic or harmful text (in order to comply with EU regulations). PleIAs later also released some derived models specifically dedicated to the RAG use case.

The small size and limited performance of both groups' models seem entirely understandable given their presumably limited compute budgets. (Also, PleIAs' December 2024 announcement describes them as "small language models", a term that some have tried to establish recently for large language models with only up to a few billion parameters.)

Still, one can't help wondering how much this might limit their value as evidence for bold claims that fair use (i.e. training on copyrighted text) is unnecessary for producing useful and competitive LLMs. As mentioned, PleIAs' largest model has 3 billion parameters and the Common Pile group's Comma models have 7 billion, whereas some publicly known LLMs have already surpassed 1 trillion (1000 billion) parameters. While smaller models have many use cases, this is one of several reasons to doubt whether Common Corpus or Common Pile could form the basis of a model that performs as well as e.g. the current versions of ChatGPT or Claude.


Is Wikipedia "central" for modern AI, or merely the fifth most important freely licensed source for LLM training?

Somewhat in contrast to claims about the centrality of Wikipedia for contemporary AI, content from Wikimedia projects forms only a minority of the tokens in each of the two datasets, as can be seen in the above figure in the case of Common Pile. (This remains true if one disregards the public domain parts and focuses on the parts that are copyrighted but under a free license. In fact, in the Common Pile dataset, even non-Wikimedia wikis make up a slightly larger share than Wikimedia wikis.)

But another aspect of the Common Pile paper of specific interest to Wikimedians is that the authors conduct an evaluation of the quality and relative importance of the different sources (in terms of what they contribute to LLM performance):

"Recent work [...] has shown that up- or down-weighting pre-training data sources in accordance with some notion of data quality can produce more performant models. Indeed, the sources in the Common Pile vary drastically in their characteristics, and we don't necessarily expect that our largest sources contain the highest quality text. For example, patent text sourced from the USPTO (our second-largest source) exhibits substantially different wording, terminology, and repetition than typical natural language. [...] To determine mixing weights, we first trained per-source language models [....] for 28 billion tokens on all sources that were sufficiently large to be repeated less than four times at this data budget. Based on the performance of these per-source models, we heuristically set mixing weights to up- and down-weight high- and low-performance sources respectively [...]

In the filtered dataset used for this training, Wikimedia wikis contributed 57.4 GB out of a total size of 1838.3 GB, i.e. about 3.1%. (On a side note, Table 5 indicates that the Wikimedia projects were among the sources that did not require filtering for "toxicity", as opposed to e.g. "Ubuntu IRC", "Pre-1929 Books", or the non-Wikimedia wikis scraped by Wikiteam.)

Per "Table 7: Overview of the data mixing used to up/down-weight individual sources", Wikimedia wikis were assigned the highest weight (6 repetitions). However, they shared that honor with 18 other sources (i.e. the majority), such as arXiv, Foodista, LibreTexts, peS2o (a corpus of open access scientific publications), StackExchange and Ubuntu IRC. The downweighted sources with the lowest number of repeats (0.25) were Biodiversity Heritage Library, Library of Congress, USGPO and USPTO.

While this weight number can be regarded as a rough proxy for a source's quality from the perspective of LLM training, its resulting share of tokens in the overall training corresponds to its importance. Here Wikimedia wikis ended up at 8.616%, behind peS2o (27.409%), CC Common Crawl (8.716%), StackExchange (13.469%) and Stack V2 (13.009%). ("Stack V2" contains open source software code, and the "CC Common Crawl" slice consists of freely licensed web pages from the internet-wide Common Crawl; presumably without wikis, as the authors performed "deduplication across all sources".)

In other words, there is now objective evidence justifying the statement that Wikimedia wikis form the fifth most important freely licensed source for LLM training. This is somewhat in contrast to e.g. claims about Wikipedia's central role in the development of modern AI, to quote the framing chosen in the keynote of last year's Wiki Workshop conference by Microsoft's Brent Hecht (known to Wikimedians for various influential research publications during his time in academia). According to Hecht, there is a [s]trong argument [that Wikipedia] was the single most important dataset for AI research since about 2005.

One might be able to reconcile these different perspectives by recognizing the possibility – as Hecht did in the Q&A for his keynote (in response to a question by this reviewer) – that Wikipedia's early popularity as an AI dataset might have been caused by its convenience rather than – or in addition to – the unique qualities of its content. (The Wikimedia Foundation had been providing its easily downloadable dumps since at least 2003.) What's more, some other claims about Wikipedia's supposed centrality for AI simply stretch the facts. For example, in a 2023 article titled "Wikipedia's value in the age of generative AI", the Wikimedia Foundation's Chief Product and Technology Officer Selena Deckelmann proudly proclaimed that To date, every LLM is trained on Wikipedia content, and it is almost always the largest source of training data in their data sets. However, the study cited in that sentence ranked Wikipedia at #2, behind a corpus of patents, and at a mere 0.19% (less than 1 in 500) of tokens in the analyzed dataset.

"The amount of openly licensed text grows steadily over time. We visualize the cumulative proportion of data created up to various cutoff dates for sources in the Common Pile with reliable creation date metadata. This includes all sources except for the Caselaw Access Project, Data Provenance Initiative, and the sources covering early 20th century Public Domain books."

As another finding of interest to the free-culture movement as a whole, the Common Pile paper provides a chart depicting "Growth rates of openly licensed data", observing that

[...] approximately half of the Common Pile (around 3.8TB) was created since 2020. This trend provides insight into the growing availability of openly licensed data and suggests a promising trajectory for future LLMs trained entirely on openly licensed sources.

While the subcategory of "Wikis" appears to be one of the slower growing ones in that chart, it still has increased about tenfold in size from 2010 to 2024.

Coming back to the AI copyright disputes mentioned at the beginning: It seems very clear that both projects are driven by a motivation to provide legal and political ammunition to the copyright industries' side in those debates, i.e. against fair use defenses for the purpose of training LLMs. This is especially evident in PleIAs' communications. For example, the company titled its aforementioned December 2024 announcement "They Said It Couldn't Be Done", with a link in the first sentence making it clear that this title is meant as a debunking of statements about the necessity of fair use in AI training made in response to an inquiry of the UK House of Lords:

"Training large language models required copyrighted data until it did not. Today we release Pleias 1.0 models, a family of fully open small language models."

(As discussed above, PleIAs offers only very thin evidence for its bold claim that "copyrighted data" is no longer required, at least if that is understood to refer to the kind of state-of-the-art LLMs that hundreds of millions use today in ChatGPT and its competitors.)

The Common Pile paper was likewise interpreted as weighing in on the copyright industries' side of the debate, by the Washington Post: AI firms say they can't respect copyright. These researchers tried. [...] That could have implications for the policy debate swirling around AI and copyright (despite objections by one of the paper's authors that it is not a tech policy writeup. It's a machine learning research paper).

Both papers also uncritically embrace copyright maximalist interpretations of "ethical" as being incompatible with fair use, i.e. requiring to give IP owners total control over whether their content is used in training LLMs. PleIAs' "Common Corpus" paper already does so in its title (The Largest Collection of Ethical Data for LLM Pre-Training). The "Common Pile" authors begin their abstract by relating training on unlicensed text directly to intellectual property infringement and ethical concerns, again without acknowledging the existence of fair use defenses. (In fact, the term "fair use" isn't mentioned anywhere in the text of the paper, it only appears once in the "References" section in the title of a publication cited for other reasons.) And several other statements in their paper implicitly denigrate fair use as unethical, similar to PleIAs.

What's more, both papers also adapt the rhetorical strategy of copyright industry advocates to focus on individual "content creators", rather than the corporations who in reality are the largest copyright owners, say Elsevier, Bertelsmann (owner of Brockhaus encyclopedia, whose revenue was greatly diminished by Wikipedia), Murdoch, or Getty Images. Or a company like Adobe, whose Adobe Firefly image generator is an interesting related example of an AI model advertised as "ethical" – due to having been solely trained on CC-licensed images from Wikimedia Commons and Flickr Commons or under public domain as well as hundreds of millions of images and videos users uploaded to the company's own Adobe Stock. It may be illustrating how some individual creators' "no training without consent" demand can turn out to be a monkey's paw wish.

Neither the PleIAs paper nor the Common Pile paper attempts to offer policy arguments to justify their use of such loaded language and promotion of anti-fair use viewpoints. The authors never acknowledge the possible harms of elevating the interests of copyright owners over fair-use protections in AI – something that e.g. the Internet Archive's Brewster Kahle warned against last month in a post titled "Protect Fair Use, Especially Now", joined by the likes of the Electronic Frontier Foundation [3]. It is also worth noting that such advocacy for copyright industry viewpoints seems unnecessary for both papers' purposes: Instead of denigrating fair use as unethical, the authors could have confined themselves to framing their efforts in terms of minimizing legal risks for particular jurisdictions and situations. Indeed, two other recent papers adopt exactly this approach (see below).

Lastly, a commonly voiced concern regarding AI, about copyrighted works being used to enrich big tech companies like OpenAI, Google or Microsoft, does not really apply here, considering e.g. that both groups are in fact themselves releasing open-weight models under a free (software) license, or calling to mind the longstanding work of nonprofits like AI2 or indeed EleutherAI (of the Common Pile group) itself on truly open LLMs that are not controlled by Big Tech. More generally, fears about enclosure, i.e. AI companies like OpenAI absorbing and monopolizing knowledge from published sources, have been greatly mitigated in recent times by the rise of competitive open-weight LLMs such as those released by DeepSeek or Meta. In fact, PleIAS' and Common Pile's efforts to build AI that must not learn from any unfree source (no matter what the actual legal constraints might be) remind this reviewer of the perennially rejected proposal that information on Wikipedia should only be drawn from open access sources and must avoid paywalled ones (WP:FUTON).

What's more, there is a strong argument that the anti fair use advocacy of PleIAS and the Common Pile group (such as the former's evident attempt to influence an ongoing legislative debate in the UK) could in fact facilitate such enclosures by the likes of OpenAI. This kind of danger was pointed out by the Wikimedia Foundation in its 2023 response to a consultation by the US Copyright Office (Signpost coverage: "AI policy positions of the Wikimedia Foundation"), even while also voicing concerns about unattributed usage of Wikipedia by AI developers:

[...] we encourage the Office to consider the potential impacts that changes to copyright law could have on competition among AI developers. If copyright law changes are enacted such that the acquisition and use of training materials becomes more expensive or difficult, there is a risk that dominant firms with greater resources will become further entrenched while smaller companies, including nonprofit organizations, struggle to keep up with mounting development costs.

In case of PleIAs, one can't help wondering if such attitudes are correlated with the project's funding. The Common Corpus paper states that

It was built up with the support and concerted efforts of AI Alliance, the state start-up LANGU:IA (start-up d'Etat), supported by the French Ministry of Culture and DINUM, as part of the prefiguration of the service offering of the Alliance for Language technologies EDIC (ALT-EDIC).

The French state (and its Ministry of Culture in particular) is not exactly known for its free culture advocacy. E.g. a decade ago, one report dryly noted that

The rights of French authors and artists have always been very well protected.

in order to explain the country's fierce resistance against Felix Reda's EU Copyright reform proposals (which had been welcomed by many European Wikimedians, see Signpost coverage at the time, which mentioned a mighty backlash, led in particular by French MEPs and the French government).

...but not to Wikipedians' concerns about AI training

For all the energy and enthusiasm they devote to acceding to the rhetoric and demands of copyright industry advocates who see AI as threatening their private business interests (no scraping or training without "consent", denigrating fair use as unethical, etc.), both groups exhibit clear disregard for the much more modest requirements of Wikipedians and others copyright owners who choose to release their works under a free license for the common good.

For example, both PleIAS and the Common Pile group released their aforementioned own "ethical" LLMs under an Apache License. While this license maintains attribution requirements, it does not contain a copyleft (share-alike) requirement – unlike Wikipedia's CC-BY-SA 4.0 license.

In case of Common Pile, a user raised that issue on the project's GitHub repository:

[...] the Comma models are Apache licensed. However the training data is at least in part CC-BY-SA. Since Apache is not ShareAlike (or compatible with it), could this be an issue?

There's a good case to be made that the models are derivative works, since you can probably extract a substantial amount of the data from them. Should the models be CC-BY-SA licensed instead?

More generally, if I train a model on all the data, is there a license under which I can safely distribute it, or are there multiple incompatible sharealike-style licenses combined?

At the time of writing, these questions have remained without response for almost two weeks.

Similarly, Michael Weinberg from the New York University School of Law called the Common Pile project

[...] cool, but nowhere do they talk about doing anything to . . . actually comply with the terms of the open licenses? That's a part of what makes this a hard question!

Open source lawyer Luis Villa (former Deputy General Counsel of the Wikimedia Foundation, who has more recently written about issues regarding open licenses and AI on his "Open(Ish) Machine Learning" blog) agreed, noting further that

All of the "we are doing training on permissively-licensed materials" sets have this problem, because permissive != no obligations.

PleIAs' cofounder Pierre-Carl Langlais had aggressively dismissed related concerns last year on social media, deriding

people being extremely protective of their content in open licenses

in particular regarding

inclusion of Wikipedia in the set which is released in the CC-By-SA, basically the gold standard of AI training for a decade and we had official support from Wikimedia enterprise. Kinda absurd

(Regardless of whether Langlais is correct in interpreting his lab's collaborations with Wikimedia LLC/Wikimedia Enterprise/WMF as an endorsement of his legal views, it is worth noting that individual contributors, not the Wikimedia Foundation, hold the copyright over Wikipedia's freely licensed content.)

Tellingly, PleIAs' November 2024 announcement in its initial version erroneously described its Common Corpus as a "copyright-free dataset" and as avoiding "copyrighted data". I.e. it assumed that "permissively licensed" material is no longer copyrighted – something that would be news to Wikipedians.

In stark contrast to PleIAs and Common Pile, two earlier papers which had similarly provided datasets enabling the training of LLMs without relying on fair use decided to exclude Wikipedia and other BY-SA licensed content precisely because of such concerns:

"The KL3M Data Project" presents "Copyright-Clean Training Resources for Large Language Models"[4]. In contrast to the Common Corpus and Common Pile papers, this paper by a small legal consultancy counts an actual legal scholar among its authors (Daniel Martin Katz, a professor of law at Chicago-Kent College of Law), alongside Michael and Jillian Bommarito, the firm's husband-and-wife CEO and Chief Risk Officer. They write:

2.3 Wikipedia: A Case Study in the Complexity of Compliance
Many foundational Internet resources are governed by complex licensing arrangements that are often overlooked by AI developers. As the most notable example, Wikipedia content is frequently included in LLM training datasets. However, Wikipedia and various other Wikimedia Foundation projects are governed by the Creative Commons Attribution-ShareAlike (CC BY-SA) license, which imposes important restrictions on the use of content.
[...]
In response to our direct legal inquiry regarding LLM training on Wikipedia content, the Wikimedia Foundation responded with their interpretation of these compliance requirements [ https://jillianbommarito.com/wikimedia-says-no-llm-training/ ]. Their response noted: "We are monitoring what many LLM companies do with Wikimedia data and generally to be upfront, many may not be compliant with the letter of the Creative Commons rules or the spirit of the licenses." When questioned about specific compliance mechanisms, they emphasized that downstream developers must "adhere to the 'attribution,' 'share-alike,' and other elements of the license."
Most critically for LLM developers, the Foundation explicitly rejected the simplified compliance approaches currently employed by virtually all AI companies: "Providing a general notice to customers would not be an adequate solution to compliance […] [T]he notice would need to be made to everyone the content is shared with, not just customers." This position directly contradicts the practices of commercial LLM developers who include Wikipedia content in their training data.
In the context of building or fine-tuning large language models, it is simple to provide a general attribution notice acknowledging input sources to a given dataset or model. However, specific attribution to the specific work or works that gave rise to a specific model output is a difficult and expensive, if not impossible, technical challenge.
While Wikimedia's interpretation of the CC BY-SA requirement is not the final word on this important legal question, we did not include this content given the risk that it could encumber downstream usage.

Overview of the SILO system. In contrast to PleIAs and Common Pile, it trains an LLM (center, in blue) on a low-risk corpus (left) that excludes Wikipedia as high-risk attribution-required data. Wikipedia, along with other sources determined to have high legal risk, is instead relegated to a separate datastore (right) that the system accesses at inference time. This datastore can be modified at any time, e.g., to respond to opt-out requests.

Similarly, a 2023 preprint by authors from UW and UC Berkeley (several also affiliated with the Allen Institute for AI, a US nonprofit which similarly to EleutherAI works on open LLMs), titled "SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore"[5] , describes training LLMs on an "Open License Corpus that excludes "high-risk" "attribution required" data such as ... Wikipedia and other sources under Creative Commons licenses, for the purpose of "copyright risk mitigation". In the initial (preprint) version of their paper, the authors justified this by claiming

For example, if a journalist writes a new article that cites information from Wikipedia (a CC-BY source), then they must provide a form of citation, link, or attribution back to the original source. In the context of machine learning, it is not clear what an attribution would constitute.

The authors did not respond to a question by this reviewer about this rather adventurous legal claim that merely using or "citing" information from Wikipedia would trigger such a requirement. In the paper's later peer-reviewed version, this was changed to "if a journalist quotes an article from Wikipedia (a CC-BY source), then they must provide a form of citation, link, or attribution back to the original source" (our emphasis) – still an evidently inaccurate claim, for example in its omitting of the share-alike terms of Wikipedia's license.

To be sure, these concerns about legal risks from training LLMs on Wikipedia content may be overwrought in practice (and not just because of the naivete of the legal arguments in the SILO paper).

In an August 2023 article, Creative Commons' General Counsel Kat Walsh (known to Wikimedians as User:Mindspillage and as former WMF Trustee, including during the decision to adapt CC BY-SA as the license for Wikipedia's text) addressed the question "Can you use CC licenses to restrict how people use copyrighted works in AI training?" by pointing out that

there are strong arguments that, in most cases, using copyrighted works to train generative AI models would be fair use in the United States, and such training can be protected by the text and data mining exception in the EU

(Last month, Creative Commons followed up by publishing "Understanding CC Licenses and AI Training: A Legal Primer", reiterating that The short answer is: AI training is often permitted by copyright, while caveating in a more in-depth analysis that The application of copyright law to AI training is complicated, it varies depending on the jurisdiction where the use is made, and litigation related to generative AI training remains ongoing.)

Similarly, Andrés Guadamuz (an intellectual property law scholar at the University of Sussex) has argued that CC licences are fully compatible with AI training, and in fact allow it to take place without asking from permission from the licensor – again citing fair use and (regarding the UK) fair dealing among other arguments. And in a 2024 policy paper commissioned by "Open Future", two scholars from the University of Amsterdam similarly concluded that Share Alike/CopyLeft licenses are largely ineffective when materials licensed under them are used to train AI models.

And even though the Wikimedia Foundation has been calling out LLM developers' non-compliance with the "letter" and "spirit" of CC licenses in its responses to the KL3M authors (see above), it too has elsewhere acknowledged the possibility or even likelihood that training AI models is covered by fair use. For example in a March 2023 legal analysis:

[...] it is more likely than not if current precedent holds that training systems on copyrighted data will be covered by fair use in the United States, but there is significant uncertainty

or also in its aforementioned response to a consultation by the US Copyright Office later that year (Signpost coverage: "AI policy positions of the Wikimedia Foundation").


The hypocrisy of "ethical" anti fair use advocacy

So PleIAs and the Common Pile group might be at little legal risk for ignoring Wikipedia's license terms in the release of their "ethical" LLMs. But this is likely exactly because of the fair use type defenses that they denigrate as unethical, and actively work to undermine in case of other copyrighted content.

To be transparent, this reviewer, like many in the free knowledge movement, doesn't mind his Wikipedia and Wikimedia Commons contributions being used for AI training (and would find it very unfortunate if PleIAs or the Common Pile group became the target of a lawsuit by Wikimedians).

A slide from Brent Hecht's keynote about Wikipedia and AI at last year's Wiki Workshop

That said, many Wikimedians feel differently and are less comfortable with their work being used in LLMs without restrictions. An extreme example is the aforementioned keynote at last year's Wiki Workshop, where Microsoft's Brent Hecht called on the Wikipedia community to be aware of its "legal leverage" and "data leverage", and even to consider a "data strike", using a labor rights framing. Again, this reviewer personally doesn't find this line of argument very convincing (for example because it ignores the interest of Wikipedia readers, or more generally that of society at large, in accessing knowledge without restrictions), and finds himself agreeing much more with e.g. the Internet Archive's and the EFF's aforementioned arguments in favor of fair use with regard to AI.

However, it's difficult to see PleIAs' and the Common Pile group's apparent confidence that they can ignore such concerns by Wikimedians (and other owners of permissively licensed but copyrighted content) as anything other than glaring hypocrisy, when contrasted with both groups' advocacy for absolute deference to the demands of (some) professional intellectual property owners. To use Hecht's terminology, both appear to be working under the assumption that Wikipedians have zero "leverage" apart from dataset provenance transparency requirements, and thus won't be able to interfere with LLM developers' use of the Common Corpus and Common Pile datasets. Or seen from another angle: PleIAs and the Common Pile group work to undermine the fair use defenses that other AI labs rely on when training their LLMs on non-freely licensed content. But both implicitly rely on these themselves when releasing their own "ethical" LLMs.

In the aforementioned remarks where PleIAs' Langlais had derided people for being extremely protective of their content in open license, he also expressed his puzzlement about

The counterintuitive thing while doing actually open ai: you can get more support from copyright collective/cultural industries than from long term open actors. The first are desperate enough at this point to see even a vaguely ethical alternative emerging.

It does not appear to occur to Langlais that many Wikimedians and other "open actors" who have worked long and hard to make knowledge freely accessible (so that people wouldn't have to pay said copyright collective/cultural industries for access) may not be very fond of seeing their work being used to promote legal changes that are likely to make knowledge less accessible, for the purpose of furthering the business interests of exactly these copyright industries – and to hand them major influence over the future of AI as an important new form of accessing knowledge.

The Wikimedia Foundation's Vision calls for Imagin[ing] a world in which every single human being can freely share in the sum of all knowledge, in future times like 20, 50, 100 years from today. While LLMs are clearly still imperfect at this point, AI continues to improve rapidly. And with ChatGPT recently surpassing Wikipedia in user numbers according to Similarweb, it is likely that in those future times, AI will be an important way in which human beings access knowledge. Despite their professed commitment to open LLMs, the anti-fair use advocacy of PleIAs and the Common Pile group are likely to bring us closer to a world in which every human being can share in the sum of all knowledge only as long as enough revenue can be extracted from them to serve the business interests of the copyright industries.


Briefly

Wiki Workshop 2025, WMF Research Award

The annual Wiki Workshop (organized by the Wikimedia Foundation's research team with other collaborators) took place virtually last month, for the first time extended to two days. The 46 accepted extended abstracts are available online (non-archival), as are video recordings from the conference.

At the event, the Foundation's "Research Award of the Year" for "best paper" went to "Motivating Experts to Contribute to Digital Public Goods: A Personalized Field Experiment on Wikipedia"[6], a paper reporting on the results of an experiment conducted in 2015/16 that invited 3974 economists to improve Wikipedia articles (see also our 2017 review of a preprint about the same experiment: "ExpertIdeas: Incentivizing Domain Experts to Contribute to Wikipedia").

The award for best student paper went to "Low-Resourced Languages and Online Knowledge Repositories: A Need-Finding Study"[7], a CHI 2024 paper describring (1) a thematic analysis of Wikipedia forum discussions and (2) a contextual inquiry study with 14 novice contributors [...] focused on three Ethiopian languages: Afan Oromo, Amharic, and Tigrinya.

Wikimedia Research Showcase

See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer

"WikiVideo: Article Generation from Multiple Videos"

From the abstract:[8]

"We present the challenging task of automatically creating a high-level Wikipedia-style article that aggregates information from multiple diverse videos about real-world events, such as natural disasters or political elections. [...] we introduce WikiVideo, a benchmark consisting of expert-written articles and densely annotated videos that provide evidence for articles' claims, facilitating the integration of video into RAG pipelines and enabling the creation of in-depth content that is grounded in multimodal sources. We further propose Collaborative Article Generation (CAG), a novel interactive method for article creation from multiple videos. CAG leverages an iterative interaction between an r1-style reasoning model and a VideoLLM to draw higher level inferences about the target event than is possible with VideoLLMs alone, which fixate on low-level visual features."

See also an explanatory thread by one of the authors

Using Wikidata to pre-train LLMs for improved multilingual question answering in underrepresented languages

From the abstract:[9]

"Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a few-shot learning approach to synthesise large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, FsModQA, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval.

Generating "Open Artificial Knowledge" synthetic data with LLMs, guided by Wikipedia categories

From the abstract:[10]

"[...] acquiring high-quality, diverse, and ethically sourced training data [for LLMs] remains a significant challenge. We introduce the Open Artificial Knowledge (OAK) dataset, a large-scale resource of over 500 million tokens (at the moment of writing) designed to address this issue. OAK leverages an ensemble of state-of-the-art LLMs [...], to generate high-quality text across diverse domains, guided by Wikipedia's main categories. Our methodology ensures broad knowledge coverage while maintaining coherence and factual accuracy. The OAK dataset aims to foster the development of more capable and aligned language models while addressing critical issues of data scarcity and privacy in LLM training [...].


"Synthetic Multimodal Question Generation" from Wikipedia

From the abstract:[11]

"[...] we propose SMMQG, a synthetic data generation framework. SMMQG leverages interplay between a retriever, large language model (LLM) and large multimodal model (LMM) to generate question and answer pairs directly from multimodal documents, with the questions conforming to specified styles and modalities. We use SMMQG to generate an MMRAG dataset of 1024 questions over Wikipedia documents and evaluate state-of-the-art models using it [...]


"EuroLLM: Multilingual Language Models for Europe" in all official European Union languages, trained on Wikipedia as "high quality data"

From the abstract:[12]

"The quality of open-weight LLMs has seen significant improvement, yet they remain predominantly focused on English. In this paper, we introduce the EuroLLM project, aimed at developing a suite of open-weight multilingual LLMs capable of understanding and generating text in all official European Union languages, as well as several additional relevant languages. We outline the progress made to date [...] Additionally, we release our initial models: EuroLLM-1.7B and EuroLLM-1.7B-Instruct and report their performance on multilingual general benchmarks and machine translation."

From the paper:

"To train the EuroLLM models, we collect and filter data from various sources for all supported languages. The data included in the final corpus can be divided into four categories: web data, parallel data, code / math data, and high-quality data
[...]
High-quality Data. Regarding higher quality data, we use the Wikipedia [sic] for all languages and the arXiv [...and two other sources] for English."


"Fake news, an internet troll, and a conspiracy theory about 'Wikipedia's Intentional Distortion of the History of the Holocaust'"

From the abstract:[13]

"In 2023, an essay alleged Wikipedia's 'intentional distortion' of the Holocaust. Subsequently, the Wikipedia community largely dismissed these claims during a formal investigation. The allegations repeated a narrative of a former Wikipedia volunteer banned from all Wikimedia projects for unethical behavior. While Wikipedia undoubtedly contains errors in its coverage of the Holocaust, there is no convincing evidence to prove that most of it is 'intentional', or that it can be attributed to the parties identified by the essay authors."

A response by one of the Wikipedia editors criticized in the 2023 article. See also our review at the time, and further coverage in this issue's "In the media", also about recent public statements by the two authors of the 2023 article.

References

  1. ^ Yang, Puyu; Colavizza, Giovanni (2024-01-18). "Polarization and reliability of news sources in Wikipedia". Online Information Review. 48 (5): 908–925. doi:10.1108/OIR-02-2023-0084. hdl:11585/953887. ISSN 1468-4527.
  2. ^ Langlais, Pierre-Carl; Hinostroza, Carlos Rosas; Nee, Mattia; Arnett, Catherine; Chizhov, Pavel; Jones, Eliot Krzystof; Girard, Irène; Mach, David; Stasenko, Anastasia; Yamshchikov, Ivan P. (2025-06-02). "Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training". arXiv:2506.01732 [cs.CL].
  3. ^ Kandpal, Nikhil; Lester, Brian; Raffel, Colin; Majstorovic, Sebastian; Biderman, Stella; Abbasi, Baber; Soldaini, Luca; Shippole, Enrico; Cooper, A. Feder; Skowron, Aviya; Kirchenbauer, John; Longpre, Shayne; Sutawika, Lintang; Albalak, Alon; Xu, Zhenlin; Penedo, Guilherme; Allal, Loubna Ben; Bakouch, Elie; Pressman, John David; Fan, Honglu; Stander, Dashiell; Song, Guangyu; Gokaslan, Aaron; Goldstein, Tom; Bartoldson, Brian R.; Kailkhura, Bhavya; Murray, Tyler (2025-06-05). "The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text". arXiv:2506.05209 [cs.CL].
  4. ^ II, Michael J. Bommarito; Bommarito, Jillian; Katz, Daniel Martin (2025-04-10). "The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models". arXiv:2504.07854 [cs.CL].
  5. ^ Min, Sewon; Gururangan, Suchin; Wallace, Eric; Shi, Weijia; Hajishirzi, Hannaneh; Smith, Noah A.; Zettlemoyer, Luke (2024-07-31). "SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore". arXiv:2308.04430 [cs.CL]. (also as "Spotlight Poster" at ICLR 2024)
  6. ^ Chen, Yan; Farzan, Rosta; Kraut, Robert; YeckehZaare, Iman; Zhang, Ark Fangzhou (May 2024). "Motivating Experts to Contribute to Digital Public Goods: A Personalized Field Experiment on Wikipedia". Management Science. 70 (5): 3264–3280. doi:10.1287/mnsc.2023.4852. ISSN 0025-1909.
  7. ^ Nigatu, Hellina Hailu; Canny, John; Chasins, Sarah E. (2024-05-11). "Low-Resourced Languages and Online Knowledge Repositories: A Need-Finding Study.". Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. CHI '24. New York, NY, USA: Association for Computing Machinery. pp. 1–21. arXiv:2405.16669. doi:10.1145/3613904.3642605. ISBN 9798400703300.
  8. ^ Martin, Alexander; Kriz, Reno; Walden, William Gantt; Sanders, Kate; Recknor, Hannah; Yang, Eugene; Ferraro, Francis; Durme, Benjamin Van (2025-04-01). "WikiVideo: Article Generation from Multiple Videos". arXiv:2504.00939 [cs.CV]. / Dataset, repo
  9. ^ Jiang, Fan; Drummond, Tom; Cohn, Trevor (2025-02-27). "Few-Shot Multilingual Open-Domain QA from 5 Examples". arXiv:2502.19722 [cs.CL].
  10. ^ Borisov, Vadim; Schreiber, Richard H. (2024-07-19). "Open Artificial Knowledge". arXiv:2407.14371 [cs.CL]. / Poster at ICML 2024, dataset and report
  11. ^ Wu, Ian; Jayanthi, Sravan; Viswanathan, Vijay; Rosenberg, Simon; Pakazad, Sina Khoshfetrat; Wu, Tongshuang; Neubig, Graham (November 2024). "Synthetic Multimodal Question Generation". In Yaser Al-Onaizan; Mohit Bansal; Yun-Nung Chen (eds.). Findings of the Association for Computational Linguistics: EMNLP 2024. Findings 2024. Miami, Florida, USA: Association for Computational Linguistics. pp. 12960–12993. doi:10.18653/v1/2024.findings-emnlp.759.
  12. ^ Martins, Pedro Henrique; Fernandes, Patrick; Alves, João; Guerreiro, Nuno M.; Rei, Ricardo; Alves, Duarte M.; Pombal, José; Farajian, Amin; Faysse, Manuel; Klimaszewski, Mateusz; Colombo, Pierre; Haddow, Barry; de Souza, José G. C.; Birch, Alexandra; Martins, André F. T. (2025-01-01). "EuroLLM: Multilingual Language Models for Europe". Procedia Computer Science. Proceedings of the Second EuroHPC user day. 255: 53–62. doi:10.1016/j.procs.2025.02.260. ISSN 1877-0509.
  13. ^ Konieczny, Piotr (2025). "Fake news, an internet troll, and a conspiracy theory about 'Wikipedia's Intentional Distortion of the History of the Holocaust'". Holocaust Studies: 1–39. doi:10.1080/17504902.2025.2511459. ISSN 1750-4902. Closed access icon / Author's copy



Reader comments

File:The Pope from behind (Unsplash).jpg
Nacho Arteaga
cc-1.0
99
500

All Sinners, a future, all Saints, a past

This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, GN22, CAWylie, Ollieisanerd and Shuipzv3.

The angel's having fun, making winners out of Sinners (April 27 to May 3)

Rank Article Class Views Image Notes/about
1 Sinners (2025 film) 2,205,285 Finally atop this list after three weeks, and just when it gave the box office summit to #3. Like all Ryan Coogler (pictured) movies, it has Michael B. Jordan spearheading a very Black cast, telling the Prohibition Era story of two brothers whose juke joint has its opening ruined by violent guests. Lauded for blending genres, as this supernatural thriller is surprisingly heavy on people playing the blues, with great filmmaking and acting, Sinners made over $200 million in North America alone, a relief to those that think that Hollywood hardly gives room to original stories compared to adaptations and franchises.
2 2025 Canadian federal election 2,138,961 This election was initially projected to go to Pierre Poilievre and the Conservatives, but with the election of staunch right-wing conservative US president Donald Trump in 2024, Canadians understandably wanted a prime minister with opposing political views. Hence, #5 easily won 160 seats to Poilievre's 119, winning the election.
3 Thunderbolts* 2,000,832 The Marvel Cinematic Universe returns taking the name of a team that used to be a group of villains disguised as heroes (who are mostly if not all ignored by the movie; the already introduced founder Baron Zemo is not involved, and while Geraldine Viswanathan plays a character named Mel the movie doesn't confirm if she's one of the original Thunderbolts) and is now "Marvel's Suicide Squad" with villains and anti-heroes doing black ops. In this case, Yelena Belova, John Walker, and Ava Starr discover that their mission was actually their former employer Valentina Allegra de Fontaine trying to make them kill each other, forcing them to form an uneasy alliance with the help of Bucky Barnes and the Red Guardian to find out what is Val's interest in some mysterious guy named Bob (and Taskmaster only appears for a few minutes in spite of being in the poster!). While hype was low given the current state of the MCU, Thunderbolts* surprised critics and viewers alike with the usual action scenes and banter being complemented by strong character moments and thoughtful discussion of worthiness and mental health, leading to positive reviews and an opening weekend that will possibly already cover the movie's $180 million budget.
4 Vaibhav Suryavanshi 1,411,487 Last year the Report decided to give an exception for the Indian Premier League regarding breaking the excessive mobile views threshold, only for the views to plummet in the championship's final weeks, and this low performance repeated this year to show we shouldn't have bothered. Only now India pushes up something regarding its national cricket tournament, as this Rajasthan Royals player who had already become the youngest to sign an IPL contract also managed to score a century at just 14.
5 Mark Carney 1,152,757 An economist by trade, Carney led the Liberal Party of Canada to win the most seats in #2, though just short of a majority.
6 Deaths in 2025 985,294 From #7's last Urbi et Orbi:
"Christ is risen! These words capture the whole meaning of our existence, for we were not made for death but for life. (...) How much violence we see, often even within families, directed at women and children! How much contempt is stirred up at times towards the vulnerable, the marginalized, and migrants! On this day, I would like all of us to hope anew and to revive our trust in others, including those who are different than ourselves, or who come from distant lands, bringing unfamiliar customs, ways of life and ideas!"
7 Pope Francis 870,473 The world is still mourning the death of a pontiff who, unbridled by precedent, did things his own way. Instead of living at the Apostolic Palace, he decided to reside in the modest Domus Sanctae Marthae apartment instead, which is usually reserved for Cardinals during conclaves like the one beginning May 7. He took a non-composed regnal name that had never been chosen by any pontiff before, the name of a priest dedicated to the poor and helpless, signaling how he would govern. He even became the first pope to release an (progressive rock) album and a memoir and chose to be buried at Santa Maria Maggiore instead of the Sistine Chapel. Now, one question remains: will the Cardinals elect a Pope Francis II?
8 Mark Kerr (fighter) 849,585 This retired American mixed martial artist, who has won two UFC heavyweight titles and numerous wrestling medals, gets a second biopic made about him. The first was a 2002 HBO documentary and the second is a feature film to be released in October, starring Dwayne Johnson (pictured) in prosthetic makeup as Kerr.
9 Clair Obscur: Expedition 33 680,753 This French turn-based role-playing game, rendered on the Unreal Engine 5, was released April 24. Within 12 days of its release, the dark fantasy Belle Époque has received critical acclaim and has sold two million units.
10 2025 Australian federal election 647,057 Happening in the same week as #2, this election also saw the red party triumphing over the blue party. The incumbent government formed by the Australian Labor Party, led by prime minister Anthony Albanese, was re-elected in a landslide victory. Of the oppositional Coalition, the Liberal Party of Australia (which despite its name is conservative) lost a quarter of their seats, including of its leader Peter Dutton, mirroring Pierre Poilievre. Commentators have attributed the result to voters rejecting Dutton's Trumpism-like views and rhetoric in the election campaign, not helped by the fact that many domestic commentators believe that the Coalition ran a pretty poor campaign with several backflips and a lack of solid policy.

From the houses of the holy, we can watch the white doves go (May 4 to 10)

Rank Article Class Views Image Notes/about
1 Pope Leo XIV 17,326,227 Habemus papam! The 2025 Papal conclave ended with a welcome surprise: the first #7 from the United States (he's also been a Peruvian citizen since 2015), who also is the first Augustinian pontiff to follow how his predecessor was the first Jesuit pontiff. Robert Francis Prevost was born and raised in Chicago, Illinois, a Villanova University graduate, Bishop of Chiclayo, Peru, Prefect of the Dicastery for Bishops, President of the Pontifical Commission for Latin America, Cardinal Bishop of Albano, and a lifelong fan of the Chicago White Sox.
2 Thunderbolts* 2,422,210 The latest Marvel Cinematic Universe offering, featuring a bunch of misfits trying to evade their former contractor who decided to kill them off, had positive responses from both critics and audiences, and quickly became the fifth highest-grossing movie of 2025, racking up $272 million. Yes, the asterisk is there for a reason, even if Marvel somehow decided to spoil it by changing some publicity and websites to a new title that only appears in the closing scene.
3 Pope Francis 2,158,492 The late Argentine #7 would be happy to have found out his successor (#1) has pledged to continue his groundbreaking church reforms and emphasis on serving the poor and marginalized. And as an aside concerning this list, in the early days of the Top 25 Report in 2013, redirects or renamed pages weren't counted together. Hence, unlike how this week all the views for "Robert Francis Prevost" were counted as part of "Pope Leo XIV", back in the last conclave result instead "Pope Francis" and "Jorge Mario Bergoglio" had separate entries that were #2 and #3 to a Google Doodle-boosted Douglas Adams.
4 List of popes 1,516,409 #1 is the 267th leader of the Catholic Church, and this page lists all his predecessors. Some interesting names come up, such as Hilarius, Simplicius, and even Lando! (If you enjoy this article, then I suggest you check out other interesting pope lists such as the list of lists of popes and who could forget the classic list of sexually active popes)
5 Pope Leo XIII 1,489,575 The previous #7 to have a leonine papal name was the first of the 20th century, reigning from 1878 to 1903, and #1 chose to homage him because he developed Catholic social teaching. Leo XIII was also known for his social justice concerns, particularly regarding worker rights.
6 Sinners (2025 film) 1,376,677 A brief break from all the holiness and saint-related stuff on this Report by returning to the movies, with the one that #2 surpassed at the box office. The film features twin brothers Michael B. Jordan and "Michael C. Jordan" — before you ask, his (or "their"?) father is A. Jordan, and that Michael Jordan is J. Jordan – returning to their Mississippi hometown and being confronted by a supernatural evil.
7 Pope 1,294,299 #1 was chosen to be the Bishop of Rome, chief of state of the Vatican, and leader of the Catholic Church.
8 2025 India–Pakistan conflict 1,242,909 For four days, the South Asian neighbors with a history of armed disputes escalated a standoff started by a terrorist attack into launching missiles and drones at each other. There was even a fighter jet battle, but no mobilization of ground troops by the time the two Director Generals of Military Operations talked to each other and eventually agreed on a ceasefire. In spite of accusations of breaking the truce (just like during the conflict both sides said the other started aiming at civilian targets), things thankfully haven't flared up again.
9 Pope Benedict XVI 1,205,310 The other two #7's of the 21st century, the Pole who was one of the longest lasting from 1978 until his death in 2005, and his German successor who resigned for health concerns in 2013, opening up for #3's emergence and remaining as pope emeritus until dying in 2022.
10 Pope John Paul II 1,201,970

If you know the Truth of the Lion(May 11 to 17)

Rank Article Class Views Image Notes/about
1 Pope Leo XIV 1,792,195 The new American pope is social-media savvy, fond of rapid-fire tweeting on his new @Pontifex Twitter (X) account and had his inauguration mass on May 18.
2 Cassie Ventura 1,532,907 On May 12, this popular American singer, dancer, actress, and model began testifying in court against the abuse of #10.
3 Eurovision Song Contest 2025 1,479,059 Switzerland received the annual showcase of new songs by the countries of Europe (plus Australia and to the chagrin of politicized people, Israel), often with extravagant musical numbers like a wiggling Estonian comparing his lover to coffee. Austria won with the song "Wasted Love", which was performed by countertenor JJ.
4 Thunderbolts* 1,099,885 The latest Marvel Cinematic Universe offering, where a bunch of misfits who mostly only know how to punch and shoot, such as former Russian agents Yelena Belova and Alexei Shostakov (whose portrayers Florence Pugh and David Harbour are to left), go after the director of the FBI who is trying to erase evidence of her former shady businesses. Critics and audiences alike praised the film's grounded and character-driven take, that has an unusual climax that doesn't involve beating up the bad guy, and having just broken $325 million to surpass #6 at the global box office, remains to be seen how much Thunderbolts* makes by the time The Fantastic Four: First Steps arrives.
5 Final Destination Bloodlines 1,047,191 The trailer for this movie has the description "Death is a relentless son of a *****.", given that 14 years after the fifth and seemingly final of all destinations, again we see the Grim Reaper making people who survived a disaster die in even more painful and elaborate ways. The plot has an extra level this time around: a woman prevented a tower explosion in 1968, and everyone who would've perished, along with their descendants, was targeted in the following decades, and now it's time for her grandkids to meet grisly fates. Reviewers and audiences approved Final Destination Bloodlines trying to add emotion and depth with the gallons of blood, including a send-off to the one recurring actor, Tony Todd (who plays coroner William Bludworth and died last year), and it not only took the top of the box office from the above movie, but already paid itself with $100 million worldwide costing half of that.
6 Sinners (2025 film) 958,864 Completing the current top 3 of the North American box office is the Michael B. Jordan-starring acclaimed combination of period piece, blues musical, and supernatural horror that between this unusual mixture and not being an adaptation of any sorts has done respectful numbers.
7 Deaths in 2025 943,421 Let me tell you now
I never felt like this before
Something's Got a Hold on Me that won't let go
I believe I'd die if I only could...
8 Andor (TV series) 881,335 The end came for the Disney+ series that told the story of one of the protagonists of Rogue One in a way that was surprisingly very grounded for Star Wars, as one could find rebellious/criminal activities, politics and even a Bollywoodian wedding but no Jedi. Andor was widely considered one of the best things to come out of the franchise.
9 2025 India–Pakistan conflict 716,668 While the Middle East has that awful war that doesn't end, South Asia had two countries that frequently go at odds with each other only battling for four days with drones and missiles before deciding to call a ceasefire.
10 Sean Combs 702,035 The trial for the rapper who was charged for decades of unruly sexual behavior, usually in his Diddy parties, has begun. One of the witnesses giving a testimony was the first of the lawsuits that led to his imprisonment: his ex #2, who was downright filmed being beaten by Combs in a hotel hallway.

Life's just a blast, it's movin' really fast (May 18 to 24)

Rank Article Class Views Image Notes/about
1 Mission: Impossible – The Final Reckoning 1,284,238 After one extra year waiting due to the 2023 Hollywood labor disputes, Ethan Hunt resumes his mission of stopping an artificial intelligence that, in true film tradition, has decided to kill off all humans. Along the way he faces life-threatening situations (unarmed fight against armed guards, jumping on the Arctic ocean, roaming through a submarine wreck about to plummet into the abyss, being forced to board a biplane and hang precariously from it) and sees many references to the previous movies in the franchise - except the one most people dislike - including the unexpected return of a minor character of the original. From its runtime of nearly 3 hours to a massive budget of at least $300 million, it's clear there's an intention to close off the series in a high note, and thus reviews are positive and The Final Reckoning is expected to make an impact at the box office after a $200 million opening, as even if it opens against a strong family-friendly competitor (#7), the next few weeks don't have something comparable to how Barbenheimer made the previous one earn less than expected.
2 George Wendt 1,282,245 Known for his role on the sitcom Cheers, this American actor died on May 20, aged 76.
3 Final Destination Bloodlines 1,247,750 14 years after Final Destination 5 comes the return of the franchise about people who evaded death suffering even more painful demises, this time with a labored plot deal where after a premonition makes a tower explosion be avoided in the 1960s, Death spends decades going after the survivors - including coroner William Bludworth, in a scene that's his actor Tony Todd saying goodbye to audiences before a terminal cancer took his life - before targeting the seer and her family. Bloodlines got the best reviews of the series and has become its highest-grossing installment.
4 Eurovision Song Contest 2025 1,057,319 Switzerland received a showcase of songs by each European country (plus Australia and Israel), often with performances full of spectacle, and at times pushing the boundaries of a supposedly family-friendly event (this year had a song about "Serving Kant"...). Austria won with "Wasted Love".
5 Deaths in 2025 980,871 Do you bury me when I'm gone?
Do you teach me while I'm here?
Just as soon as I belong
Then it's time I Disappear...
6 Mission: Impossible (film series) 940,550 In 1966, the spy show Mission: Impossible premiered on CBS. 30 years later, Paramount released a film adaptation starring Tom Cruise that started off one of their most successful franchises, revolving around agent Ethan Hunt either being disavowed by the Impossible Mission Force and having to clear his name or facing rogue ex-IMF agents, enduring some incredibly dangerous situations along the way. The eighth and possibly last movie of the series is up there at #1. (cue original theme music).
7 Lilo & Stitch (2025 film) 926,580 Disney again remade one of their animated features in live-action, but unlike Snow White this had good expectations and was deemed a worthy effort even if still inferior to the cartoon. Again we see an alien created by a mad scientist to be the ultimate killing machine, who after a crash landing in Hawaii tries to avoid capture by his creator passing as a dog adopted by Lilo Pelekai, a fairly peculiar orphan who names him Stitch, and while dealing with the beastie's destructive ways teaches him moral lessons regarding the importance of Ohana. Given Lilo & Stitch is expected to make a lot of money (the opening weekend alone surpassed $300 million worldwide, and everywhere there is Stitch merchandise to buy!), one could expect a follow-up based on all the extra material that the original movie got, namely a TV series and a few direct-to-video sequels.
8 Sinners (2025 film) 825,897 Interest is still high for the acclaimed combination of Blues musical and supernatural horror in 1930s Mississippi, as it's still in the North American top 5 and gets all these views here on Wikipedia.
9 Shai Gilgeous-Alexander 740,071 The Oklahoma City Thunder is trying to reach the NBA finals thanks to the heroics of this Canadian, whose impressive play during the regular season (he lead the league in scoring with an average of 32.7 points per game, helping the Thunder get to a league-best 68 wins) explains his selection as the NBA Most Valuable Player.
10 Thunderbolts* 680,794 In spite of a good reception by both reviewers and audiences, the fact that the latest Marvel Cinematic Universe installment has a cast of mostly supporting characters from previous works made it not as attention-grabbing to general viewers or those disappointed by Marvel's recent output, even if the studio decided to suddenly change the promotion to feature a title that appears in the final scene of Thunderbolts* hoping it would be another draw. Thus Thunderbolts* will probably finish slightly below $400 million, which is still a good number no matter if lower than many past MCU movies, including this year's Captain America: Brave New World.

Lord Almighty, I feel my temperature risin', mmm (May 25 to 31)

Rank Article Class Views Image Notes/about
1 Lilo & Stitch (2025 film) 1,391,233 Disney's live action remakes get the top spot of this list for the first time since Mulan at the heights of the pandemic. The story of Lilo Pelekai, a peculiar Hawaiian orphan, and Experiment 626 aka Stitch, an alien designed to be the ultimate destructive force but that becomes Lilo's pet and companion, had already been a cartoon successful enough to get a TV series, direct-to-video sequels, and metric loads of Stitch merchandise. The director of Marcel the Shell with Shoes On headed its transition to live action, which had a mostly positive reception even if considered inferior to the cartoon (along with general discussion regarding dropped characters and plot changes, this here writer questioned why the movie tried to either enhance the existing sad moments or create new ones) and utterly wrecked the box office, setting up a Memorial Day record in the United States and making over $300 million worldwide in a single weekend. Approaching the earnings of The Minecraft Movie or downright surpass one billion seems likely!
2 Brigitte Macron 1,389,387 The wife of #4, she was filmed with her hand in her husband's face after their plane landed in Hanoi, Vietnam. The incident prompted renewed interest into the circumstances behind the couple's first meeting: she was a 39-year old teacher and he was a 15-year old student.
3 Mission: Impossible – The Final Reckoning 1,287,271 In 2002, Lilo & Stitch opened against Minority Report, which narrowly beat it on opening weekend and ultimately made more money overall. 23 years later the remake (#1) again shared theaters with Tom Cruise (seen to the left with director Christopher McQuarrie), in the eighth and final movie of his signature franchise, where Ethan Hunt does his usual death-defying stunts trying to prevent a rogue artificial intelligence from wiping out mankind. This time, between Stitch being carried by family-friendliness and nostalgia and The Final Reckoning having less screenings due to a runtime nearing 3 hours, Cruise was at a disadvantage. But he still had a strong showing, finishing the opening weekend with over $200 million worldwide, and the studio hopes the film's earnings have legs given The Final Reckoning ranks high on the list of most expensive films with a $300 million budget.
4 Emmanuel Macron 1,064,164 The president of France since 2017, he was filmed with his wife's hand in his face as they arrived in Vietnam for an official visit. He later said they were joking around.
5 Sirens (2025 TV series) 1,026,964 Meghann Fahy, seen last year in the show The Perfect Couple, and earlier this year in the thriller Drop, stars in this Netflix miniseries as a woman who spends a weekend at her sister's beach house and is unsettled by how things go there.
6 Deaths in 2025 992,354 Is there someone you know
Your loving them so
But taking them all for granted?
You may lose them one day
Someone takes them away
And they don't hear the words you long to say
7 Mission: Impossible (film series) 920,888 #3 is the eighth chapter in this franchise based on an old spy TV series, that started with four movies that had different directors and sensibilities, and then went to four where Christopher McQuarrie upped the ante on absurd Tom Cruise stunts. Along with people doing dangerous things the series is also marked by secret agents getting into trouble with their bosses, advanced gadgets, and masks that perfectly replicate faces.
8 Phil Robertson 876,254 RIP to Mr. "Happy Happy Happy" who died at age 79 on May 25. Phil was probably best known as the patriarch of the Duck Dynasty Robertson family. The reality television series about hunting and their hunting company that makes duck calls aired from 2012 to 2017. What may not have been known about Phil was that he was a great quarterback in college for Louisiana Tech (rivaling teammate Terry Bradshaw), but Phil loved to hunt wild game more and quit the sport.
9 Loretta Swit 769,799 Known for her role on M*A*S*H, this American actor died on May 30, aged 87.
10 Final Destination Bloodlines 729,777 Hollywood has very inflated budgets in the realms of action, adventure and science fiction, but not horror movies, which through a combination of modest budgets and a faithful audience are quite profitable. Case in point, the sixth in a franchise revolving around people avoiding dying in accidents only for Death itself to ensure they are killed in elaborate and messy ways, considered by critics and audiences alike one of the best in the series. It was on the expensive side of the genre with $50 million but that's still of only half of what John Wick: Chapter 4 cost... and has made nearly five times this budget at the box office, while also outgrossing the Snow White remake that was nowhere as liked as our #1.

Exclusions

  • These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.

Most edited articles

For the April 19 – May 19 period, per this database report.

Title Revisions Notes
Pope Leo XIV 3199 The new Pontifex, that like John Paul II became pope right as his country suffered through government turmoil.
2025 papal conclave 2725 Like the two previous cardinal elections that chose Benedict XVI and Francis, this only lasted for two days. And thankfully without outside violence interfering like in Conclave and Angels & Demons.
Deaths in 2025 1916 Along with Pope Francis, the period had the departures of Sabu and Joe Don Baker.
2025 Pahalgam attack 1747 A terrorist attack in India, which Pakistani armed group The Resistance Front claimed responsibility for, led to a diplomatic crisis and armed skirmishes along the Line of Control, along with drone and missile strikes. A ceasefire was called before it escalated to full-on war.
2025 India–Pakistan strikes 1296
Herzgewächse 1295 One user decided to improve the page on a 1911 Austrian Lied (i.e. a poem to which music is added).
2025 Canadian federal election 1294 The Conservative Party of Canada was initially projected to comfortably win, but the resignation of Justin Trudeau and replacement by Mark Carney, combined with tariffs, a trade war and threats of annexation of Canada from the United States swayed the outcome. Still, the Conservative Party picked up seats and finished in second place, though its leader Pierre Poilievre lost his seat of Carleton in Ontario. To return Poilievre to parliament, Damien Kurek, the Conservative MP elected to Battle River—Crowfoot in Alberta (which is halfway across the country and ultra-safe at more than 80% of the vote), announced his intention to resign, which will trigger a by-election where Poilievre can be parachuted in.
Death and funeral of Pope Francis 1214 The expected follow-up to His Holiness dying. Francis became the first pope since the 17th century to be buried at the Basilica of Santa Maria Maggiore instead of the Basilica of Saint Peter.
2025 Australian federal election 1178 The incumbent Labor Party led by prime minister Anthony Albanese took a comfortable 94 out of 150 seats. The Coalition had 43 seats, and one of its partners, the Liberal Party of Australia, will have to elect a new leader from its depleted ranks. The Greens is reduced to one seat, down from four, and will also have to elect a new leader. Eleven seats are won by smaller parties and independents, and four seats remain too close to call. In the Australian Senate, Labor is set to pick up seats from the Coalition, and will be able to pass legislation with support from the Greens, without having to rely on the Coalition or other parties and independents.
2025 World Snooker Championship 1126 The Crucible Theatre in Sheffield received this tournament, and the first Asian snooker world champion emerged in Zhao Xintong of China.
List of dignitaries at the funeral of Pope Francis 1101 Representatives from 142 out of 193 UN member states, along with Taiwan, Kosovo, and Palestine, went to pay homages to the Pontifex.
Department of Government Efficiency 1095 The controversial company headed by Elon Musk, set up to cut down expenses of the U.S. government. DOGE has claimed to have saved $160 billion, while independent analysis estimated these savings cost taxpayers $135 billion.
2025 NBA playoffs 1040 Basketball's best started playing the postseason, and going against the NBA's usual lack of parity the eventual final four ensured that all years of the 2020s so far will have a different champion. And 2025's will either be the first title for the New York Knicks since 1973, the first for the Oklahoma City Thunder if the one from the Seattle SuperSonics doesn't count, and the first altogether if it's the Indiana Pacers or Minnesota Timberwolves.
Pope Francis 1032 The pope is pop, the pope is pop, pop doesn't spare anyone!
2025 Romanian presidential election 1030 Last year's election was aned citing alleged Russian meddling. Nicușor Dan was chosen and will take office on May 26.



Reader comments

File:Ճնճղուկները ցայտաղբյուրի մեջ.jpg
Areg1974
CC-BY-SA 4.0
50
50
400

Call for candidates is now open: Wikimedia Foundation Board of Trustees

This article was originally published by the Wikimedia Foundation Elections Committee on June 17, 2025 on Diff and licensed CC-BY-SA 4.0.
Photo by Areg1974, CC-BY-SA 4.0

The call for candidates for the 2025 Wikimedia Foundation Board of Trustees selection is now open from June 17, 2025 – July 2, 2025 at 11:59 UTC. The Board of Trustees oversees the Wikimedia Foundation’s work, and each Trustee serves a three-year term. This is a volunteer position.

External videos
video icon What does the world need from Wikimedia now?

This year, the Wikimedia community will vote starting in August and ending in September 2025 to fill two (2) seats on the Foundation Board. You can see the timeline of the full selection process on Meta-wiki.

Traits

Wikimedia is a global movement and seeks candidates from the broader community. Ideal candidates are thoughtful, respectful, community-oriented, and align with the Wikimedia Foundation mission. The Board is looking for candidates with experience guiding organizational strategy, navigating change, and driving long-term strategic objectives. Candidates should think about what experiences and perspectives they will bring to the Board.

The Board would like to find perspectives and voices that are essential but underrepresented in our movement. Accordingly, all candidates will be asked to include statements in their application that speak to their experiences in the world, the perspectives they bring to the movement, and share how these help them to include those not already here.

Time Commitment for Trustees

Trustees serve a three-year term and can serve up to three consecutive terms. The expectation is that Trustees serve on at least one of the Board’s committees. The time commitment is about 150 hours per year, excluding travel. This time is not evenly spread throughout the year. The time is concentrated around meetings.

Requirements to be a Trustee

English is the language of business for the Board. Candidates must be fluent in written and spoken English. Previous experience serving on a collective decision-making body, especially Boards or committees, and significant experience in Wikimedia (or equivalent) movement building and organizing are expected from candidates. You can review the candidate guidelines here.

How to Apply

Candidates from all projects and communities who meet the criteria to become a Wikimedia Trustee are welcome to apply. Could you – or someone you know – be a good fit to join the Wikimedia Foundation’s Board of Trustees? Encourage them to run for election. Candidates can find information and submit their nomination on the candidate application page. If you want to learn more about the role of Trustees or have questions, refer to this candidate resources page.

Community questions for candidates

The questions for the Board of Trustees candidates to answer have been published as part of the candidate application. From the list of questions submitted by the community, the Election Committee selected 5 questions which the candidates are required to answer. Learn more about the community questions on this Meta-wiki page.



Reader comments

File:Ruwiki ads in Rostov-on-Don on Sivers Avenue.jpg
Altes
CC-By-SA 4.0
90
450

Russian Wiki-fork flails, failing readers and editors

RuWiki logo
RuWiki's user interface allows font customisation, listening to audio articles, and invites the reader to become an editor.

In 2023, Vladimir Medeyko, former head of the Russian Wikimedia chapter, launched RuWiki – a heavily promoted, ideologically curated fork of Russian Wikipedia. Stealing its nickname and copying 1.9 million articles, RuWiki – the fork – promised utopia: neutrality, professionalism, and paid editors. Instead, it delivered censorship and centralized control.

Despite the billboards, art installations, and AI hype, RuWiki remains a political clone: polished, controlled, and eerily quiet. As Russian Wikipedia still outpaces it by millions of searches, the question lingers: will the Kremlin try to force RuWiki's success by making its rival disappear?

Just like its older sister, the English Wikipedia, the Russian Wikipedia has spawned several copycats – most notably the far right Tradicia (2007) and the pro-government Runiversalis (2022). And finally, the spork to end all sporks: the project launched in 2023 by Vladimir Medeyko, the former director of the now-defunct Russian Chapter. Medeyko is currently globally blocked in all Wikimedia projects.

The clone is called RuWiki, which was the informal nickname for Russian Wikipedia used by its editors. The fork copied content not only from Russian Wikipedia, but also from Wikipedia editions in the minority languages of the Russian Federation, including Bashkir and Tatar.

In pre-launch interviews, Medeyko promised a utopian version of Wikipedia: more comprehensive, neutral, and up to date, and with AI support für users. Editors would be paid for their contributions and be free from the internal conflicts that usually plague community-run platforms.[1]

RuWiki initially imported over 1.9 million articles from Russian Wikipedia under the Creative Commons CC BY-SA 3.0 license. However, unlike its predecessor, it does not allow open editing. A limited group of approved editors maintains the site, and articles on politics, history, LGBTQIA+ and current events are revised under ideological supervision. According to Novaya Gazeta Europe, "all articles were edited… anything that could raise questions from the authorities was removed."[2]

large billboard near a road
A billboard in Moscow advertising RuWiki

An extensive promotional campaign accompanied RuWiki's launch. In Moscow and other large Russian cities, billboards displayed facts from the encyclopedia, along with RuWiki's name and logo. For six months, a train decorated inside and out in RuWiki’s branding ran on the Moscow Metro's Sokolniki line – this is as prime advertising beyond the TV as you can get. An art installation titled Smart Storage, created with RuWiki's involvement, was exhibited at the Nikola-Lenivets art park throughout the second half of 2023. The project also opened a small pavilion at the Russian Exhibition of Achievements of the National Economy.[2]

Funding for RuWiki, according to the same source, was linked to media mogul Yury Kovalchuk, a close ally of Vladimir Putin. An estimated 2 billion rubles (about $22.2 million) were invested in the project. Other reports corroborate that RuWiki is run by individuals previously employed by Kovalchuk's media holdings. While official statements describe it as an "objective encyclopedia", the content tells a different story. Articles on sensitive topics—such as the Russian invasion of Ukraine, political repression, or opposition movements—are either missing or rewritten to reflect state-approved narratives. For instance, the term "special military operation" replaces "war" when referring to the invasion of Ukraine.[2]

visualization of edited pages on Russian Wikipedia
The total number of Russian Wikipedia edited pages, 2024–2025

Two years after its launch, an independent blogger praised RuWiki's modern interface, which differs significantly from the traditional Wikimedia layout. Yet even avoiding the political articles, he wasn’t impressed even after testing the AI used to customise the user experience and to create article summaries:[3]

...In fact, I don't know what to say about Ruwiki. From a visual point of view, the site really looks much better than its Western counterparts. Both the Russian Wikipedia and the American one...[But] why do we need this analogue? In essence, this is a copy-paste of information from Wikipedia with rare exceptions, mainly related to political topics and 18+ (adult) issues.

visualization showing decline of edited pages in Bashkir Wikipedia
The number of Bashkir Wikipedia edited pages (by editor type) shows a significant decline; the total is about 1% of the Russian Wikipedia total.

The project also failed to build a volunteer community. Several sources confirm that editing is carried out during regular working hours, hinting that contributors are salaried or assigned rather than self-motivated.[4] Medeyko’s promise of a peaceful editing environment takes on a dark irony: there are no fights among the gravediggers in this cemetery.

The impact on the Wikipedia communities is mixed: while Russian and Tatar Wikipedia show a modest decline in edits in line with the decrease in registrations in the "mature Wikipedias",[5] Bashkir Wikipedia saw a significant drop – 29% in active editors. An anonymous source currently living in Russia confirms that this is caused by several of the most prolific Bashkir editors being lured by the paid editing in Medeyko's project.

Despite heavy advertising, RuWiki's popularity hasn't come close to Russian Wikipedia's. According to Yandex Wordstat, over the past month, the term "RuWiki" was searched just 35,336 times, while "Wikipedia" garnered 5,523,689 searches.[2] The Wikipedia model doesn't seem to thrive under the heavy hand of government, especially in the era of AI replacing the browser search.

In conclusion, the fork of Russian Wikipedia, despite the use of AI and money being poured into its development, failed to attract readers, wow reviewers or make a significant dent in the large Wikipedia communities. We shouldn't congratulate ourselves too quickly on a battle won, however. Sooner or later, there will be a reckoning as to why RuWiki has failed so far. It might go the way of the other prospective "Russian Wikipedia killer", the Great Russian Encyclopedia, which lost its state funding and is no longer available online.[6] Or the Russian government may decide that the best way to make RuWiki succeed is to block Wikipedia entirely.

Notes

  1. ^ Davydov, Oleg, "Encyclopedia 'Ruwiki' is launched in Russia. Its creator talks about haters and why it shouldn't be confused with 'Wikipedia'", lenta.ru. Retrieved 4 June 2025.
  2. ^ a b c d Basmanov, Alexey. ""Ruviki" leads to Kovalchuk and Bank Rossiya". novayagazeta.eu. Retrieved 3 June 2025.
  3. ^ Sorokin, Nikita. "Analysis and comparison of RuWiki and Wikipedia: why, and most importantly why". habr.com. Retrieved 2 June 2025.
  4. ^ "The Kremlin is rewriting Wikipedia". The Economist. Retrieved 4 June 2025.
  5. ^ Shah-Quinn, Neil. "Investigating the 2021–present registration decline". people.wikimedia.org. Wikimedia Foundation. Retrieved 3 June 2025.
  6. ^ "The death of the 'Wikipedia killer'". Meduza. 17 July 2024. Retrieved 20 October 2024.



Reader comments

File:Dessert egg roll (6848762093).jpg
stu_spivack
CC BY-SA 2.0
40
40
510

EggRoll97's RfA2 debriefing

This is my RfA debrief, I suppose. I ran the gauntlet, so to speak, in April. My RfA closed on the 19th, with a 65.8% support percentage. This put it in the discretionary range, and a crat chat was opened, which ended unanimously in no consensus to promote.

Pre-RfA

Before I pulled the trigger, so to speak, on a nomination, I was definitely nervous, and not really sure if it would even stand a chance. I decided to do so anyways, even sans nominator, because I wanted to help out, and I didn't see the AELECT as really being too much of an indication either way on which way an RfA would go. I'm definitely not a big writer or anything, as can be clearly seen by the fact that this very reflection is only around a few paragraphs, so I'm sure my RfA nomination and questions answers seemed pretty small to a lot of people casting !votes.

During the RfA

During the RfA, I had a lot of support and more opposition than I originally expected. I knew I would face a significant opposing force due to content creation, though I never thought it would be as significant as it turned out. I really appreciated the supportive comments I received privately, and the polite tone that many of the oppose !votes even had. It also helped with the stress that did arise to have others that responded to some of the opposition, such as those in the technical areas, like 0xDeadbeef and Pppery, among others.

The big question

The biggest question, I guess, is "do you plan to run again", which I think I've seen float around a bit near the end of the RfA. The answer is almost definitely no. I think one shot through a standalone RfA may well be where I draw my limit. I don't think I could stomach putting myself through it again. If at some point the viewdeleted rights are unbundled from the admin toolset, I would likely put myself before the community again for solely those unbundled rights, but absent that occurring, I am doubtful I would bite the bullet again.

As to where I go from here, I suppose just where I always have been. Tucked away in the backend of the encyclopedia. While the admin toolset would have made many things far easier (viewdeleted to view deleted revisions, and editinterface to edit MediaWiki pages, for example), they are ultimately tools I can obviously live without, and can make requests of admins to do for me, even if doing so is sometimes tedious.

The differences with AELECT

It's been pointed out on my talk page that I'm the first person to have experienced going through AELECT, then going through a traditional RfA, and I received some questions about the two processes and my experience with them. I definitely found AELECT to be less stressful, though I felt it had less focus on the individual candidate and more on the slate of candidates. On the other hand, a traditional RfA vets exclusively the specific candidate being put up for adminship, and tends to also show more explicit participation than I received in my candidacy in AELECT. As surprised as some may be at this, I actually found a traditional RfA to be less nerve-wracking, as it was a straight week of a nomination, and I could easily see all the !votes coming in, with reasons attached that I could read through and reflect on in the moment. AELECT, though, had a week of voting, where I was completely in the dark. I'm sure that will be more helpful for some, and I found it to be an interesting new way to request adminship, but I feel like it didn't give me nearly as much of a chance to answer questions (indeed, there are almost double the questions in a traditional RfA, allowing more opportunities to answer direct questions from others) as a traditional RfA did.

Some overall reflections

I found a lot of really respectful and helpful discussion, and a lot of very well-researched points made on all sides of the RfA. I'm proud to have made it even into the discretionary range, especially with no significant content creation, which has been described on my talk page as something of a badge of honour. It's definitely something good I'm taking away from this RfA, and something I'm definitely happy about.

I absolutely didn't expect some of those in the support camp to be in there, which was really moving. I genuinely found myself moved by the amount of people who were willing to show up and opine at my RfA overall, regardless of which opinion they espoused or which side they were on. I discovered the true attraction of nominators, being frankly just someone to ask about all of this. Self-noms are great to have around, and I received a few very positive messages privately from various people with words of encouragement, but I also definitely was on my own with the answers to questions that really were outside of my depth, in hindsight.

Personally, I think the biggest part I was truly shocked by was that RfA didn't make me feel stressed so much as helpless. I spent a little while staring at RfA !votes, and sort of realizing how out of my depth I was by some of the oppose comments, while also being emboldened to keep the RfA open based on the support comments still trickling in.



Reader comments

File:Cliff divers Acapulco.jpg
Jake73
CC-by SA 3.0
50
200
500

A Deep Dive Into Wikimedia (part 3)

User Feed Me Your Skin agreed to re-publish an original guide on Wikimedia, which can be found on his personal blog, on The Signpost. This is being presented as a multi-part series of columns in this space. – Signpost Editors

Part 3.1: The Wikimedia Foundation

The role of The Wikimedia Foundation in this article has been like the foundation's role in the movement: omnipresent yet in the background. Now, I want to look at the foundation as something deserving of study on its own.

Leadership

The foundation is run by a board of trustees composed of 16 people. Under current regulations, the board is made up of Jimmy Wales, 7 people appointed by the board, and 8 people elected by the community. The board votes on resolutions, with assistance from various committees. By regulation, the committees must have members from the board, but they also accept volunteer members if the board judges them to be sufficently qualified. A list of all the committees that the board is involved with, as well as the list of trustees, can be found here. A list of board meetings and their recorded minutes can also be found here, but they seem to be incomplete.

Foundation Wiki

Unsurprisingly, the foundation has chosen to make information about itself available as a wiki. Not only does this include all the minutes and resolutions of the board, it also includes the resolutions and documentation for the Endowment and Enterprise product as well. There's nothing here that's particularly interesting if you're not a lunatic that decides to write a blog post about all of Wikimedia, except for one thing...

The Memory Bank

Recognizing the importance of preserving historical information about Wikimedia, the Foundation Wiki has a memory bank to record key events, collect stories from individuals, and develop a cohesive narrative for how Wikimedia came to be. Call this a bold claim, but I think that this will be an incredibly important resource in the future when academics begin to study Wikimedia as something deserving of study in and of itself.

Financials

So, I don't think I need to say this, but running an organization of this size and scope is pretty expensive. In the past couple of years, the Wikimedia Foundation has caught flack for perceived mismanagement and waste of funds, so I want to dedicate a section of this blog post to look at how the foundation gets and spends money.

Grants

You've probably noticed by now that there's a lot of people outside of the foundation that are doing important work, online and offline. The way they get the money to do these things is by asking the foundation for a grant. There are several different types of grants with different application processes, but they generally focus on funding people or organizations that foster Wikimedia communities, doing research on Wikimedia, working on a task aligned with the Movement Initiative, running non-profits aligned with the Wikimedia movement in sub-Saharan Africa or ESEAP, running a conference about Wikimedia, miscellaneous short-term projects, or promoting equity within the Wikimedia community. The last category started in response to the BLM protests of 2020, which made it immediately controversial for its perceived politicization of Wikimedia.

The application process differs across the various grant categories, but the general process is submit your proposal for review by the community at large, which will be used by the relevant committee of volunteers and designated Wikimedia employees to determine if your proposal should be accepted. Grants given to promote equity are a notably opaque exception to an otherwise exceptionally transparent process, which is deeply concerning given that these grants are typically 6 figures while other grants usually max out at 5 or even 4 figures. Notably, many of the other grants *also* support equity, which makes a dedicated equity fund even more odd. As we'll see later, these grants are a major source of controversy for critics of the foundation's spending.

Fundraising Banners

If you use Wikipedia without adblock or disabling them in your account settings, you've probably noticed the banner ads begging you to donate to keep the website running. Despite the ads, Wikipedia isn't in danger of running out of money anytime soon. Instead, the need for funding is for all the other things. These ads have gotten some heated criticism, and for good reason. Even as an unapologetic Wikimedia fan, I have to admit that it's scummy to act as if you're going to run out of money to get donations from ordinary people. To make matters worse, an executive director admitted that money isn't a limiting factor for the foundation all the way back in 2012. As far as I can tell, there isn't really a coherent reason for why the foundation decided to seriously harm their reputation with these misleading banners. Is it as scummy as selling user data for profit? Eh, probably not, but that doesn't make it okay. Even if it gets them funds in the short term, in the long term, this strategy alienates people and potentially harms the foundation's ability to secure funding in the future.

The Endowment

For its part, the Wikimedia Foundation has been working on reducing its dependence on donations. Part of this has been the Wikimedia Endowment, which generates revenue for the foundation to use indefinitely. The endowment was launched in 2016 with the goal of raising $100,000,000, which it did ahead of schedule in 2021. However, it reached its goal by receiving substantial donations from the Wikimedia Foundation itself, which made it a burden that caused the foundation to spend so much to begin with.

Wikimedia Enterprise

Another way that the foundation is reducing its dependence on donations is through a product called Wikimedia Enterprise. This provides APIs for accessing data from Wikimedia projects, like snapshots of an entire project or the contents of an entire article at any moment. To be blunt, there isn't any real direct benefit to using these APIs, since they can be easily replicated by a scraper or downloading from the free data dumps. The true reason why somebody would want to use these APIs is because they come with official support, which at the very least lets you CYA if something goes wrong and your boss wants answers. The product made $3,200,000 in 2022-2023, which makes up almost 2% of Wikimedia's revenue.

The Wikipedia Store

A more mundane way for the foundation to make money is by selling merch. Despite technically being for the movement as a whole, it almost exclusively sells merch with Wikipedia labelling, which is an awful snub to the other projects that desperately need name recognition. The store even has its own travel section, but for some reason, nobody thought to sell some stuff that has Wikivoyage's symbol printed on it. That may sound like a small thing to get worked up over, but for somebody that's immersed themselves into every project for this blog post, the bias towards Wikipedia is genuinely appalling.

Does Wikipedia Have Cancer?

Whenever talking about Wikimedia's financials, it's difficult not to mention a famous essay called Wikipedia Has Cancer. It's worth reading the essay in its entirety, but the gist of it is that the Wikimedia Foundation has been rapidly increasing its spending year after year without any obvious need to spend more money. The concern is that one day, the foundation will start to spend money that it doesn't have, harming or possibly even destroying the movement. You should draw your own conclusion about this essay, but here are my thoughts. First of all, deciding that the Wikimedia Foundation has nothing to do because Wikipedia is mostly the same as when the foundation started is simply wrong. Even back in 2005, which the author lists as a time when the Wikimedia Foundation was "healthy", the movement already included Wiktionary, Wikibooks, Wikiquote, Wikisource, Wikimedia Commons, Wikispecies, and Wikinews. The mission was always to make the world's information globally accessible through the internet, and Wikipedia is just one part of that.

Also, the cost of running several of the largest websites (one of which is in the top 10 most visited websites in the world) doesn't scale linearly with usage. Instead, there's overhead from having to hire people to manage data centres, and even more overhead from having to hire managers, HR, and people to do fundraising to pay for all these salaries. I'm also not convinced by the concern that the Wikimedia Foundation will take on debt to fund the ever-increasing spending if the fundraising dries up. It wouldn't be pleasant, but the Wikimedia Foundation can tighten its belt, lay off some people, turn down grant applications. It's never said exactly why the Wikimedia Foundation would take on debt, but I think that the author is assuming that the increased spending is because of incompetence, not because of self-contained programs that can be shut down in hard times without affecting the main projects. Even though I have a pretty negative view of this essay, I still think it brings up a very important point about the scope of the Wikimedia Foundation's vision. A huge portion of its spending is towards grants, which doesn't directly help the main projects. The expenses for the foundation is incredibly large, and it's worthwhile to question if the foundation is growing too fast.

The Impact Of Spending

I've spent quite a bit of time listing all the things that the Wikimedia Foundation spends money on, but I didn't go that much into impact. While there's less information on how much certain features are used, there's enough to paint a pretty pessimistic picture. Quarry has had less than 9000 users in it's entire lifespan at the time of writing, outreach campaigns rarely get more than 1000 editors participating, and there are entire projects that feel borderline unusable. Is this worth the money that's spent on it? I genuinely don't know. For starters, it's often not really possible to get a good estimate on how much things even cost. For example, Quarry and PAWS were both developed by a very talented user called Yuvipanda, but were then worked on by actual staff members as part of their job. There's no good way to quantify the amount of money that indirectly went into the project because of the foundation's support. Another thing is that impact isn't easily defined just by the number of people who use the product. ClueNG is only used by Wikipedia admins, but its impact is felt by everyone. At the very least, it's okay to spend money while revenue keeps increasing. Unfortunately, when people see this amount of spending without a good explanation, it feels like the foundation is just throwing money away. To a small degree, when you're a foundation this large, that actually does happen because there's always some level of false negative when trying to weed out scammers. In 2022, there was a viral Twitter thread where somebody discovered that Wikimedia gave a fairly substantial grant to an organization who made questionable content as part of its drive for racial equity, which in turn gave a grant to a person who did an experiment that accidentally killed a few octopi. This was then followed by countless people saying that they were no longer going to donate to Wikipedia because they were outraged over this misuse of money.

It would be easy to criticize these people for unquestioningly accept what a random person on Twitter told them, but that misses the point. People aren't *just* getting fed up with Wikipedia because of a random person on Twitter or "Wikipedia Has Cancer", they're getting fed up with Wikipedia because these match up with their very legitimate impression that Wikipedia is the exact same now as it was 10 years ago. Regardless of whether or not the money that the Wikimedia Foundation is spending has enough impact to justify the costs, users certainly have to believe that it does. Because the foundation has failed to effectively state the breadth and importance of the Wikimedia movement, they're facing backlash, which ironically affects their ability to get donations in the future.

Transparency

A major complaint from the essay was that the foundation isn't sufficiently transparent about how the money they get is spent. Particularly, the author feels that the foundation doesn't do enough to explain what the grants are and why they're given, even though the grants are one of the foundation's largest expenses. However, the foundation publishes their annual plan and a Q&A about their audit every year. Because of its open nature, even small expenses can often be found if you look for them, though it's difficult because of the massive breadth of Wikimedia (should have thought about that before writing this blog post...). To the author's credit, the Wikimedia Foundation could compile some of this information themselves and provide it to the community for better understanding. To the foundation's credit, they already go above and beyond when it comes to transparency by publishing as much financial information as possible and doing Q&A's about their finances. The fact that they allow for the community to actually get involved with the grant approval process is also incredibly unusual and totally solves the author's complaint that the community doesn't have enough say in how the foundation's money is spent.

Part 3.2: The Other Groups

Besides the foundation proper, there's a whole cottage industry of organizations that are centred around the Wikimedia movement. These groups usually either represent the interests of certain geographic areas (most commonly entire countries) or exist to ensure accurate and exhaustive information about a particular subject (e.g. art). While it's difficult to quantify and research them all, there's many organizations that are officially approved by the Affiliations Committee. Strictly speaking, you don't actually need to be recognized to form a group focused on Wikimedia, but recognition provides you the right to use official trademarks and receive funding from the foundation. Since these organizations can be considered part of the Wikimedia movement, and virtually all of them have a significant presence on at least one project, they deserve to be discussed.

User Groups

These are the most basic kind of affiliates. The only requirements to join are to have at least 3 active editors that have an account with more than 500 edits on a Wikimedia project (800 if it's Wikidata), at least 10 members overall, and to agree to the code of conduct. Even these requirements can be waived in the application. Because of its flexible structure, it's the most common type of Wikimedia affiliation. The downside is that they only get to use Wikimedia trademarks and are only eligible for the Community grant.

Chapters

These organizations are dedicated to geographical regions, most commonly entire countries. Not only do they get all the benefits of being a user group, they also get Wikimedia merchandise to hand out, access to the Annual Plan grant, and scholarships to send 2 members to Wikimedia, among other things. The downside is that the requirements are much more stringent. They have to be legally incorporated, show that their motives are in line with the movement's mission, have at least 10 editors with accounts over 6 months old that have 300 edits to a single project, and have already existed for 2 years, preferably as a user group.

Thematic Organizations

These affiliates are very similar to chapters in what they get and what they have to do. However, instead of being focused on a geographical region, these organizations are focused on an overarching thematic purpose. There's only 2 of these affiliates: Amical Wikimedia, which focuses on the Catalan language and culture, and the Wiki Project Med Foundation, which is focused on providing accurate medical information on the Wikimedia projects.

Reporting

Chapters and thematic organizations have to provide annual financial reports, while all three types of affiliates are expected to provide annual reports about their operations to make sure that they're still contributing to the Wikimedia mission. As part of Wikimedia's undying commitment to transparency, all of these reports are made available for public viewing. This is the main hub for people who want to learn more about the extended Wikimedia movement and how they interact with the official projects.

What They Do

There's undoubtedly some really cool things that are going on in the affiliated organizations, but I simply can't do a deep dive into all of them. There's almost 200 organizations, and many of them aren't even in English. Instead, I've lightly browsed each and every one of them, looked closer at them if I see anything particularly interesting, and reported it here. As for the ones that I don't mention here, most of what they do is host edit-a-thons, workshops, and sometimes collaborate with educational institutions to have Wikimedia projects used in the classroom.

Wikimedia Sverige

This is the chapter that represents all of Sweden. Despite the many things that they do to foster the Swedish Wikimedia community, I feel like the most interesting thing that they're currently working on is Wikispeech, a text-to-speech MediaWiki plugin. To be frank, it's not that good at the current moment, but once fully developed, it could help many people with literacy problems to access the Wikimedia projects.

Wikimedia Deutschland

This chapter, which represents Germany, is unusually political. Whereas many organizations try to maintain an air of neutrality, Wikimedia Deutschland meticulously lists the various ways that it represents the Wikimedia movement at the national and continental level. While I can only go off of what they say, it seems like their intense lobbying efforts have led to real improvements in how the government treats access to knowledge as a human right. Additionally, this chapter is large enough to have their own funding program to foster various initiatives, although I can't tell you anything about the initiatives themselves because all the information is in German. While it can be hard to quantify the impact that a group has, it seems like Wikimedia Deutschland is likely the most powerful organization in the extended Wikimedia movement.

Wikimedia Tunisie

The chapter that represents Tunisia has collaborated with the University of Sfax to create the Data Engineering and Semantics Research Unit, which does research on knowledge graphs like Wikidata.

Wikimedia Turkey

Somebody from the chapter representing Turkey made a 10 video course teaching beginners how to edit Wikipedia. The audio is in Turkish, but it has English subtitles.

Wikimedia Morocco

Besides contributing content, this user group has a project to improve the organization's technological capacities. So far, the main focus has been to develop bots that automate repetitive tasks for Wikis in Moroccan languages.

H-GAPS User Group

Unlike most user groups, these people are chiefly focused on Wikiversity. Their main mission is to make accurate information about psychological science freely accessible through the Wikimedia movement.

Wikimedia France

This organization runs a project called Lingua Libre, which is dedicated to recording and preserving audio of people speaking. The goal is to preserve endangered languages by creating databases that people can study from. That being said, they also welcome audio recordings of people speaking common languages as well.

MediaWiki Stakeholders' Group

Unlike virtually every other user group, the MediaWiki Stakeholders' Group isn't active on any of the Wikimedia projects. In fact, it's explicitly for people *outside* of the WMF and the projects. The mission is to work with developers and users from outside of Wikimedia to guide the development of MediaWiki in a way that doesn't hurt the user experience for non-Wikimedia websites built with the software. They also run a website that tracks every 3rd party website that uses MediaWiki, but it wasn't working at the time of writing this article. Predictably, this user group hosts information about itself on a wiki built with MediaWiki.

Wiki Project Med Foundation

I talked about them briefly before, but this organization is surprisingly complex on its own. They have several different projects and get funding from groups besides the Wikimedia Foundation, such as the World Health Organization. If I were to describe them, I would say that they're a group aspiring to create a version of the Wikimedia movement that's entirely focused on medical knowledge. They're surprisingly obscure for who they are, so I decided to do a deep dive into the work that they do.

MDWiki

MDWiki is the organization's equivalent to Wikipedia. As you probably guessed, this website is heavily focused on medicine and topics adjacent to medicine, like biology and funding. As with Wikipedia, there's also a namespace to handle articles written about the website itself and how it's run. Unlike Wikipedia, or any of the Wikimedia projects, you have to apply for the right to edit pages, presumably to prevent any trolls looking to cause potentially fatal vandalism. The process doesn't seem too intense, but they do reject people from time to time.

NC Commons

This is Wikimedia Medicine's answer to Wikimedia Commons. It's a repository for medical images that are NC or ND licensed instead of being under a truly free license. In other words, NC Commons is how Wikimedia Medicine handles medical images that can't be stored on Wikimedia Commons.

The App

In collaboration with Wikimedia Switzerland, Wikimedia Medicine has a mobile app called Medical Wikipedia to allow offline access to health information from both MDWiki and Wikipedia. This is essentially the same as getting a data dump and viewing it with Kiwix (which is what the app is built on). The app size depends on what language you download and whether or not you want video, but several versions clock in at over 1 GB. The focus of the app is obviously on mobile usage, but there's also mirrors for Windows and Linux, while iOS users have to download the ZIM file and just use Kiwix directly.

Internet-In-A-Box

If you don't have good enough internet access to download the app, Wikimedia Medicine also sells an Internet-in-a-box. This is a small device that serves as a hotspot for people within a 100 m radius so that people can access Wikipedia and medical content. You can purchase one from Wikimedia Medicine, but that could take several months to ship. You can also build one, but the instructions are pretty barebones. Better guides can be found online, and the guide does link to them.

VideoWiki

For quite a while, many people in the Wikimedia movement have wanted an equivalent of Wikipedia that used video and audio instead of just text, in no small part to make information available to illiterate users. Enter VideoWiki. While this project was created by a user called Pratik.pks, the project has since been adopted by Wikimedia Medicine, who run it to this day. The project uses Wikipedia as the editing platform and Wikimedia Commons as the source for background visuals, as described in the tutorial. This is still in beta, but it could be fairly big if it's completed.

Research And Outreach

Like the Wikimedia Foundation proper, Wikimedia Medicine conducts research on itself. MDWiki has an entire page on research conducted by members of the organization on Wikipedia and Wikimedia Medicine's ability to democratize knowledge, but ironically, most of these articles are paywalled. Besides conducting research, Wikimedia Medicine also tries to spread information through the educational system as part of WikiEducation. Besides educational material, Wikimedia Medicine has also done important work advising major health organizations about the potential of the Wikimedia movement in medicine.


Next month: Part 4, The Future of Wikimedia and Conclusion.



Reader comments

File:Punch (1841) (14794122833) (cropped).jpg

Hamburgers

Placeholder alt text

"You know, I can't help but notice that these hamburgers contain many paragraphs that are word-for-word identical to the ones served at the offline restaurant you cited in the references section."



Reader comments

If articles have been updated, you may need to refresh the single-page edition.