Help talk:Translation/Machine translation errors
Feedback and examples needed
[edit]Thinking about some people who might know about examples of seriously problematic MT from their best language(s) into English, that could be helpful if added to the worksheet here. Off the top of my head: @Yngvadottir, Kwamikagami, Austronesier, Nardog, Largoplazo, DreamRimmer, Elinruby, Scope creep, R Prazeres, Piotrus, Lambiam, DoubleGrazing, Borsoka, Jules*, Florian Blaschke, SergeWoodzing, and Gerda Arendt:.
If you got pinged and are mystified, in a nutshell, this is a small project to gather data about mistakes made by automatic translation programs translating into English, and making serious errors of fact in the process. The goal is ultimately to influence guidelines about who may or may not use MT and under what circumstances, but before we can present a case for changing policy, we need supporting data. If you can help by adding an example or two as you come across them, that would be much appreciated. I don't believe I pinged anyone with Russian (or other Slavic) languages or CJKV, and I would also like to add users with knowledge of Arabic, Farsi, Indonesian, and South Asian languages, so if you know anyone who fits the bill, please ping or make suggestions. Thanks! Mathglot (talk) 01:07, 17 May 2025 (UTC)
- S Marshall can probably add some from the translations using the WMF's translation tool.
- There's a little collection of machine-mediated gobbledygook at the bottom of my user page. I think
The side wings of the parochial road are stralauer piereced as the triaxial side projects on to the facades of the Jews and the monastery road.
from Altes Stadthaus, Berlin might be what you're looking for. Introduced here during building of the article, although "piereced" and other spelling errors like "Berliner Zeritung" for "Berliner Zeitung" suggest the editor was copying rather than cutting and pasting. The passage in the German article as it was then readDie Seitenflügel an der Parochial- und Stralauer Straße durchstoßen als dreiachsige Seitenrisaliten die Fassaden an der Jüden- und der Klosterstraße.
A better translation would be: "The wings along Parochialstraße and Stralauer Straße terminated in triaxial avant-corps that projected from the primary façades facing Jüdenstraße and Klosterstraße." Yngvadottir (talk) 03:19, 17 May 2025 (UTC)- Yngvadottir, thanks for the feedback. Gobbledygook has the advantage of being obvious to anyone, bilingual or not, as problematic content in need of attention or removal. What we *really* need, is examples of the non-obvious stuff that will sneak under the radar and not be recognized, especially by monolinguals. The real danger is not gobbledygook, but translated English prose that is beautiful, but happens to be factually wrong, having turned a verifiable fact in German into an unverifiable falsehood in English. This is the kind of evidence that, imho, will help us militate for a change in MT guidelines, which is the goal of this project. Mathglot (talk) 07:52, 17 May 2025 (UTC)
- Since we last discussed this, I've tried to find examples, but so far no joy. I had a couple of editors in mind who were producing Finnish > English translations, and I think (thought) they were using MT to do this, but the articles I checked were quite well translated, so either they're doing it manually, or the MT is delivering good results. Will keep looking... -- DoubleGrazing (talk) 05:39, 17 May 2025 (UTC)
- DoubleGrazing, thanks, and not to worry. I think this is one of those slow-moving things where you can't really elicit examples when you want them, they just pop out when you least expect it, and the thing to do is trap them right then when they happen. The natural thing to do, of course, is to just shake one's head at the mistake and move on, but the hope is that with this appeal, now that folks know that we *really need* these examples, that instead of moving on, they will pause and record them, and this is the place to do it. At least, I hope so. Mathglot (talk) 07:44, 17 May 2025 (UTC)
- Argh! There were some absolute corkers in the WMF machine translation disaster, but the ones I found were all deleted (either under WP:CSD#X2, or via PROD, which I also used extensively during the cleanup). It's been eight years and a lot's gone on, and I just don't remember the specifics any more.—S Marshall T/C 08:27, 17 May 2025 (UTC)
- I have been using machine translation from Polish to English for years. These days the errors are miniscule; proofreading is still needed, of course, due to occasional false friends and such. That said, I don't bother with WMF tools, they are simply unwieldy and too often refuse to publish the complete translation due to some random code issues or local project policies or whatever. I mostly use Google Translate, occasionally ChatGPT these days. Piotr Konieczny aka Prokonsul Piotrus| reply here 12:42, 17 May 2025 (UTC)
- Yes, translation from Indo-European languages to English is definitely the least bad form of machine translation. In the hands of someone with dual fluency who checks the output, it's fine to use.—S Marshall T/C 17:17, 17 May 2025 (UTC)
- I wish I could help with this, but I'm afraid (or glad?) I haven't run across the kind of errors specified. If I do, I will try to get in touch at that time. Good luck! It's a very good cause, when popes from Chicacgo are AI'd into long shocking video statements speaking "English" with a heavy Italian accent and "John Lennon" sings an AI ditty looking more like JFK Jr than himself (we old folks know the difference). Best wishes, --SergeWoodzing (talk) 18:37, 17 May 2025 (UTC)
- S Marshall, you just nailed in a sentence what should be our policy on MT, imho. Let me restate, with emphasis: for "someone with dual fluency who checks the output, it's fine to use", and for anyone else, it is not fine (reworded in appropriate policy language). Thanks, Mathglot (talk) 07:00, 21 May 2025 (UTC)
- Aye. Common problem is that some folks will be lazy and won't check. (Happens a lot with my students). Piotr Konieczny aka Prokonsul Piotrus| reply here 01:56, 22 May 2025 (UTC)
- True, but that puts it into a different bucket so to speak as far as warnings and blocks. A bilingual student who can read the SL (and SL refs) should be allowed to use MT, and if they have not checked it, then it falls into the already existing path of adding unsourced/invalidly sourced content, which is a burden for others to double-check, and will likely not be checked (or only much later, if ever) and is blockable if they keep doing it after a warning. Under current guidelines, a monolingual follows in the same pipeline, and can only be warned/blocked when their faulty translation that they *cannot check* is caught (rarely, as before). But it should not be that way, as that is a profligate waste of our precious resource (editors who check sources), and they should be stopped sooner in the pipeline, by a guideline that warns/blocks them for using MT in the first place. Currently, the lack of teeth in the guideline implies, "Please don't, but if you do you won't be sanctioned for it, and anyway don't worry, because we have other users who are bilingual who will check your work." This flips WP:Verifiability on its head. Recall:
- All content must be verifiable. The burden to demonstrate verifiability lies with the editor who adds or restores material, and it is satisfied by providing an inline citation to a reliable source that directly supports[a] the contribution.[b]
- If a user *cannot demonstrate verifiability* even when challenged to do so, then they have no business adding the content in the first place, otherwise the burden has in reality switched from them to everybody else to verify it. This is unfair to other editors, and is contrary to the policy. By my reading, MT is already banned to monolinguals by verifiability, and this should be called out specifically in the MT guideline, linking WP:BURDEN. Mathglot (talk) 04:46, 22 May 2025 (UTC)
- True, but that puts it into a different bucket so to speak as far as warnings and blocks. A bilingual student who can read the SL (and SL refs) should be allowed to use MT, and if they have not checked it, then it falls into the already existing path of adding unsourced/invalidly sourced content, which is a burden for others to double-check, and will likely not be checked (or only much later, if ever) and is blockable if they keep doing it after a warning. Under current guidelines, a monolingual follows in the same pipeline, and can only be warned/blocked when their faulty translation that they *cannot check* is caught (rarely, as before). But it should not be that way, as that is a profligate waste of our precious resource (editors who check sources), and they should be stopped sooner in the pipeline, by a guideline that warns/blocks them for using MT in the first place. Currently, the lack of teeth in the guideline implies, "Please don't, but if you do you won't be sanctioned for it, and anyway don't worry, because we have other users who are bilingual who will check your work." This flips WP:Verifiability on its head. Recall:
- Aye. Common problem is that some folks will be lazy and won't check. (Happens a lot with my students). Piotr Konieczny aka Prokonsul Piotrus| reply here 01:56, 22 May 2025 (UTC)
- Yes, translation from Indo-European languages to English is definitely the least bad form of machine translation. In the hands of someone with dual fluency who checks the output, it's fine to use.—S Marshall T/C 17:17, 17 May 2025 (UTC)