Module talk:DecodeEncode
Bug report: bad decoding of U+03B5 ε (epsilon)
About U+03B5 ε GREEK SMALL LETTER EPSILON (ε ε)
- Issue: after resolving HTML entity
ε
bymw.text.decode()
, the plain character is not found bymw.ustring.gsub()
. No issue with alternative HTML entityε
. ε good, ε bad.
- Report limitations: Original report and bug reproduction is at enwiki Module talk:DecodeEncode, from where en:module:DecodeEncode and en:module:String are used live. At phabricator pseudocode may be used and some "results" may be hardcoded. In-text the escape
&
is used, not in-function. Lua patterns not used ("no%
").
- To reproduce:
- 1. Create research string:
Xε1Xε2X
(shows live and unedited as: Xε1Xε2X)
- 2. Render the string by
decode()
(as inner function) - 3. then on rendered result use
gsub()
to replace plain characterε
→E
: (as outer function)mw.ustring.gsub( s=(
[is pseudo-code, see note. 21:10, 7 February 2023 (UTC)]mw.text.decode( s=Xε1Xε2X, decodeNamedEntities=true )
), pattern=ε, repl=E )
- 4. Result3 (s&r pattern use ε from
Xε1X
):- XE1XE2X
- 5. Result4 (s&r pattern use ε from
Xε2X
):- XE1XE2X
- Expected:
XE1XE2X
(only one characterε
exists)
- Note 21:10, 7 February 2023 (UTC): This step 3 is in pseudo-code. To reproduce, use Lua modules module:String and Module:DecodeEncode:
{{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}
- → XE1XE2X
- -DePiep (talk) 21:10, 7 February 2023 (UTC)
Workaround A, ad hoc
Workaround A, ad hoc: add innermost function to first replace in the research string ε
→ ε
:
- A1:
{{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s={{#invoke:String|replace|source=Xε1Xε2X|pattern=ε|replace=ε|plain=true}}}}|pattern=ε|replace=E|plain=true}}
→ - XE1XE2X
Workaround B, in module (THIN SPACE example)
Workaround B: early in :en:module:DecodeEncode, replace ε
→ ε
About THIN SPACE: it looks like character U+2009 THIN SPACE (   ) has a samilar issue.   good,   bad.
Currently in code:
function p._decode( s, subset_only )
local ret = nil;
s = mw.ustring.gsub( s, ' ', ' ' ) -- Workaround for bug:   gets properly decoded in decode, but   doesn't.
ret = mw.text.decode( s, not subset_only )
return ret
end
In en:module:DecodeEncode/sandbox, I have coded a similar handling of EPSILON:
function p._decode( s, subset_only )
local ret = nil;
-- U+2009 THIN SPACE: workaround for bug: HTML entity   is decoded incorrect. Entity   gets decoded properly
s = mw.ustring.gsub( s, ' ', ' ' )
-- U+03B5 ε GREEK SMALL LETTER EPSILON: workaround for bug (phab:T328840): HTML entity ε is decoded incorrect for gsub(). Entity ε gets decoded properly
s = mw.ustring.gsub( s, 'ε', 'ε' )
ret = mw.text.decode( s, not subset_only )
return ret
end
- /sandbox tests:
- B.
{{#invoke:String|replace|source={{#invoke:DecodeEncode/sandbox|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}
- B1. ResultB1 (s&r pattern use ε from
Xε1X
): XE1XE2X - B2. ResultB2 (s&r pattern use ε from
Xε2X
): XE1XE2X
I propose to edit the module along this way.
Workaround C (mw, Lua)
Changes in mw, Lua: I have not idea.
- I propose to consider module editing along § Workaround B. -DePiep (talk) 12:26, 4 February 2023 (UTC)
testcases EPSILON
EPSILON ε ⟨ε ⟩ error & fix proposal (16 Feb 2023)
| |||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 |
id | entity code | plain | mod:.. decode(&entity;) | replace(decode(..)) with E pattern=hardcoded ⟨ε⟩ from plain (s=&entity;) (s=checkstring) |
mod:..decode/sandbox |
checkstring | Xε1Xε2X
|
>Xε1Xε2X< | >Xε1Xε2X< | ||
EPSI | ε
|
>ε< | >ε< | E XE1XE2X |
E XE1XE2X |
EPSILON | ε
|
>ε< | >ε< | E XE1XE2X ![]() |
E XE1XE2X |
- See § Workaround B, in module (THIN SPACE example) for code change;
- Similar fix as U+2009 THIN SPACE ( ,  ) has (though original cause bug may be different for THIN SPACE).
- Phabricator T328840 did not gain traction. Would be mw-level, not this module.
Template-protected edit request on 16 February 2023
![]() | It is requested that an edit be made to the template-protected module at Module:DecodeEncode. (edit · history · last · links · sandbox · edit sandbox · sandbox history · sandbox last edit · sandbox diff · transclusion count · protection log) This template must be followed by a complete and specific description of the request, so that an editor unfamiliar with the subject matter could complete the requested edit immediately.
Edit requests to template-protected pages should only be used for edits that are either uncontroversial or supported by consensus. If the proposed edit might be controversial, discuss it on the protected page's talk page before using this template. Consider making changes first to the module's sandbox before submitting an edit request. To request that a page be protected or unprotected, make a protection request. When the request has been completed or denied, please add the |
- Please copy all code from module:DecodeEncode/sandbox into module:DecodeEncode (diff)
- Issue: bad decoding of HTML entity
ε
- re U+03B5 ε GREEK SMALL LETTER EPSILON (ε, ε)
- Change: fix by replacing with entity
ε
before applying
decode()
. See § Workaround B for code diff & backgrounds; minor comment change - Discussion: (1) reported at T328840, no responses (mw-level); (2) bug report here not challenged
- Testcases: See § testcases EPSILON.
- DePiep (talk) 06:49, 16 February 2023 (UTC)