https://en.wikipedia.org/w/index.php?action=history&feed=atom&title=Module%3ASandbox%2FErutuon%2FUnicode
Module:Sandbox/Erutuon/Unicode - Revision history
2025-06-03T18:32:10Z
Revision history for this page on the wiki
MediaWiki 1.45.0-wmf.3
https://en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&diff=885441693&oldid=prev
Erutuon: Module:Table → Module:TableTools
2019-02-28T01:35:22Z
<p><a href="/w/index.php?title=Module:Table&action=edit&redlink=1" class="new" title="Module:Table (page does not exist)">Module:Table</a> → <a href="/wiki/Module:TableTools" title="Module:TableTools">Module:TableTools</a></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 01:35, 28 February 2019</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 70:</td>
<td colspan="2" class="diff-lineno">Line 70:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local fun = require "Module:Fun"</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local fun = require "Module:Fun"</div></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>local m_table = require "Module:<del style="font-weight: bold; text-decoration: none;">Table</del>"</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>local m_table = require "Module:<ins style="font-weight: bold; text-decoration: none;">TableTools</ins>"</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local script_to_count_mt = {</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local script_to_count_mt = {</div></td>
</tr>
</table>
Erutuon
https://en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&diff=857167856&oldid=prev
Erutuon: fix errorf
2018-08-30T00:29:26Z
<p>fix errorf</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 00:29, 30 August 2018</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 4:</td>
<td colspan="2" class="diff-lineno">Line 4:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local function errorf(level, ...)</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local function errorf(level, ...)</div></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> if type(level) == number then</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> if type(level) == <ins style="font-weight: bold; text-decoration: none;">"</ins>number<ins style="font-weight: bold; text-decoration: none;">"</ins> then</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> return error(string.format(...), level + 1)</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> return error(string.format(...), level + 1)</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> else -- level is actually the format string.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> else -- level is actually the format string.</div></td>
</tr>
</table>
Erutuon
https://en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&diff=849319617&oldid=prev
Erutuon: removed language tag stuff
2018-07-08T05:11:56Z
<p>removed language tag stuff</p>
<a href="//en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&diff=849319617&oldid=849319578">Show changes</a>
Erutuon
https://en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&diff=849319578&oldid=prev
Erutuon: copied from Module:Sandbox/Erutuon
2018-07-08T05:11:29Z
<p>copied from <a href="/wiki/Module:Sandbox/Erutuon" title="Module:Sandbox/Erutuon">Module:Sandbox/Erutuon</a></p>
<p><b>New page</b></p><div>local p = {}<br />
<br />
local Unicode_data = require "Module:Unicode data/sandbox"<br />
<br />
local function errorf(level, ...)<br />
if type(level) == number then<br />
return error(string.format(...), level + 1)<br />
else -- level is actually the format string.<br />
return error(string.format(level, ...), 2)<br />
end<br />
end<br />
<br />
function mw.logf(...)<br />
return mw.log(string.format(...))<br />
end<br />
<br />
local output_mt = {}<br />
function output_mt:insert(str)<br />
self.n = self.n + 1<br />
self[self.n] = str<br />
end<br />
<br />
-- also in [[Module:Unicode data/documentation functions]]<br />
function output_mt:insert_format(...)<br />
self:insert(string.format(...))<br />
end<br />
<br />
output_mt.join = table.concat<br />
<br />
output_mt.__index = output_mt<br />
<br />
local function Output()<br />
return setmetatable({ n = 0 }, output_mt)<br />
end<br />
<br />
<br />
local Latn_pattern = table.concat {<br />
"[",<br />
"\n\32-\127",<br />
"\194\160-\194\172",<br />
"\195\128-\195\191",<br />
"\196\128-\197\191",<br />
"\198\128-\201\143",<br />
"\225\184\128-\225\187\191",<br />
"\226\177\160-\226\177\191",<br />
"\234\156\160-\234\159\191",<br />
"\234\172\176-\234\173\175",<br />
"\239\172\128-\239\172\134",<br />
"\239\188\129-\239\188\188",<br />
"–",<br />
"—",<br />
"«", "»",<br />
"]",<br />
};<br />
<br />
local get_codepoint = mw.ustring.codepoint<br />
local function expand_range(start, ending)<br />
local lower, higher = get_codepoint(start), get_codepoint(ending)<br />
if higher < lower then<br />
return nil<br />
end<br />
local chars = {}<br />
local i = 0<br />
for codepoint = lower, higher do<br />
i = i + 1<br />
chars[i] = mw.ustring.char(codepoint)<br />
end<br />
return table.concat(chars)<br />
end<br />
<br />
local fun = require "Module:Fun"<br />
local m_table = require "Module:Table"<br />
<br />
local script_to_count_mt = {<br />
__index = function (self, key)<br />
self[key] = 0<br />
return 0<br />
end,<br />
__call = function (self, ...)<br />
return setmetatable({}, self)<br />
end<br />
}<br />
setmetatable(script_to_count_mt, script_to_count_mt)<br />
<br />
-- Uses an iterator (such as mw.ustring.gcodepoint) that generates a codepoint<br />
-- each time it is called with an optional state and another value.<br />
local function show_scripts(iterator, state, value)<br />
local script_to_count = script_to_count_mt()<br />
for codepoint in iterator, state, value do<br />
local script = Unicode_data.lookup_script(codepoint)<br />
script_to_count[script] = script_to_count[script] + 1<br />
end<br />
return table.concat(<br />
fun.mapIter(<br />
function (count, script)<br />
return ("%s (%d)"):format(script, count)<br />
end,<br />
m_table.sortedPairs(<br />
script_to_count,<br />
function (script1, script2)<br />
return script_to_count[script1] > script_to_count[script2]<br />
end)),<br />
", ")<br />
end<br />
<br />
local function get_chars_in_scripts(iterator, state, value)<br />
local script_to_char_set = {}<br />
for codepoint in iterator, state, value do<br />
local script = Unicode_data.lookup_script(codepoint)<br />
script_to_char_set[script] = script_to_char_set[script] or {}<br />
script_to_char_set[script][codepoint] = true<br />
end<br />
<br />
return script_to_char_set<br />
end<br />
<br />
local function print_char_set_map(script_to_char_set, format, separator)<br />
format = format or "%s: %s"<br />
separator = separator or "\n"<br />
return table.concat(<br />
fun.mapIter(<br />
function (char_set, script)<br />
local char_list = fun.mapIter(<br />
function (_, codepoint)<br />
return mw.ustring.char(codepoint)<br />
end,<br />
m_table.sortedPairs(char_set))<br />
return (format):format(script, mw.text.nowiki(table.concat(char_list)))<br />
end,<br />
m_table.sortedPairs(script_to_char_set)),<br />
separator)<br />
end<br />
<br />
function p.show(frame)<br />
local expanded_pattern = Latn_pattern<br />
:gsub("%[(.-)%]", "%1")<br />
:gsub( -- Find two UTF-8-encoded characters separated by hyphen-minus.<br />
"([%z\1-\127\194-\244][\128-\191]*)%-([%z\1-\127\194-\244][\128-\191]*)",<br />
function (char1, char2)<br />
return expand_range(char1, char2)<br />
end)<br />
<br />
return ('* <div style="overflow-wrap: break-word;">%s</div><br>%s')<br />
:format(expanded_pattern<br />
:gsub("^%s*", ""), -- Remove initial "\n " to avoid creating unwanted pre element.<br />
show_scripts(mw.ustring.gcodepoint(expanded_pattern)))<br />
end<br />
<br />
local function get_block_info_from_arg(args, arg)<br />
local block_name = args[1]<br />
or errorf("Parameter %s is required", tostring(arg))<br />
<br />
local block_info = Unicode_data.get_block_info(block_name)<br />
or errorf("The block '%s' could be found", block_name)<br />
<br />
return block_info<br />
end<br />
<br />
local function get_boolean_from_arg(args, arg)<br />
return args[arg] and require "Module:Yesno" (args[arg])<br />
end<br />
<br />
function p.scripts_in_block(frame)<br />
local block_info = get_block_info_from_arg(frame.args, 1)<br />
local show_block_name = get_boolean_from_arg(frame.args, 2)<br />
local script_list = show_scripts(fun.range(block_info[1], block_info[2]))<br />
if show_block_name then<br />
return ("%s: %s"):format(block_info[3], script_list)<br />
else<br />
return script_list<br />
end<br />
end<br />
<br />
local function link_block_name(block_name)<br />
if block_name:find " " then<br />
return ("[[%s]]"):format(block_name)<br />
else<br />
return ("[[%s (Unicode block)|%s]]"):format(block_name, block_name)<br />
end<br />
end<br />
<br />
function p.scripts_in_blocks(frame)<br />
local output = Output()<br />
local start = frame.args[1] and tonumber(frame.args[1], 16) or 0<br />
local ending = frame.args[2] and tonumber(frame.args[2], 16) or 0x4000<br />
<br />
local script_data = mw.loadData "Module:Unicode data/scripts"<br />
local singles = script_data.singles<br />
local ranges = script_data.ranges<br />
<br />
local function clear (self)<br />
for _, key in ipairs(m_table.keysToList(self, false)) do<br />
self[key] = nil<br />
end<br />
end<br />
<br />
local counts = {}<br />
setmetatable(counts, {<br />
__index = {<br />
increment = function(self, script_code, amount)<br />
self[script_code] = (self[script_code] or 0) + (amount or 1)<br />
end,<br />
clear = clear,<br />
}<br />
})<br />
local codepoints_per_script = {}<br />
setmetatable(codepoints_per_script, {<br />
__index = {<br />
add = function(self, script_code, codepoint)<br />
self[script_code] = self[script_code] or { n = 0 }<br />
if self[script_code].n <= 0x20<br />
and not (codepoint <= 0x9F and (codepoint >= 0x80<br />
or codepoint <= 0x1F)) then<br />
if self[script_code].n == 0x20 then<br />
local period = ("."):byte()<br />
for _ = 1, 3 do<br />
self[script_code].n = self[script_code].n + 1<br />
self[script_code][self[script_code].n] = period<br />
end<br />
else<br />
if script_code == "Zinh" then -- probably combining character<br />
self[script_code].n = self[script_code].n + 1<br />
self[script_code][self[script_code].n] = 0x25CC<br />
end<br />
self[script_code].n = self[script_code].n + 1<br />
self[script_code][self[script_code].n] = codepoint<br />
end<br />
end<br />
end,<br />
clear = clear,<br />
}<br />
})<br />
<br />
output:insert [[<br />
{| class="wikitable"<br />
|+ Scripts in each Unicode block<br />
! block !! codepoints !! scripts<br />
]]<br />
<br />
for _, block in pairs(mw.loadData "Module:Unicode data/blocks") do<br />
local codepoint = block[1]<br />
if codepoint > ending then break end<br />
<br />
if codepoint >= start then<br />
while codepoint <= block[2] do<br />
local script = singles[codepoint]<br />
local count<br />
if script then -- Codepoint is in "singles" map.<br />
counts:increment(script)<br />
codepoints_per_script:add(script, codepoint)<br />
codepoint = codepoint + 1<br />
count = 1 -- for potential future use<br />
else<br />
local range, index = Unicode_data.binary_range_search(codepoint, ranges)<br />
if range then -- Codepoint is in "ranges" array.<br />
count = 0<br />
script = range[3]<br />
while codepoint <= range[2] and codepoint <= block[2] do<br />
count = count + 1<br />
codepoints_per_script:add(script, codepoint)<br />
codepoint = codepoint + 1<br />
end<br />
counts:increment(script, count)<br />
else -- Codepoint doesn't have data; it's Zzzz.<br />
-- Get range immediately above codepoint.<br />
while ranges[index][2] < codepoint do<br />
index = index + 1<br />
end<br />
<br />
count = 0<br />
script = "Zzzz"<br />
local range = ranges[index]<br />
while codepoint < range[1] and codepoint <= block[2]<br />
and not singles[codepoint] do<br />
count = count + 1<br />
codepoint = codepoint + 1<br />
end<br />
counts:increment(script, count)<br />
end<br />
end<br />
end<br />
<br />
output:insert_format([[<br />
|-<br />
| %s<br />
| U+%04X&ndash;U+%04X<br />
| %s<br />
]], link_block_name(block[3]), block[1], block[2],<br />
table.concat(<br />
fun.map(<br />
function (count, script)<br />
return ('<abbr title="%s">%s</abbr> (<span title="%s">%d</span>)')<br />
:format(<br />
script_data.aliases[script], script,<br />
codepoints_per_script[script]<br />
and mw.text.nowiki(mw.ustring.char(<br />
unpack(codepoints_per_script[script])))<br />
or "",<br />
count)<br />
end,<br />
m_table.sortedPairs(<br />
counts,<br />
function (script1, script2)<br />
return counts[script1] > counts[script2]<br />
end)),<br />
", "))<br />
end<br />
<br />
-- mw.logObject(codepoints_per_script, block[3])<br />
counts:clear()<br />
codepoints_per_script:clear()<br />
end<br />
output:insert "|}"<br />
<br />
return output:join()<br />
end<br />
<br />
function p.chars_in_scripts_in_block(frame)<br />
local block_info = get_block_info_from_arg(frame.args, 1)<br />
local show_block_name = get_boolean_from_arg(frame.args, 2)<br />
local script_char_set_map = print_char_set_map(<br />
get_chars_in_scripts(fun.range(block_info[1], block_info[2])))<br />
if show_block_name then<br />
return ("%s: %s"):format(block_info[3], script_char_set_map)<br />
else<br />
return script_char_set_map<br />
end<br />
end<br />
<br />
function p.search_for_language_codes(frame)<br />
local page_name = frame.args[1] or "English language"<br />
<br />
local success, title_object = pcall(mw.title.new, page_name)<br />
if not (success and title_object) then<br />
mw.logf("Could not make title object for '%s'.", page_name)<br />
return<br />
end<br />
<br />
local content = title_object:getContent()<br />
<br />
local language_codes = {}<br />
for lang_template in content:gmatch "{{lang[^}]+" do<br />
local template_name = lang_template:match("{{([^|}]+)")<br />
local language_code<br />
if template_name == "lang" then<br />
language_code = lang_template:match "{{lang|([^|}]+)"<br />
elseif template_name:find "^lang-" then<br />
language_code = lang_template:match "{{lang-([^|}]+)"<br />
end<br />
if language_code then<br />
language_codes[language_code] = true<br />
end<br />
end<br />
<br />
return table.concat(m_table.keysToList(language_codes), ", ")<br />
end<br />
<br />
local parsed_subtags_mt = {<br />
__index = {<br />
-- "error" is the error message.<br />
-- "index" is the ordinal of the subtag in which the error was found.<br />
throw = function (self, error, index)<br />
self.error = self.error_messages[error]<br />
self.invalid = table.concat(self.input, "-", index)<br />
return self:remove_unnecessary_fields()<br />
end,<br />
<br />
remove_unnecessary_fields = function (self)<br />
-- Only useful internally.<br />
self.input = nil<br />
self:pretty_print()<br />
p.validate_lang_tag(self)<br />
return self<br />
end,<br />
<br />
-- Regularize capitalization of language subtags:<br />
-- ZH-LATN -> zh-Latn, FR-ca -> fr-CA<br />
pretty_print = function (self)<br />
for key, func in pairs(self.print_funcs) do<br />
if self[key] then<br />
self[key] = func(self[key])<br />
end<br />
end<br />
return self<br />
end,<br />
<br />
-- Re-create the original tag from the parsed subtags.<br />
get_tag = function (self)<br />
if self.tag then return self.tag end<br />
<br />
local tag = {}<br />
for _, subtag_name in ipairs(self.subtag_order) do<br />
if subtag_name == "private_use" then<br />
table.insert(tag, "x")<br />
end<br />
<br />
if type(self[subtag_name]) == "table" then<br />
for _, subtag in ipairs(self[subtag_name]) do<br />
table.insert(tag, subtag)<br />
end<br />
else<br />
table.insert(tag, self[subtag_name])<br />
end<br />
end<br />
<br />
tag = table.concat(tag, "-")<br />
self.tag = tag -- Cache the result.<br />
<br />
return tag<br />
end,<br />
<br />
subtag_order = {<br />
"language", "script", "region", "variant", "private_use"<br />
},<br />
<br />
error_messages = {<br />
invalid_characters = "invalid characters",<br />
no_language = "no language subtag",<br />
invalid_subtag = "invalid subtag",<br />
invalid_private_use = "length of private-use subtag out of range",<br />
empty_private_use = "empty private-use subtag",<br />
}<br />
}<br />
}<br />
local function initial_caps_helper(initial, rest)<br />
return string.upper(initial) .. string.lower(rest)<br />
end<br />
local function lower_or_map_lower(str)<br />
if type(str) == "table" then<br />
return fun.map(string.lower, str)<br />
else<br />
return string.lower(str)<br />
end<br />
end<br />
parsed_subtags_mt.__index.print_funcs = {<br />
language = string.lower,<br />
script = function (script_code)<br />
return (string.gsub(script_code, "^(%a)(%a%a%a)$", initial_caps_helper))<br />
end,<br />
region = string.upper,<br />
variant = lower_or_map_lower,<br />
private_use = lower_or_map_lower,<br />
}<br />
<br />
setmetatable(parsed_subtags_mt, {<br />
__call = function (self, input)<br />
return setmetatable({ input = input }, self)<br />
end<br />
})<br />
<br />
-- An array of patterns for each subtag, and a "type" field for the name<br />
-- of the subtag.<br />
-- The patterns are checked in order, and any of the subtags can be skipped.<br />
-- So, for example, the "language" subtag must precede the "script"<br />
-- subtag, but a tag may contain a "language" subtag, no "script" subtag<br />
-- and then a "region" subtag.<br />
-- If the full list of subtags has been iterated over, the remaining subtags<br />
-- must match the pattern for a private-use subtag, or the tag is invalid.<br />
local subtag_info = { -- can be put in data module<br />
{ "%a%a%a?", "1%a+", type = "language" }, -- ll or lll; special case<br />
-- include extlang?<br />
{ "%a%a%a%a", type = "script" }, -- Ssss<br />
{ "%a%a", "%d%d%d", type = "region" }, -- rr, DDD<br />
{<br />
"%d%d%d%d", -- 4 digits<br />
"%w%w%w%w%w%w?%w?%w?", -- 5-8 alnum characters<br />
type = "variant",<br />
repeatable = true, -- There can be multiple variants.<br />
}<br />
}<br />
<br />
-- A previous draft, in [[Module:Lang/sandbox]]:<br />
-- https://en.wikipedia.org/w/index.php?oldid=812819217<br />
<br />
-- Based on https://www.w3.org/International/articles/language-tags/.<br />
<br />
-- Parse a language tag.<br />
-- Returns nil if tag is not a string or empty.<br />
-- Else returns a table with a map of subtag type to subtag for all subtags that<br />
-- were parsed.<br />
-- If there was an error, returns an "error" field with a description of the<br />
-- error, and an "invalid" field with the suffix of the tag starting at the<br />
-- index where the error occurred.<br />
<br />
-- Does not recognize "extension" tags, such as those introduced by "u", as they<br />
-- are not needed on Wikipedia. Does not recognize "grandfathered" tags.<br />
-- Does not recognize extended language subtags, such as "zh-yue".<br />
-- https://www.rfc-editor.org/rfc/rfc6067.txt, https://tools.ietf.org/html/bcp47<br />
<br />
-- Only checks that the syntax is correct, not that the values are valid. For<br />
-- instance, will accept non-existent language codes, like "zz".<br />
function p.parse_IETF(tag)<br />
if type(tag) ~= "string" or tag == "" then<br />
return nil<br />
end<br />
<br />
-- This may contain the special fields "invalid", "error".<br />
-- "error" indicates why the<br />
-- tag is invalid (if applicable).<br />
-- All other fields are subtags, and they appear in the tag in the following<br />
-- order:<br />
-- "language", "script", "region", "variant", "private_use", "invalid"<br />
-- All these subtags can be strings or nil, while "variant" can also be an<br />
-- array of strings if more than one variant subtag was found.<br />
-- "invalid" is the portion of the tag after the last valid subtag (minus a<br />
-- hyphen).<br />
local segments = mw.text.split(tag, "-")<br />
local parsed_subtags = parsed_subtags_mt(segments)<br />
<br />
-- Language tags probably only contain ASCII alphabetic and numerical<br />
-- characters and hyphen-minus.<br />
if not tag:find "^[A-Za-z0-9-]+$" then<br />
return parsed_subtags:throw(<br />
"invalid_characters",<br />
fun.indexOf(<br />
function (tag)<br />
return tag:find "[^A-Za-z0-9-]"<br />
end,<br />
segments))<br />
end<br />
<br />
local subtag_i = 1 -- Index of current item in subtag_info.<br />
local segment_i = 1 -- Index of current segment.<br />
while segments[segment_i] and subtag_info[subtag_i] do<br />
local segment = segments[segment_i]<br />
local subtag_type<br />
while not subtag_type and subtag_info[subtag_i] do<br />
-- Check each pattern for the subtag type at "subtag_i" in "subtag_info".<br />
local cur_subtag = subtag_info[subtag_i]<br />
for _, pattern in ipairs(cur_subtag) do<br />
if segment:find("^" .. pattern .. "$") then<br />
subtag_type = cur_subtag.type<br />
-- There can be multiple "variant" subtags (and "extension"<br />
-- subtags, if those are added).<br />
if not cur_subtag.repeatable then<br />
subtag_i = subtag_i + 1<br />
end<br />
break<br />
end<br />
end<br />
<br />
if not subtag_type then -- No match; try next subtag.<br />
subtag_i = subtag_i + 1<br />
end<br />
end<br />
<br />
-- If language subtag has not been found, or the current segment has not<br />
-- been matched as a subtag, break the loop and check for<br />
-- a private-use subtag.<br />
if segment_i == 1 and subtag_type ~= "language" or not subtag_type then<br />
break<br />
else<br />
if parsed_subtags[subtag_type] then -- Create an array.<br />
if type(parsed_subtags[subtag_type]) == "string" then<br />
parsed_subtags[subtag_type] = { parsed_subtags[subtag_type] }<br />
end -- else table<br />
table.insert(parsed_subtags[subtag_type], segment)<br />
else<br />
parsed_subtags[subtag_type] = segment<br />
end<br />
last_matched_segment_i = segment_i<br />
end<br />
<br />
segment_i = segment_i + 1<br />
end<br />
<br />
if segments[segment_i] then -- More segments to scan?<br />
-- Not all potential subtags were matched. Check for private-use subtags.<br />
-- https://tools.ietf.org/html/bcp47#section-2.2.7<br />
-- Private-use subtags consist of one or more sequences of 1 to 8<br />
-- alphanumeric characters preceded by "x-".<br />
-- Alphanumericity has already been checked.<br />
<br />
-- A tag must start with either a language subtag or a private-use subtag.<br />
-- If next segment is not "x", introducing a private-use subtag, there<br />
-- is no private-use subtag.<br />
if segments[segment_i] and segments[segment_i]:lower() ~= "x" then<br />
if not parsed_subtags.language then<br />
return parsed_subtags:throw("no_language", 1)<br />
else<br />
return parsed_subtags:throw("invalid_subtag",<br />
segment_i)<br />
end<br />
elseif not segments[segment_i + 1] then<br />
return parsed_subtags:throw("empty_private_use",<br />
segment_i)<br />
end<br />
<br />
-- Check length of all segments after "x".<br />
for i = segment_i + 1, #segments do<br />
local length = #segments[i]<br />
<br />
if not (1 <= length and length <= 8) then<br />
return parsed_subtags<br />
:throw("invalid_private_use", segment_i)<br />
end<br />
end<br />
<br />
if not segments[last_matched_segment_i + 3] then -- There is only one private-use subtag.<br />
parsed_subtags.private_use = segments[segment_i + 1]<br />
else<br />
parsed_subtags.private_use = {}<br />
for i = segment_i + 1, #segments do<br />
table.insert(parsed_subtags.private_use, segments[i])<br />
end<br />
end<br />
end<br />
<br />
return parsed_subtags:remove_unnecessary_fields()<br />
end<br />
<br />
<br />
local lang_name_table = mw.loadData "Module:Language/name/data"<br />
local synonym_table = mw.loadData "Module:Lang/ISO 639 synonyms"<br />
local lang_data = mw.loadData "Module:Lang/data"<br />
<br />
function p.validate_lang_tag(parsed_subtags)<br />
-- Already checked that the tag starts with a language subtag or a private-use subtag.<br />
-- Script code is initially capitalized, region code is uppercase,<br />
-- everything else is lowercase.<br />
<br />
-- Check existence of language tag.<br />
if parsed_subtags.language and<br />
not (lang_data.override[parsed_subtags.language]<br />
or lang_name_table.lang[parsed_subtags.language]) then<br />
mw.log("Invalid language code", parsed_subtags.language, "in", parsed_subtags:get_tag())<br />
end<br />
<br />
-- Check existence of script tag.<br />
if parsed_subtags.script then<br />
local lower_script = parsed_subtags.script:lower()<br />
if not lang_name_table.script[lower_script] then<br />
mw.log("Invalid script code", parsed_subtags.script, "in", parsed_subtags:get_tag())<br />
end<br />
<br />
-- Check that script tag is not marked as superfluous (because the<br />
-- it is considered the default one for the language).<br />
if lang_name_table.suppressed[lower_script]<br />
and parsed_subtags.language<br />
and m_table.inArray(<br />
lang_name_table.suppressed[lower_script],<br />
parsed_subtags.language:lower()) then<br />
mw.log(parsed_subtags.script, "is suppressed with",<br />
parsed_subtags.language, "in", parsed_subtags:get_tag())<br />
end<br />
end<br />
<br />
-- Check existence of region code..<br />
if parsed_subtags.region and not lang_name_table.region[parsed_subtags.region:lower()] then<br />
mw.log("Invalid region code", parsed_subtags.region, "in", parsed_subtags:get_tag())<br />
end<br />
<br />
-- Check that variant code is valid, and that it can validly be used with the<br />
-- given combination of language, script, region, and variant.<br />
-- Check for duplicate variant subtags?<br />
if parsed_subtags.variant then<br />
local lower_tag = parsed_subtags:get_tag():lower()<br />
<br />
for _, variant in ipairs(type(parsed_subtags.variant) == "table"<br />
and parsed_subtags.variant or { parsed_subtags.variant }) do<br />
if not lang_name_table.variant[variant] then<br />
mw.log("Invalid variant code", variant, "in", parsed_subtags:get_tag())<br />
else<br />
local prefix = parsed_subtags:get_tag():lower():match("^(.-)%-" .. variant)<br />
<br />
-- Check that at least one of the prefixes is found at the<br />
-- beginning of lower_tag.<br />
if not fun.some(function (prefix)<br />
return lower_tag:find(prefix, 1, true) == 1<br />
end,<br />
lang_name_table.variant[variant].prefixes) then<br />
mw.log("Variant tag", variant, "does not belong with prefix",<br />
prefix, "in", parsed_subtags:get_tag())<br />
end<br />
end<br />
end<br />
end<br />
<br />
-- Check that the private-use subtag is actually used by Wikipedia.<br />
if parsed_subtags.private_use and not lang_data.override[parsed_subtags.tag] then<br />
mw.log("Invalid private-use subtag in", parsed_subtags:get_tag())<br />
end<br />
end<br />
<br />
return p</div>
Erutuon