https://en.wikipedia.org/w/index.php?action=history&feed=atom&title=Module%3ASandbox%2FErutuon%2FUnicode Module:Sandbox/Erutuon/Unicode - Revision history 2025-06-03T18:32:10Z Revision history for this page on the wiki MediaWiki 1.45.0-wmf.3 https://en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&diff=885441693&oldid=prev Erutuon: Module:Table → Module:TableTools 2019-02-28T01:35:22Z <p><a href="/w/index.php?title=Module:Table&amp;action=edit&amp;redlink=1" class="new" title="Module:Table (page does not exist)">Module:Table</a> → <a href="/wiki/Module:TableTools" title="Module:TableTools">Module:TableTools</a></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 01:35, 28 February 2019</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 70:</td> <td colspan="2" class="diff-lineno">Line 70:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local fun = require "Module:Fun"</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local fun = require "Module:Fun"</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>local m_table = require "Module:<del style="font-weight: bold; text-decoration: none;">Table</del>"</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>local m_table = require "Module:<ins style="font-weight: bold; text-decoration: none;">TableTools</ins>"</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local script_to_count_mt = {</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local script_to_count_mt = {</div></td> </tr> </table> Erutuon https://en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&diff=857167856&oldid=prev Erutuon: fix errorf 2018-08-30T00:29:26Z <p>fix errorf</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 00:29, 30 August 2018</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 4:</td> <td colspan="2" class="diff-lineno">Line 4:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local function errorf(level, ...)</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>local function errorf(level, ...)</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> if type(level) == number then</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> if type(level) == <ins style="font-weight: bold; text-decoration: none;">"</ins>number<ins style="font-weight: bold; text-decoration: none;">"</ins> then</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> return error(string.format(...), level + 1)</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> return error(string.format(...), level + 1)</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> else -- level is actually the format string.</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> else -- level is actually the format string.</div></td> </tr> </table> Erutuon https://en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&diff=849319617&oldid=prev Erutuon: removed language tag stuff 2018-07-08T05:11:56Z <p>removed language tag stuff</p> <a href="//en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&amp;diff=849319617&amp;oldid=849319578">Show changes</a> Erutuon https://en.wikipedia.org/w/index.php?title=Module:Sandbox/Erutuon/Unicode&diff=849319578&oldid=prev Erutuon: copied from Module:Sandbox/Erutuon 2018-07-08T05:11:29Z <p>copied from <a href="/wiki/Module:Sandbox/Erutuon" title="Module:Sandbox/Erutuon">Module:Sandbox/Erutuon</a></p> <p><b>New page</b></p><div>local p = {}<br /> <br /> local Unicode_data = require &quot;Module:Unicode data/sandbox&quot;<br /> <br /> local function errorf(level, ...)<br /> if type(level) == number then<br /> return error(string.format(...), level + 1)<br /> else -- level is actually the format string.<br /> return error(string.format(level, ...), 2)<br /> end<br /> end<br /> <br /> function mw.logf(...)<br /> return mw.log(string.format(...))<br /> end<br /> <br /> local output_mt = {}<br /> function output_mt:insert(str)<br /> self.n = self.n + 1<br /> self[self.n] = str<br /> end<br /> <br /> -- also in [[Module:Unicode data/documentation functions]]<br /> function output_mt:insert_format(...)<br /> self:insert(string.format(...))<br /> end<br /> <br /> output_mt.join = table.concat<br /> <br /> output_mt.__index = output_mt<br /> <br /> local function Output()<br /> return setmetatable({ n = 0 }, output_mt)<br /> end<br /> <br /> <br /> local Latn_pattern = table.concat {<br /> &quot;[&quot;,<br /> &quot;\n\32-\127&quot;,<br /> &quot;\194\160-\194\172&quot;,<br /> &quot;\195\128-\195\191&quot;,<br /> &quot;\196\128-\197\191&quot;,<br /> &quot;\198\128-\201\143&quot;,<br /> &quot;\225\184\128-\225\187\191&quot;,<br /> &quot;\226\177\160-\226\177\191&quot;,<br /> &quot;\234\156\160-\234\159\191&quot;,<br /> &quot;\234\172\176-\234\173\175&quot;,<br /> &quot;\239\172\128-\239\172\134&quot;,<br /> &quot;\239\188\129-\239\188\188&quot;,<br /> &quot;–&quot;,<br /> &quot;—&quot;,<br /> &quot;«&quot;, &quot;»&quot;,<br /> &quot;]&quot;,<br /> };<br /> <br /> local get_codepoint = mw.ustring.codepoint<br /> local function expand_range(start, ending)<br /> local lower, higher = get_codepoint(start), get_codepoint(ending)<br /> if higher &lt; lower then<br /> return nil<br /> end<br /> local chars = {}<br /> local i = 0<br /> for codepoint = lower, higher do<br /> i = i + 1<br /> chars[i] = mw.ustring.char(codepoint)<br /> end<br /> return table.concat(chars)<br /> end<br /> <br /> local fun = require &quot;Module:Fun&quot;<br /> local m_table = require &quot;Module:Table&quot;<br /> <br /> local script_to_count_mt = {<br /> __index = function (self, key)<br /> self[key] = 0<br /> return 0<br /> end,<br /> __call = function (self, ...)<br /> return setmetatable({}, self)<br /> end<br /> }<br /> setmetatable(script_to_count_mt, script_to_count_mt)<br /> <br /> -- Uses an iterator (such as mw.ustring.gcodepoint) that generates a codepoint<br /> -- each time it is called with an optional state and another value.<br /> local function show_scripts(iterator, state, value)<br /> local script_to_count = script_to_count_mt()<br /> for codepoint in iterator, state, value do<br /> local script = Unicode_data.lookup_script(codepoint)<br /> script_to_count[script] = script_to_count[script] + 1<br /> end<br /> return table.concat(<br /> fun.mapIter(<br /> function (count, script)<br /> return (&quot;%s (%d)&quot;):format(script, count)<br /> end,<br /> m_table.sortedPairs(<br /> script_to_count,<br /> function (script1, script2)<br /> return script_to_count[script1] &gt; script_to_count[script2]<br /> end)),<br /> &quot;, &quot;)<br /> end<br /> <br /> local function get_chars_in_scripts(iterator, state, value)<br /> local script_to_char_set = {}<br /> for codepoint in iterator, state, value do<br /> local script = Unicode_data.lookup_script(codepoint)<br /> script_to_char_set[script] = script_to_char_set[script] or {}<br /> script_to_char_set[script][codepoint] = true<br /> end<br /> <br /> return script_to_char_set<br /> end<br /> <br /> local function print_char_set_map(script_to_char_set, format, separator)<br /> format = format or &quot;%s: %s&quot;<br /> separator = separator or &quot;\n&quot;<br /> return table.concat(<br /> fun.mapIter(<br /> function (char_set, script)<br /> local char_list = fun.mapIter(<br /> function (_, codepoint)<br /> return mw.ustring.char(codepoint)<br /> end,<br /> m_table.sortedPairs(char_set))<br /> return (format):format(script, mw.text.nowiki(table.concat(char_list)))<br /> end,<br /> m_table.sortedPairs(script_to_char_set)),<br /> separator)<br /> end<br /> <br /> function p.show(frame)<br /> local expanded_pattern = Latn_pattern<br /> :gsub(&quot;%[(.-)%]&quot;, &quot;%1&quot;)<br /> :gsub( -- Find two UTF-8-encoded characters separated by hyphen-minus.<br /> &quot;([%z\1-\127\194-\244][\128-\191]*)%-([%z\1-\127\194-\244][\128-\191]*)&quot;,<br /> function (char1, char2)<br /> return expand_range(char1, char2)<br /> end)<br /> <br /> return (&#039;* &lt;div style=&quot;overflow-wrap: break-word;&quot;&gt;%s&lt;/div&gt;&lt;br&gt;%s&#039;)<br /> :format(expanded_pattern<br /> :gsub(&quot;^%s*&quot;, &quot;&quot;), -- Remove initial &quot;\n &quot; to avoid creating unwanted pre element.<br /> show_scripts(mw.ustring.gcodepoint(expanded_pattern)))<br /> end<br /> <br /> local function get_block_info_from_arg(args, arg)<br /> local block_name = args[1]<br /> or errorf(&quot;Parameter %s is required&quot;, tostring(arg))<br /> <br /> local block_info = Unicode_data.get_block_info(block_name)<br /> or errorf(&quot;The block &#039;%s&#039; could be found&quot;, block_name)<br /> <br /> return block_info<br /> end<br /> <br /> local function get_boolean_from_arg(args, arg)<br /> return args[arg] and require &quot;Module:Yesno&quot; (args[arg])<br /> end<br /> <br /> function p.scripts_in_block(frame)<br /> local block_info = get_block_info_from_arg(frame.args, 1)<br /> local show_block_name = get_boolean_from_arg(frame.args, 2)<br /> local script_list = show_scripts(fun.range(block_info[1], block_info[2]))<br /> if show_block_name then<br /> return (&quot;%s: %s&quot;):format(block_info[3], script_list)<br /> else<br /> return script_list<br /> end<br /> end<br /> <br /> local function link_block_name(block_name)<br /> if block_name:find &quot; &quot; then<br /> return (&quot;[[%s]]&quot;):format(block_name)<br /> else<br /> return (&quot;[[%s (Unicode block)|%s]]&quot;):format(block_name, block_name)<br /> end<br /> end<br /> <br /> function p.scripts_in_blocks(frame)<br /> local output = Output()<br /> local start = frame.args[1] and tonumber(frame.args[1], 16) or 0<br /> local ending = frame.args[2] and tonumber(frame.args[2], 16) or 0x4000<br /> <br /> local script_data = mw.loadData &quot;Module:Unicode data/scripts&quot;<br /> local singles = script_data.singles<br /> local ranges = script_data.ranges<br /> <br /> local function clear (self)<br /> for _, key in ipairs(m_table.keysToList(self, false)) do<br /> self[key] = nil<br /> end<br /> end<br /> <br /> local counts = {}<br /> setmetatable(counts, {<br /> __index = {<br /> increment = function(self, script_code, amount)<br /> self[script_code] = (self[script_code] or 0) + (amount or 1)<br /> end,<br /> clear = clear,<br /> }<br /> })<br /> local codepoints_per_script = {}<br /> setmetatable(codepoints_per_script, {<br /> __index = {<br /> add = function(self, script_code, codepoint)<br /> self[script_code] = self[script_code] or { n = 0 }<br /> if self[script_code].n &lt;= 0x20<br /> and not (codepoint &lt;= 0x9F and (codepoint &gt;= 0x80<br /> or codepoint &lt;= 0x1F)) then<br /> if self[script_code].n == 0x20 then<br /> local period = (&quot;.&quot;):byte()<br /> for _ = 1, 3 do<br /> self[script_code].n = self[script_code].n + 1<br /> self[script_code][self[script_code].n] = period<br /> end<br /> else<br /> if script_code == &quot;Zinh&quot; then -- probably combining character<br /> self[script_code].n = self[script_code].n + 1<br /> self[script_code][self[script_code].n] = 0x25CC<br /> end<br /> self[script_code].n = self[script_code].n + 1<br /> self[script_code][self[script_code].n] = codepoint<br /> end<br /> end<br /> end,<br /> clear = clear,<br /> }<br /> })<br /> <br /> output:insert [[<br /> {| class=&quot;wikitable&quot;<br /> |+ Scripts in each Unicode block<br /> ! block !! codepoints !! scripts<br /> ]]<br /> <br /> for _, block in pairs(mw.loadData &quot;Module:Unicode data/blocks&quot;) do<br /> local codepoint = block[1]<br /> if codepoint &gt; ending then break end<br /> <br /> if codepoint &gt;= start then<br /> while codepoint &lt;= block[2] do<br /> local script = singles[codepoint]<br /> local count<br /> if script then -- Codepoint is in &quot;singles&quot; map.<br /> counts:increment(script)<br /> codepoints_per_script:add(script, codepoint)<br /> codepoint = codepoint + 1<br /> count = 1 -- for potential future use<br /> else<br /> local range, index = Unicode_data.binary_range_search(codepoint, ranges)<br /> if range then -- Codepoint is in &quot;ranges&quot; array.<br /> count = 0<br /> script = range[3]<br /> while codepoint &lt;= range[2] and codepoint &lt;= block[2] do<br /> count = count + 1<br /> codepoints_per_script:add(script, codepoint)<br /> codepoint = codepoint + 1<br /> end<br /> counts:increment(script, count)<br /> else -- Codepoint doesn&#039;t have data; it&#039;s Zzzz.<br /> -- Get range immediately above codepoint.<br /> while ranges[index][2] &lt; codepoint do<br /> index = index + 1<br /> end<br /> <br /> count = 0<br /> script = &quot;Zzzz&quot;<br /> local range = ranges[index]<br /> while codepoint &lt; range[1] and codepoint &lt;= block[2]<br /> and not singles[codepoint] do<br /> count = count + 1<br /> codepoint = codepoint + 1<br /> end<br /> counts:increment(script, count)<br /> end<br /> end<br /> end<br /> <br /> output:insert_format([[<br /> |-<br /> | %s<br /> | U+%04X&amp;ndash;U+%04X<br /> | %s<br /> ]], link_block_name(block[3]), block[1], block[2],<br /> table.concat(<br /> fun.map(<br /> function (count, script)<br /> return (&#039;&lt;abbr title=&quot;%s&quot;&gt;%s&lt;/abbr&gt; (&lt;span title=&quot;%s&quot;&gt;%d&lt;/span&gt;)&#039;)<br /> :format(<br /> script_data.aliases[script], script,<br /> codepoints_per_script[script]<br /> and mw.text.nowiki(mw.ustring.char(<br /> unpack(codepoints_per_script[script])))<br /> or &quot;&quot;,<br /> count)<br /> end,<br /> m_table.sortedPairs(<br /> counts,<br /> function (script1, script2)<br /> return counts[script1] &gt; counts[script2]<br /> end)),<br /> &quot;, &quot;))<br /> end<br /> <br /> -- mw.logObject(codepoints_per_script, block[3])<br /> counts:clear()<br /> codepoints_per_script:clear()<br /> end<br /> output:insert &quot;|}&quot;<br /> <br /> return output:join()<br /> end<br /> <br /> function p.chars_in_scripts_in_block(frame)<br /> local block_info = get_block_info_from_arg(frame.args, 1)<br /> local show_block_name = get_boolean_from_arg(frame.args, 2)<br /> local script_char_set_map = print_char_set_map(<br /> get_chars_in_scripts(fun.range(block_info[1], block_info[2])))<br /> if show_block_name then<br /> return (&quot;%s: %s&quot;):format(block_info[3], script_char_set_map)<br /> else<br /> return script_char_set_map<br /> end<br /> end<br /> <br /> function p.search_for_language_codes(frame)<br /> local page_name = frame.args[1] or &quot;English language&quot;<br /> <br /> local success, title_object = pcall(mw.title.new, page_name)<br /> if not (success and title_object) then<br /> mw.logf(&quot;Could not make title object for &#039;%s&#039;.&quot;, page_name)<br /> return<br /> end<br /> <br /> local content = title_object:getContent()<br /> <br /> local language_codes = {}<br /> for lang_template in content:gmatch &quot;{{lang[^}]+&quot; do<br /> local template_name = lang_template:match(&quot;{{([^|}]+)&quot;)<br /> local language_code<br /> if template_name == &quot;lang&quot; then<br /> language_code = lang_template:match &quot;{{lang|([^|}]+)&quot;<br /> elseif template_name:find &quot;^lang-&quot; then<br /> language_code = lang_template:match &quot;{{lang-([^|}]+)&quot;<br /> end<br /> if language_code then<br /> language_codes[language_code] = true<br /> end<br /> end<br /> <br /> return table.concat(m_table.keysToList(language_codes), &quot;, &quot;)<br /> end<br /> <br /> local parsed_subtags_mt = {<br /> __index = {<br /> -- &quot;error&quot; is the error message.<br /> -- &quot;index&quot; is the ordinal of the subtag in which the error was found.<br /> throw = function (self, error, index)<br /> self.error = self.error_messages[error]<br /> self.invalid = table.concat(self.input, &quot;-&quot;, index)<br /> return self:remove_unnecessary_fields()<br /> end,<br /> <br /> remove_unnecessary_fields = function (self)<br /> -- Only useful internally.<br /> self.input = nil<br /> self:pretty_print()<br /> p.validate_lang_tag(self)<br /> return self<br /> end,<br /> <br /> -- Regularize capitalization of language subtags:<br /> -- ZH-LATN -&gt; zh-Latn, FR-ca -&gt; fr-CA<br /> pretty_print = function (self)<br /> for key, func in pairs(self.print_funcs) do<br /> if self[key] then<br /> self[key] = func(self[key])<br /> end<br /> end<br /> return self<br /> end,<br /> <br /> -- Re-create the original tag from the parsed subtags.<br /> get_tag = function (self)<br /> if self.tag then return self.tag end<br /> <br /> local tag = {}<br /> for _, subtag_name in ipairs(self.subtag_order) do<br /> if subtag_name == &quot;private_use&quot; then<br /> table.insert(tag, &quot;x&quot;)<br /> end<br /> <br /> if type(self[subtag_name]) == &quot;table&quot; then<br /> for _, subtag in ipairs(self[subtag_name]) do<br /> table.insert(tag, subtag)<br /> end<br /> else<br /> table.insert(tag, self[subtag_name])<br /> end<br /> end<br /> <br /> tag = table.concat(tag, &quot;-&quot;)<br /> self.tag = tag -- Cache the result.<br /> <br /> return tag<br /> end,<br /> <br /> subtag_order = {<br /> &quot;language&quot;, &quot;script&quot;, &quot;region&quot;, &quot;variant&quot;, &quot;private_use&quot;<br /> },<br /> <br /> error_messages = {<br /> invalid_characters = &quot;invalid characters&quot;,<br /> no_language = &quot;no language subtag&quot;,<br /> invalid_subtag = &quot;invalid subtag&quot;,<br /> invalid_private_use = &quot;length of private-use subtag out of range&quot;,<br /> empty_private_use = &quot;empty private-use subtag&quot;,<br /> }<br /> }<br /> }<br /> local function initial_caps_helper(initial, rest)<br /> return string.upper(initial) .. string.lower(rest)<br /> end<br /> local function lower_or_map_lower(str)<br /> if type(str) == &quot;table&quot; then<br /> return fun.map(string.lower, str)<br /> else<br /> return string.lower(str)<br /> end<br /> end<br /> parsed_subtags_mt.__index.print_funcs = {<br /> language = string.lower,<br /> script = function (script_code)<br /> return (string.gsub(script_code, &quot;^(%a)(%a%a%a)$&quot;, initial_caps_helper))<br /> end,<br /> region = string.upper,<br /> variant = lower_or_map_lower,<br /> private_use = lower_or_map_lower,<br /> }<br /> <br /> setmetatable(parsed_subtags_mt, {<br /> __call = function (self, input)<br /> return setmetatable({ input = input }, self)<br /> end<br /> })<br /> <br /> -- An array of patterns for each subtag, and a &quot;type&quot; field for the name<br /> -- of the subtag.<br /> -- The patterns are checked in order, and any of the subtags can be skipped.<br /> -- So, for example, the &quot;language&quot; subtag must precede the &quot;script&quot;<br /> -- subtag, but a tag may contain a &quot;language&quot; subtag, no &quot;script&quot; subtag<br /> -- and then a &quot;region&quot; subtag.<br /> -- If the full list of subtags has been iterated over, the remaining subtags<br /> -- must match the pattern for a private-use subtag, or the tag is invalid.<br /> local subtag_info = { -- can be put in data module<br /> { &quot;%a%a%a?&quot;, &quot;1%a+&quot;, type = &quot;language&quot; }, -- ll or lll; special case<br /> -- include extlang?<br /> { &quot;%a%a%a%a&quot;, type = &quot;script&quot; }, -- Ssss<br /> { &quot;%a%a&quot;, &quot;%d%d%d&quot;, type = &quot;region&quot; }, -- rr, DDD<br /> {<br /> &quot;%d%d%d%d&quot;, -- 4 digits<br /> &quot;%w%w%w%w%w%w?%w?%w?&quot;, -- 5-8 alnum characters<br /> type = &quot;variant&quot;,<br /> repeatable = true, -- There can be multiple variants.<br /> }<br /> }<br /> <br /> -- A previous draft, in [[Module:Lang/sandbox]]:<br /> -- https://en.wikipedia.org/w/index.php?oldid=812819217<br /> <br /> -- Based on https://www.w3.org/International/articles/language-tags/.<br /> <br /> -- Parse a language tag.<br /> -- Returns nil if tag is not a string or empty.<br /> -- Else returns a table with a map of subtag type to subtag for all subtags that<br /> -- were parsed.<br /> -- If there was an error, returns an &quot;error&quot; field with a description of the<br /> -- error, and an &quot;invalid&quot; field with the suffix of the tag starting at the<br /> -- index where the error occurred.<br /> <br /> -- Does not recognize &quot;extension&quot; tags, such as those introduced by &quot;u&quot;, as they<br /> -- are not needed on Wikipedia. Does not recognize &quot;grandfathered&quot; tags.<br /> -- Does not recognize extended language subtags, such as &quot;zh-yue&quot;.<br /> -- https://www.rfc-editor.org/rfc/rfc6067.txt, https://tools.ietf.org/html/bcp47<br /> <br /> -- Only checks that the syntax is correct, not that the values are valid. For<br /> -- instance, will accept non-existent language codes, like &quot;zz&quot;.<br /> function p.parse_IETF(tag)<br /> if type(tag) ~= &quot;string&quot; or tag == &quot;&quot; then<br /> return nil<br /> end<br /> <br /> -- This may contain the special fields &quot;invalid&quot;, &quot;error&quot;.<br /> -- &quot;error&quot; indicates why the<br /> -- tag is invalid (if applicable).<br /> -- All other fields are subtags, and they appear in the tag in the following<br /> -- order:<br /> -- &quot;language&quot;, &quot;script&quot;, &quot;region&quot;, &quot;variant&quot;, &quot;private_use&quot;, &quot;invalid&quot;<br /> -- All these subtags can be strings or nil, while &quot;variant&quot; can also be an<br /> -- array of strings if more than one variant subtag was found.<br /> -- &quot;invalid&quot; is the portion of the tag after the last valid subtag (minus a<br /> -- hyphen).<br /> local segments = mw.text.split(tag, &quot;-&quot;)<br /> local parsed_subtags = parsed_subtags_mt(segments)<br /> <br /> -- Language tags probably only contain ASCII alphabetic and numerical<br /> -- characters and hyphen-minus.<br /> if not tag:find &quot;^[A-Za-z0-9-]+$&quot; then<br /> return parsed_subtags:throw(<br /> &quot;invalid_characters&quot;,<br /> fun.indexOf(<br /> function (tag)<br /> return tag:find &quot;[^A-Za-z0-9-]&quot;<br /> end,<br /> segments))<br /> end<br /> <br /> local subtag_i = 1 -- Index of current item in subtag_info.<br /> local segment_i = 1 -- Index of current segment.<br /> while segments[segment_i] and subtag_info[subtag_i] do<br /> local segment = segments[segment_i]<br /> local subtag_type<br /> while not subtag_type and subtag_info[subtag_i] do<br /> -- Check each pattern for the subtag type at &quot;subtag_i&quot; in &quot;subtag_info&quot;.<br /> local cur_subtag = subtag_info[subtag_i]<br /> for _, pattern in ipairs(cur_subtag) do<br /> if segment:find(&quot;^&quot; .. pattern .. &quot;$&quot;) then<br /> subtag_type = cur_subtag.type<br /> -- There can be multiple &quot;variant&quot; subtags (and &quot;extension&quot;<br /> -- subtags, if those are added).<br /> if not cur_subtag.repeatable then<br /> subtag_i = subtag_i + 1<br /> end<br /> break<br /> end<br /> end<br /> <br /> if not subtag_type then -- No match; try next subtag.<br /> subtag_i = subtag_i + 1<br /> end<br /> end<br /> <br /> -- If language subtag has not been found, or the current segment has not<br /> -- been matched as a subtag, break the loop and check for<br /> -- a private-use subtag.<br /> if segment_i == 1 and subtag_type ~= &quot;language&quot; or not subtag_type then<br /> break<br /> else<br /> if parsed_subtags[subtag_type] then -- Create an array.<br /> if type(parsed_subtags[subtag_type]) == &quot;string&quot; then<br /> parsed_subtags[subtag_type] = { parsed_subtags[subtag_type] }<br /> end -- else table<br /> table.insert(parsed_subtags[subtag_type], segment)<br /> else<br /> parsed_subtags[subtag_type] = segment<br /> end<br /> last_matched_segment_i = segment_i<br /> end<br /> <br /> segment_i = segment_i + 1<br /> end<br /> <br /> if segments[segment_i] then -- More segments to scan?<br /> -- Not all potential subtags were matched. Check for private-use subtags.<br /> -- https://tools.ietf.org/html/bcp47#section-2.2.7<br /> -- Private-use subtags consist of one or more sequences of 1 to 8<br /> -- alphanumeric characters preceded by &quot;x-&quot;.<br /> -- Alphanumericity has already been checked.<br /> <br /> -- A tag must start with either a language subtag or a private-use subtag.<br /> -- If next segment is not &quot;x&quot;, introducing a private-use subtag, there<br /> -- is no private-use subtag.<br /> if segments[segment_i] and segments[segment_i]:lower() ~= &quot;x&quot; then<br /> if not parsed_subtags.language then<br /> return parsed_subtags:throw(&quot;no_language&quot;, 1)<br /> else<br /> return parsed_subtags:throw(&quot;invalid_subtag&quot;,<br /> segment_i)<br /> end<br /> elseif not segments[segment_i + 1] then<br /> return parsed_subtags:throw(&quot;empty_private_use&quot;,<br /> segment_i)<br /> end<br /> <br /> -- Check length of all segments after &quot;x&quot;.<br /> for i = segment_i + 1, #segments do<br /> local length = #segments[i]<br /> <br /> if not (1 &lt;= length and length &lt;= 8) then<br /> return parsed_subtags<br /> :throw(&quot;invalid_private_use&quot;, segment_i)<br /> end<br /> end<br /> <br /> if not segments[last_matched_segment_i + 3] then -- There is only one private-use subtag.<br /> parsed_subtags.private_use = segments[segment_i + 1]<br /> else<br /> parsed_subtags.private_use = {}<br /> for i = segment_i + 1, #segments do<br /> table.insert(parsed_subtags.private_use, segments[i])<br /> end<br /> end<br /> end<br /> <br /> return parsed_subtags:remove_unnecessary_fields()<br /> end<br /> <br /> <br /> local lang_name_table = mw.loadData &quot;Module:Language/name/data&quot;<br /> local synonym_table = mw.loadData &quot;Module:Lang/ISO 639 synonyms&quot;<br /> local lang_data = mw.loadData &quot;Module:Lang/data&quot;<br /> <br /> function p.validate_lang_tag(parsed_subtags)<br /> -- Already checked that the tag starts with a language subtag or a private-use subtag.<br /> -- Script code is initially capitalized, region code is uppercase,<br /> -- everything else is lowercase.<br /> <br /> -- Check existence of language tag.<br /> if parsed_subtags.language and<br /> not (lang_data.override[parsed_subtags.language]<br /> or lang_name_table.lang[parsed_subtags.language]) then<br /> mw.log(&quot;Invalid language code&quot;, parsed_subtags.language, &quot;in&quot;, parsed_subtags:get_tag())<br /> end<br /> <br /> -- Check existence of script tag.<br /> if parsed_subtags.script then<br /> local lower_script = parsed_subtags.script:lower()<br /> if not lang_name_table.script[lower_script] then<br /> mw.log(&quot;Invalid script code&quot;, parsed_subtags.script, &quot;in&quot;, parsed_subtags:get_tag())<br /> end<br /> <br /> -- Check that script tag is not marked as superfluous (because the<br /> -- it is considered the default one for the language).<br /> if lang_name_table.suppressed[lower_script]<br /> and parsed_subtags.language<br /> and m_table.inArray(<br /> lang_name_table.suppressed[lower_script],<br /> parsed_subtags.language:lower()) then<br /> mw.log(parsed_subtags.script, &quot;is suppressed with&quot;,<br /> parsed_subtags.language, &quot;in&quot;, parsed_subtags:get_tag())<br /> end<br /> end<br /> <br /> -- Check existence of region code..<br /> if parsed_subtags.region and not lang_name_table.region[parsed_subtags.region:lower()] then<br /> mw.log(&quot;Invalid region code&quot;, parsed_subtags.region, &quot;in&quot;, parsed_subtags:get_tag())<br /> end<br /> <br /> -- Check that variant code is valid, and that it can validly be used with the<br /> -- given combination of language, script, region, and variant.<br /> -- Check for duplicate variant subtags?<br /> if parsed_subtags.variant then<br /> local lower_tag = parsed_subtags:get_tag():lower()<br /> <br /> for _, variant in ipairs(type(parsed_subtags.variant) == &quot;table&quot;<br /> and parsed_subtags.variant or { parsed_subtags.variant }) do<br /> if not lang_name_table.variant[variant] then<br /> mw.log(&quot;Invalid variant code&quot;, variant, &quot;in&quot;, parsed_subtags:get_tag())<br /> else<br /> local prefix = parsed_subtags:get_tag():lower():match(&quot;^(.-)%-&quot; .. variant)<br /> <br /> -- Check that at least one of the prefixes is found at the<br /> -- beginning of lower_tag.<br /> if not fun.some(function (prefix)<br /> return lower_tag:find(prefix, 1, true) == 1<br /> end,<br /> lang_name_table.variant[variant].prefixes) then<br /> mw.log(&quot;Variant tag&quot;, variant, &quot;does not belong with prefix&quot;,<br /> prefix, &quot;in&quot;, parsed_subtags:get_tag())<br /> end<br /> end<br /> end<br /> end<br /> <br /> -- Check that the private-use subtag is actually used by Wikipedia.<br /> if parsed_subtags.private_use and not lang_data.override[parsed_subtags.tag] then<br /> mw.log(&quot;Invalid private-use subtag in&quot;, parsed_subtags:get_tag())<br /> end<br /> end<br /> <br /> return p</div> Erutuon