Talk:Numeric character reference
![]() | This article is rated Start-class on Wikipedia's content assessment scale. |
Numeric character conversor
[edit]Perl
[edit]For usual needs, there are a "1 line code" conversor for Perl:
while (<STDIN>) { s/(.)/(ord($1)>127)? ('&#'.ord($1).';'): $1/ge; print $_; }
(use %perl code.pl < fileIn.txt > fileOut.txt
)
It converts unicode or ISO Latim to XML-compatible ASCII.
JavaScript
[edit]function unicode_to_ncr(text){ var ncr_text = "" var text_length = text.length for(var index = 0; index < text_length; index++) { var character = text.charAt(index) var ncr_character = character.charCodeAt(0) if(ncr_character < 128) { ncr_text += character } else { ncr_text += "&#"+ncr_character+";" } } return ncr_text }
It, also, converts unicode or ISO Latin to XML-compatible ASCII.
Terminology?
[edit]The nomenclature used in this article is not the same as the basic SGML one. SGML has two proper names, "character reference", which is the numeric character reference described here, and "entity reference", which is a macro resolving to any sequence of characters.
The list of entity references used in HTML all resolve to exactly one character. But that doesn't make them special cases, as the phrase character entity reference implies; they just all happen to be one-character strings. Pim 2 (talk) 11:26, 11 December 2011 (UTC)
HTML 5
[edit]HTML 5 has a concrete list of numeric character references and what character they refer to, https://html.spec.whatwg.org/multipage/parsing.html#decimal-character-reference-start-state . I believe it's based on Windows-1252 or at least that encoding's effect on historical pages, and should eliminate the ambiguity of it being unclear what encoding to use SgeoTC 07:38, 8 May 2025 (UTC)