Jump to content

Unicode collation algorithm: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
BattyBot (talk | contribs)
Moved See also above References per MOS:ORDER and other General fixes
m Tools: clean up spacing around commas and other punctuation fixes, replaced: , → ,
Line 19: Line 19:


===Tools===
===Tools===
* [https://icu4c-demos.unicode.org/icu-bin/locexp?_=en_US&x=col ICU Locale Explorer] An online demonstration of the Unicode Collation Algorithm using [[International Components for Unicode]] , as of 2023-08-16 it's not working.
* [https://icu4c-demos.unicode.org/icu-bin/locexp?_=en_US&x=col ICU Locale Explorer] An online demonstration of the Unicode Collation Algorithm using [[International Components for Unicode]], as of 2023-08-16 it's not working.
*[https://icu4c-demos.unicode.org/icu-bin/collation.html An ICU collation demo], as of 2023-08-16 it's not working.
*[https://icu4c-demos.unicode.org/icu-bin/collation.html An ICU collation demo], as of 2023-08-16 it's not working.
* [http://billposer.org/Software/msort.html msort] A sort program that provides an unusual level of flexibility in defining collations and extracting keys.
* [http://billposer.org/Software/msort.html msort] A sort program that provides an unusual level of flexibility in defining collations and extracting keys.

Revision as of 00:01, 4 March 2024

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently byte-by-byte compared in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.[1]

Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET), this data file specifies a default collation ordering, the DUCET is customizable for different languages.[1][2] Some such customizations can be found in the Unicode Common Locale Data Repository (CLDR).[3]

An open source implementation of UCA is included with the International Components for Unicode, ICU.[4][5] ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.[6][2]

See also

References

  1. ^ a b Whistler, Ken; Scherer, Markus; Davis, Mark (2022-08-26). "UTS #10: Unicode Collation Algorithm". Unicode. Retrieved 2023-08-16.
  2. ^ a b Hosken, Martin (2021-09-23). Unicode Sort Tailoring: Tutorial (PDF) (1.3 ed.). SIL Writing Systems Technology. pp. 2–3. Retrieved 2023-08-16.
  3. ^ "CLDR Releases/Downloads". Unicode CLDR. Retrieved 2023-08-16.
  4. ^ "ICU - International Components for Unicode". Unicode. Retrieved 2023-08-16.
  5. ^ "Collations". SyBooks Online. Retrieved 2023-08-16.
  6. ^ "Customization". ICU Documentation. Retrieved 2023-08-16.

Tools