Jump to content

Unicode collation algorithm: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Func86 (talk | contribs)
m Tools: the link redirected and work again
mNo edit summary
Tags: Mobile edit Mobile app edit iOS app edit
Line 1: Line 1:
{{Short description|Algorithm in Unicode}}
{{no footnotes|date=September 2016}}
{{no footnotes|date=September 2016}}
The '''Unicode collation algorithm''' ('''UCA''') is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from [[String (computer science)|strings]] representing text in any [[writing system]] and [[language]] that can be represented with [[Unicode]]. These keys can then be efficiently byte-by-byte compared in order to [[collate]] or sort them according to the rules of the language, with options for ignoring case, accents, etc.
The '''Unicode collation algorithm''' ('''UCA''') is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from [[String (computer science)|strings]] representing text in any [[writing system]] and [[language]] that can be represented with [[Unicode]]. These keys can then be efficiently byte-by-byte compared in order to [[collate]] or sort them according to the rules of the language, with options for ignoring case, accents, etc.

Revision as of 18:55, 12 April 2023

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently byte-by-byte compared in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.

Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages. Some such customisations can be found in the Unicode Common Locale Data Repository (CLDR).

An open source implementation of UCA is included with the International Components for Unicode, ICU. ICU supports tailoring, and the collation tailorings from CLDR are included in ICU. The effects of tailoring and many language-specific tailorings are displayed in the on-line ICU Locale Explorer.

See also

Tools