Ordinal versus culture comparison
There are two basic algorithms for string comparison: ordinal and culture-sensitive.
Ordinal comparisons interpret characters simply as numbers (according to their
numeric Unicode value); culture-sensitive comparisons interpret characters with
reference to a particular alphabet. There are two special cultures: the “current cul-
ture,” which is based on settings picked up from the computer’s control panel, and
the “invariant culture,” which is the same on every computer (and closely maps
American culture).
For equality comparison, both ordinal and culture-specific algorithms are useful.
For ordering, however, culture-specific comparison is nearly always preferable: to
order strings alphabetically, you need an alphabet. Ordinal relies on the numeric
Unicode point values, which happen to put English characters in alphabetical
order—but even then not exactly as you might expect. For example, assuming case-
sensitivity, consider the strings “Atom”, “atom”, and “Zamia”. The invariant culture
puts them in the following order:
"Atom", "atom", "Zamia"
Ordinal arranges them instead as follows:
"Atom", "Zamia", "atom"
This is because the invariant culture encapsulates an alphabet, which considers up-
percase characters adjacent to their lowercase counterparts (AaBbCcDd…). The or-
dinal algorithm, however, puts all the uppercase characters first, and then all
lowercase characters (A..Z, a..z). This is essentially a throwback to the ASCII char-
acter set invented in the 1960s.
No comments:
Post a Comment