character encoding - String ordering in Lua -
i'm reading programming in lua, 1st edition (yup, know it's bit outdated), , in section 3.2 (about relational operators), author says:
for instance, european latin-1 locale, have "acai" < "açaí" < "acorde".
i don't it. me, it's ok have "acai" < "açaí"
, why "açaí" < "acorde"
?
afaik (and wikipedia seems confirm), "c" < "ç"
, or wrong?
you reference code page, maps codepoints characters. codepoints, being finite set of non-negative integers, well-ordered, distinct entities. however, not characters about.
characters have collation order, partial ordering: characters can "equal" not same. collation user-valued concept varies locale (and on time).
strings more complicated because character sets (e.g. unicode) can have combining characters. allows "character" represented single character or base character followed combining characters. example, "ä" vs "a¨". since represent same conceptual character should considered more equal "ä" vs "a".
in spanish, "ch", "rr" , "ll" used letters in alphabet , words ordered accordingly; now, not "ñ" still is.
similarly, in past not uncommon english-speakers sort surnames beginning "mc" , "mac" after others beginning "m".
software libraries have deal such things because that's users want. thankfully, of older conventions have fallen use.
so, locale have collation rules result in "acai" < "açaí" < "acorde" if "c" has same sort order "ç" "i" comes before "í". case seems strange though possibility in general requires our code allow it.
Comments
Post a Comment