icuSetCollate {base} | R Documentation |
Controls the way collation is done by ICU (an optional part of the R build).
icuSetCollate(...)
... |
Named arguments, see ‘Details’. |
Optionally, R can be built to collate character strings by ICU
(http://site.icu-project.org). For such systems,
icuSetCollate
can be used to tune the way collation is done.
On other builds calling this function does nothing, with a warning.
Possible arguments are
locale
:A character string such as "da_DK"
giving the
country whose collation rules are to be used. If present, this
should be the first argument.
case_first
:"upper"
, "lower"
or
"default"
, asking for upper- or lower-case characters to be
sorted first. The default is usually lower-case first, but not in
all languages (see the Danish example).
alternate_handling
:Controls the handling of
‘variable’ characters (mainly punctuation and symbols).
Possible values are "non_ignorable"
(primary strength) and
"shifted"
(quaternary strength).
strength
:Which components should be used? Possible
values "primary"
, "secondary"
, "tertiary"
(default), "quaternary"
and "identical"
.
french_collation
:In a French locale the way accents
affect collation is from right to left, whereas in most other locales
it is from left to right. Possible values "on"
, "off"
and "default"
.
normalization
:Should strings be normalized? Possible values
are "on"
and "off"
(default). This affects the
collation of composite characters.
case_level
:An additional level between secondary and
tertiary, used to distinguish large and small Japanese Kana
characters. Possible values "on"
and "off"
(default).
hiragana_quaternary
:Possible values "on"
(sort
Hiragana first at quaternary level) and "off"
.
Only the first three are likely to be of interest except to those with a detailed understanding of collation and specialized requirements.
Some examples are case_level="on", strength="primary"
to ignore
accent differences and alternate_handling="shifted"
to ignore
space and punctuation characters.
Note that these settings have no effect if collation is set to the
C
locale, unless locale
is specified.
As from R 2.9.0, ICU is used by default wherever it is available: this include Mac OS >= 10.4 and many Linux installations.
The ICU user guide chapter on collation (http://userguide.icu-project.org/collation).
## these examples depend on having ICU available, and on the locale x <- c("Aarhus", "aarhus", "safe", "test", "Zoo") sort(x) icuSetCollate(case_first="upper"); sort(x) icuSetCollate(case_first="lower"); sort(x) icuSetCollate(locale="da_DK", case_first="default"); sort(x) icuSetCollate(locale="et_EE"); sort(x)