icuSetCollate {base}R Documentation

Setup Collation by ICU

Description

Controls the way collation is done by ICU (an optional part of the R build).

Usage

icuSetCollate(...)

Arguments

...

Named arguments, see ‘Details’.

Details

Optionally, R can be built to collate character strings by ICU (http://site.icu-project.org). For such systems, icuSetCollate can be used to tune the way collation is done. On other builds calling this function does nothing, with a warning.

Possible arguments are

locale:

A character string such as "da_DK" giving the country whose collation rules are to be used. If present, this should be the first argument.

case_first:

"upper", "lower" or "default", asking for upper- or lower-case characters to be sorted first. The default is usually lower-case first, but not in all languages (see the Danish example).

alternate_handling:

Controls the handling of ‘variable’ characters (mainly punctuation and symbols). Possible values are "non_ignorable" (primary strength) and "shifted" (quaternary strength).

strength:

Which components should be used? Possible values "primary", "secondary", "tertiary" (default), "quaternary" and "identical".

french_collation:

In a French locale the way accents affect collation is from right to left, whereas in most other locales it is from left to right. Possible values "on", "off" and "default".

normalization:

Should strings be normalized? Possible values are "on" and "off" (default). This affects the collation of composite characters.

case_level:

An additional level between secondary and tertiary, used to distinguish large and small Japanese Kana characters. Possible values "on" and "off" (default).

hiragana_quaternary:

Possible values "on" (sort Hiragana first at quaternary level) and "off".

Only the first three are likely to be of interest except to those with a detailed understanding of collation and specialized requirements.

Some examples are case_level="on", strength="primary" to ignore accent differences and alternate_handling="shifted" to ignore space and punctuation characters.

Note that these settings have no effect if collation is set to the C locale, unless locale is specified.

Note

As from R 2.9.0, ICU is used by default wherever it is available: this include Mac OS >= 10.4 and many Linux installations.

See Also

Comparison, sort

The ICU user guide chapter on collation (http://userguide.icu-project.org/collation).

Examples

## these examples depend on having ICU available, and on the locale
x <- c("Aarhus", "aarhus", "safe", "test", "Zoo")
sort(x)
icuSetCollate(case_first="upper"); sort(x)
icuSetCollate(case_first="lower"); sort(x)

icuSetCollate(locale="da_DK", case_first="default"); sort(x)
icuSetCollate(locale="et_EE"); sort(x)

[Package base version 2.15.1 Index]