nchar {base} | R Documentation |
nchar
takes a character vector as an argument and
returns a vector whose elements contain the sizes of
the corresponding elements of x
.
nzchar
is a fast way to find out if elements of a character
vector are non-empty strings.
nchar(x, type = "chars", allowNA = FALSE) nzchar(x)
x |
character vector, or a vector to be coerced to a character vector. Giving a factor is an error. |
type |
character string: partial matching to one of
|
allowNA |
logical: should |
The ‘size’ of a character string can be measured in one of three ways
bytes
The number of bytes needed to store the string (plus in C a final terminator which is not counted).
chars
The number of human-readable characters.
width
The number of columns cat
will use to
print the string in a monospaced font. The same as chars
if this cannot be calculated.
These will often be the same, and almost always will be in single-byte locales. There will be differences between the first two with multibyte character sequences, e.g. in UTF-8 locales.
The internal equivalent of the default method of
as.character
is performed on x
(so there is no
method dispatch). If you want to operate on non-vector objects
passing them through deparse
first will be required.
For nchar
, an integer vector giving the sizes of each element,
currently always 2
for missing values (for NA
).
If allowNA = TRUE
and an element is invalid in a multi-byte
character set such as UTF-8, its number of characters and the width
will be NA
. Otherwise the number of characters will be
non-negative, so !is.na(nchar(x, "chars", TRUE))
is a test of
validity.
A character string marked with "bytes"
encoding has a number of
bytes, but neither a known number of characters nor a width, so the
latter two types are NA
if allowNA = TRUE
, otherwise an
error.
Names, dims and dimnames are copied from the input.
For nzchar
, a logical vector of the same length as x
,
true if and only if the element has non-zero length.
This does not by default give the number of characters that
will be used to print()
the string. Use
encodeString
to find the characters used to print the
string.
This is particularly important on Windows when \uxxxx
sequences have been used to enter Unicode characters not representable
in the current encoding. Thus nchar("\u2642")
is 1
,
and it is printed in Rgui
as one character, but it will be
printed in Rterm
as <U+2642>
, which is what
encodeString
gives.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
strwidth
giving width of strings for plotting;
paste
, substr
, strsplit
x <- c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech") nchar(x) # 5 6 6 1 15 nchar(deparse(mean)) # 18 17