read.dta {foreign} | R Documentation |
Reads a file in Stata version 5–11 binary format into a data frame.
read.dta(file, convert.dates = TRUE, convert.factors = TRUE, missing.type = FALSE, convert.underscore = FALSE, warn.missing.labels = TRUE)
file |
a filename or URL as a character string. |
convert.dates |
Convert Stata dates to |
convert.factors |
Use Stata value labels to create factors? (version 6.0 or later). |
missing.type |
For version 8 or later, store information about different types of missing data? |
convert.underscore |
Convert |
warn.missing.labels |
Warn if a variable is specified with value labels and those value labels are not present in the file. |
If the filename appears to be a URL (of schemes http:, ftp: or https:) the URL is first downloaded to a temporary file and then read. (https: is only supported on some platforms.)
The variables in the Stata data set become the columns of the data frame. Missing values are correctly handled. The data label, variable labels, and timestamp are stored as attributes of the data frame. Nothing is done with variable characteristics.
By default Stata dates (%d and %td formats) are converted to R's
Date
class and variables with Stata value labels are
converted to factors. Ordinarily, read.dta
will not convert
a variable to a factor unless a label is present for every level. Use
convert.factors = NA
to override this. In any case the value
label and format information is stored as attributes on the returned
data frame.
Stata 8.0 introduced a system of 27 different missing data values. If
missing.type
is TRUE
a separate list is created with the
same variable names as the loaded data. For string variables the list
value is NULL
. For other variables the value is NA
where the observation is not missing and 0–26 when the observation is
missing. This is attached as the "missing"
attribute of the
returned value.
A data frame with attributes. These will include "datalabel"
,
"time.stamp"
, "formats"
, "types"
,
"val.labels"
, "var.labels"
and "version"
and may
include "label.table"
. Possible versions are 5, 6, 7
,
-7
(Stata 7SE, ‘format-111’), 8
(Stata 8 and 9,
‘format-113’) and 10
(Stata 10 and 11, ‘format-114’).
Stata 12 by default uses ‘format-115’, which is read as
‘format-114’ (the Stata documentation says its structure is identical).
The value labels in attribute "val.labels"
name a table for
each variable, or are an empty string. The tables are elements of the
named list attribute "label.table"
: each is an integer vector with
names.
Thomas Lumley and R-core members
Stata Users Manual (versions 5 & 6), Programming manual (version 7), or online help (version 8 and later) describe the format of the files. Or at http://www.stata.com/help.cgi?dta and http://www.stata.com/help.cgi?dta_113.
A different approach is available in package memisc: see its
help for Stata.file
.
write.dta
,
attributes
,
Date
,
factor
data(swiss) write.dta(swiss,swissfile <- tempfile()) read.dta(swissfile)