survexp {survival} | R Documentation |
Returns either the expected survival of a cohort of subjects, or the individual expected survival for each subject.
survexp(formula, data, weights, subset, na.action, rmap, times, cohort=TRUE, conditional=FALSE, ratetable=survexp.us, scale=1, npoints, se.fit, model=FALSE, x=FALSE, y=FALSE)
formula |
formula object. The response variable is a vector of follow-up times
and is optional. The predictors consist of optional grouping variables
separated by the |
data |
data frame in which to interpret the variables named in
the |
weights |
case weights. |
subset |
expression indicating a subset of the rows of |
na.action |
function to filter missing data. This is applied to the model frame after
|
rmap |
an optional list that maps data set names to the ratetable names. See the details section below. |
times |
vector of follow-up times at which the resulting survival curve is
evaluated. If absent, the result will be reported for each unique
value of the vector of follow-up times supplied in |
cohort |
logical value: if |
conditional |
logical value: if |
ratetable |
a table of event rates, such as |
scale |
numeric value to scale the results. If |
npoints |
number of points at which to calculate intermediate results, evenly spaced
over the range of the follow-up times. The usual (exact) calculation is done
at each unique follow-up time. For very large data sets specifying |
se.fit |
compute the standard error of the predicted survival. The default is to compute standard errors whenever possible, which at this time is only for the Ederer method and a Cox model as the rate table. |
model,x,y |
flags to control what is returned. If any of these is true, then the model frame, the model matrix, and/or the vector of response times will be returned as components of the final result, with the same names as the flag arguments. |
Individual expected survival is usually used in models or testing, to
‘correct’ for the age and sex composition of a group of subjects.
For instance, assume that birth date, entry date into the study,
sex and actual survival time are all known for a group of subjects.
The survexp.us
population tables contain expected death rates
based on calendar year, sex and age.
Then
haz <- -log(survexp(fu.time ~ 1, data=mydata, rmap = list(year=entry.dt, age=(birth.dt-entry.dt)), cohort=FALSE))gives for each subject the total hazard experienced up to their observed death time or last follow-up time (variable fu.time) This probability can be used as a rescaled time value in models:
glm(status ~ 1 + offset(log(haz)), family=poisson) glm(status ~ x + offset(log(haz)), family=poisson)In the first model, a test for intercept=0 is the one sample log-rank test of whether the observed group of subjects has equivalent survival to the baseline population. The second model tests for an effect of variable
x
after adjustment for age and sex.
The ratetable being used may have different variable names than the user's
data set, this is dealt with by the rmap
argument.
The rate table for the above calculation was survexp.us
, a call to
summary{survexp.us}
reveals that it expects to have variables
age
= age in days, sex
, and year
= the date of study
entry, we create them in the rmap
line. The sex variable is not
mapped, therefore the code assumes that it exists in mydata
in the
correct format. (Note: for factors such as sex, the program will match on
any unique abbreviation, ignoring case.)
Cohort survival is used to produce an overall survival curve. This is then
added to the Kaplan-Meier plot of the study group for visual comparison
between these subjects and the population at large. There are three common
methods of computing cohort survival.
In the "exact method" of Ederer the cohort is not censored; this corresponds
to having no response variable in the formula. Hakulinen recommends censoring
the cohort at the anticipated censoring time of each patient, and Verheul
recommends censoring the cohort at the actual observation time of each
patient.
The last of these is the conditional method.
These are obtained by using the respective time values as the
follow-up time or response in the formula.
if cohort=TRUE
an object of class survexp
,
otherwise a vector of per-subject expected survival values.
The former contains the number of subjects at risk
and the expected survival for the cohort at each requested time.
Berry, G. (1983). The analysis of mortality by the subject-years method. Biometrics, 39:173-84.
Ederer, F., Axtell, L. and Cutler, S. (1961). The relative survival rate: a statistical methodology. Natl Cancer Inst Monogr, 6:101-21.
Hakulinen, T. (1982). Cancer survival corrected for heterogeneity in patient withdrawal. Biometrics, 38:933-942.
Verheul, H., Dekker, E., Bossuyt, P., Moulijn, A. and Dunning, A. (1993). Background mortality in clinical survival studies. Lancet, 341: 872-875.
survfit
, pyears
, survexp.us
, survexp.fit
.
# # Stanford heart transplant data # We don't have sex in the data set, but know it to be nearly all males. # Estimate of conditional survival survexp(futime ~ 1, rmap=list(sex="male", year=accept.dt, age=(accept.dt-birth.dt)), conditional=TRUE, data=jasa) # Estimate of expected survival stratified by prior surgery survexp(futime ~ surgery, rmap= list(sex="male", year=accept.dt, age=(accept.dt-birth.dt)), conditional=TRUE, data=jasa) ## Compare the survival curves for the Mayo PBC data to Cox model fit ## pfit <-coxph(Surv(time,status>0) ~ trt + log(bili) + log(protime) + age + platelet, data=pbc) plot(survfit(Surv(time, status>0) ~ trt, data=pbc)) lines(survexp( ~ trt, ratetable=pfit, data=pbc), col='purple')