R: Estimating Hazard Function Using Smoothing Splines

sshzd {gss}

R Documentation

Estimating Hazard Function Using Smoothing Splines

Description

Estimate hazard function using smoothing spline ANOVA models. The symbolic model specification via formula follows the same rules as in lm, but with the response of a special form.

Usage

sshzd(formula, type=NULL, data=list(), alpha=1.4, weights=NULL,
      subset, offset, na.action=na.omit, partial=NULL, id.basis=NULL,
      nbasis=NULL, seed=NULL, random=NULL, prec=1e-7, maxiter=30,
      skip.iter=FALSE)
sshzd1(formula, type=NULL, data=list(), alpha=1.4, weights=NULL,
       subset, na.action=na.omit, rho=list("marginal"), partial=NULL,
       id.basis=NULL, nbasis=NULL, seed=NULL, random=NULL, prec=1e-7,
       maxiter=30, skip.iter=FALSE)

Arguments

`formula`	Symbolic description of the model to be fit, where the response is of the form `Surv(futime,status,start=0)`.
`type`	List specifying the type of spline for each variable. See `mkterm` for details.
`data`	Optional data frame containing the variables in the model.
`alpha`	Parameter defining cross-validation score for smoothing parameter selection.
`weights`	Optional vector of counts for duplicated data.
`subset`	Optional vector specifying a subset of observations to be used in the fitting process.
`offset`	Optional offset term with known parameter 1.
`na.action`	Function which indicates what should happen when the data contain NAs.
`partial`	Optional symbolic description of parametric terms in partial spline models.
`id.basis`	Index of observations to be used as "knots."
`nbasis`	Number of "knots" to be used. Ignored when `id.basis` is specified.
`seed`	Seed to be used for the random generation of "knots." Ignored when `id.basis` is specified.
`random`	Input for parametric random effects (frailty) in nonparametric mixed-effect models. See `mkran` for details.
`prec`	Precision requirement for internal iterations.
`maxiter`	Maximum number of iterations allowed for internal iterations.
`skip.iter`	Flag indicating whether to use initial values of theta and skip theta iteration. See `ssanova` for notes on skipping theta iteration.
`rho`	rho function needed for sshzd1.

Details

The model specification via formula is for the log hazard. For example, Suve(t,d)~t*u prescribes a model of the form

log f(t,u) = C + g_{t}(t) + g_{u}(u) + g_{t,u}(t,u)

with the terms denoted by "1", "t", "u", and "t:u". Replacing t*u by t+u in the formula, one gets a proportional hazard model with g_{t,u}=0.

sshzd takes standard right-censored lifetime data, with possible left-truncation and covariates; in Surv(futime,status,start=0)~..., futime is the follow-up time, status is the censoring indicator, and start is the optional left-truncation time. The main effect of futime must appear in the model terms specified via ....

Parallel to those in a ssanova object, the model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.

The selection of smoothing parameters is through a cross-validation mechanism described in Gu (2002, Sec. 7.2), with a parameter alpha; alpha=1 is "unbiased" for the minimization of Kullback-Leibler loss but may yield severe undersmoothing, whereas larger alpha yields smoother estimates.

A subset of the observations are selected as "knots." Unless specified via id.basis or nbasis, the number of "knots" q is determined by max(30,10n^{2/9}), which is appropriate for the default cubic splines for numerical vectors.

Value

sshzd returns a list object of class "sshzd". sshzd1 returns a list object of class c("sshzd1","sshzd").

hzdrate.sshzd can be used to evaluate the estimated hazard function. hzdcurve.sshzd can be used to evaluate hazard curves with fixed covariates. survexp.sshzd can be used to calculated estimated expected survival.

The method project.sshzd can be used to calculate the Kullback-Leibler projection of "sshzd" objects for model selection; project.sshzd1 can be used to calculate the square error projection of "sshzd1" objects.

Note

The function Surv(futime,status,start=0) is defined and parsed inside sshzd, not quite the same as the one in the survival package.

Integration on the time axis is done by the 200-point Gauss-Legendre formula on c(min(start),max(futime)), returned from gauss.quad.

sshzd1 can be up to 50 times faster than sshzd, at the cost of performance degradation.

The results may vary from run to run. For consistency, specify id.basis or set seed.

Author(s)

Chong Gu, chong@stat.purdue.edu

References

Gu, C. (2002), Smoothing Spline ANOVA Models. New York: Springer-Verlag.

Du, P. and Gu, C. (2006), Penalized likelihood hazard estimation: efficient approximation and Bayesian confidence intervals. Statistics and Probability Letters, 76, 244–254.

Du, P. and Gu, C. (2009), Penalized Pseudo-Likelihood Hazard Estimation: A Fast Alternative to Penalized Likelihood. Journal of Statistical Planning and Inference, 139, 891–899.

Du, P. and Ma, S. (2010), Frailty Model with Spline Estimated Nonparametric Hazard Function, Statistica Sinica, 20, 561–580.

Examples

## Model with interaction
data(gastric)
gastric.fit <- sshzd(Surv(futime,status)~futime*trt,data=gastric)
## exp(-Lambda(600)), exp(-(Lambda(1200)-Lambda(600))), and exp(-Lambda(1200))
survexp.sshzd(gastric.fit,c(600,1200,1200),data.frame(trt=as.factor(1)),c(0,600,0))
## Clean up
## Not run: rm(gastric,gastric.fit)
dev.off()
## End(Not run)

## THE FOLLOWING EXAMPLE IS TIME-CONSUMING
## Proportional hazard model
## Not run: 
data(stan)
stan.fit <- sshzd(Surv(futime,status)~futime+age,data=stan)
## Evaluate fitted hazard
hzdrate.sshzd(stan.fit,data.frame(futime=c(10,20),age=c(20,30)))
## Plot lambda(t,age=20)
tt <- seq(0,60,leng=101)
hh <- hzdcurve.sshzd(stan.fit,tt,data.frame(age=20))
plot(tt,hh,type="l")
## Clean up
rm(stan,stan.fit,tt,hh)
dev.off()

## End(Not run)

[Package gss version 2.0-10 Index]