gssanova {gss} | R Documentation |
Fit smoothing spline ANOVA models in non-Gaussian regression. The
symbolic model specification via formula
follows the same
rules as in lm
and glm
.
gssanova(formula, family, type=NULL, data=list(), weights, subset, offset, na.action=na.omit, partial=NULL, alpha=NULL, nu=NULL, id.basis=NULL, nbasis=NULL, seed=NULL, random=NULL, skip.iter=FALSE)
formula |
Symbolic description of the model to be fit. |
family |
Description of the error distribution. Supported
are exponential families |
type |
List specifying the type of spline for each variable.
See |
data |
Optional data frame containing the variables in the model. |
weights |
Optional vector of weights to be used in the fitting process. |
subset |
Optional vector specifying a subset of observations to be used in the fitting process. |
offset |
Optional offset term with known parameter 1. |
na.action |
Function which indicates what should happen when the data contain NAs. |
partial |
Optional symbolic description of parametric terms in partial spline models. |
alpha |
Tuning parameter defining cross-validation; larger
values yield smoother fits. Defaults are |
nu |
Inverse scale parameter in accelerated life model families. Ignored for exponential families. |
id.basis |
Index designating selected "knots". |
nbasis |
Number of "knots" to be selected. Ignored when
|
seed |
Seed for reproducible random selection of "knots".
Ignored when |
random |
Input for parametric random effects in nonparametric
mixed-effect models. See |
skip.iter |
Flag indicating whether to use initial values of
theta and skip theta iteration. See |
The model specification via formula
is intuitive. For
example, y~x1*x2
yields a model of the form
y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e
with the terms denoted by "1"
, "x1"
, "x2"
, and
"x1:x2"
.
The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.
Only one link is implemented for each family
. It is the
logit link for "binomial"
, and the log link for
"poisson"
, and "Gamma"
. For "nbinomial"
, the
working parameter is the logit of the probability p; see
NegBinomial
. For "weibull"
, "lognorm"
,
and "loglogis"
, it is the location parameter for the log
lifetime.
The selection of smoothing parameters is through direct
cross-validation. The cross-validation score used for
family="poisson"
is taken from density estimation as in Gu
and Wang (2003), and those used for other families are derived
following the lines of Gu and Xiang (2001).
A subset of the observations are selected as "knots." Unless
specified via id.basis
or nbasis
, the number of
"knots" q is determined by max(30,10n^{2/9}), which is
appropriate for the default cubic splines for numerical vectors.
gssanova
returns a list object of class
c("gssanova","ssanova")
.
The method summary.gssanova
can be used to obtain
summaries of the fits. The method predict.ssanova
can
be used to evaluate the fits at arbitrary points along with standard
errors, on the link scale. The method
project.gssanova
can be used to calculate the
Kullback-Leibler projection for model selection. The methods
residuals.gssanova
and fitted.gssanova
extract the respective traits from the fits.
For family="binomial"
, the response can be specified either
as two columns of counts or as a column of sample proportions plus a
column of total counts entered through the argument weights
,
as in glm
.
For family="nbinomial"
, the response may be specified as two
columns with the second being the known sizes, or simply as a single
column with the common unknown size to be estimated through the
maximum likelihood.
For family="weibull"
, "lognorm"
, or "loglogis"
,
the response consists of three columns, with the first giving the
follow-up time, the second the censoring status, and the third the
left-truncation time. For data with no truncation, the third column
can be omitted.
For simpler models and moderate sample sizes, the exact solution of
gssanova0
can be faster.
The results may vary from run to run. For consistency, specify
id.basis
or set seed
.
In gss versions earlier than 1.0, gssanova
was under
the name gssanova1
.
Chong Gu, chong@stat.purdue.edu
Gu, C. and Xiang, D. (2001), Cross validating non Gaussian data: generalized approximate cross validation revisited. Journal of Computational and Graphical Statistics, 10, 581–591.
Gu, C. and Wang, J. (2003), Penalized likelihood density estimation: Direct cross-validation and scalable approximation. Statistica Sinica, 13, 811–826.
## Fit a cubic smoothing spline logistic regression model test <- function(x) {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2} x <- (0:100)/100 p <- 1-1/(1+exp(test(x))) y <- rbinom(x,3,p) logit.fit <- gssanova(cbind(y,3-y)~x,family="binomial") ## The same fit logit.fit1 <- gssanova(y/3~x,"binomial",weights=rep(3,101), id.basis=logit.fit$id.basis) ## Obtain estimates and standard errors on a grid est <- predict(logit.fit,data.frame(x=x),se=TRUE) ## Plot the fit and the Bayesian confidence intervals plot(x,y/3,ylab="p") lines(x,p,col=1) lines(x,1-1/(1+exp(est$fit)),col=2) lines(x,1-1/(1+exp(est$fit+1.96*est$se)),col=3) lines(x,1-1/(1+exp(est$fit-1.96*est$se)),col=3) ## Fit a mixed-effect logistic model data(bacteriuria) bact.fit <- gssanova(infect~trt+time,family="binomial",data=bacteriuria, id.basis=(1:820)[bacteriuria$id%in%c(3,38)],random=~1|id) ## Predict fixed effects predict(bact.fit,data.frame(time=2:16,trt=as.factor(rep(1,15))),se=TRUE) ## Estimated random effects bact.fit$b ## Clean up ## Not run: rm(test,x,p,y,logit.fit,logit.fit1,est,bacteriuria,bact.fit) dev.off() ## End(Not run)