ssllrm {gss} | R Documentation |
Fit smoothing spline log-linear regression models. The symbolic
model specification via formula
follows the same rules as in
lm
.
ssllrm(formula, response, type=NULL, data=list(), weights, subset, na.action=na.omit, alpha=1, id.basis=NULL, nbasis=NULL, seed=NULL, random=NULL, prec=1e-7, maxiter=30, skip.iter=FALSE)
formula |
Symbolic description of the model to be fit. |
response |
Formula listing response variables. |
type |
List specifying the type of spline for each variable.
See |
data |
Optional data frame containing the variables in the model. |
weights |
Optional vector of weights to be used in the fitting process. |
subset |
Optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
Function which indicates what should happen when the data contain NAs. |
alpha |
Parameter modifying GCV or Mallows' CL; larger absolute
values yield smoother fits; negative value invokes a stable and
more accurate GCV/CL evaluation algorithm but may take two to
five times as long. Ignored when |
id.basis |
Index designating selected "knots". |
nbasis |
Number of "knots" to be selected. Ignored when
|
seed |
Seed to be used for the random generation of "knots".
Ignored when |
random |
Input for parametric random effects in nonparametric
mixed-effect models. See |
prec |
Precision requirement for internal iterations. |
maxiter |
Maximum number of iterations allowed for internal iterations. |
skip.iter |
Flag indicating whether to use initial values of
theta and skip theta iteration. See |
The model is specified via formula
and response
, where
response
lists the response variables. For example,
ssllrm(~y1*y2*x,~y1+y2)
prescribe a model of the form
log f(y1,y2|x) = g_{1}(y1) + g_{2}(y2) + g_{12}(y1,y2) + g_{x1}(x,y1) + g_{x2}(x,y2) + g_{x12}(x,y1,y2) + C(x)
with the terms denoted by "y1"
, "y2"
, "y1:y2"
,
"y1:x"
, "y2:x"
, and "y1:y2:x"
; the term(s) not
involving response(s) are removed and the constant C(x)
is
determined by the fact that a conditional density integrates (adds)
to one on the y
axis.
The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.
A subset of the observations are selected as "knots." Unless
specified via id.basis
or nbasis
, the number of
"knots" q is determined by max(30,10n^{2/9}), which is
appropriate for the default cubic splines for numerical vectors.
ssllrm
returns a list object of class "ssllrm"
.
The method predict.ssllrm
can be used to evaluate
f(y|x)
at arbitrary x, or contrasts of log{f(y|x)}
such as the odds ratio along with standard errors. The method
project.ssllrm
can be used to calculate the
Kullback-Leibler projection for model selection.
The responses, or y-variables, must be factors, and there must be at
least one numerical x's. For response
, there is no difference
between ~y1+y2
and ~y1*y2
.
The results may vary from run to run. For consistency, specify
id.basis
or set seed
.
Chong Gu, chong@stat.purdue.edu
Gu, C. and Ma, P. (2009), Log-linear nonparametric regression models. Manuscript.
## Simulate data test <- function(x) {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2} x <- (0:100)/100 p <- 1-1/(1+exp(test(x))) y <- rbinom(x,3,p) y1 <- as.ordered(y) y2 <- as.factor(rbinom(x,1,p)) ## Fit model fit <- ssllrm(~y1*y2*x,~y1+y2) ## Evaluate f(y|x) est <- predict(fit,data.frame(x=x), data.frame(y1=as.factor(0:3),y2=as.factor(rep(0,4)))) ## f(y|x) at all y values (fit$qd.pt) est <- predict(fit,data.frame(x=x)) ## Evaluate contrast of log f(y|x) est <- predict(fit,data.frame(x=x),odds=c(-1,.5,.5,0), data.frame(y1=as.factor(0:3),y2=as.factor(rep(0,4))),se=TRUE) ## Odds ratio log{f(0,0|x)/f(3,0|x)} est <- predict(fit,data.frame(x=x),odds=c(1,-1), data.frame(y1=as.factor(c(0,3)),y2=as.factor(c(0,1))),se=TRUE) ## KL projection kl <- project(fit,include=c("y2:x","y1:y2","y1:x","y2:x")) ## Clean up ## Not run: rm(test,x,p,y,y1,y2,fit,est,kl) dev.off() ## End(Not run)