summary.gam {mgcv} | R Documentation |
Takes a fitted gam
object produced by gam()
and produces various useful
summaries from it. (See sink
to divert output to a file.)
## S3 method for class 'gam' summary(object, dispersion=NULL, freq=FALSE, p.type = 0, ...) ## S3 method for class 'summary.gam' print(x,digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"),...)
object |
a fitted |
x |
a |
dispersion |
A known dispersion parameter. |
freq |
By default p-values for parametric terms are calculated using the Bayesian estimated
covariance matrix of the parameter estimators. If this is set to |
p.type |
determines how p-values are computed for smooth terms. 0 uses a test statistic with distribution determined by the un-rounded edf of the term. 1 uses upwardly biased rounding of the edf and -1 uses a version of the test statistic with a null distribution that has to be simulated. 5 is the approximation in Wood (2006). Other options are poor, generate a warning, and are only of research interest. See details. |
digits |
controls number of digits printed in output. |
signif.stars |
Should significance stars be printed alongside output. |
... |
other arguments. |
Model degrees of freedom are taken as the trace of the influence (or hat) matrix A for the model fit. Residual degrees of freedom are taken as number of data minus model degrees of freedom. Let P_i be the matrix giving the parameters of the ith smooth when applied to the data (or pseudodata in the generalized case) and let X be the design matrix of the model. Then tr(XP_i) is the edf for the ith term. Clearly this definition causes the edf's to add up properly! An alternative version of EDF is more appropriate for p-value computation, and is based on the trace of 2A - AA.
print.summary.gam
tries to print various bits of summary information useful for term selection in a pretty way.
Unless p.type=5
, p-values for smooth terms are usually based on a
test statistic motivated by an extension of Nychka's (1988) analysis of the frequentist properties
of Bayesian confidence intervals for smooths.
These have better frequentist performance (in terms of power and distribution under the null)
than the alternative strictly frequentist approximation. When the Bayesian intervals have good
across the function properties then the p-values have close to the correct null distribution
and reasonable power (but there are no optimality results for the power).
Let f denote the vector of values of a smooth term evaluated at the original covariate values and let V_f denote the corresponding Bayesian covariance matrix. Let V*_f denote the rank r pseudoinverse of V_f, where r is the EDF for the term. The statistic used is then
T = f'V*_f f
(this can be calculated efficiently without forming the pseudoinverse explicitly). T is compared to an approximation to an appropriate weighted sum of chi-squared random variables.
The non-integer rank truncated inverse is constructed to give an
approximation varying smoothly between the bounding integer rank approximations, while yielding test statistics with the correct mean and variance under the null. Alternatively (p.type==1
) r is obtained by
biased rounding of the EDF: values less than .05 above the preceding integer are rounded down, while other values are rounded up. Another option (p.type==-1
) uses a statistic of formal rank given by the number of coefficients for the smooth, but with its terms weighted by the eigenvalues of the covariance matrix, so that penalized terms are down-weighted, but the null distribution requires simulation. Other options for p.type
are 2 (naive rounding), 3 (round up), 4 (numerical rank determination): these are poor options for theoretically known reasons, and will generate a warning.
The default p-values also have a Bayesian interpretation: the probability of observing an f less probable than 0, under the approximation for the posterior for f implied by the truncation used in the test statistic.
Note that for terms with no unpenalized terms the Nychka (1988) requirement for smoothing bias to be substantially less than variance breaks down (see e.g. appendix of Marra and Wood, 2012), and this results in incorrect null distribution for p-values computed using the above approach. In this case it is necessary to use an alternative approach designed for random effects variance components.
In this zero-dimensional null space/random effects case, the p-values are again conditional on the smoothing parameters/variance component estimates, and may therefore be somewhat too low when these are subject to large uncertainty. The idea is to condition on the smoothing parameter estimates, and then to use the likelihood ratio test statistic conditional on those estimates. The distribution of this test statistic under the null is computable as a weighted sum of chi-squared random variables.
In simulations the p-values have best behaviour under ML smoothness selection, with REML coming second. In general the p-values behave well, but conditioning on the smoothing parameters means that they may be somewhat too low when smoothing parameters are highly uncertain. High uncertainty happens in particular when smoothing parameters are poorly identified, which can occur with nested smooths or highly correlated covariates (high concurvity).
If p.type=5
then the frequentist approximation for p-values of smooth terms described in section
4.8.5 of Wood (2006) is used. The approximation is not great. If p_i
is the parameter vector for the ith smooth term, and this term has estimated
covariance matrix V_i then the
statistic is p_i'V_i^{k-}p_i, where V_i^{k-} is the rank k
pseudo-inverse of V_i, and k is estimated rank of
V_i. p-values are obtained as follows. In the case of
known dispersion parameter, they are obtained by comparing the chi.sq statistic to the
chi-squared distribution with k degrees of freedom, where k is the estimated
rank of V_i. If the dispersion parameter is unknown (in
which case it will have been estimated) the statistic is compared
to an F distribution with k upper d.f. and lower d.f. given by the residual degrees of freedom for the model.
Typically the p-values will be somewhat too low.
By default the p-values for parametric model terms are also based on Wald tests using the Bayesian
covariance matrix for the coefficients. This is appropriate when there are "re" terms present, and is
otherwise rather similar to the results using the frequentist covariance matrix (freq=TRUE
), since
the parametric terms themselves are usually unpenalized. Default P-values for parameteric terms that are
penalized using the paraPen
argument will not be good. However if such terms represent conventional
random effects with full rank penalties, then setting freq=TRUE
is appropriate.
summary.gam
produces a list of summary information for a fitted gam
object.
p.coeff |
is an array of estimates of the strictly parametric model coefficients. |
p.t |
is an array of the |
p.pv |
is an array of p-values for the null hypothesis that the corresponding parameter is zero. Calculated with reference to the t distribution with the estimated residual degrees of freedom for the model fit if the dispersion parameter has been estimated, and the standard normal if not. |
m |
The number of smooth terms in the model. |
chi.sq |
An array of test statistics for assessing the significance of model smooth terms. See details. |
s.pv |
An array of approximate p-values for the null hypotheses that each smooth term is zero. Be warned, these are only approximate. |
se |
array of standard error estimates for all parameter estimates. |
r.sq |
The adjusted r-squared for the model. Defined as the proportion of variance explained, where original variance and
residual variance are both estimated using unbiased estimators. This quantity can be negative if your model is worse than a one
parameter constant model, and can be higher for the smaller of two nested models! The proportion null deviance
explained is probably more appropriate for non-normal errors. Note that |
dev.expl |
The proportion of the null deviance explained by the model. The null deviance is computed taking account of any offset, so
|
edf |
array of estimated degrees of freedom for the model terms. |
residual.df |
estimated residual degrees of freedom. |
n |
number of data. |
method |
The smoothing selection criterion used. |
sp.criterion |
The minimized value of the smoothness selection criterion. Note that for ML and REML methods, what is reported is the negative log marginal likelihood or negative log restricted likelihood. |
scale |
estimated (or given) scale parameter. |
family |
the family used. |
formula |
the original GAM formula. |
dispersion |
the scale parameter. |
pTerms.df |
the degrees of freedom associated with each parametric term (excluding the constant). |
pTerms.chi.sq |
a Wald statistic for testing the null hypothesis that the each parametric term is zero. |
pTerms.pv |
p-values associated with the tests that each term is zero. For penalized fits these are approximate. The reference distribution is an appropriate chi-squared when the scale parameter is known, and is based on an F when it is not. |
cov.unscaled |
The estimated covariance matrix of the parameters (or
estimators if |
cov.scaled |
The estimated covariance matrix of the parameters
(estimators if |
p.table |
significance table for parameters |
s.table |
significance table for smooths |
p.Terms |
significance table for parametric model terms |
The p-values are approximate and conditional on the smoothing parameters, they are likely to be somewhat too low when smoothing parameter estimates are highly uncertain: do read the details section.
P-values for terms penalized via ‘paraPen’ are unlikely to be correct.
Simon N. Wood simon.wood@r-project.org with substantial improvements by Henric Nilsson.
Wood, S.N. (2012) On p-values for smooth component es of an extended generalized additive model. in press Biometrika
Marra, G and S.N. Wood (2012) Coverage Properties of Confidence Intervals for Generalized Additive Model Components. Scandinavian Journal of Statistics, 39(1), 53-74.
Nychka (1988) Bayesian Confidence Intervals for Smoothing Splines. Journal of the American Statistical Association 83:1134-1143.
Wood S.N. (2006) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press.
gam
, predict.gam
,
gam.check
, anova.gam
, gam.vcomp
, sp.vcov
library(mgcv) set.seed(0) dat <- gamSim(1,n=200,scale=2) ## simulate data b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat) plot(b,pages=1) summary(b) ## now check the p-values by using a pure regression spline..... b.d <- round(summary(b)$edf)+1 ## get edf per smooth b.d <- pmax(b.d,3) # can't have basis dimension less than 3! bc<-gam(y~s(x0,k=b.d[1],fx=TRUE)+s(x1,k=b.d[2],fx=TRUE)+ s(x2,k=b.d[3],fx=TRUE)+s(x3,k=b.d[4],fx=TRUE),data=dat) plot(bc,pages=1) summary(bc) ## Example where some p-values are less reliable... dat <- gamSim(6,n=200,scale=2) b <- gam(y~s(x0,m=1)+s(x1)+s(x2)+s(x3)+s(fac,bs="re"),data=dat) ## Here s(x0,m=1) can be penalized to zero, so p-value approximation ## cruder than usual... summary(b) ## Now force summary to report approximate p-value for "re" ## terms as well. In this case the p-value is OK, since the ## random effect structure is so simple. summary(b,all.p=TRUE) ## p-value check - increase k to make this useful! k<-20;n <- 200;p <- rep(NA,k) for (i in 1:k) { b<-gam(y~te(x,z),data=data.frame(y=rnorm(n),x=runif(n),z=runif(n)), method="ML") p[i]<-summary(b)$s.p[1] } plot(((1:k)-0.5)/k,sort(p)) abline(0,1,col=2) ks.test(p,"punif") ## how close to uniform are the p-values?