Title: | Smoothing Splines for Large Samples |
---|---|
Description: | Fits smoothing spline regression models using scalable algorithms designed for large samples. Seven marginal spline types are supported: linear, cubic, different cubic, cubic periodic, cubic thin-plate, ordinal, and nominal. Random effects and parametric effects are also supported. Response can be Gaussian or non-Gaussian: Binomial, Poisson, Gamma, Inverse Gaussian, or Negative Binomial. |
Authors: | Nathaniel E. Helwig <[email protected]> |
Maintainer: | Nathaniel E. Helwig <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1-1 |
Built: | 2024-10-29 06:41:41 UTC |
Source: | CRAN |
Fits smoothing spline regression models using scalable algorithms designed for large samples. Seven marginal spline types are supported: linear, cubic, different cubic, cubic periodic, cubic thin-plate, ordinal, and nominal. Random effects and parametric effects are also supported. Response can be Gaussian or non-Gaussian: Binomial, Poisson, Gamma, Inverse Gaussian, or Negative Binomial.
The DESCRIPTION file:
Package: | bigsplines |
Type: | Package |
Title: | Smoothing Splines for Large Samples |
Version: | 1.1-1 |
Date: | 2018-05-25 |
Author: | Nathaniel E. Helwig <[email protected]> |
Maintainer: | Nathaniel E. Helwig <[email protected]> |
Depends: | quadprog |
Imports: | stats, graphics, grDevices |
Description: | Fits smoothing spline regression models using scalable algorithms designed for large samples. Seven marginal spline types are supported: linear, cubic, different cubic, cubic periodic, cubic thin-plate, ordinal, and nominal. Random effects and parametric effects are also supported. Response can be Gaussian or non-Gaussian: Binomial, Poisson, Gamma, Inverse Gaussian, or Negative Binomial. |
License: | GPL (>= 2) |
NeedsCompilation: | yes |
Packaged: | 2018-05-25 05:53:06 UTC; Nate |
Repository: | CRAN |
Date/Publication: | 2018-05-25 06:47:54 UTC |
Index of help topics:
bigspline Fits Smoothing Spline bigsplines-package Smoothing Splines for Large Samples bigssa Fits Smoothing Spline ANOVA Models bigssg Fits Generalized Smoothing Spline ANOVA Models bigssp Fits Smoothing Splines with Parametric Effects bigtps Fits Cubic Thin-Plate Splines binsamp Bin-Samples Strategic Knot Indices imagebar Displays a Color Image with Colorbar makessa Makes Objects to Fit Smoothing Spline ANOVA Models makessg Makes Objects to Fit Generalized Smoothing Spline ANOVA Models makessp Makes Objects to Fit Smoothing Splines with Parametric Effects ordspline Fits Ordinal Smoothing Spline plotbar Generic X-Y Plotting with Colorbar plotci Generic X-Y Plotting with Confidence Intervals predict.bigspline Predicts for "bigspline" Objects predict.bigssa Predicts for "bigssa" Objects predict.bigssg Predicts for "bigssg" Objects predict.bigssp Predicts for "bigssp" Objects predict.bigtps Predicts for "bigtps" Objects predict.ordspline Predicts for "ordspline" Objects print.bigspline Prints Fit Information for bigsplines Model ssBasis Smoothing Spline Basis for Polynomial Splines summary.bigspline Summarizes Fit Information for bigsplines Model
The function bigspline
fits one-dimensional cubic smoothing splines (unconstrained or periodic). The function bigssa
fits Smoothing Spline Anova (SSA) models (Gaussian data). The function bigssg
fits Generalized Smoothing Spline Anova (GSSA) models (non-Gaussian data). The function bigssp
is for fitting Smoothing Splines with Parametric effects (semi-parametric regression). The function bigtps
fits one-, two-, and three-dimensional cubic thin-plate splines. There are corresponding predict, print, and summary functions for these methods.
Nathaniel E. Helwig <[email protected]>
Maintainer: Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Gu, C. and Wahba, G. (1991). Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM Journal on Scientific and Statistical Computing, 12, 383-398.
Gu, C. and Xiang, D. (2001). Cross-validating non-Gaussian data: Generalized approximate cross-validation revisited. Journal of Computational and Graphical Statistics, 10, 581-591.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2016). Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Statistics and Computing, 26, 1319-1336.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
# See examples for bigspline, bigssa, bigssg, bigssp, and bigtps
# See examples for bigspline, bigssa, bigssg, bigssp, and bigtps
Given a real-valued response vector and a real-valued predictor vector
with
, a smoothing spline model has the form
where is the
-th observation's respone,
is the
-th observation's predictor,
is an unknown smooth function relating the response and predictor, and
is iid Gaussian error.
bigspline(x,y,type="cub",nknots=30,rparm=0.01,xmin=min(x), xmax=max(x),alpha=1,lambdas=NULL,se.fit=FALSE, rseed=1234,knotcheck=TRUE)
bigspline(x,y,type="cub",nknots=30,rparm=0.01,xmin=min(x), xmax=max(x),alpha=1,lambdas=NULL,se.fit=FALSE, rseed=1234,knotcheck=TRUE)
x |
Predictor vector. |
y |
Response vector. Must be same length as |
type |
Type of spline for |
nknots |
Scalar giving maximum number of knots to bin-sample. Use more knots for more jagged functions. |
rparm |
Rounding parameter for |
xmin |
Minimum |
xmax |
Maximum |
alpha |
Manual tuning parameter for GCV score. Using |
lambdas |
Vector of global smoothing parameters to try. Default estimates smoothing parameter that minimizes GCV score. |
se.fit |
Logical indicating if the standard errors of fitted values should be estimated. |
rseed |
Random seed. Input to |
knotcheck |
If |
To estimate I minimize the penalized least-squares functional
where denotes the second derivative of
and
is a smoothing parameter that controls the trade-off between fitting and smoothing the data.
Default use of the function estimates by minimizing the GCV score:
where is the identity matrix and
is the smoothing matrix (see Computational Details).
Using the rounding parameter input rparm
can greatly speed-up and stabilize the fitting for large samples. When rparm
is used, the spline is fit to a set of unique data points after rounding; the unique points are determined using the efficient algorithm described in Helwig (2013). For typical cases, I recommend using rparm=0.01
, but smaller rounding parameters (e,g., rparm=0.001
) may be needed for particularly jagged functions (or when x
has outliers).
fitted.values |
Vector of fitted values corresponding to the original data points in |
se.fit |
Vector of standard errors of |
x |
Predictor vector (same as input). |
y |
Response vector (same as input). |
type |
Type of spline that was used. |
xunique |
Unique elements of |
yunique |
Mean of |
funique |
Vector giving frequency of each element of |
sigma |
Estimated error standard deviation, i.e., |
ndf |
Data frame with two elements: |
info |
Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model (assuming Gaussian error). |
xrng |
Predictor range: |
myknots |
Bin-sampled spline knots used for fit. |
rparm |
Rounding parameter for |
lambda |
Optimal smoothing parameter. |
coef |
Spline basis function coefficients. |
coef.csqrt |
Matrix square-root of covariace matrix of |
Cubic and cubic periodic splines transform the predictor to the interval [0,1] before fitting. So input xmin
must be less than or equal to min(x)
, and input xmax
must be greater than or equal to max(x)
.
When using rounding parameters, output fitted.values
corresponds to unique rounded predictor scores in output xunique
. Use predict.bigspline
function to get fitted values for full y
vector.
According to smoothing spline theory, the function can be approximated as
where the is a linear function,
is the reproducing kernel of the contrast (nonlinear) space, and
are the selected spline knots.
This implies that the penalized least-squares functional can be rewritten as
where is the null space basis function matrix,
is the contrast space basis funciton matrix,
is the penalty matrix, and
and
are the unknown basis function coefficients.
Given the smoothing parameter , the optimal basis function coefficients have the form
where denotes the pseudoinverse of the input matrix.
Given the optimal coefficients, the fitted values are given by , where
is the smoothing matrix, which depends on .
For a linear spline (type="lin"
) with , the needed functions are
where ,
; in this case
and
.
For a cubic spline (type="cub"
) with , the needed functions are
where and
are defined above, and
.
For a different cubic spline (type="cub0"
) with , the needed functions are
where and
.
Note that type="cub"
and type="cub0"
use different definitions of the averaging operator in the null space. The overall spline estimates should be the same (up to approximation accuracy), but the null and constrast space effect functions will differ (see predict.bigspline
). See Helwig (2013) and Gu (2013) for a further discussion of polynomial splines.
For a periodic cubic spline (type="per"
) with , the needed functions are
where is defined as it was for
type="cub"
; in this case and
.
The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE 1 ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(10^6) y <- myfun(x) + rnorm(10^6) # linear, cubic, different cubic, and periodic splines linmod <- bigspline(x,y,type="lin") linmod cubmod <- bigspline(x,y) cubmod cub0mod <- bigspline(x,y,type="cub0") cub0mod permod <- bigspline(x,y,type="per") permod ########## EXAMPLE 2 ########## # define more jagged function set.seed(773) myfun <- function(x){ 2*x + cos(4*pi*x) } x <- runif(10^6)*4 y <- myfun(x) + rnorm(10^6) # try different numbers of knots r1mod <- bigspline(x,y,nknots=20) crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted) r2mod <- bigspline(x,y,nknots=30) crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted) r3mod <- bigspline(x,y,nknots=40) crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted) ########## EXAMPLE 3 ########## # define more jagged function set.seed(773) myfun <- function(x){ 2*x + cos(4*pi*x) } x <- runif(10^6)*4 y <- myfun(x) + rnorm(10^6) # try different rounding parameters r1mod <- bigspline(x,y,rparm=0.05) crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted) r2mod <- bigspline(x,y,rparm=0.02) crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted) r3mod <- bigspline(x,y,rparm=0.01) crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted)
########## EXAMPLE 1 ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(10^6) y <- myfun(x) + rnorm(10^6) # linear, cubic, different cubic, and periodic splines linmod <- bigspline(x,y,type="lin") linmod cubmod <- bigspline(x,y) cubmod cub0mod <- bigspline(x,y,type="cub0") cub0mod permod <- bigspline(x,y,type="per") permod ########## EXAMPLE 2 ########## # define more jagged function set.seed(773) myfun <- function(x){ 2*x + cos(4*pi*x) } x <- runif(10^6)*4 y <- myfun(x) + rnorm(10^6) # try different numbers of knots r1mod <- bigspline(x,y,nknots=20) crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted) r2mod <- bigspline(x,y,nknots=30) crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted) r3mod <- bigspline(x,y,nknots=40) crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted) ########## EXAMPLE 3 ########## # define more jagged function set.seed(773) myfun <- function(x){ 2*x + cos(4*pi*x) } x <- runif(10^6)*4 y <- myfun(x) + rnorm(10^6) # try different rounding parameters r1mod <- bigspline(x,y,rparm=0.05) crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted) r2mod <- bigspline(x,y,rparm=0.02) crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted) r3mod <- bigspline(x,y,rparm=0.01) crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted)
Given a real-valued response vector , a Smoothing Spline Anova (SSA) has the form
where is the
-th observation's respone,
is the
-th observation's nonparametric predictor vector,
is an unknown smooth function relating the response and nonparametric predictors, and
is iid Gaussian error. Function can fit additive models, and also allows for 2-way and 3-way interactions between any number of predictors (see Details and Examples).
bigssa(formula,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.fit=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL, random=NULL,remlalg=c("FS","NR","EM","none"),remliter=500, remltol=10^-4,remltau=NULL)
bigssa(formula,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.fit=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL, random=NULL,remlalg=c("FS","NR","EM","none"),remliter=500, remltol=10^-4,remltau=NULL)
formula |
An object of class " |
data |
Optional data frame, list, or environment containing the variables in |
type |
List of smoothing spline types for predictors in |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
rparm |
List of rounding parameters for each predictor. See Details. |
lambdas |
Vector of global smoothing parameters to try. Default |
skip.iter |
Logical indicating whether to skip the iterative smoothing parameter update. Using |
se.fit |
Logical indicating if the standard errors of the fitted values should be estimated. |
rseed |
Random seed for knot sampling. Input is ignored if |
gcvopts |
Control parameters for optimization. List with 3 elements: (a) |
knotcheck |
If |
gammas |
List of initial smoothing parameters for each predictor. See Details. |
weights |
Vector of positive weights for fitting (default is vector of ones). |
random |
Adds random effects to model (see Random Effects section). |
remlalg |
REML algorithm for estimating variance components (see Random Effects section). Input is ignored if |
remliter |
Maximum number of iterations for REML estimation of variance components. Input is ignored if |
remltol |
Convergence tolerance for REML estimation of variance components. Input is ignored if |
remltau |
Initial estimate of variance parameters for REML estimation of variance components. Input is ignored if |
The formula
syntax is similar to that used in lm
and many other R regression functions. Use y~x
to predict the response y
from the predictor x
. Use y~x1+x2
to fit an additive model of the predictors x1
and x2
, and use y~x1*x2
to fit an interaction model. The syntax y~x1*x2
includes the interaction and main effects, whereas the syntax y~x1:x2
is not supported. See Computational Details for specifics about how nonparametric effects are estimated.
See bigspline
for definitions of type="cub"
, type="cub0"
, and type="per"
splines, which can handle one-dimensional predictors. See Appendix of Helwig and Ma (2015) for information about type="tps"
and type="nom"
splines. Note that type="tps"
can handle one-, two-, or three-dimensional predictors. I recommend using type="cub"
if the predictor scores have no extreme outliers; when outliers are present, type="tps"
may produce a better result.
Using the rounding parameter input rparm
can greatly speed-up and stabilize the fitting for large samples. For typical cases, I recommend using rparm=0.01
for cubic and periodic splines, but smaller rounding parameters may be needed for particularly jagged functions. For thin-plate splines, the data are NOT transformed to the interval [0,1] before fitting, so the rounding parameter should be on the raw data scale. Also, for type="tps"
you can enter one rounding parameter for each predictor dimension. Use rparm=1
for ordinal and nominal splines.
fitted.values |
Vector of fitted values corresponding to the original data points in |
se.fit |
Vector of standard errors of |
yvar |
Response vector. |
xvars |
List of predictors. |
type |
Type of smoothing spline that was used for each predictor. |
yunique |
Mean of |
xunique |
Unique rows of |
sigma |
Estimated error standard deviation, i.e., |
ndf |
Data frame with two elements: |
info |
Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model (assuming Gaussian error). |
modelspec |
List containing specifics of fit model (needed for prediction). |
converged |
Convergence status: |
tnames |
Names of the terms in model. |
random |
Random effects formula (same as input). |
tau |
Variance parameters such that |
blup |
Best linear unbiased predictors (if |
call |
Called model in input |
Cubic and cubic periodic splines transform the predictor to the interval [0,1] before fitting.
When using rounding parameters, output fitted.values
corresponds to unique rounded predictor scores in output xunique
. Use predict.bigssa
function to get fitted values for full yvar
vector.
To estimate I minimize the penalized least-squares functional
where is a nonnegative penalty functional quantifying the roughness of
and
is a smoothing parameter controlling the trade-off between fitting and smoothing the data. Note that for
nonparametric predictors, there are additional
smoothing parameters embedded in
.
The penalized least squares functioncal can be rewritten as
where is the null (parametric) space basis function matrix,
with
denoting the
-th contrast space basis funciton matrix,
with
denoting the
-th penalty matrix, and
and
are the unknown basis function coefficients. The optimal smoothing parameters are chosen by minimizing the GCV score (see
bigspline
).
Note that this function uses the efficient SSA reparameterization described in Helwig (2013) and Helwig and Ma (2015); using is parameterization, there is one unique smoothing parameter per predictor (), and these
parameters determine the structure of the
parameters in the tensor product space. To evaluate the GCV score, this function uses the improved (scalable) SSA algorithm discussed in Helwig (2013) and Helwig and Ma (2015).
For predictors, initial values for the
parameters (that determine the structure of the
parameters) are estimated using the smart starting algorithm described in Helwig (2013) and Helwig and Ma (2015).
Default use of this function (skip.iter=TRUE
) fixes the parameters afer the smart start, and then finds the global smoothing parameter
(among the input
lambdas
) that minimizes the GCV score. This approach typically produces a solution very similar to the more optimal solution using skip.iter=FALSE
.
Setting skip.iter=FALSE
uses the same smart starting algorithm as setting skip.iter=TRUE
. However, instead of fixing the parameters afer the smart start, using
skip.iter=FALSE
iterates between estimating the optimal and the optimal
parameters. The R function
nlm
is used to minimize the GCV score with respect to the parameters, which can be time consuming for models with many predictors and/or a large number of knots.
The input random
adds random effects to the model assuming a variance components structure. Both nested and crossed random effects are supported. In all cases, the random effects are assumed to be indepedent zero-mean Gaussian variables with the variance depending on group membership.
Random effects are distinguished by vertical bars ("|"), which separate expressions for design matrices (left) from group factors (right). For example, the syntax ~1|group
includes a random intercept for each level of group
, whereas the syntax ~1+x|group
includes both a random intercept and a random slope for each level of group
. For crossed random effects, parentheses are needed to distinguish different terms, e.g., ~(1|group1)+(1|group2)
includes a random intercept for each level of group1
and a random intercept for each level of group2
, where both group1
and group2
are factors. For nested random effects, the syntax ~group|subject
can be used, where both group
and subject
are factors such that the levels of subject
are nested within those of group
.
The input remlalg
determines the REML algorithm used to estimate the variance components. Setting remlalg="FS"
uses a Fisher Scoring algorithm (default). Setting remlalg="NR"
uses a Newton-Raphson algorithm. Setting remlalg="EM"
uses an Expectation Maximization algorithm. Use remlalg="none"
to fit a model with known variance components (entered through remltau
).
The input remliter
sets the maximum number of iterations for the REML estimation. The input remltol
sets the convergence tolerance for the REML estimation, which is determined via relative change in the REML log-likelihood. The input remltau
sets the initial estimates of variance parameters; default is remltau = rep(1,ntau)
where ntau
is the number of variance components.
The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2016). Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Statistics and Computing, 26, 1319-1336.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(500) y <- myfun(x) + rnorm(500) # cubic, periodic, and thin-plate spline models with 20 knots cubmod <- bigssa(y~x,type="cub",nknots=20,se.fit=TRUE) cubmod permod <- bigssa(y~x,type="per",nknots=20,se.fit=TRUE) permod tpsmod <- bigssa(y~x,type="tps",nknots=20,se.fit=TRUE) tpsmod ########## EXAMPLE 2 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){sin(2*pi*x1v)+log(x2v+.1)+cos(pi*(x1v-x2v))} x1v <- runif(500) x2v <- runif(500) y <- myfun(x1v,x2v) + rnorm(500) # cubic splines with 50 randomly selected knots intmod <- bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) intmod crossprod( myfun(x1v,x2v) - intmod$fitted.values )/500 # fit additive model (with same knots) addmod <- bigssa(y~x1v+x2v,type=list(x1v="cub",x2v="cub"),nknots=50) addmod crossprod( myfun(x1v,x2v) - addmod$fitted.values )/500 ########## EXAMPLE 3 ########## # function with two continuous and one nominal predictor (3 levels) set.seed(773) myfun <- function(x1v,x2v,x3v){ fval <- rep(0,length(x1v)) xmeans <- c(-1,0,1) for(j in 1:3){ idx <- which(x3v==letters[j]) fval[idx] <- xmeans[j] } fval[idx] <- fval[idx] + cos(4*pi*(x1v[idx])) fval <- (fval + sin(3*pi*x1v*x2v+pi)) / sqrt(2) } x1v <- runif(500) x2v <- runif(500) x3v <- sample(letters[1:3],500,replace=TRUE) y <- myfun(x1v,x2v,x3v) + rnorm(500) # 3-way interaction with 50 knots cuimod <- bigssa(y~x1v*x2v*x3v,type=list(x1v="cub",x2v="cub",x3v="nom"),nknots=50) crossprod( myfun(x1v,x2v,x3v) - cuimod$fitted.values )/500 # fit correct interaction model with 50 knots cubmod <- bigssa(y~x1v*x2v+x1v*x3v,type=list(x1v="cub",x2v="cub",x3v="nom"),nknots=50) crossprod( myfun(x1v,x2v,x3v) - cubmod$fitted.values )/500 # fit model using 2-dimensional thin-plate and nominal x1new <- cbind(x1v,x2v) x2new <- x3v tpsmod <- bigssa(y~x1new*x2new,type=list(x1new="tps",x2new="nom"),nknots=50) crossprod( myfun(x1v,x2v,x3v) - tpsmod$fitted.values )/500 ########## EXAMPLE 4 ########## # function with four continuous predictors set.seed(773) myfun <- function(x1v,x2v,x3v,x4v){ sin(2*pi*x1v) + log(x2v+.1) + x3v*cos(pi*(x4v)) } x1v <- runif(500) x2v <- runif(500) x3v <- runif(500) x4v <- runif(500) y <- myfun(x1v,x2v,x3v,x4v) + rnorm(500) # fit cubic spline model with x3v*x4v interaction cubmod <- bigssa(y~x1v+x2v+x3v*x4v,type=list(x1v="cub",x2v="cub",x3v="cub",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(500) y <- myfun(x) + rnorm(500) # cubic, periodic, and thin-plate spline models with 20 knots cubmod <- bigssa(y~x,type="cub",nknots=20,se.fit=TRUE) cubmod permod <- bigssa(y~x,type="per",nknots=20,se.fit=TRUE) permod tpsmod <- bigssa(y~x,type="tps",nknots=20,se.fit=TRUE) tpsmod ########## EXAMPLE 2 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){sin(2*pi*x1v)+log(x2v+.1)+cos(pi*(x1v-x2v))} x1v <- runif(500) x2v <- runif(500) y <- myfun(x1v,x2v) + rnorm(500) # cubic splines with 50 randomly selected knots intmod <- bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) intmod crossprod( myfun(x1v,x2v) - intmod$fitted.values )/500 # fit additive model (with same knots) addmod <- bigssa(y~x1v+x2v,type=list(x1v="cub",x2v="cub"),nknots=50) addmod crossprod( myfun(x1v,x2v) - addmod$fitted.values )/500 ########## EXAMPLE 3 ########## # function with two continuous and one nominal predictor (3 levels) set.seed(773) myfun <- function(x1v,x2v,x3v){ fval <- rep(0,length(x1v)) xmeans <- c(-1,0,1) for(j in 1:3){ idx <- which(x3v==letters[j]) fval[idx] <- xmeans[j] } fval[idx] <- fval[idx] + cos(4*pi*(x1v[idx])) fval <- (fval + sin(3*pi*x1v*x2v+pi)) / sqrt(2) } x1v <- runif(500) x2v <- runif(500) x3v <- sample(letters[1:3],500,replace=TRUE) y <- myfun(x1v,x2v,x3v) + rnorm(500) # 3-way interaction with 50 knots cuimod <- bigssa(y~x1v*x2v*x3v,type=list(x1v="cub",x2v="cub",x3v="nom"),nknots=50) crossprod( myfun(x1v,x2v,x3v) - cuimod$fitted.values )/500 # fit correct interaction model with 50 knots cubmod <- bigssa(y~x1v*x2v+x1v*x3v,type=list(x1v="cub",x2v="cub",x3v="nom"),nknots=50) crossprod( myfun(x1v,x2v,x3v) - cubmod$fitted.values )/500 # fit model using 2-dimensional thin-plate and nominal x1new <- cbind(x1v,x2v) x2new <- x3v tpsmod <- bigssa(y~x1new*x2new,type=list(x1new="tps",x2new="nom"),nknots=50) crossprod( myfun(x1v,x2v,x3v) - tpsmod$fitted.values )/500 ########## EXAMPLE 4 ########## # function with four continuous predictors set.seed(773) myfun <- function(x1v,x2v,x3v,x4v){ sin(2*pi*x1v) + log(x2v+.1) + x3v*cos(pi*(x4v)) } x1v <- runif(500) x2v <- runif(500) x3v <- runif(500) x4v <- runif(500) y <- myfun(x1v,x2v,x3v,x4v) + rnorm(500) # fit cubic spline model with x3v*x4v interaction cubmod <- bigssa(y~x1v+x2v+x3v*x4v,type=list(x1v="cub",x2v="cub",x3v="cub",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500
Given an exponential family response vector , a Generalized Smoothing Spline Anova (GSSA) has the form
where is the expected value of the
-th observation's respone,
is some invertible link function,
is the
-th observation's nonparametric predictor vector, and
is an unknown smooth function relating the response and nonparametric predictors. Function can fit additive models, and also allows for 2-way and 3-way interactions between any number of predictors. Response can be one of five non-Gaussian distributions: Binomial, Poisson, Gamma, Inverse Gaussian, or Negative Binomial (see Details and Examples).
bigssg(formula,family,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.lp=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL, gcvtype=c("acv","gacv","gacv.old"))
bigssg(formula,family,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.lp=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL, gcvtype=c("acv","gacv","gacv.old"))
formula |
An object of class " |
family |
Distribution for response. One of five options: |
data |
Optional data frame, list, or environment containing the variables in |
type |
List of smoothing spline types for predictors in |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
rparm |
List of rounding parameters for each predictor. See Details. |
lambdas |
Vector of global smoothing parameters to try. Default |
skip.iter |
Logical indicating whether to skip the iterative smoothing parameter update. Using |
se.lp |
Logical indicating if the standard errors of the linear predictors ( |
rseed |
Random seed for knot sampling. Input is ignored if |
gcvopts |
Control parameters for optimization. List with 6 elements: (i) |
knotcheck |
If |
gammas |
List of initial smoothing parameters for each predictor. See Details. |
weights |
Vector of positive weights for fitting (default is vector of ones). |
gcvtype |
Cross-validation criterion for selecting smoothing parameters (see Details). |
The formula
syntax is similar to that used in lm
and many other R regression functions. Use y~x
to predict the response y
from the predictor x
. Use y~x1+x2
to fit an additive model of the predictors x1
and x2
, and use y~x1*x2
to fit an interaction model. The syntax y~x1*x2
includes the interaction and main effects, whereas the syntax y~x1:x2
is not supported. See Computational Details for specifics about how nonparametric effects are estimated.
See bigspline
for definitions of type="cub"
, type="cub0"
, and type="per"
splines, which can handle one-dimensional predictors. See Appendix of Helwig and Ma (2015) for information about type="tps"
and type="nom"
splines. Note that type="tps"
can handle one-, two-, or three-dimensional predictors. I recommend using type="cub"
if the predictor scores have no extreme outliers; when outliers are present, type="tps"
may produce a better result.
Using the rounding parameter input rparm
can greatly speed-up and stabilize the fitting for large samples. For typical cases, I recommend using rparm=0.01
for cubic and periodic splines, but smaller rounding parameters may be needed for particularly jagged functions. For thin-plate splines, the data are NOT transformed to the interval [0,1] before fitting, so rounding parameter should be on raw data scale. Also, for type="tps"
you can enter one rounding parameter for each predictor dimension. Use rparm=1
for ordinal and nominal splines.
fitted.values |
Vector of fitted values (data scale) corresponding to the original data points in |
linear.predictors |
Vector of fitted values (link scale) corresponding to the original data points in |
se.lp |
Vector of standard errors of |
yvar |
Response vector. |
xvars |
List of predictors. |
type |
Type of smoothing spline that was used for each predictor. |
yunique |
Mean of |
xunique |
Unique rows of |
dispersion |
Estimated dispersion parameter (see Response section). |
ndf |
Data frame with two elements: |
info |
Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model. |
modelspec |
List containing specifics of fit model (needed for prediction). |
converged |
Convergence status: |
tnames |
Names of the terms in model. |
family |
Distribution family (same as input). |
call |
Called model in input |
Cubic and cubic periodic splines transform the predictor to the interval [0,1] before fitting.
When using rounding parameters, output fitted.values
corresponds to unique rounded predictor scores in output xunique
. Use predict.bigssg
function to get fitted values for full yvar
vector.
Only one link is permitted for each family:
family="binomial"
Logit link. Response should be vector of proportions in the interval [0,1]. If response is a sample proportion, the total count should be input through weights
argument.
family="poisson"
Log link. Response should be vector of counts (non-negative integers).
family="Gamma"
Inverse link. Response should be vector of positive real-valued data. Estimated dispersion
parameter is the inverse of the shape
parameter, so that the variance of the response increases as dispersion
increases.
family="inverse.gaussian"
Inverse-square link. Response should be vector of positive real-valued data. Estimated dispersion
parameter is the inverse of the shape
parameter, so that the variance of the response increases as dispersion
increases.
family="negbin"
Log link. Response should be vector of counts (non-negative integers). Estimated dispersion
parameter is the inverse of the size
parameter, so that the variance of the response increases as dispersion
increases.
family=list("negbin",2)
Log link. Response should be vector of counts (non-negative integers). Second element is the known (common) dispersion
parameter (2 in this case). The input dispersion
parameter should be the inverse of the size
parameter, so that the variance of the response increases as dispersion
increases.
To estimate I minimize the (negative of the) penalized log likelihood
where is a nonnegative penalty functional quantifying the roughness of
and
is a smoothing parameter controlling the trade-off between fitting and smoothing the data. Note that for
nonparametric predictors, there are additional
smoothing parameters embedded in
.
Following standard exponential family theory, and
, where
and
denote the first and second derivatives of
,
is the variance of
,and
is the dispersion parameter. Given fixed smoothing parameters, the optimal
can be estimated by iteratively minimizing the penalized reweighted least-squares functional
where is the weight,
is the adjusted dependent variable, and
is the current estimate of
.
The optimal smoothing parameters are chosen via direct cross-validation (see Gu & Xiang, 2001).
Setting gcvtype="acv"
uses the Approximate Cross-Validation (ACV) score:
where is the i-th diagonal of the smoothing matrix
.
Setting gcvtype="gacv"
uses the Generalized ACV (GACV) score:
where is the smoothing matrix, and
.
Setting gcvtype="gacv.old"
uses an approximation of the GACV where is approximated using
. This option is included for back-compatibility (ver 1.0-4 and earlier), and is not recommended because the ACV or GACV often perform better.
Note that this function uses the efficient SSA reparameterization described in Helwig (2013) and Helwig and Ma (2015); using is parameterization, there is one unique smoothing parameter per predictor (), and these
parameters determine the structure of the
parameters in the tensor product space. To evaluate the ACV/GACV score, this function uses the improved (scalable) GSSA algorithm discussed in Helwig (in preparation).
For predictors, initial values for the
parameters (that determine the structure of the
parameters) are estimated using an extension of the smart starting algorithm described in Helwig (2013) and Helwig and Ma (2015).
Default use of this function (skip.iter=TRUE
) fixes the parameters afer the smart start, and then finds the global smoothing parameter
(among the input
lambdas
) that minimizes the GCV score. This approach typically produces a solution very similar to the more optimal solution using skip.iter=FALSE
.
Setting skip.iter=FALSE
uses the same smart starting algorithm as setting skip.iter=TRUE
. However, instead of fixing the parameters afer the smart start, using
skip.iter=FALSE
iterates between estimating the optimal and the optimal
parameters. The R function
nlm
is used to minimize the approximate GACV score with respect to the parameters, which can be time consuming for models with many predictors and/or a large number of knots.
The spline is estimated using penalized likelihood estimation. Standard errors of the linear predictors are formed using Bayesian confidence intervals.
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Gu, C. and Xiang, D. (2001). Cross-validating non-Gaussian data: Generalized approximate cross-validation revisited. Journal of Computational and Graphical Statistics, 10, 581-591.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE 1 (1-way GSSA) ########## # define univariate function and data set.seed(1) myfun <- function(x){ sin(2*pi*x) } ndpts <- 1000 x <- runif(ndpts) # binomial response (no weights) set.seed(773) lp <- myfun(x) p <- 1/(1+exp(-lp)) y <- rbinom(n=ndpts,size=1,p=p) ## y is binary data gmod <- bigssg(y~x,family="binomial",type="cub",nknots=20) crossprod( lp - gmod$linear.predictor )/length(lp) # binomial response (with weights) set.seed(773) lp <- myfun(x) p <- 1/(1+exp(-lp)) w <- sample(c(10,20,30,40,50),length(p),replace=TRUE) y <- rbinom(n=ndpts,size=w,p=p)/w ## y is proportion correct gmod <- bigssg(y~x,family="binomial",type="cub",nknots=20,weights=w) crossprod( lp - gmod$linear.predictor )/length(lp) # poisson response set.seed(773) lp <- myfun(x) mu <- exp(lp) y <- rpois(n=ndpts,lambda=mu) gmod <- bigssg(y~x,family="poisson",type="cub",nknots=20) crossprod( lp - gmod$linear.predictor )/length(lp) # Gamma response set.seed(773) lp <- myfun(x) + 2 mu <- 1/lp y <- rgamma(n=ndpts,shape=4,scale=mu/4) gmod <- bigssg(y~x,family="Gamma",type="cub",nknots=20) 1/gmod$dispersion ## dispersion = 1/shape crossprod( lp - gmod$linear.predictor )/length(lp) # inverse gaussian response (not run: requires statmod package) # require(statmod) # set.seed(773) # lp <- myfun(x) + 2 # mu <- sqrt(1/lp) # y <- rinvgauss(n=ndpts,mean=mu,shape=2) # gmod <- bigssg(y~x,family="inverse.gaussian",type="cub",nknots=20) # 1/gmod$dispersion ## dispersion = 1/shape # crossprod( lp - gmod$linear.predictor )/length(lp) # negative binomial response (known dispersion) set.seed(773) lp <- myfun(x) mu <- exp(lp) y <- rnbinom(n=ndpts,size=.5,mu=mu) gmod <- bigssg(y~x,family=list("negbin",2),type="cub",nknots=20) 1/gmod$dispersion ## dispersion = 1/size crossprod( lp - gmod$linear.predictor )/length(lp) # negative binomial response (unknown dispersion) set.seed(773) lp <- myfun(x) mu <- exp(lp) y <- rnbinom(n=ndpts,size=.5,mu=mu) gmod <- bigssg(y~x,family="negbin",type="cub",nknots=20) 1/gmod$dispersion ## dispersion = 1/size crossprod( lp - gmod$linear.predictor )/length(lp) ## Not run: ########## EXAMPLE 2 (2-way GSSA) ########## # function with two continuous predictors set.seed(1) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } ndpts <- 1000 x1v <- runif(ndpts) x2v <- runif(ndpts) # binomial response (no weights) set.seed(773) lp <- myfun(x1v,x2v) p <- 1/(1+exp(-lp)) y <- rbinom(n=ndpts,size=1,p=p) ## y is binary data gmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50) crossprod( lp - gmod$linear.predictor )/length(lp) # binomial response (with weights) set.seed(773) lp <- myfun(x1v,x2v) p <- 1/(1+exp(-lp)) w <- sample(c(10,20,30,40,50),length(p),replace=TRUE) y <- rbinom(n=ndpts,size=w,p=p)/w ## y is proportion correct gmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50,weights=w) crossprod( lp - gmod$linear.predictor )/length(lp) # poisson response set.seed(773) lp <- myfun(x1v,x2v) mu <- exp(lp) y <- rpois(n=ndpts,lambda=mu) gmod <- bigssg(y~x1v*x2v,family="poisson",type=list(x1v="cub",x2v="cub"),nknots=50) crossprod( lp - gmod$linear.predictor )/length(lp) # Gamma response set.seed(773) lp <- myfun(x1v,x2v)+6 mu <- 1/lp y <- rgamma(n=ndpts,shape=4,scale=mu/4) gmod <- bigssg(y~x1v*x2v,family="Gamma",type=list(x1v="cub",x2v="cub"),nknots=50) 1/gmod$dispersion ## dispersion = 1/shape crossprod( lp - gmod$linear.predictor )/length(lp) # inverse gaussian response (not run: requires 'statmod' package) # require(statmod) # set.seed(773) # lp <- myfun(x1v,x2v)+6 # mu <- sqrt(1/lp) # y <- rinvgauss(n=ndpts,mean=mu,shape=2) # gmod <- bigssg(y~x1v*x2v,family="inverse.gaussian",type=list(x1v="cub",x2v="cub"),nknots=50) # 1/gmod$dispersion ## dispersion = 1/shape # crossprod( lp - gmod$linear.predictor )/length(lp) # negative binomial response (known dispersion) set.seed(773) lp <- myfun(x1v,x2v) mu <- exp(lp) y <- rnbinom(n=ndpts,size=.5,mu=mu) gmod <- bigssg(y~x1v*x2v,family=list("negbin",2),type=list(x1v="cub",x2v="cub"),nknots=50) 1/gmod$dispersion ## dispersion = 1/size crossprod( lp - gmod$linear.predictor )/length(lp) # negative binomial response (unknown dispersion) set.seed(773) lp <- myfun(x1v,x2v) mu <- exp(lp) y <- rnbinom(n=ndpts,size=.5,mu=mu) gmod <- bigssg(y~x1v*x2v,family="negbin",type=list(x1v="cub",x2v="cub"),nknots=50) 1/gmod$dispersion ## dispersion = 1/size crossprod( lp - gmod$linear.predictor )/length(lp) ## End(Not run)
########## EXAMPLE 1 (1-way GSSA) ########## # define univariate function and data set.seed(1) myfun <- function(x){ sin(2*pi*x) } ndpts <- 1000 x <- runif(ndpts) # binomial response (no weights) set.seed(773) lp <- myfun(x) p <- 1/(1+exp(-lp)) y <- rbinom(n=ndpts,size=1,p=p) ## y is binary data gmod <- bigssg(y~x,family="binomial",type="cub",nknots=20) crossprod( lp - gmod$linear.predictor )/length(lp) # binomial response (with weights) set.seed(773) lp <- myfun(x) p <- 1/(1+exp(-lp)) w <- sample(c(10,20,30,40,50),length(p),replace=TRUE) y <- rbinom(n=ndpts,size=w,p=p)/w ## y is proportion correct gmod <- bigssg(y~x,family="binomial",type="cub",nknots=20,weights=w) crossprod( lp - gmod$linear.predictor )/length(lp) # poisson response set.seed(773) lp <- myfun(x) mu <- exp(lp) y <- rpois(n=ndpts,lambda=mu) gmod <- bigssg(y~x,family="poisson",type="cub",nknots=20) crossprod( lp - gmod$linear.predictor )/length(lp) # Gamma response set.seed(773) lp <- myfun(x) + 2 mu <- 1/lp y <- rgamma(n=ndpts,shape=4,scale=mu/4) gmod <- bigssg(y~x,family="Gamma",type="cub",nknots=20) 1/gmod$dispersion ## dispersion = 1/shape crossprod( lp - gmod$linear.predictor )/length(lp) # inverse gaussian response (not run: requires statmod package) # require(statmod) # set.seed(773) # lp <- myfun(x) + 2 # mu <- sqrt(1/lp) # y <- rinvgauss(n=ndpts,mean=mu,shape=2) # gmod <- bigssg(y~x,family="inverse.gaussian",type="cub",nknots=20) # 1/gmod$dispersion ## dispersion = 1/shape # crossprod( lp - gmod$linear.predictor )/length(lp) # negative binomial response (known dispersion) set.seed(773) lp <- myfun(x) mu <- exp(lp) y <- rnbinom(n=ndpts,size=.5,mu=mu) gmod <- bigssg(y~x,family=list("negbin",2),type="cub",nknots=20) 1/gmod$dispersion ## dispersion = 1/size crossprod( lp - gmod$linear.predictor )/length(lp) # negative binomial response (unknown dispersion) set.seed(773) lp <- myfun(x) mu <- exp(lp) y <- rnbinom(n=ndpts,size=.5,mu=mu) gmod <- bigssg(y~x,family="negbin",type="cub",nknots=20) 1/gmod$dispersion ## dispersion = 1/size crossprod( lp - gmod$linear.predictor )/length(lp) ## Not run: ########## EXAMPLE 2 (2-way GSSA) ########## # function with two continuous predictors set.seed(1) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } ndpts <- 1000 x1v <- runif(ndpts) x2v <- runif(ndpts) # binomial response (no weights) set.seed(773) lp <- myfun(x1v,x2v) p <- 1/(1+exp(-lp)) y <- rbinom(n=ndpts,size=1,p=p) ## y is binary data gmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50) crossprod( lp - gmod$linear.predictor )/length(lp) # binomial response (with weights) set.seed(773) lp <- myfun(x1v,x2v) p <- 1/(1+exp(-lp)) w <- sample(c(10,20,30,40,50),length(p),replace=TRUE) y <- rbinom(n=ndpts,size=w,p=p)/w ## y is proportion correct gmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50,weights=w) crossprod( lp - gmod$linear.predictor )/length(lp) # poisson response set.seed(773) lp <- myfun(x1v,x2v) mu <- exp(lp) y <- rpois(n=ndpts,lambda=mu) gmod <- bigssg(y~x1v*x2v,family="poisson",type=list(x1v="cub",x2v="cub"),nknots=50) crossprod( lp - gmod$linear.predictor )/length(lp) # Gamma response set.seed(773) lp <- myfun(x1v,x2v)+6 mu <- 1/lp y <- rgamma(n=ndpts,shape=4,scale=mu/4) gmod <- bigssg(y~x1v*x2v,family="Gamma",type=list(x1v="cub",x2v="cub"),nknots=50) 1/gmod$dispersion ## dispersion = 1/shape crossprod( lp - gmod$linear.predictor )/length(lp) # inverse gaussian response (not run: requires 'statmod' package) # require(statmod) # set.seed(773) # lp <- myfun(x1v,x2v)+6 # mu <- sqrt(1/lp) # y <- rinvgauss(n=ndpts,mean=mu,shape=2) # gmod <- bigssg(y~x1v*x2v,family="inverse.gaussian",type=list(x1v="cub",x2v="cub"),nknots=50) # 1/gmod$dispersion ## dispersion = 1/shape # crossprod( lp - gmod$linear.predictor )/length(lp) # negative binomial response (known dispersion) set.seed(773) lp <- myfun(x1v,x2v) mu <- exp(lp) y <- rnbinom(n=ndpts,size=.5,mu=mu) gmod <- bigssg(y~x1v*x2v,family=list("negbin",2),type=list(x1v="cub",x2v="cub"),nknots=50) 1/gmod$dispersion ## dispersion = 1/size crossprod( lp - gmod$linear.predictor )/length(lp) # negative binomial response (unknown dispersion) set.seed(773) lp <- myfun(x1v,x2v) mu <- exp(lp) y <- rnbinom(n=ndpts,size=.5,mu=mu) gmod <- bigssg(y~x1v*x2v,family="negbin",type=list(x1v="cub",x2v="cub"),nknots=50) 1/gmod$dispersion ## dispersion = 1/size crossprod( lp - gmod$linear.predictor )/length(lp) ## End(Not run)
Given a real-valued response vector , a semiparametric regression model has the form
where is the
-th observation's respone,
is the
-th observation's nonparametric predictor vector,
is an unknown smooth function relating the response and nonparametric predictors,
is the
-th observation's parametric predictor vector, and
is iid Gaussian error. Function can fit both additive and interactive non/parametric effects, and allows for 2-way and 3-way interactions between nonparametric and parametric effects (see Details and Examples).
bigssp(formula,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.fit=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,thetas=NULL,weights=NULL, random=NULL,remlalg=c("FS","NR","EM","none"),remliter=500, remltol=10^-4,remltau=NULL)
bigssp(formula,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.fit=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,thetas=NULL,weights=NULL, random=NULL,remlalg=c("FS","NR","EM","none"),remliter=500, remltol=10^-4,remltau=NULL)
formula |
An object of class " |
data |
Optional data frame, list, or environment containing the variables in |
type |
List of smoothing spline types for predictors in |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
rparm |
List of rounding parameters for each predictor. See Details. |
lambdas |
Vector of global smoothing parameters to try. Default |
skip.iter |
Logical indicating whether to skip the iterative smoothing parameter update. Using |
se.fit |
Logical indicating if the standard errors of the fitted values should be estimated. |
rseed |
Random seed for knot sampling. Input is ignored if |
gcvopts |
Control parameters for optimization. List with 3 elements: (a) |
knotcheck |
If |
thetas |
List of initial smoothing parameters for each predictor subspace. See Details. |
weights |
Vector of positive weights for fitting (default is vector of ones). |
random |
Adds random effects to model (see Random Effects section). |
remlalg |
REML algorithm for estimating variance components (see Random Effects section). Input is ignored if |
remliter |
Maximum number of iterations for REML estimation of variance components. Input is ignored if |
remltol |
Convergence tolerance for REML estimation of variance components. Input is ignored if |
remltau |
Initial estimate of variance parameters for REML estimation of variance components. Input is ignored if |
The formula
syntax is similar to that used in lm
and many other R regression functions. Use y~x
to predict the response y
from the predictor x
. Use y~x1+x2
to fit an additive model of the predictors x1
and x2
, and use y~x1*x2
to fit an interaction model. The syntax y~x1*x2
includes the interaction and main effects, whereas the syntax y~x1:x2
only includes the interaction. See Computational Details for specifics about how non/parametric effects are estimated.
See bigspline
for definitions of type="cub"
, type="cub0"
, and type="per"
splines, which can handle one-dimensional predictors. See Appendix of Helwig and Ma (2015) for information about type="tps"
and type="nom"
splines. Note that type="tps"
can handle one-, two-, or three-dimensional predictors. I recommend using type="cub"
if the predictor scores have no extreme outliers; when outliers are present, type="tps"
may produce a better result.
Using the rounding parameter input rparm
can greatly speed-up and stabilize the fitting for large samples. For typical cases, I recommend using rparm=0.01
for cubic and periodic splines, but smaller rounding parameters may be needed for particularly jagged functions. For thin-plate splines, the data are NOT transformed to the interval [0,1] before fitting, so the rounding parameter should be on the raw data scale. Also, for type="tps"
you can enter one rounding parameter for each predictor dimension. Use rparm=1
for ordinal and nominal splines.
fitted.values |
Vector of fitted values corresponding to the original data points in |
se.fit |
Vector of standard errors of |
yvar |
Response vector. |
xvars |
List of predictors. |
type |
Type of smoothing spline that was used for each predictor. |
yunique |
Mean of |
xunique |
Unique rows of |
sigma |
Estimated error standard deviation, i.e., |
ndf |
Data frame with two elements: |
info |
Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model (assuming Gaussian error). |
modelspec |
List containing specifics of fit model (needed for prediction). |
converged |
Convergence status: |
tnames |
Names of the terms in model. |
random |
Random effects formula (same as input). |
tau |
Variance parameters such that |
blup |
Best linear unbiased predictors (if |
call |
Called model in input |
Cubic and cubic periodic splines transform the predictor to the interval [0,1] before fitting.
When using rounding parameters, output fitted.values
corresponds to unique rounded predictor scores in output xunique
. Use predict.bigssp
function to get fitted values for full yvar
vector.
To estimate I minimize the penalized least-squares functional
where is a nonnegative penalty functional quantifying the roughness of
and
is a smoothing parameter controlling the trade-off between fitting and smoothing the data. Note that for
nonparametric predictors, there are additional
smoothing parameters embedded in
.
The penalized least squares functioncal can be rewritten as
where is the parametric space basis function matrix,
with
denoting the
-th contrast space basis funciton matrix,
with
denoting the
-th penalty matrix, and
and
are the unknown basis function coefficients. The optimal smoothing parameters are chosen by minimizing the GCV score (see
bigspline
).
Note that this function uses the classic smoothing spline parameterization (see Gu, 2013), so there is more than one smoothing parameter per predictor (if interactions are included in the model). To evaluate the GCV score, this function uses the improved (scalable) SSA algorithm discussed in Helwig (2013) and Helwig and Ma (2015).
For predictors, initial values for the
parameters are estimated using Algorithm 3.2 described in Gu and Wahba (1991).
Default use of this function (skip.iter=TRUE
) fixes the parameters afer the smart start, and then finds the global smoothing parameter
(among the input
lambdas
) that minimizes the GCV score. This approach typically produces a solution very similar to the more optimal solution using skip.iter=FALSE
.
Setting skip.iter=FALSE
uses the same smart starting algorithm as setting skip.iter=TRUE
. However, instead of fixing the parameters afer the smart start, using
skip.iter=FALSE
iterates between estimating the optimal and the optimal
parameters. The R function
nlm
is used to minimize the GCV score with respect to the parameters, which can be time consuming for models with many predictors.
The input random
adds random effects to the model assuming a variance components structure. Both nested and crossed random effects are supported. In all cases, the random effects are assumed to be indepedent zero-mean Gaussian variables with the variance depending on group membership.
Random effects are distinguished by vertical bars ("|"), which separate expressions for design matrices (left) from group factors (right). For example, the syntax ~1|group
includes a random intercept for each level of group
, whereas the syntax ~1+x|group
includes both a random intercept and a random slope for each level of group
. For crossed random effects, parentheses are needed to distinguish different terms, e.g., ~(1|group1)+(1|group2)
includes a random intercept for each level of group1
and a random intercept for each level of group2
, where both group1
and group2
are factors. For nested random effects, the syntax ~group|subject
can be used, where both group
and subject
are factors such that the levels of subject
are nested within those of group
.
The input remlalg
determines the REML algorithm used to estimate the variance components. Setting remlalg="FS"
uses a Fisher Scoring algorithm (default). Setting remlalg="NR"
uses a Newton-Raphson algorithm. Setting remlalg="EM"
uses an Expectation Maximization algorithm. Use remlalg="none"
to fit a model with known variance components (entered through remltau
).
The input remliter
sets the maximum number of iterations for the REML estimation. The input remltol
sets the convergence tolerance for the REML estimation, which is determined via relative change in the REML log-likelihood. The input remltau
sets the initial estimates of variance parameters; default is remltau = rep(1,ntau)
where ntau
is the number of variance components.
The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Gu, C. and Wahba, G. (1991). Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM Journal on Scientific and Statistical Computing, 12, 383-398.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2016). Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Statistics and Computing, 26, 1319-1336.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE ########## # function with four continuous predictors set.seed(773) myfun <- function(x1v,x2v,x3v,x4v){ sin(2*pi*x1v) + log(x2v+.1) + x3v*cos(pi*(x4v)) } x1v <- runif(500) x2v <- runif(500) x3v <- runif(500) x4v <- runif(500) y <- myfun(x1v,x2v,x3v,x4v) + rnorm(500) # fit cubic spline model with x3v*x4v interaction and x3v as "cub" # (includes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v*x4v,type=list(x1v="cub",x2v="cub",x3v="cub",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit cubic spline model with x3v*x4v interaction and x3v as "cub0" # (includes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v*x4v,type=list(x1v="cub",x2v="cub",x3v="cub0",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit model with x3v*x4v interaction treating x3v as parametric effect # (includes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v*x4v,type=list(x1v="cub",x2v="cub",x3v="prm",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit cubic spline model with x3v:x4v interaction and x3v as "cub" # (excludes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v:x4v,type=list(x1v="cub",x2v="cub",x3v="cub",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit cubic spline model with x3v:x4v interaction and x3v as "cub0" # (excludes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v:x4v,type=list(x1v="cub",x2v="cub",x3v="cub0",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit model with x3v:x4v interaction treating x3v as parametric effect # (excludes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v:x4v,type=list(x1v="cub",x2v="cub",x3v="prm",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500
########## EXAMPLE ########## # function with four continuous predictors set.seed(773) myfun <- function(x1v,x2v,x3v,x4v){ sin(2*pi*x1v) + log(x2v+.1) + x3v*cos(pi*(x4v)) } x1v <- runif(500) x2v <- runif(500) x3v <- runif(500) x4v <- runif(500) y <- myfun(x1v,x2v,x3v,x4v) + rnorm(500) # fit cubic spline model with x3v*x4v interaction and x3v as "cub" # (includes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v*x4v,type=list(x1v="cub",x2v="cub",x3v="cub",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit cubic spline model with x3v*x4v interaction and x3v as "cub0" # (includes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v*x4v,type=list(x1v="cub",x2v="cub",x3v="cub0",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit model with x3v*x4v interaction treating x3v as parametric effect # (includes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v*x4v,type=list(x1v="cub",x2v="cub",x3v="prm",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit cubic spline model with x3v:x4v interaction and x3v as "cub" # (excludes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v:x4v,type=list(x1v="cub",x2v="cub",x3v="cub",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit cubic spline model with x3v:x4v interaction and x3v as "cub0" # (excludes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v:x4v,type=list(x1v="cub",x2v="cub",x3v="cub0",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500 # fit model with x3v:x4v interaction treating x3v as parametric effect # (excludes x3v and x4v main effects) cubmod <- bigssp(y~x1v+x2v+x3v:x4v,type=list(x1v="cub",x2v="cub",x3v="prm",x4v="cub"),nknots=50) crossprod( myfun(x1v,x2v,x3v,x4v) - cubmod$fitted.values )/500
Given a real-valued response vector , a thin-plate spline model has the form
where is the
-th observation's respone,
is the
-th observation's nonparametric predictor vector,
is an unknown smooth function relating the response and predictor, and
is iid Gaussian error. Function only fits interaction models.
bigtps(x,y,nknots=NULL,nvec=NULL,rparm=NA, alpha=1,lambdas=NULL,se.fit=FALSE, rseed=1234,knotcheck=TRUE)
bigtps(x,y,nknots=NULL,nvec=NULL,rparm=NA, alpha=1,lambdas=NULL,se.fit=FALSE, rseed=1234,knotcheck=TRUE)
x |
Predictor vector or matrix with three or less columns. |
y |
Response vector. Must be same length as |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
nvec |
Number of eigenvectors (and eigenvalues) to use in approximation. Must be less than or equal to the number of knots and greater than or equal to |
rparm |
Rounding parameter(s) for |
alpha |
Manual tuning parameter for GCV score. Using |
lambdas |
Vector of global smoothing parameters to try. Default estimates smoothing parameter that minimizes GCV score. |
se.fit |
Logical indicating if the standard errors of fitted values should be estimated. |
rseed |
Random seed for knot sampling. Input is ignored if |
knotcheck |
If |
To estimate I minimize the penalized least-squares functional
where is the thin-plate penalty (see Helwig and Ma) and
is a smoothing parameter that controls the trade-off between fitting and smoothing the data. Default use of the function estimates
by minimizing the GCV score (see
bigspline
).
Using the rounding parameter input rparm
can greatly speed-up and stabilize the fitting for large samples. When rparm
is used, the spline is fit to a set of unique data points after rounding; the unique points are determined using the efficient algorithm described in Helwig (2013). Rounding parameter should be on the raw data scale.
fitted.values |
Vector of fitted values corresponding to the original data points in |
se.fit |
Vector of standard errors of |
x |
Predictor vector (same as input). |
y |
Response vector (same as input). |
xunique |
Unique elements of |
yunique |
Mean of |
funique |
Vector giving frequency of each element of |
sigma |
Estimated error standard deviation, i.e., |
ndf |
Data frame with two elements: |
info |
Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model (assuming Gaussian error). |
myknots |
Spline knots used for fit. |
nvec |
Number of eigenvectors used for solution. |
rparm |
Rounding parameter for |
lambda |
Optimal smoothing parameter. |
coef |
Spline basis function coefficients. |
coef.csqrt |
Matrix square-root of covariace matrix of |
Input nvec
must be greater than ncol(x)+1
.
When using rounding parameters, output fitted.values
corresponds to unique rounded predictor scores in output xunique
. Use predict.bigtps
function to get fitted values for full y
vector.
According to thin-plate spline theory, the function can be approximated as
where the are linear functions,
is the thin-plate spline semi-kernel,
are the knots, and the
coefficients are constrained to be orthongonal to the
functions.
This implies that the penalized least-squares functional can be rewritten as
where is the null space basis function matrix,
is the contrast space basis funciton matrix,
is the penalty matrix, and
and
are the unknown basis function coefficients, where
are constrained to be orthongonal to the
functions.
See Helwig and Ma for specifics about how the constrained estimation is handled.
The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE 1 ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(500) y <- myfun(x) + rnorm(500) # fit thin-plate spline (default 1 dim: 30 knots) tpsmod <- bigtps(x,y) tpsmod ########## EXAMPLE 2 ########## # define more jagged function set.seed(773) myfun <- function(x){ 2*x+cos(2*pi*x) } x <- runif(500)*4 y <- myfun(x) + rnorm(500) # try different numbers of knots r1mod <- bigtps(x,y,nknots=20,rparm=0.01) crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted) r2mod <- bigtps(x,y,nknots=35,rparm=0.01) crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted) r3mod <- bigtps(x,y,nknots=50,rparm=0.01) crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted) ########## EXAMPLE 3 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x <- cbind(runif(500),runif(500)) y <- myfun(x[,1],x[,2]) + rnorm(500) # fit thin-plate spline with 50 knots (default 2 dim: 100 knots) tpsmod <- bigtps(x,y,nknots=50) tpsmod crossprod( myfun(x[,1],x[,2]) - tpsmod$fitted.values )/500 ########## EXAMPLE 4 ########## # function with three continuous predictors set.seed(773) myfun <- function(x1v,x2v,x3v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*x3v) } x <- cbind(runif(500),runif(500),runif(500)) y <- myfun(x[,1],x[,2],x[,3]) + rnorm(500) # fit thin-plate spline with 50 knots (default 3 dim: 200 knots) tpsmod <- bigtps(x,y,nknots=50) tpsmod crossprod( myfun(x[,1],x[,2],x[,3]) - tpsmod$fitted.values )/500
########## EXAMPLE 1 ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(500) y <- myfun(x) + rnorm(500) # fit thin-plate spline (default 1 dim: 30 knots) tpsmod <- bigtps(x,y) tpsmod ########## EXAMPLE 2 ########## # define more jagged function set.seed(773) myfun <- function(x){ 2*x+cos(2*pi*x) } x <- runif(500)*4 y <- myfun(x) + rnorm(500) # try different numbers of knots r1mod <- bigtps(x,y,nknots=20,rparm=0.01) crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted) r2mod <- bigtps(x,y,nknots=35,rparm=0.01) crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted) r3mod <- bigtps(x,y,nknots=50,rparm=0.01) crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted) ########## EXAMPLE 3 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x <- cbind(runif(500),runif(500)) y <- myfun(x[,1],x[,2]) + rnorm(500) # fit thin-plate spline with 50 knots (default 2 dim: 100 knots) tpsmod <- bigtps(x,y,nknots=50) tpsmod crossprod( myfun(x[,1],x[,2]) - tpsmod$fitted.values )/500 ########## EXAMPLE 4 ########## # function with three continuous predictors set.seed(773) myfun <- function(x1v,x2v,x3v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*x3v) } x <- cbind(runif(500),runif(500),runif(500)) y <- myfun(x[,1],x[,2],x[,3]) + rnorm(500) # fit thin-plate spline with 50 knots (default 3 dim: 200 knots) tpsmod <- bigtps(x,y,nknots=50) tpsmod crossprod( myfun(x[,1],x[,2],x[,3]) - tpsmod$fitted.values )/500
Breaks the predictor domain into a user-specified number of disjoint subregions, and randomly samples a user-specified number of observations from each (nonempty) subregion.
binsamp(x,xrng=NULL,nmbin=11,nsamp=1,alg=c("new","old"))
binsamp(x,xrng=NULL,nmbin=11,nsamp=1,alg=c("new","old"))
x |
Matrix of predictors |
xrng |
Optional matrix of predictor ranges: |
nmbin |
Vector |
nsamp |
Scalar |
alg |
Bin-sampling algorithm. New algorithm forms equidistant grid, whereas old algorithm forms approximately equidistant grid. New algorithm is default for versions 1.0-1 and later. |
Returns an index vector indicating the rows of x
that were bin-sampled.
If is nominal with
levels, the function requires
and
for
.
The number of returned knots will depend on the distribution of the covariate scores. The maximum number of possible bin-sampled knots is , but fewer knots will be returned if one (or more) of the bins is empty (i.e., if there is no data in one or more bins).
Nathaniel E. Helwig <[email protected]>
########## EXAMPLE 1 ########## # create 2-dimensional predictor (both continuous) set.seed(123) xmat <- cbind(runif(10^6),runif(10^6)) # Default use: # 10 marginal bins for each predictor # sample 1 observation from each subregion xind <- binsamp(xmat) # get the corresponding knots bknots <- xmat[xind,] # compare to randomly-sampled knots rknots <- xmat[sample(1:(10^6),100),] par(mfrow=c(1,2)) plot(bknots,main="bin-sampled") plot(rknots,main="randomly sampled") ########## EXAMPLE 2 ########## # create 2-dimensional predictor (continuous and nominal) set.seed(123) xmat <- cbind(runif(10^6),sample(1:3,10^6,replace=TRUE)) # use 10 marginal bins for x1 and 3 marginal bins for x2 # and sample one observation from each subregion xind <- binsamp(xmat,nmbin=c(10,3)) # get the corresponding knots bknots <- xmat[xind,] # compare to randomly-sampled knots rknots <- xmat[sample(1:(10^6),30),] par(mfrow=c(1,2)) plot(bknots,main="bin-sampled") plot(rknots,main="randomly sampled") ########## EXAMPLE 3 ########## # create 3-dimensional predictor (continuous, continuous, nominal) set.seed(123) xmat <- cbind(runif(10^6),runif(10^6),sample(1:2,10^6,replace=TRUE)) # use 10 marginal bins for x1 and x2, and 2 marginal bins for x3 # and sample one observation from each subregion xind <- binsamp(xmat,nmbin=c(10,10,2)) # get the corresponding knots bknots <- xmat[xind,] # compare to randomly-sampled knots rknots <- xmat[sample(1:(10^6),200),] par(mfrow=c(2,2)) plot(bknots[1:100,1:2],main="bin-sampled, x3=1") plot(bknots[101:200,1:2],main="bin-sampled, x3=2") plot(rknots[rknots[,3]==1,1:2],main="randomly sampled, x3=1") plot(rknots[rknots[,3]==2,1:2],main="randomly sampled, x3=2")
########## EXAMPLE 1 ########## # create 2-dimensional predictor (both continuous) set.seed(123) xmat <- cbind(runif(10^6),runif(10^6)) # Default use: # 10 marginal bins for each predictor # sample 1 observation from each subregion xind <- binsamp(xmat) # get the corresponding knots bknots <- xmat[xind,] # compare to randomly-sampled knots rknots <- xmat[sample(1:(10^6),100),] par(mfrow=c(1,2)) plot(bknots,main="bin-sampled") plot(rknots,main="randomly sampled") ########## EXAMPLE 2 ########## # create 2-dimensional predictor (continuous and nominal) set.seed(123) xmat <- cbind(runif(10^6),sample(1:3,10^6,replace=TRUE)) # use 10 marginal bins for x1 and 3 marginal bins for x2 # and sample one observation from each subregion xind <- binsamp(xmat,nmbin=c(10,3)) # get the corresponding knots bknots <- xmat[xind,] # compare to randomly-sampled knots rknots <- xmat[sample(1:(10^6),30),] par(mfrow=c(1,2)) plot(bknots,main="bin-sampled") plot(rknots,main="randomly sampled") ########## EXAMPLE 3 ########## # create 3-dimensional predictor (continuous, continuous, nominal) set.seed(123) xmat <- cbind(runif(10^6),runif(10^6),sample(1:2,10^6,replace=TRUE)) # use 10 marginal bins for x1 and x2, and 2 marginal bins for x3 # and sample one observation from each subregion xind <- binsamp(xmat,nmbin=c(10,10,2)) # get the corresponding knots bknots <- xmat[xind,] # compare to randomly-sampled knots rknots <- xmat[sample(1:(10^6),200),] par(mfrow=c(2,2)) plot(bknots[1:100,1:2],main="bin-sampled, x3=1") plot(bknots[101:200,1:2],main="bin-sampled, x3=2") plot(rknots[rknots[,3]==1,1:2],main="randomly sampled, x3=1") plot(rknots[rknots[,3]==2,1:2],main="randomly sampled, x3=2")
This is a modification to the R function image
that adds a colorbar to the margin.
imagebar(x,y,z,xlim=NULL,ylim=NULL,zlim=NULL, zlab=NULL,zcex.axis=NULL,zcex.lab=NULL, zaxis.at=NULL,zaxis.labels=TRUE, col=NULL,ncolor=21,drawbar=TRUE,zline=2, pltimage=c(.2,.8,.2,.8),pltbar=c(.82,.85,.2,.8),...)
imagebar(x,y,z,xlim=NULL,ylim=NULL,zlim=NULL, zlab=NULL,zcex.axis=NULL,zcex.lab=NULL, zaxis.at=NULL,zaxis.labels=TRUE, col=NULL,ncolor=21,drawbar=TRUE,zline=2, pltimage=c(.2,.8,.2,.8),pltbar=c(.82,.85,.2,.8),...)
x , y
|
Locations of grid lines at which the values in |
z |
A matrix containing the values to be plotted ( |
xlim , ylim
|
Ranges for the plotted |
zlim |
The minimum and maximum |
zlab |
Label for the colorbar. |
zcex.axis |
The magnification to be used for the z-axis annotation (colorbar scale). |
zcex.lab |
The magnification to be used for the z-axis label ( |
zaxis.at |
The points at which tick-marks are to be drawn for the colorbar. Points outside of the range of |
zaxis.labels |
This can either be a logical value specifying whether (numerical) annotations are to be made at the tickmarks, or a character or expression vector of labels to be placed at the tickpoints. |
col |
Color scheme to use. Default is from |
ncolor |
The number of colors to use in the color scheme. |
drawbar |
Logical indicating if the colorbar should be drawn. |
zline |
Number of lines into the margin at which the axis line will be drawn (see |
pltimage |
A vector of the form c(x1, x2, y1, y2) giving the coordinates of the image region as fractions of the current figure region (see |
pltbar |
A vector of the form c(x1, x2, y1, y2) giving the coordinates of the colorbar region as fractions of the current figure region (see |
... |
Additional arguments to be passed to |
Produces an image
plot with a colorbar.
Nathaniel E. Helwig <[email protected]>
########## EXAMPLE 1 ########## myfun <- function(x){ 2*sin(sqrt(x[,1]^2+x[,2]^2+.1))/sqrt(x[,1]^2+x[,2]^2+.1) } x <- expand.grid(seq(-8,8,l=100),seq(-8,8,l=100)) imagebar(seq(-8,8,l=100),seq(-8,8,l=100),matrix(myfun(x),100,100), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(-0.5,2),zaxis.at=seq(-0.5,2,by=0.5)) ########## EXAMPLE 2 ########## myfun <- function(x1v,x2v){ sin(2*pi*x1v) + 2*sin(sqrt(x2v^2+.1))/sqrt(x2v^2+.1) } x <- expand.grid(x1v=seq(0,1,l=100),x2v=seq(-8,8,l=100)) imagebar(seq(0,1,l=100),seq(-8,8,l=100),matrix(myfun(x$x1v,x$x2v),100,100), col=c("red","orange","yellow","white"),xlab="x1v",ylab="x2v", zlab=expression(hat(italic(y))),zlim=c(-1.5,3),zaxis.at=seq(-1.5,3,by=0.5)) ########## EXAMPLE 3 ########## myfun <- function(x1v,x2v){ sin(3*pi*x1v) + sin(2*pi*x2v) + 3*cos(pi*(x1v-x2v)) } x <- expand.grid(x1v=seq(-1,1,l=100),x2v=seq(-1,1,l=100)) imagebar(seq(-1,1,l=100),seq(-1,1,l=100),matrix(myfun(x$x1v,x$x2v),100,100), col=c("blue","green","light green","yellow"),xlab="x1v",ylab="x2v", zlab=expression(hat(italic(y))),zlim=c(-5,5),zaxis.at=c(-5,0,5), zaxis.labels=c("low","med","high"))
########## EXAMPLE 1 ########## myfun <- function(x){ 2*sin(sqrt(x[,1]^2+x[,2]^2+.1))/sqrt(x[,1]^2+x[,2]^2+.1) } x <- expand.grid(seq(-8,8,l=100),seq(-8,8,l=100)) imagebar(seq(-8,8,l=100),seq(-8,8,l=100),matrix(myfun(x),100,100), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(-0.5,2),zaxis.at=seq(-0.5,2,by=0.5)) ########## EXAMPLE 2 ########## myfun <- function(x1v,x2v){ sin(2*pi*x1v) + 2*sin(sqrt(x2v^2+.1))/sqrt(x2v^2+.1) } x <- expand.grid(x1v=seq(0,1,l=100),x2v=seq(-8,8,l=100)) imagebar(seq(0,1,l=100),seq(-8,8,l=100),matrix(myfun(x$x1v,x$x2v),100,100), col=c("red","orange","yellow","white"),xlab="x1v",ylab="x2v", zlab=expression(hat(italic(y))),zlim=c(-1.5,3),zaxis.at=seq(-1.5,3,by=0.5)) ########## EXAMPLE 3 ########## myfun <- function(x1v,x2v){ sin(3*pi*x1v) + sin(2*pi*x2v) + 3*cos(pi*(x1v-x2v)) } x <- expand.grid(x1v=seq(-1,1,l=100),x2v=seq(-1,1,l=100)) imagebar(seq(-1,1,l=100),seq(-1,1,l=100),matrix(myfun(x$x1v,x$x2v),100,100), col=c("blue","green","light green","yellow"),xlab="x1v",ylab="x2v", zlab=expression(hat(italic(y))),zlim=c(-5,5),zaxis.at=c(-5,0,5), zaxis.labels=c("low","med","high"))
This function creates a list containing the necessary information to fit a smoothing spline anova model (see bigssa
).
makessa(formula,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.fit=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL, random=NULL,remlalg=c("FS","NR","EM","none"),remliter=500, remltol=10^-4,remltau=NULL)
makessa(formula,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.fit=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL, random=NULL,remlalg=c("FS","NR","EM","none"),remliter=500, remltol=10^-4,remltau=NULL)
formula |
An object of class " |
data |
Optional data frame, list, or environment containing the variables in |
type |
List of smoothing spline types for predictors in |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
rparm |
List of rounding parameters for each predictor. See Details. |
lambdas |
Vector of global smoothing parameters to try. Default uses |
skip.iter |
Logical indicating whether to skip the iterative smoothing parameter update. Using |
se.fit |
Logical indicating if the standard errors of the fitted values should be estimated. |
rseed |
Random seed for knot sampling. Input is ignored if |
gcvopts |
Control parameters for optimization. List with 3 elements: (a) |
knotcheck |
If |
gammas |
List of initial smoothing parameters for each predictor. See Details. |
weights |
Vector of positive weights for fitting (default is vector of ones). |
random |
Adds random effects to model (see Random Effects section). |
remlalg |
REML algorithm for estimating variance components (see Random Effects section). Input is ignored if |
remliter |
Maximum number of iterations for REML estimation of variance components. Input is ignored if |
remltol |
Convergence tolerance for REML estimation of variance components. Input is ignored if |
remltau |
Initial estimate of variance parameters for REML estimation of variance components. Input is ignored if |
See bigssa
and below example for more details.
An object of class "makessa", which can be input to bigssa
.
When inputting a "makessa" class object into bigssa
, the formula input to bigssa
must be a nested version of the original formula input to makessa
. In other words, you cannot add any new effects after a "makessa" object has been created, but you can drop (remove) effects from the model.
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2016). Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Statistics and Computing, 26, 1319-1336.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x1v <- runif(500) x2v <- runif(500) y <- myfun(x1v,x2v) + rnorm(500) # fit 2 possible models (create information 2 separate times) system.time({ intmod <- bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) addmod <- bigssa(y~x1v+x2v,type=list(x1v="cub",x2v="cub"),nknots=50) }) # fit 2 possible models (create information 1 time) system.time({ makemod <- makessa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) int2mod <- bigssa(y~x1v*x2v,makemod) add2mod <- bigssa(y~x1v+x2v,makemod) }) # check difference (no difference) crossprod( intmod$fitted.values - int2mod$fitted.values ) crossprod( addmod$fitted.values - add2mod$fitted.values )
########## EXAMPLE ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x1v <- runif(500) x2v <- runif(500) y <- myfun(x1v,x2v) + rnorm(500) # fit 2 possible models (create information 2 separate times) system.time({ intmod <- bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) addmod <- bigssa(y~x1v+x2v,type=list(x1v="cub",x2v="cub"),nknots=50) }) # fit 2 possible models (create information 1 time) system.time({ makemod <- makessa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) int2mod <- bigssa(y~x1v*x2v,makemod) add2mod <- bigssa(y~x1v+x2v,makemod) }) # check difference (no difference) crossprod( intmod$fitted.values - int2mod$fitted.values ) crossprod( addmod$fitted.values - add2mod$fitted.values )
This function creates a list containing the necessary information to fit a generalized smoothing spline anova model (see bigssg
).
makessg(formula,family,data,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.lp=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL, gcvtype=c("acv","gacv","gacv.old"))
makessg(formula,family,data,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.lp=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL, gcvtype=c("acv","gacv","gacv.old"))
formula |
An object of class " |
family |
Distribution for response. One of five options: |
data |
Optional data frame, list, or environment containing the variables in |
type |
List of smoothing spline types for predictors in |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
rparm |
List of rounding parameters for each predictor. See Details. |
lambdas |
Vector of global smoothing parameters to try. Default uses |
skip.iter |
Logical indicating whether to skip the iterative smoothing parameter update. Using |
se.lp |
Logical indicating if the standard errors of the linear predictors ( |
rseed |
Random seed for knot sampling. Input is ignored if |
gcvopts |
Control parameters for optimization. List with 6 elements: (i) |
knotcheck |
If |
gammas |
List of initial smoothing parameters for each predictor. See Details. |
weights |
Vector of positive weights for fitting (default is vector of ones). |
gcvtype |
Cross-validation criterion for selecting smoothing parameters (see Details). |
See bigssg
and below example for more details.
An object of class "makessg", which can be input to bigssg
.
When inputting a "makessg" class object into bigssg
, the formula input to bigssg
must be a nested version of the original formula input to makessg
. In other words, you cannot add any new effects after a "makessg" object has been created, but you can drop (remove) effects from the model.
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Gu, C. and Xiang, D. (2001). Cross-validating non-Gaussian data: Generalized approximate cross-validation revisited. Journal of Computational and Graphical Statistics, 10, 581-591.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE ########## # function with two continuous predictors set.seed(1) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } ndpts <- 1000 x1v <- runif(ndpts) x2v <- runif(ndpts) # binomial response (no weights) set.seed(773) lp <- myfun(x1v,x2v) p <- 1/(1+exp(-lp)) y <- rbinom(n=ndpts,size=1,p=p) # fit 2 possible models (create information 2 separate times) system.time({ intmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50) addmod <- bigssg(y~x1v+x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50) }) # fit 2 possible models (create information 1 time) system.time({ makemod <- makessg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50) int2mod <- bigssg(y~x1v*x2v,data=makemod) add2mod <- bigssg(y~x1v+x2v,data=makemod) }) # check difference (no difference) crossprod( intmod$fitted.values - int2mod$fitted.values ) crossprod( addmod$fitted.values - add2mod$fitted.values )
########## EXAMPLE ########## # function with two continuous predictors set.seed(1) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } ndpts <- 1000 x1v <- runif(ndpts) x2v <- runif(ndpts) # binomial response (no weights) set.seed(773) lp <- myfun(x1v,x2v) p <- 1/(1+exp(-lp)) y <- rbinom(n=ndpts,size=1,p=p) # fit 2 possible models (create information 2 separate times) system.time({ intmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50) addmod <- bigssg(y~x1v+x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50) }) # fit 2 possible models (create information 1 time) system.time({ makemod <- makessg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50) int2mod <- bigssg(y~x1v*x2v,data=makemod) add2mod <- bigssg(y~x1v+x2v,data=makemod) }) # check difference (no difference) crossprod( intmod$fitted.values - int2mod$fitted.values ) crossprod( addmod$fitted.values - add2mod$fitted.values )
This function creates a list containing the necessary information to fit a smoothing spline with parametric effects (see bigssp
).
makessp(formula,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.fit=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,thetas=NULL,weights=NULL, random=NULL,remlalg=c("FS","NR","EM","none"),remliter=500, remltol=10^-4,remltau=NULL)
makessp(formula,data=NULL,type=NULL,nknots=NULL,rparm=NA, lambdas=NULL,skip.iter=TRUE,se.fit=FALSE,rseed=1234, gcvopts=NULL,knotcheck=TRUE,thetas=NULL,weights=NULL, random=NULL,remlalg=c("FS","NR","EM","none"),remliter=500, remltol=10^-4,remltau=NULL)
formula |
An object of class " |
data |
Optional data frame, list, or environment containing the variables in |
type |
List of smoothing spline types for predictors in |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
rparm |
List of rounding parameters for each predictor. See Details. |
lambdas |
Vector of global smoothing parameters to try. Default uses |
skip.iter |
Logical indicating whether to skip the iterative smoothing parameter update. Using |
se.fit |
Logical indicating if the standard errors of the fitted values should be estimated. |
rseed |
Random seed for knot sampling. Input is ignored if |
gcvopts |
Control parameters for optimization. List with 3 elements: (a) |
knotcheck |
If |
thetas |
List of initial smoothing parameters for each predictor subspace. See Details. |
weights |
Vector of positive weights for fitting (default is vector of ones). |
random |
Adds random effects to model (see Random Effects section). |
remlalg |
REML algorithm for estimating variance components (see Random Effects section). Input is ignored if |
remliter |
Maximum number of iterations for REML estimation of variance components. Input is ignored if |
remltol |
Convergence tolerance for REML estimation of variance components. Input is ignored if |
remltau |
Initial estimate of variance parameters for REML estimation of variance components. Input is ignored if |
See bigssp
and below example for more details.
An object of class "makessp", which can be input to bigssp
.
When inputting a "makessp" class object into bigssp
, the formula input to bigssp
must be a nested version of the original formula input to makessp
. In other words, you cannot add any new effects after a "makessp" object has been created, but you can drop (remove) effects from the model.
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2016). Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Statistics and Computing, 26, 1319-1336.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x1v <- runif(500) x2v <- runif(500) y <- myfun(x1v,x2v) + rnorm(500) # fit 2 possible models (create information 2 separate times) system.time({ intmod <- bigssp(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) addmod <- bigssp(y~x1v+x2v,type=list(x1v="cub",x2v="cub"),nknots=50) }) # fit 2 possible models (create information 1 time) system.time({ makemod <- makessp(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) int2mod <- bigssp(y~x1v*x2v,makemod) add2mod <- bigssp(y~x1v+x2v,makemod) }) # check difference (no difference) crossprod( intmod$fitted.values - int2mod$fitted.values ) crossprod( addmod$fitted.values - add2mod$fitted.values )
########## EXAMPLE ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x1v <- runif(500) x2v <- runif(500) y <- myfun(x1v,x2v) + rnorm(500) # fit 2 possible models (create information 2 separate times) system.time({ intmod <- bigssp(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) addmod <- bigssp(y~x1v+x2v,type=list(x1v="cub",x2v="cub"),nknots=50) }) # fit 2 possible models (create information 1 time) system.time({ makemod <- makessp(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) int2mod <- bigssp(y~x1v*x2v,makemod) add2mod <- bigssp(y~x1v+x2v,makemod) }) # check difference (no difference) crossprod( intmod$fitted.values - int2mod$fitted.values ) crossprod( addmod$fitted.values - add2mod$fitted.values )
Given a real-valued response vector and an ordinal predictor vector
with
, an ordinal smoothing spline model has the form
where is the
-th observation's respone,
is the
-th observation's predictor,
is an unknown function relating the response and predictor, and
is iid Gaussian error.
ordspline(x, y, knots, weights, lambda, monotone=FALSE)
ordspline(x, y, knots, weights, lambda, monotone=FALSE)
x |
Predictor vector. |
y |
Response vector. Must be same length as |
knots |
Either a scalar giving the number of equidistant knots to use, or a vector of values to use as the spline knots. If left blank, the number of knots is |
weights |
Weights vector (for weighted penalized least squares). Must be same length as |
lambda |
Smoothing parameter. If left blank, |
monotone |
If |
To estimate I minimize the penalized least-squares functional
where is a smoothing parameter that controls the trade-off between fitting and smoothing the data.
Default use of the function estimates by minimizing the GCV score:
where is the identity matrix and
is the smoothing matrix.
fitted.values |
Vector of fitted values. |
se.fit |
Vector of standard errors of |
sigma |
Estimated error standard deviation, i.e., |
lambda |
Chosen smoothing parameter. |
info |
Model fit information: vector containing the GCV, R-squared, AIC, and BIC of fit model (assuming Gaussian error). |
coef |
Spline basis function coefficients. |
coef.csqrt |
Matrix square-root of covariace matrix of |
n |
Number of data points, i.e., |
df |
Effective degrees of freedom (trace of smoothing matrix). |
xunique |
Unique elements of |
x |
Predictor vector (same as input). |
y |
Response vector (same as input). |
residuals |
Residual vector, i.e., |
knots |
Spline knots used for fit. |
monotone |
Logical (same as input). |
When inputting user-specified knots
, all values in knots
must match a corresponding value in x
.
The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
########## EXAMPLE ########## # generate some data n <- 100 nk <- 50 x <- seq(-3,3,length.out=n) eta <- (sin(2*x/pi) + 0.25*x^3 + 0.05*x^5)/15 set.seed(1) y <- eta + rnorm(n, sd=0.5) # plot data and true eta plot(x, y) lines(x, eta, col="blue", lwd=2) # fit ordinal smoothing spline ossmod <- ordspline(x, y, knots=nk) lines(ossmod$x, ossmod$fit, col="red", lwd=2) # fit monotonic smoothing spline mssmod <- ordspline(x, y, knots=nk, monotone=TRUE) lines(mssmod$x, mssmod$fit, col="purple", lwd=2)
########## EXAMPLE ########## # generate some data n <- 100 nk <- 50 x <- seq(-3,3,length.out=n) eta <- (sin(2*x/pi) + 0.25*x^3 + 0.05*x^5)/15 set.seed(1) y <- eta + rnorm(n, sd=0.5) # plot data and true eta plot(x, y) lines(x, eta, col="blue", lwd=2) # fit ordinal smoothing spline ossmod <- ordspline(x, y, knots=nk) lines(ossmod$x, ossmod$fit, col="red", lwd=2) # fit monotonic smoothing spline mssmod <- ordspline(x, y, knots=nk, monotone=TRUE) lines(mssmod$x, mssmod$fit, col="purple", lwd=2)
This is a modification to the R function plot
that adds a colorbar to the margin.
plotbar(x,y,z,xlim=NULL,ylim=NULL,zlim=NULL, zlab=NULL,zcex.axis=NULL,zcex.lab=NULL, zaxis.at = NULL, zaxis.labels = TRUE, col=NULL,ncolor=21,drawbar=TRUE,zline=2, pltimage=c(.2,.8,.2,.8),pltbar=c(.82,.85,.2,.8),...)
plotbar(x,y,z,xlim=NULL,ylim=NULL,zlim=NULL, zlab=NULL,zcex.axis=NULL,zcex.lab=NULL, zaxis.at = NULL, zaxis.labels = TRUE, col=NULL,ncolor=21,drawbar=TRUE,zline=2, pltimage=c(.2,.8,.2,.8),pltbar=c(.82,.85,.2,.8),...)
x , y
|
The x and y coordinates of the points to plot. |
z |
Numeric vector the same length as |
xlim , ylim
|
Ranges for the plotted |
zlim |
The minimum and maximum |
zlab |
Label for the colorbar. |
zcex.axis |
The magnification to be used for the z-axis annotation (colorbar scale). |
zcex.lab |
The magnification to be used for the z-axis label ( |
zaxis.at |
The points at which tick-marks are to be drawn for the colorbar. Points outside of the range of |
zaxis.labels |
This can either be a logical value specifying whether (numerical) annotations are to be made at the tickmarks, or a character or expression vector of labels to be placed at the tickpoints. |
col |
Color scheme to use. Default is from |
ncolor |
The number of colors to use in the color scheme. |
drawbar |
Logical indicating if the colorbar should be drawn. |
zline |
Number of lines into the margin at which the axis line will be drawn (see |
pltimage |
A vector of the form c(x1, x2, y1, y2) giving the coordinates of the image region as fractions of the current figure region (see |
pltbar |
A vector of the form c(x1, x2, y1, y2) giving the coordinates of the colorbar region as fractions of the current figure region (see |
... |
Additional arguments to be passed to |
Produces a plot
with a colorbar.
Nathaniel E. Helwig <[email protected]>
########## EXAMPLE 1 ########## myfun <- function(x){ 2*sin(sqrt(x[,1]^2+x[,2]^2+.1))/sqrt(x[,1]^2+x[,2]^2+.1) } x <- expand.grid(seq(-8,8,l=100),seq(-8,8,l=100)) plotbar(x[,1],x[,2],myfun(x), xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y)))) ########## EXAMPLE 2 ########## myfun <- function(x1v,x2v){ sin(2*pi*x1v) + 2*sin(sqrt(x2v^2+.1))/sqrt(x2v^2+.1) } x <- expand.grid(x1v=seq(0,1,l=100),x2v=seq(-8,8,l=100)) plotbar(x[,1],x[,2],myfun(x$x1v,x$x2v), col=c("red","orange","yellow","white"), xlab="x1v",ylab="x2v",zlab=expression(hat(italic(y)))) ########## EXAMPLE 3 ########## myfun <- function(x1v,x2v){ sin(3*pi*x1v) + sin(2*pi*x2v) + 3*cos(pi*(x1v-x2v)) } x <- expand.grid(x1v=seq(-1,1,l=100),x2v=seq(-1,1,l=100)) plotbar(x[,1],x[,2],myfun(x$x1v,x$x2v), col=c("blue","green","light green","yellow"), xlab="x1v",ylab="x2v",zlab=expression(hat(italic(y))))
########## EXAMPLE 1 ########## myfun <- function(x){ 2*sin(sqrt(x[,1]^2+x[,2]^2+.1))/sqrt(x[,1]^2+x[,2]^2+.1) } x <- expand.grid(seq(-8,8,l=100),seq(-8,8,l=100)) plotbar(x[,1],x[,2],myfun(x), xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y)))) ########## EXAMPLE 2 ########## myfun <- function(x1v,x2v){ sin(2*pi*x1v) + 2*sin(sqrt(x2v^2+.1))/sqrt(x2v^2+.1) } x <- expand.grid(x1v=seq(0,1,l=100),x2v=seq(-8,8,l=100)) plotbar(x[,1],x[,2],myfun(x$x1v,x$x2v), col=c("red","orange","yellow","white"), xlab="x1v",ylab="x2v",zlab=expression(hat(italic(y)))) ########## EXAMPLE 3 ########## myfun <- function(x1v,x2v){ sin(3*pi*x1v) + sin(2*pi*x2v) + 3*cos(pi*(x1v-x2v)) } x <- expand.grid(x1v=seq(-1,1,l=100),x2v=seq(-1,1,l=100)) plotbar(x[,1],x[,2],myfun(x$x1v,x$x2v), col=c("blue","green","light green","yellow"), xlab="x1v",ylab="x2v",zlab=expression(hat(italic(y))))
This is a modification to the R function plot
that adds confidence intervals to the plot.
plotci(x, y, se, level=0.95, cval=NULL, col="blue", col.ci="cyan", alpha=0.65, add=FALSE, type="l", link=function(y){y}, axes=TRUE, bars=FALSE, barlty=1, barlwd=2, bw=0.2, ...)
plotci(x, y, se, level=0.95, cval=NULL, col="blue", col.ci="cyan", alpha=0.65, add=FALSE, type="l", link=function(y){y}, axes=TRUE, bars=FALSE, barlty=1, barlwd=2, bw=0.2, ...)
x , y
|
The x and y coordinates of the points to plot. |
se |
Numeric vector the same length as |
level |
Significance level for the confidence interval. Default forms 95% interval. |
cval |
Critical value for the confidence interval. Default uses |
col |
Color for plotting the relationship between |
col.ci |
Color for plotting the confidence interval. |
alpha |
Transparency used for plotting confidence polygons. Only used when |
add |
Logical indicating whether lines should be added to current plot. |
type |
Type of plot to create (defaults to "l" for lines). |
link |
Link function to apply. See Details. |
axes |
Logical indicating if the axes should be drawn. |
bars |
Logical indicating if confidence bars should be plotted instead of polygons. |
barlty , barlwd
|
Line type and width for confidence bars. Only used when |
bw |
Positive scalar giving the width of the confidence bars. Only used when |
... |
Additional arguments to be passed to |
The plotted confidence interval is c(link(y-cval*se), link(y+cval*se))
where link
is the user-specified link function and cval
is the user-sepcified critival value, which defaults to cval = qnorm(1-(1-level)/2)
.
Produces a plot
with a colorbar.
Nathaniel E. Helwig <[email protected]>
########## EXAMPLE ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # fit cubic smoothing spline cubmod <- bigspline(x,y) newdata <- data.frame(x=seq(0,1,length=20)) ypred <- predict(cubmod, newdata, se.fit=TRUE) # plot predictions with CIs in two ways plotci(newdata$x, ypred$fit, ypred$se.fit) plotci(newdata$x, ypred$fit, ypred$se.fit, type="p", bars=TRUE, bw=0.02)
########## EXAMPLE ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # fit cubic smoothing spline cubmod <- bigspline(x,y) newdata <- data.frame(x=seq(0,1,length=20)) ypred <- predict(cubmod, newdata, se.fit=TRUE) # plot predictions with CIs in two ways plotci(newdata$x, ypred$fit, ypred$se.fit) plotci(newdata$x, ypred$fit, ypred$se.fit, type="p", bars=TRUE, bw=0.02)
Get fitted values and standard error estimates for cubic smoothing splines.
## S3 method for class 'bigspline' predict(object,newdata=NULL,se.fit=FALSE, effect=c("all","0","lin","non"), design=FALSE,smoothMatrix=FALSE,...)
## S3 method for class 'bigspline' predict(object,newdata=NULL,se.fit=FALSE, effect=c("all","0","lin","non"), design=FALSE,smoothMatrix=FALSE,...)
object |
Object of class "bigspline", which is output from |
newdata |
Vector containing new data points for prediction. See Details and Example. Default of |
se.fit |
Logical indicating whether the standard errors of the fitted values should be estimated. Default is |
effect |
Which effect to estimate: |
design |
Logical indicating whether the design matrix should be returned. |
smoothMatrix |
Logical indicating whether the smoothing matrix should be returned. |
... |
Ignored. |
Uses the coefficient and smoothing parameter estimates from a fit cubic smoothing spline (estimated by bigspline
) to predict for new data.
If se.fit=FALSE
, design=FALSE
, and smoothMatrix=FALSE
, returns vector of fitted values.
Otherwise returns list with elements:
fit |
Vector of fitted values |
se.fit |
Vector of standard errors of fitted values (if |
X |
Design matrix used to create fitted values (if |
ix |
Index vector such that |
S |
Smoothing matrix corresponding to fitted values (if |
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # fit cubic spline model cubmod <- bigspline(x,y) crossprod( predict(cubmod) - myfun(x) )/10^4 # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(cubmod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,yc$fit,type="l") lines(newdata$x,yc$fit+qnorm(.975)*yc$se.fit,lty=3) lines(newdata$x,yc$fit-qnorm(.975)*yc$se.fit,lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(cubmod,newdata,se.fit=TRUE,effect="0") ycl <- predict(cubmod,newdata,se.fit=TRUE,effect="lin") ycn <- predict(cubmod,newdata,se.fit=TRUE,effect="non") crossprod( yc$fit - (yc0$fit + ycl$fit + ycn$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,ycl$fit,type="l",main="Linear effect") lines(newdata$x,ycl$fit+qnorm(.975)*ycl$se.fit,lty=3) lines(newdata$x,ycl$fit-qnorm(.975)*ycl$se.fit,lty=3) plot(newdata$x,ycn$fit,type="l",main="Nonlinear effect") lines(newdata$x,ycn$fit+qnorm(.975)*ycn$se.fit,lty=3) lines(newdata$x,ycn$fit-qnorm(.975)*ycn$se.fit,lty=3) ########## EXAMPLE 2 ########## # define (same) univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # fit a different cubic spline model cubamod <- bigspline(x,y,type="cub0") crossprod( predict(cubamod) - myfun(x) )/10^4 # define (same) new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data ya <- predict(cubamod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,ya$fit,type="l") lines(newdata$x,ya$fit+qnorm(.975)*ya$se.fit,lty=3) lines(newdata$x,ya$fit-qnorm(.975)*ya$se.fit,lty=3) # predict constant, linear, and nonlinear effects ya0 <- predict(cubamod,newdata,se.fit=TRUE,effect="0") yal <- predict(cubamod,newdata,se.fit=TRUE,effect="lin") yan <- predict(cubamod,newdata,se.fit=TRUE,effect="non") crossprod( ya$fit - (ya0$fit + yal$fit + yan$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,yal$fit,type="l",main="Linear effect") lines(newdata$x,yal$fit+qnorm(.975)*yal$se.fit,lty=3) lines(newdata$x,yal$fit-qnorm(.975)*yal$se.fit,lty=3) plot(newdata$x,yan$fit,type="l",main="Nonlinear effect") lines(newdata$x,yan$fit+qnorm(.975)*yan$se.fit,lty=3) lines(newdata$x,yan$fit-qnorm(.975)*yan$se.fit,lty=3)
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # fit cubic spline model cubmod <- bigspline(x,y) crossprod( predict(cubmod) - myfun(x) )/10^4 # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(cubmod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,yc$fit,type="l") lines(newdata$x,yc$fit+qnorm(.975)*yc$se.fit,lty=3) lines(newdata$x,yc$fit-qnorm(.975)*yc$se.fit,lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(cubmod,newdata,se.fit=TRUE,effect="0") ycl <- predict(cubmod,newdata,se.fit=TRUE,effect="lin") ycn <- predict(cubmod,newdata,se.fit=TRUE,effect="non") crossprod( yc$fit - (yc0$fit + ycl$fit + ycn$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,ycl$fit,type="l",main="Linear effect") lines(newdata$x,ycl$fit+qnorm(.975)*ycl$se.fit,lty=3) lines(newdata$x,ycl$fit-qnorm(.975)*ycl$se.fit,lty=3) plot(newdata$x,ycn$fit,type="l",main="Nonlinear effect") lines(newdata$x,ycn$fit+qnorm(.975)*ycn$se.fit,lty=3) lines(newdata$x,ycn$fit-qnorm(.975)*ycn$se.fit,lty=3) ########## EXAMPLE 2 ########## # define (same) univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # fit a different cubic spline model cubamod <- bigspline(x,y,type="cub0") crossprod( predict(cubamod) - myfun(x) )/10^4 # define (same) new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data ya <- predict(cubamod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,ya$fit,type="l") lines(newdata$x,ya$fit+qnorm(.975)*ya$se.fit,lty=3) lines(newdata$x,ya$fit-qnorm(.975)*ya$se.fit,lty=3) # predict constant, linear, and nonlinear effects ya0 <- predict(cubamod,newdata,se.fit=TRUE,effect="0") yal <- predict(cubamod,newdata,se.fit=TRUE,effect="lin") yan <- predict(cubamod,newdata,se.fit=TRUE,effect="non") crossprod( ya$fit - (ya0$fit + yal$fit + yan$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,yal$fit,type="l",main="Linear effect") lines(newdata$x,yal$fit+qnorm(.975)*yal$se.fit,lty=3) lines(newdata$x,yal$fit-qnorm(.975)*yal$se.fit,lty=3) plot(newdata$x,yan$fit,type="l",main="Nonlinear effect") lines(newdata$x,yan$fit+qnorm(.975)*yan$se.fit,lty=3) lines(newdata$x,yan$fit-qnorm(.975)*yan$se.fit,lty=3)
Get fitted values and standard error estimates for smoothing spline anova models.
## S3 method for class 'bigssa' predict(object,newdata=NULL,se.fit=FALSE,include=object$tnames, effect=c("all","0","lin","non"),includeint=FALSE, design=FALSE,smoothMatrix=FALSE,intercept=NULL,...)
## S3 method for class 'bigssa' predict(object,newdata=NULL,se.fit=FALSE,include=object$tnames, effect=c("all","0","lin","non"),includeint=FALSE, design=FALSE,smoothMatrix=FALSE,intercept=NULL,...)
object |
Object of class "bigssa", which is output from |
newdata |
Data frame or list containing the new data points for prediction. Variable names must match those used in the |
se.fit |
Logical indicating whether the standard errors of the fitted values should be estimated. Default is |
include |
Which terms to include in the estimate. You can get fitted values for any combination of terms in the |
effect |
Which effect to estimate: |
includeint |
Logical indicating whether the intercept should be included in the prediction. If |
design |
Logical indicating whether the design matrix should be returned. |
smoothMatrix |
Logical indicating whether the smoothing matrix should be returned. |
intercept |
Logical indicating whether the intercept should be included in the prediction. When used, this input overrides the |
... |
Ignored. |
Uses the coefficient and smoothing parameter estimates from a fit smoothing spline anova (estimated by bigssa
) to predict for new data.
If se.fit=FALSE
, design=FALSE
, and smoothMatrix=FALSE
, returns vector of fitted values.
Otherwise returns list with elements:
fit |
Vector of fitted values |
se.fit |
Vector of standard errors of fitted values (if |
X |
Design matrix used to create fitted values (if |
ix |
Index vector such that |
S |
Smoothing matrix corresponding to fitted values (if |
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2016). Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Statistics and Computing, 26, 1319-1336.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(500) y <- myfun(x) + rnorm(500) # fit cubic spline model cubmod <- bigssa(y~x,type="cub",nknots=30) crossprod( predict(cubmod) - myfun(x) )/500 # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(cubmod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,yc$fit,type="l") lines(newdata$x,yc$fit+qnorm(.975)*yc$se.fit,lty=3) lines(newdata$x,yc$fit-qnorm(.975)*yc$se.fit,lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(cubmod,newdata,se.fit=TRUE,effect="0") ycl <- predict(cubmod,newdata,se.fit=TRUE,effect="lin") ycn <- predict(cubmod,newdata,se.fit=TRUE,effect="non") crossprod( yc$fit - (yc0$fit + ycl$fit + ycn$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,ycl$fit,type="l",main="Linear effect") lines(newdata$x,ycl$fit+qnorm(.975)*ycl$se.fit,lty=3) lines(newdata$x,ycl$fit-qnorm(.975)*ycl$se.fit,lty=3) plot(newdata$x,ycn$fit,type="l",main="Nonlinear effect") lines(newdata$x,ycn$fit+qnorm(.975)*ycn$se.fit,lty=3) lines(newdata$x,ycn$fit-qnorm(.975)*ycn$se.fit,lty=3) ########## EXAMPLE 2 ########## # define bivariate function and data set.seed(773) myfun<-function(x){ 2 + x[,1]/10 - x[,2]/5 + 2*sin(sqrt(x[,1]^2+x[,2]^2+.1))/sqrt(x[,1]^2+x[,2]^2+.1) } x1v <- runif(500)*16-8 x2v <- runif(500)*16-8 y <- myfun(cbind(x1v,x2v)) + rnorm(500) # tensor product cubic splines with 50 knots cubmod <- bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) crossprod( predict(cubmod) - myfun(cbind(x1v,x2v)) )/500 # define new data for prediction xnew <- as.matrix(expand.grid(seq(-8,8,l=50),seq(-8,8,l=50))) newdata <- list(x1v=xnew[,1],x2v=xnew[,2]) # get fitted values for new data yp <- predict(cubmod,newdata) # plot results imagebar(seq(-8,8,l=50),seq(-8,8,l=50),matrix(yp,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y)))) # predict linear and nonlinear effects for x1v newdata <- list(x1v=seq(-8,8,length.out=100)) yl <- predict(cubmod,newdata,include="x1v",effect="lin",se.fit=TRUE) yn <- predict(cubmod,newdata,include="x1v",effect="non",se.fit=TRUE) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x1v,yl$fit,type="l",main="Linear effect") lines(newdata$x1v,yl$fit+qnorm(.975)*yl$se.fit,lty=3) lines(newdata$x1v,yl$fit-qnorm(.975)*yl$se.fit,lty=3) plot(newdata$x1v,yn$fit,type="l",main="Nonlinear effect",ylim=c(-.3,.4)) lines(newdata$x1v,yn$fit+qnorm(.975)*yn$se.fit,lty=3) lines(newdata$x1v,yn$fit-qnorm(.975)*yn$se.fit,lty=3)
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(500) y <- myfun(x) + rnorm(500) # fit cubic spline model cubmod <- bigssa(y~x,type="cub",nknots=30) crossprod( predict(cubmod) - myfun(x) )/500 # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(cubmod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,yc$fit,type="l") lines(newdata$x,yc$fit+qnorm(.975)*yc$se.fit,lty=3) lines(newdata$x,yc$fit-qnorm(.975)*yc$se.fit,lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(cubmod,newdata,se.fit=TRUE,effect="0") ycl <- predict(cubmod,newdata,se.fit=TRUE,effect="lin") ycn <- predict(cubmod,newdata,se.fit=TRUE,effect="non") crossprod( yc$fit - (yc0$fit + ycl$fit + ycn$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,ycl$fit,type="l",main="Linear effect") lines(newdata$x,ycl$fit+qnorm(.975)*ycl$se.fit,lty=3) lines(newdata$x,ycl$fit-qnorm(.975)*ycl$se.fit,lty=3) plot(newdata$x,ycn$fit,type="l",main="Nonlinear effect") lines(newdata$x,ycn$fit+qnorm(.975)*ycn$se.fit,lty=3) lines(newdata$x,ycn$fit-qnorm(.975)*ycn$se.fit,lty=3) ########## EXAMPLE 2 ########## # define bivariate function and data set.seed(773) myfun<-function(x){ 2 + x[,1]/10 - x[,2]/5 + 2*sin(sqrt(x[,1]^2+x[,2]^2+.1))/sqrt(x[,1]^2+x[,2]^2+.1) } x1v <- runif(500)*16-8 x2v <- runif(500)*16-8 y <- myfun(cbind(x1v,x2v)) + rnorm(500) # tensor product cubic splines with 50 knots cubmod <- bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50) crossprod( predict(cubmod) - myfun(cbind(x1v,x2v)) )/500 # define new data for prediction xnew <- as.matrix(expand.grid(seq(-8,8,l=50),seq(-8,8,l=50))) newdata <- list(x1v=xnew[,1],x2v=xnew[,2]) # get fitted values for new data yp <- predict(cubmod,newdata) # plot results imagebar(seq(-8,8,l=50),seq(-8,8,l=50),matrix(yp,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y)))) # predict linear and nonlinear effects for x1v newdata <- list(x1v=seq(-8,8,length.out=100)) yl <- predict(cubmod,newdata,include="x1v",effect="lin",se.fit=TRUE) yn <- predict(cubmod,newdata,include="x1v",effect="non",se.fit=TRUE) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x1v,yl$fit,type="l",main="Linear effect") lines(newdata$x1v,yl$fit+qnorm(.975)*yl$se.fit,lty=3) lines(newdata$x1v,yl$fit-qnorm(.975)*yl$se.fit,lty=3) plot(newdata$x1v,yn$fit,type="l",main="Nonlinear effect",ylim=c(-.3,.4)) lines(newdata$x1v,yn$fit+qnorm(.975)*yn$se.fit,lty=3) lines(newdata$x1v,yn$fit-qnorm(.975)*yn$se.fit,lty=3)
Get fitted values and standard error estimates for generalized smoothing spline anova models.
## S3 method for class 'bigssg' predict(object,newdata=NULL,se.lp=FALSE,include=object$tnames, effect=c("all","0","lin","non"),includeint=FALSE, design=FALSE,smoothMatrix=FALSE,intercept=NULL,...)
## S3 method for class 'bigssg' predict(object,newdata=NULL,se.lp=FALSE,include=object$tnames, effect=c("all","0","lin","non"),includeint=FALSE, design=FALSE,smoothMatrix=FALSE,intercept=NULL,...)
object |
Object of class "bigssg", which is output from |
newdata |
Data frame or list containing the new data points for prediction. Variable names must match those used in the |
se.lp |
Logical indicating if the standard errors of the linear predictors ( |
include |
Which terms to include in the estimate. You can get fitted values for any combination of terms in the |
effect |
Which effect to estimate: |
includeint |
Logical indicating whether the intercept should be included in the prediction. If |
design |
Logical indicating whether the design matrix should be returned. |
smoothMatrix |
Logical indicating whether the smoothing matrix should be returned. |
intercept |
Logical indicating whether the intercept should be included in the prediction. When used, this input overrides the |
... |
Ignored. |
Uses the coefficient and smoothing parameter estimates from a fit generalized smoothing spline anova (estimated by bigssg
) to predict for new data.
Returns list with elements:
fitted.values |
Vector of fitted values (on data scale) |
linear.predictors |
Vector of fitted values (on link scale) |
se.lp |
Vector of standard errors of linear predictors (if |
X |
Design matrix used to create linear predictors (if |
ix |
Index vector such that |
S |
Smoothing matrix corresponding to fitted values (if |
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Gu, C. and Xiang, D. (2001). Cross-validating non-Gaussian data: Generalized approximate cross-validation revisited. Journal of Computational and Graphical Statistics, 10, 581-591.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE 1 ########## # define univariate function and data set.seed(1) myfun <- function(x){ sin(2*pi*x) } ndpts <- 1000 x <- runif(ndpts) # negative binomial response (unknown dispersion) set.seed(773) lp <- myfun(x) mu <- exp(lp) y <- rnbinom(n=ndpts,size=2,mu=mu) # fit cubic spline model cubmod <- bigssg(y~x,family="negbin",type="cub",nknots=20) 1/cubmod$dispersion ## dispersion = 1/size crossprod( lp - cubmod$linear.predictor )/length(lp) # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(cubmod,newdata,se.lp=TRUE) # plot results with 95% Bayesian confidence interval (link scale) plot(newdata$x,yc$linear.predictor,type="l") lines(newdata$x,yc$linear.predictor+qnorm(.975)*yc$se.lp,lty=3) lines(newdata$x,yc$linear.predictor-qnorm(.975)*yc$se.lp,lty=3) # plot results with 95% Bayesian confidence interval (data scale) plot(newdata$x,yc$fitted,type="l") lines(newdata$x,exp(yc$linear.predictor+qnorm(.975)*yc$se.lp),lty=3) lines(newdata$x,exp(yc$linear.predictor-qnorm(.975)*yc$se.lp),lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(cubmod,newdata,se.lp=TRUE,effect="0") ycl <- predict(cubmod,newdata,se.lp=TRUE,effect="lin") ycn <- predict(cubmod,newdata,se.lp=TRUE,effect="non") crossprod( yc$linear - (yc0$linear + ycl$linear + ycn$linear) ) # plot results with 95% Bayesian confidence intervals (link scale) par(mfrow=c(1,2)) plot(newdata$x,ycl$linear,type="l",main="Linear effect") lines(newdata$x,ycl$linear+qnorm(.975)*ycl$se.lp,lty=3) lines(newdata$x,ycl$linear-qnorm(.975)*ycl$se.lp,lty=3) plot(newdata$x,ycn$linear,type="l",main="Nonlinear effect") lines(newdata$x,ycn$linear+qnorm(.975)*ycn$se.lp,lty=3) lines(newdata$x,ycn$linear-qnorm(.975)*ycn$se.lp,lty=3) # plot results with 95% Bayesian confidence intervals (data scale) par(mfrow=c(1,2)) plot(newdata$x,ycl$fitted,type="l",main="Linear effect") lines(newdata$x,exp(ycl$linear+qnorm(.975)*ycl$se.lp),lty=3) lines(newdata$x,exp(ycl$linear-qnorm(.975)*ycl$se.lp),lty=3) plot(newdata$x,ycn$fitted,type="l",main="Nonlinear effect") lines(newdata$x,exp(ycn$linear+qnorm(.975)*ycn$se.lp),lty=3) lines(newdata$x,exp(ycn$linear-qnorm(.975)*ycn$se.lp),lty=3) ########## EXAMPLE 2 ########## # define bivariate function and data set.seed(1) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } ndpts <- 1000 x1v <- runif(ndpts) x2v <- runif(ndpts) # binomial response (with weights) set.seed(773) lp <- myfun(x1v,x2v) p <- 1/(1+exp(-lp)) w <- sample(c(10,20,30,40,50),length(p),replace=TRUE) y <- rbinom(n=ndpts,size=w,p=p)/w ## y is proportion correct cubmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=100,weights=w) crossprod( lp - cubmod$linear.predictor )/length(lp) # define new data for prediction xnew <- as.matrix(expand.grid(seq(0,1,length=50),seq(0,1,length=50))) newdata <- list(x1v=xnew[,1],x2v=xnew[,2]) # get fitted values for new data yp <- predict(cubmod,newdata) # plot linear predictor and fitted values par(mfrow=c(2,2)) imagebar(seq(0,1,l=50),seq(0,1,l=50),matrix(myfun(newdata$x1v,newdata$x2v),50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(-4.5,1.5),main="True Linear Predictor") imagebar(seq(0,1,l=50),seq(0,1,l=50),matrix(yp$linear.predictor,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(-4.5,1.5),main="Estimated Linear Predictor") newprob <- 1/(1+exp(-myfun(newdata$x1v,newdata$x2v))) imagebar(seq(0,1,l=50),seq(0,1,l=50),matrix(newprob,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(0,0.8),main="True Probabilities") imagebar(seq(0,1,l=50),seq(0,1,l=50),matrix(yp$fitted.values,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(0,0.8),main="Estimated Probabilities") # predict linear and nonlinear effects for x1v (link scale) newdata <- list(x1v=seq(0,1,length.out=100)) yl <- predict(cubmod,newdata,include="x1v",effect="lin",se.lp=TRUE) yn <- predict(cubmod,newdata,include="x1v",effect="non",se.lp=TRUE) # plot results with 95% Bayesian confidence intervals (link scale) par(mfrow=c(1,2)) plot(newdata$x1v,yl$linear,type="l",main="Linear effect") lines(newdata$x1v,yl$linear+qnorm(.975)*yl$se.lp,lty=3) lines(newdata$x1v,yl$linear-qnorm(.975)*yl$se.lp,lty=3) plot(newdata$x1v,yn$linear,type="l",main="Nonlinear effect") lines(newdata$x1v,yn$linear+qnorm(.975)*yn$se.lp,lty=3) lines(newdata$x1v,yn$linear-qnorm(.975)*yn$se.lp,lty=3)
########## EXAMPLE 1 ########## # define univariate function and data set.seed(1) myfun <- function(x){ sin(2*pi*x) } ndpts <- 1000 x <- runif(ndpts) # negative binomial response (unknown dispersion) set.seed(773) lp <- myfun(x) mu <- exp(lp) y <- rnbinom(n=ndpts,size=2,mu=mu) # fit cubic spline model cubmod <- bigssg(y~x,family="negbin",type="cub",nknots=20) 1/cubmod$dispersion ## dispersion = 1/size crossprod( lp - cubmod$linear.predictor )/length(lp) # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(cubmod,newdata,se.lp=TRUE) # plot results with 95% Bayesian confidence interval (link scale) plot(newdata$x,yc$linear.predictor,type="l") lines(newdata$x,yc$linear.predictor+qnorm(.975)*yc$se.lp,lty=3) lines(newdata$x,yc$linear.predictor-qnorm(.975)*yc$se.lp,lty=3) # plot results with 95% Bayesian confidence interval (data scale) plot(newdata$x,yc$fitted,type="l") lines(newdata$x,exp(yc$linear.predictor+qnorm(.975)*yc$se.lp),lty=3) lines(newdata$x,exp(yc$linear.predictor-qnorm(.975)*yc$se.lp),lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(cubmod,newdata,se.lp=TRUE,effect="0") ycl <- predict(cubmod,newdata,se.lp=TRUE,effect="lin") ycn <- predict(cubmod,newdata,se.lp=TRUE,effect="non") crossprod( yc$linear - (yc0$linear + ycl$linear + ycn$linear) ) # plot results with 95% Bayesian confidence intervals (link scale) par(mfrow=c(1,2)) plot(newdata$x,ycl$linear,type="l",main="Linear effect") lines(newdata$x,ycl$linear+qnorm(.975)*ycl$se.lp,lty=3) lines(newdata$x,ycl$linear-qnorm(.975)*ycl$se.lp,lty=3) plot(newdata$x,ycn$linear,type="l",main="Nonlinear effect") lines(newdata$x,ycn$linear+qnorm(.975)*ycn$se.lp,lty=3) lines(newdata$x,ycn$linear-qnorm(.975)*ycn$se.lp,lty=3) # plot results with 95% Bayesian confidence intervals (data scale) par(mfrow=c(1,2)) plot(newdata$x,ycl$fitted,type="l",main="Linear effect") lines(newdata$x,exp(ycl$linear+qnorm(.975)*ycl$se.lp),lty=3) lines(newdata$x,exp(ycl$linear-qnorm(.975)*ycl$se.lp),lty=3) plot(newdata$x,ycn$fitted,type="l",main="Nonlinear effect") lines(newdata$x,exp(ycn$linear+qnorm(.975)*ycn$se.lp),lty=3) lines(newdata$x,exp(ycn$linear-qnorm(.975)*ycn$se.lp),lty=3) ########## EXAMPLE 2 ########## # define bivariate function and data set.seed(1) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } ndpts <- 1000 x1v <- runif(ndpts) x2v <- runif(ndpts) # binomial response (with weights) set.seed(773) lp <- myfun(x1v,x2v) p <- 1/(1+exp(-lp)) w <- sample(c(10,20,30,40,50),length(p),replace=TRUE) y <- rbinom(n=ndpts,size=w,p=p)/w ## y is proportion correct cubmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=100,weights=w) crossprod( lp - cubmod$linear.predictor )/length(lp) # define new data for prediction xnew <- as.matrix(expand.grid(seq(0,1,length=50),seq(0,1,length=50))) newdata <- list(x1v=xnew[,1],x2v=xnew[,2]) # get fitted values for new data yp <- predict(cubmod,newdata) # plot linear predictor and fitted values par(mfrow=c(2,2)) imagebar(seq(0,1,l=50),seq(0,1,l=50),matrix(myfun(newdata$x1v,newdata$x2v),50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(-4.5,1.5),main="True Linear Predictor") imagebar(seq(0,1,l=50),seq(0,1,l=50),matrix(yp$linear.predictor,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(-4.5,1.5),main="Estimated Linear Predictor") newprob <- 1/(1+exp(-myfun(newdata$x1v,newdata$x2v))) imagebar(seq(0,1,l=50),seq(0,1,l=50),matrix(newprob,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(0,0.8),main="True Probabilities") imagebar(seq(0,1,l=50),seq(0,1,l=50),matrix(yp$fitted.values,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y))),zlim=c(0,0.8),main="Estimated Probabilities") # predict linear and nonlinear effects for x1v (link scale) newdata <- list(x1v=seq(0,1,length.out=100)) yl <- predict(cubmod,newdata,include="x1v",effect="lin",se.lp=TRUE) yn <- predict(cubmod,newdata,include="x1v",effect="non",se.lp=TRUE) # plot results with 95% Bayesian confidence intervals (link scale) par(mfrow=c(1,2)) plot(newdata$x1v,yl$linear,type="l",main="Linear effect") lines(newdata$x1v,yl$linear+qnorm(.975)*yl$se.lp,lty=3) lines(newdata$x1v,yl$linear-qnorm(.975)*yl$se.lp,lty=3) plot(newdata$x1v,yn$linear,type="l",main="Nonlinear effect") lines(newdata$x1v,yn$linear+qnorm(.975)*yn$se.lp,lty=3) lines(newdata$x1v,yn$linear-qnorm(.975)*yn$se.lp,lty=3)
Get fitted values and standard error estimates for smoothing splines with parametric effects.
## S3 method for class 'bigssp' predict(object,newdata=NULL,se.fit=FALSE,include=object$tnames, effect=c("all","0","lin","non"),includeint=FALSE, design=FALSE,smoothMatrix=FALSE,intercept=NULL,...)
## S3 method for class 'bigssp' predict(object,newdata=NULL,se.fit=FALSE,include=object$tnames, effect=c("all","0","lin","non"),includeint=FALSE, design=FALSE,smoothMatrix=FALSE,intercept=NULL,...)
object |
Object of class "bigssp", which is output from |
newdata |
Data frame or list containing the new data points for prediction. Variable names must match those used in the |
se.fit |
Logical indicating whether the standard errors of the fitted values should be estimated. Default is |
include |
Which terms to include in the estimate. You can get fitted values for any combination of terms in the |
effect |
Which effect to estimate: |
includeint |
Logical indicating whether the intercept should be included in the prediction. If |
design |
Logical indicating whether the design matrix should be returned. |
smoothMatrix |
Logical indicating whether the smoothing matrix should be returned. |
intercept |
Logical indicating whether the intercept should be included in the prediction. When used, this input overrides the |
... |
Ignored. |
Uses the coefficient and smoothing parameter estimates from a fit smoothing spline with parametric effects (estimated by bigssp
) to predict for new data.
If se.fit=FALSE
, design=FALSE
, and smoothMatrix=FALSE
, returns vector of fitted values.
Otherwise returns list with elements:
fit |
Vector of fitted values |
se.fit |
Vector of standard errors of fitted values (if |
X |
Design matrix used to create fitted values (if |
ix |
Index vector such that |
S |
Smoothing matrix corresponding to fitted values (if |
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Gu, C. and Wahba, G. (1991). Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM Journal on Scientific and Statistical Computing, 12, 383-398.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2016). Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Statistics and Computing, 26, 1319-1336.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(500) y <- myfun(x) + rnorm(500) # fit cubic spline model cubmod <- bigssp(y~x,type="cub",nknots=30) crossprod( predict(cubmod) - myfun(x) )/500 # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(cubmod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,yc$fit,type="l") lines(newdata$x,yc$fit+qnorm(.975)*yc$se.fit,lty=3) lines(newdata$x,yc$fit-qnorm(.975)*yc$se.fit,lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(cubmod,newdata,se.fit=TRUE,effect="0") ycl <- predict(cubmod,newdata,se.fit=TRUE,effect="lin") ycn <- predict(cubmod,newdata,se.fit=TRUE,effect="non") sum( yc$fit - (yc0$fit + ycl$fit + ycn$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,ycl$fit,type="l",main="Linear effect") lines(newdata$x,ycl$fit+qnorm(.975)*ycl$se.fit,lty=3) lines(newdata$x,ycl$fit-qnorm(.975)*ycl$se.fit,lty=3) plot(newdata$x,ycn$fit,type="l",main="Nonlinear effect") lines(newdata$x,ycn$fit+qnorm(.975)*ycn$se.fit,lty=3) lines(newdata$x,ycn$fit-qnorm(.975)*ycn$se.fit,lty=3) ########## EXAMPLE 2 ########## # define bivariate function and data set.seed(773) myfun <- function(x){ 2 + x[,1]/10 - x[,2]/5 + 2*sin(sqrt(x[,1]^2+x[,2]^2+.1))/sqrt(x[,1]^2+x[,2]^2+.1) } x <- cbind(runif(500),runif(500))*16 - 8 y <- myfun(x)+rnorm(500) # bidimensional thin-plate spline with 50 knots tpsmod <- bigssp(y~x,type="tps",nknots=50) crossprod( predict(tpsmod) - myfun(x) )/500 # define new data for prediction xnew <- as.matrix(expand.grid(seq(-8,8,length=50),seq(-8,8,length=50))) newdata <- list(x=xnew) # get fitted values for new data yp <- predict(tpsmod,newdata) # plot results imagebar(seq(-8,8,l=50),seq(-8,8,l=50),matrix(yp,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y)))) # predict linear and nonlinear effects yl <- predict(tpsmod,newdata,effect="lin") yn <- predict(tpsmod,newdata,effect="non") # plot results par(mfrow=c(1,2)) imagebar(seq(-8,8,l=50),seq(-8,8,l=50),matrix(yl,50,50), main="Linear effect",xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]),zlab=expression(hat(italic(y)))) imagebar(seq(-8,8,l=50),seq(-8,8,l=50),matrix(yn,50,50), main="Nonlinear effect",xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]),zlab=expression(hat(italic(y))))
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(500) y <- myfun(x) + rnorm(500) # fit cubic spline model cubmod <- bigssp(y~x,type="cub",nknots=30) crossprod( predict(cubmod) - myfun(x) )/500 # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(cubmod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,yc$fit,type="l") lines(newdata$x,yc$fit+qnorm(.975)*yc$se.fit,lty=3) lines(newdata$x,yc$fit-qnorm(.975)*yc$se.fit,lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(cubmod,newdata,se.fit=TRUE,effect="0") ycl <- predict(cubmod,newdata,se.fit=TRUE,effect="lin") ycn <- predict(cubmod,newdata,se.fit=TRUE,effect="non") sum( yc$fit - (yc0$fit + ycl$fit + ycn$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,ycl$fit,type="l",main="Linear effect") lines(newdata$x,ycl$fit+qnorm(.975)*ycl$se.fit,lty=3) lines(newdata$x,ycl$fit-qnorm(.975)*ycl$se.fit,lty=3) plot(newdata$x,ycn$fit,type="l",main="Nonlinear effect") lines(newdata$x,ycn$fit+qnorm(.975)*ycn$se.fit,lty=3) lines(newdata$x,ycn$fit-qnorm(.975)*ycn$se.fit,lty=3) ########## EXAMPLE 2 ########## # define bivariate function and data set.seed(773) myfun <- function(x){ 2 + x[,1]/10 - x[,2]/5 + 2*sin(sqrt(x[,1]^2+x[,2]^2+.1))/sqrt(x[,1]^2+x[,2]^2+.1) } x <- cbind(runif(500),runif(500))*16 - 8 y <- myfun(x)+rnorm(500) # bidimensional thin-plate spline with 50 knots tpsmod <- bigssp(y~x,type="tps",nknots=50) crossprod( predict(tpsmod) - myfun(x) )/500 # define new data for prediction xnew <- as.matrix(expand.grid(seq(-8,8,length=50),seq(-8,8,length=50))) newdata <- list(x=xnew) # get fitted values for new data yp <- predict(tpsmod,newdata) # plot results imagebar(seq(-8,8,l=50),seq(-8,8,l=50),matrix(yp,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y)))) # predict linear and nonlinear effects yl <- predict(tpsmod,newdata,effect="lin") yn <- predict(tpsmod,newdata,effect="non") # plot results par(mfrow=c(1,2)) imagebar(seq(-8,8,l=50),seq(-8,8,l=50),matrix(yl,50,50), main="Linear effect",xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]),zlab=expression(hat(italic(y)))) imagebar(seq(-8,8,l=50),seq(-8,8,l=50),matrix(yn,50,50), main="Nonlinear effect",xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]),zlab=expression(hat(italic(y))))
Get fitted values and standard error estimates for thin-plate splines.
## S3 method for class 'bigtps' predict(object,newdata=NULL,se.fit=FALSE, effect=c("all","0","lin","non"), design=FALSE,smoothMatrix=FALSE,...)
## S3 method for class 'bigtps' predict(object,newdata=NULL,se.fit=FALSE, effect=c("all","0","lin","non"), design=FALSE,smoothMatrix=FALSE,...)
object |
Object of class "bigtps", which is output from |
newdata |
Vector or matrix containing new data points for prediction. See Details and Example. Default of |
se.fit |
Logical indicating whether the standard errors of the fitted values should be estimated. Default is |
effect |
Which effect to estimate: |
design |
Logical indicating whether the design matrix should be returned. |
smoothMatrix |
Logical indicating whether the smoothing matrix should be returned. |
... |
Ignored. |
Uses the coefficient and smoothing parameter estimates from a fit thin-plate spline (estimated by bigtps
) to predict for new data.
If se.fit=FALSE
, design=FALSE
, and smoothMatrix=FALSE
, returns vector of fitted values.
Otherwise returns list with elements:
fit |
Vector of fitted values |
se.fit |
Vector of standard errors of fitted values (if |
X |
Design matrix used to create fitted values (if |
ix |
Index vector such that |
S |
Smoothing matrix corresponding to fitted values (if |
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # fit thin-plate spline (default 1 dim: 30 knots) tpsmod <- bigtps(x,y) crossprod( predict(tpsmod) - myfun(x) )/10^4 # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(tpsmod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,yc$fit,type="l") lines(newdata$x,yc$fit+qnorm(.975)*yc$se.fit,lty=3) lines(newdata$x,yc$fit-qnorm(.975)*yc$se.fit,lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(tpsmod,newdata,se.fit=TRUE,effect="0") ycl <- predict(tpsmod,newdata,se.fit=TRUE,effect="lin") ycn <- predict(tpsmod,newdata,se.fit=TRUE,effect="non") crossprod( yc$fit - (yc0$fit + ycl$fit + ycn$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,ycl$fit,type="l",main="Linear effect") lines(newdata$x,ycl$fit+qnorm(.975)*ycl$se.fit,lty=3) lines(newdata$x,ycl$fit-qnorm(.975)*ycl$se.fit,lty=3) plot(newdata$x,ycn$fit,type="l",main="Nonlinear effect") lines(newdata$x,ycn$fit+qnorm(.975)*ycn$se.fit,lty=3) lines(newdata$x,ycn$fit-qnorm(.975)*ycn$se.fit,lty=3) ########## EXAMPLE 2 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x <- cbind(runif(10^4),runif(10^4)) y <- myfun(x[,1],x[,2]) + rnorm(10^4) # fit thin-plate spline (default 2 dim: 100 knots) tpsmod <- bigtps(x,y) # define new data newdata <- as.matrix(expand.grid(seq(0,1,length=50),seq(0,1,length=50))) # get fitted values for new data yp <- predict(tpsmod,newdata) # plot results imagebar(seq(0,1,length=50),seq(0,1,length=50),matrix(yp,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y)))) # predict linear and nonlinear effects yl <- predict(tpsmod,newdata,effect="lin") yn <- predict(tpsmod,newdata,effect="non") # plot results par(mfrow=c(1,2)) imagebar(seq(0,1,length=50),seq(0,1,length=50),matrix(yl,50,50), main="Linear effect",xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]),zlab=expression(hat(italic(y)))) imagebar(seq(0,1,length=50),seq(0,1,length=50),matrix(yn,50,50), main="Nonlinear effect",xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]),zlab=expression(hat(italic(y))))
########## EXAMPLE 1 ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x + sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # fit thin-plate spline (default 1 dim: 30 knots) tpsmod <- bigtps(x,y) crossprod( predict(tpsmod) - myfun(x) )/10^4 # define new data for prediction newdata <- data.frame(x=seq(0,1,length.out=100)) # get fitted values and standard errors for new data yc <- predict(tpsmod,newdata,se.fit=TRUE) # plot results with 95% Bayesian confidence interval plot(newdata$x,yc$fit,type="l") lines(newdata$x,yc$fit+qnorm(.975)*yc$se.fit,lty=3) lines(newdata$x,yc$fit-qnorm(.975)*yc$se.fit,lty=3) # predict constant, linear, and nonlinear effects yc0 <- predict(tpsmod,newdata,se.fit=TRUE,effect="0") ycl <- predict(tpsmod,newdata,se.fit=TRUE,effect="lin") ycn <- predict(tpsmod,newdata,se.fit=TRUE,effect="non") crossprod( yc$fit - (yc0$fit + ycl$fit + ycn$fit) ) # plot results with 95% Bayesian confidence intervals par(mfrow=c(1,2)) plot(newdata$x,ycl$fit,type="l",main="Linear effect") lines(newdata$x,ycl$fit+qnorm(.975)*ycl$se.fit,lty=3) lines(newdata$x,ycl$fit-qnorm(.975)*ycl$se.fit,lty=3) plot(newdata$x,ycn$fit,type="l",main="Nonlinear effect") lines(newdata$x,ycn$fit+qnorm(.975)*ycn$se.fit,lty=3) lines(newdata$x,ycn$fit-qnorm(.975)*ycn$se.fit,lty=3) ########## EXAMPLE 2 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x <- cbind(runif(10^4),runif(10^4)) y <- myfun(x[,1],x[,2]) + rnorm(10^4) # fit thin-plate spline (default 2 dim: 100 knots) tpsmod <- bigtps(x,y) # define new data newdata <- as.matrix(expand.grid(seq(0,1,length=50),seq(0,1,length=50))) # get fitted values for new data yp <- predict(tpsmod,newdata) # plot results imagebar(seq(0,1,length=50),seq(0,1,length=50),matrix(yp,50,50), xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]), zlab=expression(hat(italic(y)))) # predict linear and nonlinear effects yl <- predict(tpsmod,newdata,effect="lin") yn <- predict(tpsmod,newdata,effect="non") # plot results par(mfrow=c(1,2)) imagebar(seq(0,1,length=50),seq(0,1,length=50),matrix(yl,50,50), main="Linear effect",xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]),zlab=expression(hat(italic(y)))) imagebar(seq(0,1,length=50),seq(0,1,length=50),matrix(yn,50,50), main="Nonlinear effect",xlab=expression(italic(x)[1]), ylab=expression(italic(x)[2]),zlab=expression(hat(italic(y))))
Get fitted values and standard error estimates for ordinal smoothing splines.
## S3 method for class 'ordspline' predict(object,newdata=NULL,se.fit=FALSE,...)
## S3 method for class 'ordspline' predict(object,newdata=NULL,se.fit=FALSE,...)
object |
Object of class "ordspline", which is output from |
newdata |
Vector containing new data points for prediction. See Details and Example. Default of |
se.fit |
Logical indicating whether the standard errors of the fitted values should be estimated. Default is |
... |
Ignored. |
Uses the coefficient and smoothing parameter estimates from a fit ordinal smoothing spline (estimated by ordspline
) to predict for new data.
If se.fit=FALSE
, returns vector of fitted values.
Otherwise returns list with elements:
fit |
Vector of fitted values |
se.fit |
Vector of standard errors of fitted values |
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
########## EXAMPLE ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x/2 + sin(x) } x <- sample(1:20, size=500, replace=TRUE) y <- myfun(x) + rnorm(500) # fit ordinal spline model ordmod <- ordspline(x, y) monmod <- ordspline(x, y, monotone=TRUE) crossprod( predict(ordmod) - myfun(x) ) / 500 crossprod( predict(monmod) - myfun(x) ) / 500 # plot truth and predictions ordfit <- predict(ordmod, 1:20, se.fit=TRUE) monfit <- predict(monmod, 1:20, se.fit=TRUE) plotci(1:20, ordfit$fit, ordfit$se.fit, ylab="f(x)") plotci(1:20, monfit$fit, monfit$se.fit, col="red", col.ci="pink", add=TRUE) points(1:20, myfun(1:20))
########## EXAMPLE ########## # define univariate function and data set.seed(773) myfun <- function(x){ 2 + x/2 + sin(x) } x <- sample(1:20, size=500, replace=TRUE) y <- myfun(x) + rnorm(500) # fit ordinal spline model ordmod <- ordspline(x, y) monmod <- ordspline(x, y, monotone=TRUE) crossprod( predict(ordmod) - myfun(x) ) / 500 crossprod( predict(monmod) - myfun(x) ) / 500 # plot truth and predictions ordfit <- predict(ordmod, 1:20, se.fit=TRUE) monfit <- predict(monmod, 1:20, se.fit=TRUE) plotci(1:20, ordfit$fit, ordfit$se.fit, ylab="f(x)") plotci(1:20, monfit$fit, monfit$se.fit, col="red", col.ci="pink", add=TRUE) points(1:20, myfun(1:20))
This function prints basic model fit information for a fit bigsplines
model.
## S3 method for class 'bigspline' print(x,...) ## S3 method for class 'bigssa' print(x,...) ## S3 method for class 'bigssg' print(x,...) ## S3 method for class 'bigssp' print(x,...) ## S3 method for class 'bigtps' print(x,...) ## S3 method for class 'ordspline' print(x,...) ## S3 method for class 'summary.bigspline' print(x,digits=4,...) ## S3 method for class 'summary.bigssa' print(x,digits=4,...) ## S3 method for class 'summary.bigssg' print(x,digits=4,...) ## S3 method for class 'summary.bigssp' print(x,digits=4,...) ## S3 method for class 'summary.bigtps' print(x,digits=4,...)
## S3 method for class 'bigspline' print(x,...) ## S3 method for class 'bigssa' print(x,...) ## S3 method for class 'bigssg' print(x,...) ## S3 method for class 'bigssp' print(x,...) ## S3 method for class 'bigtps' print(x,...) ## S3 method for class 'ordspline' print(x,...) ## S3 method for class 'summary.bigspline' print(x,digits=4,...) ## S3 method for class 'summary.bigssa' print(x,digits=4,...) ## S3 method for class 'summary.bigssg' print(x,digits=4,...) ## S3 method for class 'summary.bigssp' print(x,digits=4,...) ## S3 method for class 'summary.bigtps' print(x,digits=4,...)
x |
Object of class "bigspline" (output from |
digits |
Number of decimal places to print. |
... |
Ignored. |
See bigspline
, bigssa
, bigssg
, bigssp
, bigtps
, and ordspline
for more details.
"bigspline" objects: prints Spline Type, Fit Statistic information, and Smoothing Parameter.
"summary.bigspline" objects: prints Spline Type, five number summary of Residuals, Error Standard Deviation Estimate, Fit Statistics, and Smoothing Parameter.
"bigssa" objects: prints Spline Types, Fit Statistic information, and Algorithm Convergence status.
"summary.bigssa" objects: prints the formula Call, five number summary of Residuals, Error Standard Deviation Estimate, Fit Statistics, and Smoothing Parameters.
"bigssg" objects: prints Family, Spline Types, Fit Statistic information, and Algorithm Convergence status.
"summary.bigssg" objects: prints the Family, formula Call, five number summary of Residuals, Dispersion Estimate, Fit Statistics, and Smoothing Parameters (with selection criterion).
"bigssp" objects: prints Predictor Types, Fit Statistic information, and Algorithm Convergence status.
"summary.bigssp" objects: prints formula Call, five number summary of Residuals, Error Standard Deviation Estimate, Fit Statistics, and Smoothing Parameters.
"bigtps" objects: prints Spline Type, Fit Statistic information, and Smoothing Parameter.
"summary.bigtps" objects: prints Spline Type, five number summary of Residuals, Error Standard Deviation Estimate, Fit Statistics, and Smoothing Parameter.
"ordspline" objects: prints Monotonic, Fit Statistic information, and Smoothing Parameter.
Nathaniel E. Helwig <[email protected]>
### see examples for bigspline, bigssa, bigssg, bigssp, bigtps, and ordspline
### see examples for bigspline, bigssa, bigssg, bigssp, bigtps, and ordspline
Generate the smoothing spline basis matrix for a polynomial spline.
ssBasis(x, knots, m=2, d=0, xmin=min(x), xmax=max(x), periodic=FALSE, intercept=FALSE)
ssBasis(x, knots, m=2, d=0, xmin=min(x), xmax=max(x), periodic=FALSE, intercept=FALSE)
x |
Predictor variable. |
knots |
Spline knots. |
m |
Penalty order. 'm=1' for linear smoothing spline, 'm=2' for cubic, and 'm=3' for quintic. |
d |
Derivative order. 'd=0' for smoothing spline basis, 'd=1' for 1st derivative of basis, and 'd=2' for 2nd derivative of basis. |
xmin |
Minimum value of 'x'. |
xmax |
Maximum value of 'x'. |
periodic |
If |
intercept |
If |
X |
Spline Basis. |
knots |
Spline knots. |
m |
Penalty order. |
d |
Derivative order. |
xlim |
Inputs |
periodic |
Same as input. |
intercept |
Same as input. |
Inputs x
and knots
should be within the interval [xmin
, xmax
].
Nathaniel E. Helwig <[email protected]>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
########## EXAMPLE ########## # define function and its derivatives n <- 500 x <- seq(0, 1, length.out=n) knots <- seq(0, 1, length=20) y <- sin(4 * pi * x) d1y <- 4 * pi * cos(4 * pi * x) d2y <- - (4 * pi)^2 * sin(4 * pi * x) # linear smoothing spline linmat0 <- ssBasis(x, knots, m=1) lincoef <- pinvsm(crossprod(linmat0$X)) %*% crossprod(linmat0$X, y) linyhat <- linmat0$X %*% lincoef linmat1 <- ssBasis(x, knots, m=1, d=1) linyd1 <- linmat1$X %*% lincoef # plot linear smoothing spline results par(mfrow=c(1,2)) plot(x, y, type="l", main="Function") lines(x, linyhat, lty=2, col="red") plot(x, d1y, type="l", main="First Derivative") lines(x, linyd1, lty=2, col="red") # cubic smoothing spline cubmat0 <- ssBasis(x, knots) cubcoef <- pinvsm(crossprod(cubmat0$X)) %*% crossprod(cubmat0$X, y) cubyhat <- cubmat0$X %*% cubcoef cubmat1 <- ssBasis(x, knots, d=1) cubyd1 <- cubmat1$X %*% cubcoef cubmat2 <- ssBasis(x, knots, d=2) cubyd2 <- cubmat2$X %*% cubcoef # plot cubic smoothing spline results par(mfrow=c(1,3)) plot(x, y, type="l", main="Function") lines(x, cubyhat, lty=2, col="red") plot(x, d1y, type="l", main="First Derivative") lines(x, cubyd1, lty=2, col="red") plot(x, d2y, type="l", main="Second Derivative") lines(x, cubyd2, lty=2, col="red") # quintic smoothing spline quimat0 <- ssBasis(x, knots, m=3) quicoef <- pinvsm(crossprod(quimat0$X)) %*% crossprod(quimat0$X, y) quiyhat <- quimat0$X %*% quicoef quimat1 <- ssBasis(x, knots, m=3, d=1) quiyd1 <- quimat1$X %*% quicoef quimat2 <- ssBasis(x, knots, m=3, d=2) quiyd2 <- quimat2$X %*% quicoef # plot quintic smoothing spline results par(mfrow=c(1,3)) plot(x, y, type="l", main="Function") lines(x, quiyhat, lty=2, col="red") plot(x, d1y, type="l", main="First Derivative") lines(x, quiyd1, lty=2, col="red") plot(x, d2y, type="l", main="Second Derivative") lines(x, quiyd2, lty=2, col="red")
########## EXAMPLE ########## # define function and its derivatives n <- 500 x <- seq(0, 1, length.out=n) knots <- seq(0, 1, length=20) y <- sin(4 * pi * x) d1y <- 4 * pi * cos(4 * pi * x) d2y <- - (4 * pi)^2 * sin(4 * pi * x) # linear smoothing spline linmat0 <- ssBasis(x, knots, m=1) lincoef <- pinvsm(crossprod(linmat0$X)) %*% crossprod(linmat0$X, y) linyhat <- linmat0$X %*% lincoef linmat1 <- ssBasis(x, knots, m=1, d=1) linyd1 <- linmat1$X %*% lincoef # plot linear smoothing spline results par(mfrow=c(1,2)) plot(x, y, type="l", main="Function") lines(x, linyhat, lty=2, col="red") plot(x, d1y, type="l", main="First Derivative") lines(x, linyd1, lty=2, col="red") # cubic smoothing spline cubmat0 <- ssBasis(x, knots) cubcoef <- pinvsm(crossprod(cubmat0$X)) %*% crossprod(cubmat0$X, y) cubyhat <- cubmat0$X %*% cubcoef cubmat1 <- ssBasis(x, knots, d=1) cubyd1 <- cubmat1$X %*% cubcoef cubmat2 <- ssBasis(x, knots, d=2) cubyd2 <- cubmat2$X %*% cubcoef # plot cubic smoothing spline results par(mfrow=c(1,3)) plot(x, y, type="l", main="Function") lines(x, cubyhat, lty=2, col="red") plot(x, d1y, type="l", main="First Derivative") lines(x, cubyd1, lty=2, col="red") plot(x, d2y, type="l", main="Second Derivative") lines(x, cubyd2, lty=2, col="red") # quintic smoothing spline quimat0 <- ssBasis(x, knots, m=3) quicoef <- pinvsm(crossprod(quimat0$X)) %*% crossprod(quimat0$X, y) quiyhat <- quimat0$X %*% quicoef quimat1 <- ssBasis(x, knots, m=3, d=1) quiyd1 <- quimat1$X %*% quicoef quimat2 <- ssBasis(x, knots, m=3, d=2) quiyd2 <- quimat2$X %*% quicoef # plot quintic smoothing spline results par(mfrow=c(1,3)) plot(x, y, type="l", main="Function") lines(x, quiyhat, lty=2, col="red") plot(x, d1y, type="l", main="First Derivative") lines(x, quiyd1, lty=2, col="red") plot(x, d2y, type="l", main="Second Derivative") lines(x, quiyd2, lty=2, col="red")
This function summarizes basic model fit information for a fit bigsplines
model.
## S3 method for class 'bigspline' summary(object, fitresid = TRUE, chunksize = 10000, ...) ## S3 method for class 'bigssa' summary(object, fitresid = TRUE, chunksize = 10000, diagnostics = FALSE,...) ## S3 method for class 'bigssg' summary(object, fitresid = TRUE, chunksize = 10000, diagnostics = FALSE,...) ## S3 method for class 'bigssp' summary(object, fitresid = TRUE, chunksize = 10000, diagnostics = FALSE,...) ## S3 method for class 'bigtps' summary(object, fitresid = TRUE, chunksize = 10000, ...)
## S3 method for class 'bigspline' summary(object, fitresid = TRUE, chunksize = 10000, ...) ## S3 method for class 'bigssa' summary(object, fitresid = TRUE, chunksize = 10000, diagnostics = FALSE,...) ## S3 method for class 'bigssg' summary(object, fitresid = TRUE, chunksize = 10000, diagnostics = FALSE,...) ## S3 method for class 'bigssp' summary(object, fitresid = TRUE, chunksize = 10000, diagnostics = FALSE,...) ## S3 method for class 'bigtps' summary(object, fitresid = TRUE, chunksize = 10000, ...)
object |
Object of class "bigspline" (output from |
fitresid |
Logical indicating whether the fitted values and residuals should be calculated for all data points in input |
chunksize |
If |
diagnostics |
If |
... |
Ignored. |
See bigspline
, bigssa
, bigssg
, bigssp
, and bigtps
for more details.
call |
Called model in input |
type |
Type of smoothing spline that was used for each predictor. |
fitted.values |
Vector of fitted values (if |
linear.predictors |
Vector of linear predictors (only for class "bigssg" with |
residuals |
Vector of residuals (if |
sigma |
Estimated error standard deviation. |
deviance |
Model deviance (only for class "bigssg"). |
dispersion |
Estimated dispersion parameter (only for class "bigssg"). |
n |
Total sample size. |
df |
Effective degrees of freedom of the model. |
info |
Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model. |
converged |
Convergence status: |
iter |
Number of iterative updates ( |
rparm |
Rounding parameters used for model fitting. |
lambda |
Global smoothing parameter used for model fitting. |
gammas |
Vector of additional smoothing parameters (only for class "bigssa"). |
thetas |
Vector of additional smoothing parameters (only for class "bigssp"). |
pi |
Vector of cosine diagnostics. |
family |
Distribution family (only for class "bigssg"). |
gcvtype |
Smoothing parameter selection criterion (only for class "bigssg"). |
For "bigspline" and "bigtps" objects, the outputs call
, converged
, and iter
are NA.
Nathaniel E. Helwig <[email protected]>
########## EXAMPLE 1 ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # cubic spline cubmod <- bigspline(x,y) summary(cubmod) ########## EXAMPLE 2 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x1v <- runif(10^4) x2v <- runif(10^4) y <- myfun(x1v,x2v) + rnorm(10^4) # cubic splines with 100 randomly selected knots (efficient parameterization) cubmod <- bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=100) summary(cubmod) ########## EXAMPLE 3 ########## # function with two continuous predictors set.seed(1) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } ndpts <- 1000 x1v <- runif(ndpts) x2v <- runif(ndpts) # poisson response set.seed(773) lp <- myfun(x1v,x2v) mu <- exp(lp) y <- rpois(n=ndpts,lambda=mu) # generalized smoothing spline anova genmod <- bigssg(y~x1v*x2v,family="poisson",type=list(x1v="cub",x2v="cub"),nknots=50) summary(genmod) ########## EXAMPLE 4 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x1v <- runif(10^4) x2v <- runif(10^4) y <- myfun(x1v,x2v) + rnorm(10^4) # cubic splines with 100 randomly selected knots (classic parameterization) cubmod <- bigssp(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=100) summary(cubmod) ########## EXAMPLE 5 ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # thin-plate with default (30 knots) tpsmod <- bigtps(x,y) summary(tpsmod)
########## EXAMPLE 1 ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # cubic spline cubmod <- bigspline(x,y) summary(cubmod) ########## EXAMPLE 2 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x1v <- runif(10^4) x2v <- runif(10^4) y <- myfun(x1v,x2v) + rnorm(10^4) # cubic splines with 100 randomly selected knots (efficient parameterization) cubmod <- bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=100) summary(cubmod) ########## EXAMPLE 3 ########## # function with two continuous predictors set.seed(1) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } ndpts <- 1000 x1v <- runif(ndpts) x2v <- runif(ndpts) # poisson response set.seed(773) lp <- myfun(x1v,x2v) mu <- exp(lp) y <- rpois(n=ndpts,lambda=mu) # generalized smoothing spline anova genmod <- bigssg(y~x1v*x2v,family="poisson",type=list(x1v="cub",x2v="cub"),nknots=50) summary(genmod) ########## EXAMPLE 4 ########## # function with two continuous predictors set.seed(773) myfun <- function(x1v,x2v){ sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v)) } x1v <- runif(10^4) x2v <- runif(10^4) y <- myfun(x1v,x2v) + rnorm(10^4) # cubic splines with 100 randomly selected knots (classic parameterization) cubmod <- bigssp(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=100) summary(cubmod) ########## EXAMPLE 5 ########## # define relatively smooth function set.seed(773) myfun <- function(x){ sin(2*pi*x) } x <- runif(10^4) y <- myfun(x) + rnorm(10^4) # thin-plate with default (30 knots) tpsmod <- bigtps(x,y) summary(tpsmod)