| Title: | Sparse Additive Modelling |
|---|---|
| Description: | Computationally efficient tools for high dimensional predictive modeling (regression and classification). SAM is short for sparse additive modeling, and adopts the computationally efficient basis spline technique. We solve the optimization problems by various computational algorithms including the block coordinate descent algorithm, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by warm-start and active-set tricks. |
| Authors: | Haoming Jiang [aut], Yukun Ma [aut], Han Liu [aut], Kathryn Roeder [aut], Xingguo Li [aut], Tuo Zhao [aut, cre] |
| Maintainer: | Tuo Zhao <[email protected]> |
| License: | GPL-2 |
| Version: | 1.3 |
| Built: | 2026-06-11 07:43:14 UTC |
| Source: | https://github.com/cran/SAM |
SAM provides sparse additive models for high-dimensional prediction tasks (regression and classification). It uses spline basis expansion and efficient optimization routines to compute full regularization paths.
The package exposes four model families:
samQL: quadratic-loss sparse additive regression.
samLL: logistic-loss sparse additive classification.
samHL: hinge-loss sparse additive classification.
samEL: Poisson-loss sparse additive regression.
All models share a common spline representation and return regularization paths, allowing model selection after one fit.
Tuo Zhao, Xingguo Li, Haoming Jiang, Han Liu, and Kathryn Roeder
Maintainer: Tuo Zhao <[email protected]>
P. Ravikumar, J. Lafferty, H.Liu and L. Wasserman. "Sparse Additive Models", Journal of Royal Statistical Society: Series B, 2009.
T. Zhao and H.Liu. "Sparse Additive Machine", International Conference on Artificial Intelligence and Statistics, 2012.
"samEL"
Plot the regularization path (regularization parameter versus functional norm).
## S3 method for class 'samEL' plot(x, ...)## S3 method for class 'samEL' plot(x, ...)
x |
An object with S3 class |
... |
Additional arguments passed to methods; currently unused. |
The x-axis shows regularization parameters on a log scale. The y-axis shows the functional norm of each component function.
"samHL"
Plot the regularization path (regularization parameter versus functional norm).
## S3 method for class 'samHL' plot(x, ...)## S3 method for class 'samHL' plot(x, ...)
x |
An object with S3 class |
... |
Additional arguments passed to methods; currently unused. |
The x-axis shows regularization parameters on a log scale. The y-axis shows the functional norm of each component function.
"samLL"
Plot the regularization path (regularization parameter versus functional norm).
## S3 method for class 'samLL' plot(x, ...)## S3 method for class 'samLL' plot(x, ...)
x |
An object with S3 class |
... |
Additional arguments passed to methods; currently unused. |
The x-axis shows regularization parameters on a log scale. The y-axis shows the functional norm of each component function.
"samQL"
Plot the regularization path (regularization parameter versus functional norm).
## S3 method for class 'samQL' plot(x, ...)## S3 method for class 'samQL' plot(x, ...)
x |
An object with S3 class |
... |
Additional arguments passed to methods; currently unused. |
The x-axis shows regularization parameters on a log scale. The y-axis shows the functional norm of each component function.
"samEL"
Predict expected counts for test data.
## S3 method for class 'samEL' predict(object, newdata, ...)## S3 method for class 'samEL' predict(object, newdata, ...)
object |
An object with S3 class |
newdata |
Numeric test matrix with |
... |
Additional arguments passed to methods; currently unused. |
The test matrix is rescaled using the training X.min/X.ran,
truncated to [0, 1], and expanded with the same spline basis used
during training.
expectations |
Estimated expected counts as an |
expectation |
Alias of |
"samHL"
Predict decision values and class labels for test data.
## S3 method for class 'samHL' predict(object, newdata, thol = 0, ...)## S3 method for class 'samHL' predict(object, newdata, thol = 0, ...)
object |
An object with S3 class |
newdata |
Numeric test matrix with |
thol |
Decision-value threshold used to convert scores to labels. The
default value is |
... |
Additional arguments passed to methods; currently unused. |
The test matrix is rescaled using the training X.min/X.ran,
truncated to [0, 1], and expanded with the same spline basis used
during training.
values |
Predicted decision values as an |
labels |
Predicted class labels ( |
"samLL"
Predict class probabilities and labels for test data.
## S3 method for class 'samLL' predict(object, newdata, thol = 0.5, ...)## S3 method for class 'samLL' predict(object, newdata, thol = 0.5, ...)
object |
An object with S3 class |
newdata |
Numeric test matrix with |
thol |
Decision-value threshold used to convert probabilities to labels. The default value is |
... |
Additional arguments passed to methods; currently unused. |
The test matrix is rescaled using the training X.min/X.ran,
truncated to [0, 1], and expanded with the same spline basis used
during training.
probs |
Estimated posterior probabilities as an |
labels |
Predicted class labels ( |
"samQL"
Predict responses for test data.
## S3 method for class 'samQL' predict(object, newdata, ...)## S3 method for class 'samQL' predict(object, newdata, ...)
object |
An object with S3 class |
newdata |
Numeric test matrix with |
... |
Additional arguments passed to methods; currently unused. |
The test matrix is rescaled using the training X.min/X.ran,
truncated to [0, 1], and expanded with the same spline basis used
during training.
values |
Predicted responses as an |
"samEL"
Print a summary of an object of class "samEL".
## S3 method for class 'samEL' print(x, ...)## S3 method for class 'samEL' print(x, ...)
x |
An object with S3 class |
... |
Additional arguments passed to methods; currently unused. |
The output includes the regularization path length and its degrees of freedom.
"samHL"
Print a summary of an object of class "samHL".
## S3 method for class 'samHL' print(x, ...)## S3 method for class 'samHL' print(x, ...)
x |
An object with S3 class |
... |
Additional arguments passed to methods; currently unused. |
The output includes the regularization path length and its degrees of freedom.
"samLL"
Print a summary of an object of class "samLL".
## S3 method for class 'samLL' print(x, ...)## S3 method for class 'samLL' print(x, ...)
x |
An object with S3 class |
... |
Additional arguments passed to methods; currently unused. |
The output includes the regularization path length and its degrees of freedom.
"samQL"
Print a summary of an object of class "samQL".
## S3 method for class 'samQL' print(x, ...)## S3 method for class 'samQL' print(x, ...)
x |
An object with S3 class |
... |
Additional arguments passed to methods; currently unused. |
The output includes the regularization path length and its degrees of freedom.
Fit a sparse additive model by dispatching to the appropriate family-specific
function (samQL, samLL, samEL, or samHL).
sam(X, y, p = 3, family = c("gaussian", "binomial", "poisson", "hinge"), ...)sam(X, y, p = 3, family = c("gaussian", "binomial", "poisson", "hinge"), ...)
X |
Numeric training matrix with |
y |
Response vector of length |
p |
The number of basis spline functions. The default value is 3. |
family |
A string specifying the loss family. One of |
... |
Additional arguments passed to the family-specific function. |
An S3 object of class samQL, samLL, samEL, or
samHL, depending on the chosen family.
n <- 100; d <- 50 X <- matrix(runif(n * d), n, d) y <- rnorm(n) fit <- sam(X, y, family = "gaussian") fitn <- 100; d <- 50 X <- matrix(runif(n * d), n, d) y <- rnorm(n) fit <- sam(X, y, family = "gaussian") fit
Fit a sparse additive Poisson regression model on training data.
samEL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.25, thol = 1e-05, max.ite = 1e+05, regfunc = "L1", dfmax = NULL, verbose = FALSE, dev.ratio.thr = NULL, dev.change.thr = NULL )samEL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.25, thol = 1e-05, max.ite = 1e+05, regfunc = "L1", dfmax = NULL, verbose = FALSE, dev.ratio.thr = NULL, dev.change.thr = NULL )
X |
Numeric training matrix with |
y |
Response vector of length |
p |
The number of basis spline functions. The default value is 3. |
lambda |
Optional user-supplied regularization sequence. If provided, use a decreasing sequence; warm starts are used along the path and are usually much faster than fitting a single value. |
nlambda |
The number of lambda values. The default value is 20. |
lambda.min.ratio |
Smallest lambda as a fraction of |
thol |
Stopping tolerance. The default value is |
max.ite |
Maximum number of iterations. The default value is |
regfunc |
A string indicating the regularizer. The default value is "L1". You can also assign "MCP" or "SCAD" to it. |
dfmax |
Maximum number of non-zero groups allowed. When the number of
non-zero groups reaches |
verbose |
Logical; if |
dev.ratio.thr |
Deviance ratio threshold for early stopping.
When the deviance ratio |
dev.change.thr |
Relative deviance change threshold for early stopping.
When the relative change in deviance over the last few lambda steps falls
below this value, the path is terminated. |
The solver combines block coordinate descent, fast iterative soft-thresholding, and Newton updates. Computation is accelerated by warm starts and active-set screening.
p |
The number of basis spline functions used in training. |
X.min |
Per-feature minimums from training data (used to rescale test data). |
X.ran |
Per-feature ranges from training data (used to rescale test data). |
lambda |
Sequence of regularization parameters used in training. |
w |
Solution path matrix with size |
df |
Degrees of freedom along the solution path (number of non-zero component functions). |
knots |
The |
Boundary.knots |
The |
func_norm |
Functional norm matrix ( |
SAM,plot.samEL,print.samEL,predict.samEL
## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) u = exp(-2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1+1) y = rep(0,n) for(i in 1:n) y[i] = rpois(1,u[i]) ## Training out.trn = samEL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) ut = exp(-2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1+1) yt = rep(0,nt) for(i in 1:nt) yt[i] = rpois(1,ut[i]) ## predicting response out.tst = predict(out.trn,Xt)## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) u = exp(-2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1+1) y = rep(0,n) for(i in 1:n) y[i] = rpois(1,u[i]) ## Training out.trn = samEL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) ut = exp(-2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1+1) yt = rep(0,nt) for(i in 1:nt) yt[i] = rpois(1,ut[i]) ## predicting response out.tst = predict(out.trn,Xt)
Fit a sparse additive classifier with hinge loss.
samHL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.4, thol = 1e-05, mu = 0.05, max.ite = 1e+05, w = NULL, dfmax = NULL, verbose = FALSE )samHL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.4, thol = 1e-05, mu = 0.05, max.ite = 1e+05, w = NULL, dfmax = NULL, verbose = FALSE )
X |
Numeric training matrix with |
y |
Training labels of length |
p |
The number of basis spline functions. The default value is 3. |
lambda |
Optional user-supplied regularization sequence. If provided, use a decreasing sequence; warm starts are used along the path and are usually much faster than fitting a single value. |
nlambda |
The number of lambda values. The default value is 20. |
lambda.min.ratio |
Smallest lambda as a fraction of |
thol |
Stopping tolerance. The default value is |
mu |
Smoothing parameter used to approximate hinge loss. The default
value is |
max.ite |
Maximum number of iterations. The default value is |
w |
Optional positive observation weights of length |
dfmax |
Maximum number of non-zero groups allowed. When the number of
non-zero groups reaches |
verbose |
Logical; if |
The solver combines block coordinate descent, fast iterative soft-thresholding, and Newton updates. Computation is accelerated by warm starts and active-set screening.
p |
The number of basis spline functions used in training. |
X.min |
Per-feature minimums from training data (used to rescale test data). |
X.ran |
Per-feature ranges from training data (used to rescale test data). |
lambda |
Sequence of regularization parameters used in training. |
w |
Solution path matrix with size |
df |
Degrees of freedom along the solution path (number of non-zero component functions). |
knots |
The |
Boundary.knots |
The |
func_norm |
Functional norm matrix ( |
SAM,plot.samHL,print.samHL,predict.samHL
## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y y = y*sign(runif(n)-0.05) ## Training out.trn = samHL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y yt = yt*sign(runif(nt)-0.05) ## predicting response out.tst = predict(out.trn,Xt)## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y y = y*sign(runif(n)-0.05) ## Training out.trn = samHL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y yt = yt*sign(runif(nt)-0.05) ## predicting response out.tst = predict(out.trn,Xt)
Fit a sparse additive logistic regression model on training data.
samLL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.1, thol = 1e-05, max.ite = 1e+05, regfunc = "L1", dfmax = NULL, verbose = FALSE, dev.ratio.thr = NULL, dev.change.thr = NULL )samLL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.1, thol = 1e-05, max.ite = 1e+05, regfunc = "L1", dfmax = NULL, verbose = FALSE, dev.ratio.thr = NULL, dev.change.thr = NULL )
X |
Numeric training matrix with |
y |
Binary training labels of length |
p |
The number of basis spline functions. The default value is 3. |
lambda |
Optional user-supplied regularization sequence. If provided, use a decreasing sequence; warm starts are used along the path and are usually much faster than fitting a single value. |
nlambda |
The number of lambda values. The default value is 20. |
lambda.min.ratio |
Smallest lambda as a fraction of |
thol |
Stopping tolerance. The default value is |
max.ite |
Maximum number of iterations. The default value is |
regfunc |
A string indicating the regularizer. The default value is "L1". You can also assign "MCP" or "SCAD" to it. |
dfmax |
Maximum number of non-zero groups allowed. When the number of
non-zero groups reaches |
verbose |
Logical; if |
dev.ratio.thr |
Deviance ratio threshold for early stopping.
When the deviance ratio |
dev.change.thr |
Relative deviance change threshold for early stopping.
When the relative change in deviance over the last few lambda steps falls
below this value, the path is terminated. |
The solver combines block coordinate descent, fast iterative soft-thresholding, and Newton updates. Computation is accelerated by warm starts and active-set screening.
p |
The number of basis spline functions used in training. |
X.min |
Per-feature minimums from training data (used to rescale test data). |
X.ran |
Per-feature ranges from training data (used to rescale test data). |
lambda |
Sequence of regularization parameters used in training. |
w |
Solution path matrix with size |
df |
Degrees of freedom along the solution path (number of non-zero component functions). |
knots |
The |
Boundary.knots |
The |
func_norm |
Functional norm matrix ( |
SAM,plot.samLL,print.samLL,predict.samLL
## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y y = y*sign(runif(n)-0.05) y = sign(y==1) ## Training out.trn = samLL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y yt = yt*sign(runif(nt)-0.05) yt = sign(yt==1) ## predicting response out.tst = predict(out.trn,Xt)## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y y = y*sign(runif(n)-0.05) y = sign(y==1) ## Training out.trn = samLL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y yt = yt*sign(runif(nt)-0.05) yt = sign(yt==1) ## predicting response out.tst = predict(out.trn,Xt)
Fit a sparse additive regression model with quadratic loss.
samQL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.005, thol = 1e-05, max.ite = 1e+05, regfunc = "L1", dfmax = NULL, verbose = FALSE, dev.ratio.thr = NULL, dev.change.thr = NULL, solver = c("actnewton", "actgd"), type.gaussian = c("naive", "covariance", "auto") )samQL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.005, thol = 1e-05, max.ite = 1e+05, regfunc = "L1", dfmax = NULL, verbose = FALSE, dev.ratio.thr = NULL, dev.change.thr = NULL, solver = c("actnewton", "actgd"), type.gaussian = c("naive", "covariance", "auto") )
X |
Numeric training matrix with |
y |
Numeric response vector of length |
p |
The number of basis spline functions. The default value is 3. |
lambda |
Optional user-supplied regularization sequence. If provided, use a decreasing sequence; warm starts are used along the path and are usually much faster than fitting a single value. |
nlambda |
The number of lambda values. The default value is 30. |
lambda.min.ratio |
Smallest lambda as a fraction of |
thol |
Stopping tolerance. The default value is |
max.ite |
Maximum number of iterations. The default value is |
regfunc |
A string indicating the regularizer. The default value is "L1". You can also assign "MCP" or "SCAD" to it. |
dfmax |
Maximum number of non-zero groups allowed. When the number of
non-zero groups reaches |
verbose |
Logical; if |
dev.ratio.thr |
Deviance ratio threshold for early stopping.
When the deviance ratio |
dev.change.thr |
Relative deviance change threshold for early stopping.
When the relative change in deviance over the last few lambda steps falls
below this value, the path is terminated. |
solver |
Which solver to use: |
type.gaussian |
Which internal update strategy to use:
|
The solver combines block coordinate descent, fast iterative soft-thresholding, and Newton updates. Computation is accelerated by warm starts and active-set screening.
p |
The number of basis spline functions used in training. |
X.min |
Per-feature minimums from training data (used to rescale test data). |
X.ran |
Per-feature ranges from training data (used to rescale test data). |
lambda |
Sequence of regularization parameters used in training. |
w |
Solution path matrix with size |
intercept |
The solution path of the intercept. |
df |
Degrees of freedom along the solution path (number of non-zero component functions). |
knots |
The |
Boundary.knots |
The |
func_norm |
Functional norm matrix ( |
sse |
Sums of square errors of the solution path. |
SAM,plot.samQL,print.samQL,predict.samQL
## generating training data n = 100 d = 500 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) ## generating response y = -2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1 ## Training out.trn = samQL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = -2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1 ## predicting response out.tst = predict(out.trn,Xt)## generating training data n = 100 d = 500 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) ## generating response y = -2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1 ## Training out.trn = samQL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = -2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1 ## predicting response out.tst = predict(out.trn,Xt)