Title: | Sparse Nonparametric Regression for High-Dimensional Data |
---|---|
Description: | Estimation of sparse nonlinear functions in nonparametric regression using component selection and smoothing. Designed for the analysis of high-dimensional data, the models support various data types, including exponential family models and Cox proportional hazards models. The methodology is based on Lin and Zhang (2006) <doi:10.1214/009053606000000722>. |
Authors: | Jieun Shin [aut, cre] |
Maintainer: | Jieun Shin <jieunstat@uos.ac.kr> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2025-03-13 14:20:44 UTC |
Source: | CRAN |
The cossonet function implements a nonparametric regression model that estimates nonlinear components.
This function can be applied to continuous, count, binary, and survival responses.
To use this function, the user must specify a family, kernel function, etc. For cross-validation, the sequence vectors lambda0
and lambda_theta
appropriate for the input data must also be specified.
cossonet( x, y, family = c("gaussian", "binomial", "poisson", "Cox"), wt = rep(1, ncol(x)), scale = TRUE, nbasis, basis.id, kernel = c("linear", "gaussian", "poly", "spline"), effect = c("main", "interaction"), nfold = 5, kparam = 1, lambda0 = exp(seq(log(2^{ -10 }), log(2^{ 10 }), length.out = 20)), lambda_theta = exp(seq(log(2^{ -10 }), log(2^{ 10 }), length.out = 20)), gamma = 0.95, one.std = TRUE )
cossonet( x, y, family = c("gaussian", "binomial", "poisson", "Cox"), wt = rep(1, ncol(x)), scale = TRUE, nbasis, basis.id, kernel = c("linear", "gaussian", "poly", "spline"), effect = c("main", "interaction"), nfold = 5, kparam = 1, lambda0 = exp(seq(log(2^{ -10 }), log(2^{ 10 }), length.out = 20)), lambda_theta = exp(seq(log(2^{ -10 }), log(2^{ 10 }), length.out = 20)), gamma = 0.95, one.std = TRUE )
x |
Input matrix or data frame of $n$ by $p$. |
y |
A response vector with a continuous, binary, or count type. For survival responses, this should be a two-column matrix (or data frame) with columns called 'time' and 'status'. |
family |
A distribution corresponding to the response type. |
wt |
The weights assigned to the explanatory variables. The default is |
scale |
Boolean for whether to scale continuous explanatory variables to values between 0 and 1. |
nbasis |
The number of "knots". If |
basis.id |
The index of the "knot" to select. |
kernel |
TThe kernel function. One of four types of |
effect |
The effect of the component. |
nfold |
The number of folds to use in cross-validation is used to determine how many subsets to divide the data into for the training and validation sets. |
kparam |
Parameters for Gaussian and polynomial kernel functions |
lambda0 |
A vector of |
lambda_theta |
A vector of |
gamma |
Elastic-net mixing parameter |
one.std |
A logical value indicating whether to apply the "1-standard error rule." When set to |
A list containing information about the fitted model.
# Generate example data set.seed(20250101) tr = data_generation(n = 200, p = 20, SNR = 9, response = "continuous") tr_x = tr$x tr_y = tr$y te = data_generation(n = 1000, p = 20, SNR = 9, response = "continuous") te_x = te$x te_y = te$y # Fit the model fit = cossonet(tr_x, tr_y, family = 'gaussian', gamma = 0.95, kernel = "spline", scale = TRUE, lambda0 = exp(seq(log(2^{-4}), log(2^{0}), length.out = 20)), lambda_theta = exp(seq(log(2^{-8}), log(2^{-6}), length.out = 20)) )
# Generate example data set.seed(20250101) tr = data_generation(n = 200, p = 20, SNR = 9, response = "continuous") tr_x = tr$x tr_y = tr$y te = data_generation(n = 1000, p = 20, SNR = 9, response = "continuous") te_x = te$x te_y = te$y # Fit the model fit = cossonet(tr_x, tr_y, family = 'gaussian', gamma = 0.95, kernel = "spline", scale = TRUE, lambda0 = exp(seq(log(2^{-4}), log(2^{0}), length.out = 20)), lambda_theta = exp(seq(log(2^{-8}), log(2^{-6}), length.out = 20)) )
cossonet.predict
predicts predictive values for new data based on an object from the cossonet
function.The function cossonet.predict
predicts predictive values for new data based on an object from the cossonet
function.
cossonet.predict(model, testx)
cossonet.predict(model, testx)
model |
The fitted cossonet object. |
testx |
The new data set to be predicted. |
A list of predicted values for the new data set.
set.seed(20250101) tr = data_generation(n = 200, p = 20, SNR = 9, response = "continuous") tr_x = tr$x tr_y = tr$y te = data_generation(n = 1000, p = 20, SNR = 9, response = "continuous") te_x = te$x te_y = te$y # Fit the model fit = cossonet(tr_x, tr_y, family = 'gaussian', gamma = 0.95, kernel = "spline", scale = TRUE, lambda0 = exp(seq(log(2^{-4}), log(2^{0}), length.out = 20)), lambda_theta = exp(seq(log(2^{-8}), log(2^{-6}), length.out = 20)) ) # Predict new dataset pred = cossonet.predict(fit, te_x)
set.seed(20250101) tr = data_generation(n = 200, p = 20, SNR = 9, response = "continuous") tr_x = tr$x tr_y = tr$y te = data_generation(n = 1000, p = 20, SNR = 9, response = "continuous") te_x = te$x te_y = te$y # Fit the model fit = cossonet(tr_x, tr_y, family = 'gaussian', gamma = 0.95, kernel = "spline", scale = TRUE, lambda0 = exp(seq(log(2^{-4}), log(2^{0}), length.out = 20)), lambda_theta = exp(seq(log(2^{-8}), log(2^{-6}), length.out = 20)) ) # Predict new dataset pred = cossonet.predict(fit, te_x)
The function data_generation generates an example dataset for applying the cossonet function.
data_generation( n, p, rho, SNR, response = c("continuous", "binary", "count", "survival") )
data_generation( n, p, rho, SNR, response = c("continuous", "binary", "count", "survival") )
n |
observation size. |
p |
dimension. |
rho |
a positive integer indicating the correlation strength for the first four informative variables. |
SNR |
signal-to-noise ratio. |
response |
the type of the response variable. |
a list of explanatory variables, response variables, and true functions.
# Generate example data set.seed(20250101) tr = data_generation(n = 200, p = 20, SNR = 9, response = "continuous") tr_x = tr$x tr_y = tr$y te = data_generation(n = 1000, p = 20, SNR = 9, response = "continuous") te_x = te$x te_y = te$y
# Generate example data set.seed(20250101) tr = data_generation(n = 200, p = 20, SNR = 9, response = "continuous") tr_x = tr$x tr_y = tr$y te = data_generation(n = 1000, p = 20, SNR = 9, response = "continuous") te_x = te$x te_y = te$y
metric
provides a contingency table for the predicted class and the true class for binary classes.The function metric
provides a contingency table for the predicted class and the true class for binary classes.
metric(true, est)
metric(true, est)
true |
binary true class. |
est |
binary predicted class. |
a contingency table for the predicted results of binary class responses.
set.seed(20250101) tr = data_generation(n = 200, p = 20, SNR = 9, response = "continuous") tr_x = tr$x tr_y = tr$y te = data_generation(n = 1000, p = 20, SNR = 9, response = "continuous") te_x = te$x te_y = te$y # Fit the model fit = cossonet(tr_x, tr_y, family = 'gaussian', gamma = 0.95, kernel = "spline", scale = TRUE, lambda0 = exp(seq(log(2^{-4}), log(2^{0}), length.out = 20)), lambda_theta = exp(seq(log(2^{-8}), log(2^{-6}), length.out = 20)) ) # Predict new dataset pred = cossonet.predict(fit, te_x) # Calculate the contingency table for binary class true_var = c(rep(1, 4), rep(0, 20-4)) est_var = ifelse(fit$theta_step$theta.new > 0, 1, 0) metric(true_var, est_var)
set.seed(20250101) tr = data_generation(n = 200, p = 20, SNR = 9, response = "continuous") tr_x = tr$x tr_y = tr$y te = data_generation(n = 1000, p = 20, SNR = 9, response = "continuous") te_x = te$x te_y = te$y # Fit the model fit = cossonet(tr_x, tr_y, family = 'gaussian', gamma = 0.95, kernel = "spline", scale = TRUE, lambda0 = exp(seq(log(2^{-4}), log(2^{0}), length.out = 20)), lambda_theta = exp(seq(log(2^{-8}), log(2^{-6}), length.out = 20)) ) # Predict new dataset pred = cossonet.predict(fit, te_x) # Calculate the contingency table for binary class true_var = c(rep(1, 4), rep(0, 20-4)) est_var = ifelse(fit$theta_step$theta.new > 0, 1, 0) metric(true_var, est_var)