Title: | Global Adaptive Generative Adjustment Algorithm for Generalized Linear Models |
---|---|
Description: | Fits linear regression, logistic and multinomial regression models, Poisson regression, Cox model via Global Adaptive Generative Adjustment Algorithm. For more detailed information, see Bin Wang, Xiaofei Wang and Jianhua Guo (2022) <arXiv:1911.00658>. This paper provides the theoretical properties of Gaga linear model when the load matrix is orthogonal. Further study is going on for the nonorthogonal cases and generalized linear models. These works are in part supported by the National Natural Foundation of China (No.12171076). |
Authors: | Bin Wang [aut, cre], Xiaofei Wang [ctb], Jianhua Guo [ths] |
Maintainer: | Bin Wang <[email protected]> |
License: | GPL-2 |
Version: | 0.6.2 |
Built: | 2024-10-31 21:22:56 UTC |
Source: | CRAN |
Calculate ACC for classification, the inputs must be characters
cal.acc(predictions, truelabels)
cal.acc(predictions, truelabels)
predictions |
predictions |
truelabels |
true labels |
ACC
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size))
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size))
Computes Harrel's C index for predictions from a "cox"
object.
cal.cindex(pred, y, weights = rep(1, nrow(y)))
cal.cindex(pred, y, weights = rep(1, nrow(y)))
pred |
Predictions from a |
y |
a survival response object - a matrix with two columns "time" and "status"; see documentation for "glmnet" or see documentation for "GAGA" |
weights |
optional observation weights |
Computes the concordance index, taking into account censoring. This file fully references the Cindex.R file in glmnet package.
Harrel's C index
Trevor Hastie <[email protected]>
Harrel Jr, F. E. and Lee, K. L. and Mark, D. B. (1996) Tutorial in biostatistics: multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing error, Statistics in Medicine, 15, pages 361–387.
cv.glmnet
set.seed(10101) N = 1000 p = 30 nzc = p/3 x = matrix(rnorm(N * p), N, p) beta = rnorm(nzc) fx = x[, seq(nzc)] %*% beta/3 hx = exp(fx) ty = rexp(N, hx) tcens = rbinom(n = N, prob = 0.3, size = 1) # censoring indicator y = cbind(time = ty, status = 1 - tcens) # y=Surv(ty,1-tcens) with library(survival) fit = GAGAs(x, y, family = "cox") pred = predict(fit, newx = x) cat("\n Cindex:", cal.cindex(pred, y))
set.seed(10101) N = 1000 p = 30 nzc = p/3 x = matrix(rnorm(N * p), N, p) beta = rnorm(nzc) fx = x[, seq(nzc)] %*% beta/3 hx = exp(fx) ty = rexp(N, hx) tcens = rbinom(n = N, prob = 0.3, size = 1) # censoring indicator y = cbind(time = ty, status = 1 - tcens) # y=Surv(ty,1-tcens) with library(survival) fit = GAGAs(x, y, family = "cox") pred = predict(fit, newx = x) cat("\n Cindex:", cal.cindex(pred, y))
Calculate F1 score for classification, the inputs must be characters, and each of these elements must be either 'FALSE' or 'TRUE'.
cal.F1Score(predictions, truelabels)
cal.F1Score(predictions, truelabels)
predictions |
predictions |
truelabels |
true labels |
F1 score
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta cat("\n F1 score:", cal.F1Score(as.character(Eb!=0),as.character(beta_true!=0)))
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta cat("\n F1 score:", cal.F1Score(as.character(Eb!=0),as.character(beta_true!=0)))
Calculate the weighted ACC of the classification, the inputs must be characters
cal.w.acc(predictions, truelabels)
cal.w.acc(predictions, truelabels)
predictions |
predictions |
truelabels |
true labels |
weighted ACC
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size))
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size))
Fit a Cox model via the Global Adaptive Generative Adjustment algorithm. Part of this function refers to the coxphfit function in MATLAB 2016b.
cox_GAGA( X, t, alpha = 2, itrNum = 20, thresh = 0.001, flag = TRUE, lamda_0 = 0.5, fdiag = TRUE, subItrNum = 20 )
cox_GAGA( X, t, alpha = 2, itrNum = 20, thresh = 0.001, flag = TRUE, lamda_0 = 0.5, fdiag = TRUE, subItrNum = 20 )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
t |
A n*2 matrix, one column should be named "time", indicating the survival time; the other column must be named "status", and consists of 0 and 1, 0 indicates that the row of data is censored, 1 is opposite. |
alpha |
Hyperparameter. The suggested value for alpha is 2 or 3. |
itrNum |
Maximum number of iteration steps. In general, 20 steps are enough.
If the condition number of |
thresh |
Convergence threshold for beta Change, if |
flag |
It identifies whether to make model selection. The default is |
lamda_0 |
The initial value of the regularization parameter for ridge regression. The running result of the algorithm is not sensitive to this value. |
fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
subItrNum |
Maximum number of steps for subprocess iterations. |
Coefficient vector.
set.seed(2022) p_size = 50 sample_size = 500 test_size = 1000 R1 = 3 R2 = 1 ratio = 0.5 #The ratio of zeroes in coefficients censoringRate = 0.25 #Proportion of censoring data in observation data # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,-R2,R2) beta_true[ind] = 0 # Generate training samples X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) z = X%*%beta_true u = runif(sample_size,0,1) t = ((-log(1-u)/(3*exp(z)))*100)^(0.1) cs = rep(0,sample_size) csNum = round(censoringRate*sample_size) ind = sample(1:sample_size,csNum) cs[ind] = 1 t[ind] = runif(csNum,0,0.8)*t[ind] y = cbind(t,1 - cs) colnames(y) = c("time", "status") #Estimation fit = GAGAs(X,y,alpha=2,family="cox") Eb = fit$beta #Generate testing samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) z = X_t%*%beta_true u = runif(test_size,0,1) t = ((-log(1-u)/(3*exp(z)))*100)^(0.1) cs = rep(0,test_size) csNum = round(censoringRate*test_size) ind = sample(1:test_size,csNum) cs[ind] = 1 t[ind] = runif(csNum,0,0.8)*t[ind] y_t = cbind(t,1 - cs) colnames(y_t) = c("time", "status") #Prediction pred = predict(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n Cindex:", cal.cindex(pred,y_t))
set.seed(2022) p_size = 50 sample_size = 500 test_size = 1000 R1 = 3 R2 = 1 ratio = 0.5 #The ratio of zeroes in coefficients censoringRate = 0.25 #Proportion of censoring data in observation data # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,-R2,R2) beta_true[ind] = 0 # Generate training samples X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) z = X%*%beta_true u = runif(sample_size,0,1) t = ((-log(1-u)/(3*exp(z)))*100)^(0.1) cs = rep(0,sample_size) csNum = round(censoringRate*sample_size) ind = sample(1:sample_size,csNum) cs[ind] = 1 t[ind] = runif(csNum,0,0.8)*t[ind] y = cbind(t,1 - cs) colnames(y) = c("time", "status") #Estimation fit = GAGAs(X,y,alpha=2,family="cox") Eb = fit$beta #Generate testing samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) z = X_t%*%beta_true u = runif(test_size,0,1) t = ((-log(1-u)/(3*exp(z)))*100)^(0.1) cs = rep(0,test_size) csNum = round(censoringRate*test_size) ind = sample(1:test_size,csNum) cs[ind] = 1 t[ind] = runif(csNum,0,0.8)*t[ind] y_t = cbind(t,1 - cs) colnames(y_t) = c("time", "status") #Prediction pred = predict(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n Cindex:", cal.cindex(pred,y_t))
Fit a Cox model via the Global Adaptive Generative Adjustment algorithm. Part of this function refers to the coxphfit function in MATLAB 2016b.
cpp_COX_gaga( X, y, cens, alpha = 2, itrNum = 50L, thresh = 0.001, flag = TRUE, lamda_0 = 0.5, fdiag = TRUE, subItrNum = 20L )
cpp_COX_gaga( X, y, cens, alpha = 2, itrNum = 50L, thresh = 0.001, flag = TRUE, lamda_0 = 0.5, fdiag = TRUE, subItrNum = 20L )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
A n*1 matrix, indicating the survival time; |
cens |
A n*1 matrix, consists of 0 and 1, 1 indicates that the row of data is censored, 0 is opposite. |
alpha |
Hyperparameter. The suggested value for alpha is 2 or 3. |
itrNum |
Maximum number of iteration steps. In general, 20 steps are enough. |
thresh |
Convergence threshold for beta Change, if |
flag |
It identifies whether to make model selection. The default is |
lamda_0 |
The initial value of the regularization parameter for ridge regression. |
fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
subItrNum |
Maximum number of steps for subprocess iterations. |
Coefficient vector
Fit a logistic model via the Global Adaptive Generative Adjustment algorithm using cpp
cpp_logistic_gaga( X, y, s_alpha, s_itrNum, s_thresh, s_flag, s_lamda_0, s_fdiag, s_subItrNum )
cpp_logistic_gaga( X, y, s_alpha, s_itrNum, s_thresh, s_flag, s_lamda_0, s_fdiag, s_subItrNum )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
should be either a factor with two levels. |
s_alpha |
Hyperparameter. The suggested value for alpha is 1 or 2. When the collinearity of the load matrix is serious, the hyperparameters can be selected larger, such as 5. |
s_itrNum |
The number of iteration steps. In general, 20 steps are enough.
If the condition number of |
s_thresh |
Convergence threshold for beta Change, if |
s_flag |
It identifies whether to make model selection. The default is |
s_lamda_0 |
The initial value of the regularization parameter for ridge regression. The running result of the algorithm is not sensitive to this value. |
s_fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
s_subItrNum |
Maximum number of steps for subprocess iterations. |
Coefficient vector.
Fit a multinomial model the Global Adaptive Generative Adjustment algorithm
cpp_multinomial_gaga( X, y, s_alpha, s_itrNum, s_thresh, s_flag, s_lamda_0, s_fdiag, s_subItrNum )
cpp_multinomial_gaga( X, y, s_alpha, s_itrNum, s_thresh, s_flag, s_lamda_0, s_fdiag, s_subItrNum )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
a One-hot response matrix or a |
s_alpha |
Hyperparameter. The suggested value for alpha is 1 or 2. When the collinearity of the load matrix is serious, the hyperparameters can be selected larger, such as 5. |
s_itrNum |
The number of iteration steps. In general, 20 steps are enough.
If the condition number of |
s_thresh |
Convergence threshold for beta Change, if |
s_flag |
It identifies whether to make model selection. The default is |
s_lamda_0 |
The initial value of the regularization parameter for ridge regression. The running result of the algorithm is not sensitive to this value. |
s_fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
s_subItrNum |
Maximum number of steps for subprocess iterations. |
Coefficient matrix with K-1 columns, where K is the class number. For k=1,..,K-1, the probability
. For k=K, the probability
.
Fit a poisson model the Global Adaptive Generative Adjustment algorithm
cpp_poisson_gaga( X, y, s_alpha, s_itrNum, s_thresh, s_flag, s_lamda_0, s_fdiag, s_subItrNum )
cpp_poisson_gaga( X, y, s_alpha, s_itrNum, s_thresh, s_flag, s_lamda_0, s_fdiag, s_subItrNum )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
Non-negative count response vector. |
s_alpha |
Hyperparameter. The suggested value for alpha is 1 or 2. When the collinearity of the load matrix is serious, the hyperparameters can be selected larger, such as 5. |
s_itrNum |
The number of iteration steps. In general, 20 steps are enough.
If the condition number of |
s_thresh |
Convergence threshold for beta Change, if |
s_flag |
It identifies whether to make model selection. The default is |
s_lamda_0 |
The initial value of the regularization parameter for ridge regression. The running result of the algorithm is not sensitive to this value. |
s_fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
s_subItrNum |
Maximum number of steps for subprocess iterations. |
Coefficient vector.
Fits linear, logistic and multinomial, poisson, and Cox regression models via Global Adaptive Generative Adjustment algorithm.
Fits linear, logistic and multinomial, poisson, and Cox regression models via the Global Adaptive Generative Adjustment algorithm.
GAGAs( X, y, family = c("gaussian", "binomial", "poisson", "multinomial", "cox"), alpha = 2, itrNum = 100, thresh = 0.001, QR_flag = FALSE, flag = TRUE, lamda_0 = 0.001, fdiag = TRUE, frp = TRUE, subItrNum = 20 )
GAGAs( X, y, family = c("gaussian", "binomial", "poisson", "multinomial", "cox"), alpha = 2, itrNum = 100, thresh = 0.001, QR_flag = FALSE, flag = TRUE, lamda_0 = 0.001, fdiag = TRUE, frp = TRUE, subItrNum = 20 )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
Response variable. Quantitative for |
family |
Either a character string representing one of the built-in families,
|
alpha |
Hyperparameter. In general, alpha can be set to 1, 2 or 3.
for |
itrNum |
The number of iteration steps. In general, 20 steps are enough.
If the condition number of |
thresh |
Convergence threshold for beta Change, if |
QR_flag |
It identifies whether to use QR decomposition to speed up the algorithm. Currently only valid for linear models. |
flag |
It identifies whether to make model selection. The default is |
lamda_0 |
The initial value of the regularization parameter for ridge regression. The running result of the algorithm is not sensitive to this value. |
fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
frp |
Identifies if a method is preprocessed to reduce the number of parameters |
subItrNum |
Maximum number of steps for subprocess iterations. |
Regression coefficients
GAGAs
# Gaussian set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size)) # binomial set.seed(2022) cat("\n") cat("Test binomial GAGA\n") p_size = 30 sample_size=600 test_size=1000 R1 = 1 R2 = 3 ratio = 0.5 #The ratio of zeroes in coefficients #Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,R2*0.2,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 t = 1/(1+exp(-X%*%beta_true)) tmp = runif(sample_size,0,1) y = rep(0,sample_size) y[t>tmp] = 1 fit = GAGAs(X,y,family = "binomial", alpha = 1) Eb = fit$beta #Generate test samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) X_t[1:test_size,1]=1 t = 1/(1+exp(-X_t%*%beta_true)) tmp = runif(test_size,0,1) y_t = rep(0,test_size) y_t[t>tmp] = 1 #Prediction Ey = predict(fit,newx = X_t) cat("\n--------------------") cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n pacc:", cal.w.acc(as.character(Ey),as.character(y_t))) cat("\n") # multinomial set.seed(2022) cat("\n") cat("Test multinomial GAGA\n") p_size = 20 C = 3 classnames = c("C1","C2","C3","C4") sample_size = 500 test_size = 1000 ratio = 0.5 #The ratio of zeroes in coefficients Num = 10 # Total number of experiments R1 = 1 R2 = 5 #Set the true coefficients beta_true = matrix(rep(0,p_size*C),c(p_size,C)) zeroNum = round(ratio*p_size) for(jj in 1:C){ ind = sample(1:p_size,zeroNum) tmp = runif(p_size,0,R2) tmp[ind] = 0 beta_true[,jj] = tmp } #Generate training samples X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 z = X%*%beta_true t = exp(z)/(1+rowSums(exp(z))) t = cbind(t,1-rowSums(t)) tt = t(apply(t,1,cumsum)) tt = cbind(rep(0,sample_size),tt) # y = matrix(rep(0,sample_size*(C+1)),c(sample_size,C+1)) y = rep(0,sample_size) for(jj in 1:sample_size){ tmp = runif(1,0,1) for(kk in 1:(C+1)){ if((tmp>tt[jj,kk])&&(tmp<=tt[jj,kk+1])){ # y[jj,kk] = 1 y[jj] = kk break } } } y = classnames[y] fit = GAGAs(X, y,alpha=1,family = "multinomial") Eb = fit$beta #Prediction #Generate test samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) X_t[1:test_size,1]=1 z = X_t%*%beta_true t = exp(z)/(1+rowSums(exp(z))) t = cbind(t,1-rowSums(t)) tt = t(apply(t,1,cumsum)) tt = cbind(rep(0,test_size),tt) y_t = rep(0,test_size) for(jj in 1:test_size){ tmp = runif(1,0,1) for(kk in 1:(C+1)){ if((tmp>tt[jj,kk])&&(tmp<=tt[jj,kk+1])){ y_t[jj] = kk break } } } y_t = classnames[y_t] Ey = predict(fit,newx = X_t) cat("\n--------------------") cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n pacc:", cal.w.acc(as.character(Ey),as.character(y_t))) cat("\n") # Poisson set.seed(2022) p_size = 30 sample_size=300 R1 = 1/sqrt(p_size) R2 = 5 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 y = rpois(sample_size,lambda = as.vector(exp(X%*%beta_true))) y = as.vector(y) # Estimate fit = GAGAs(X,y,alpha = 2,family="poisson") Eb = fit$beta cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) # cox p_size = 50 sample_size = 500 test_size = 1000 R1 = 3 R2 = 1 ratio = 0.5 #The ratio of zeroes in coefficients censoringRate = 0.25 #Proportion of censoring data in observation data # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,-R2,R2) beta_true[ind] = 0 # Generate training samples X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) z = X%*%beta_true u = runif(sample_size,0,1) t = ((-log(1-u)/(3*exp(z)))*100)^(0.1) cs = rep(0,sample_size) csNum = round(censoringRate*sample_size) ind = sample(1:sample_size,csNum) cs[ind] = 1 t[ind] = runif(csNum,0,0.8)*t[ind] y = cbind(t,1 - cs) colnames(y) = c("time", "status") #Estimation fit = GAGAs(X,y,alpha=2,family="cox") Eb = fit$beta #Generate testing samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) z = X_t%*%beta_true u = runif(test_size,0,1) t = ((-log(1-u)/(3*exp(z)))*100)^(0.1) cs = rep(0,test_size) csNum = round(censoringRate*test_size) ind = sample(1:test_size,csNum) cs[ind] = 1 t[ind] = runif(csNum,0,0.8)*t[ind] y_t = cbind(t,1 - cs) colnames(y_t) = c("time", "status") #Prediction pred = predict(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n Cindex:", cal.cindex(pred,y_t))
# Gaussian set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size)) # binomial set.seed(2022) cat("\n") cat("Test binomial GAGA\n") p_size = 30 sample_size=600 test_size=1000 R1 = 1 R2 = 3 ratio = 0.5 #The ratio of zeroes in coefficients #Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,R2*0.2,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 t = 1/(1+exp(-X%*%beta_true)) tmp = runif(sample_size,0,1) y = rep(0,sample_size) y[t>tmp] = 1 fit = GAGAs(X,y,family = "binomial", alpha = 1) Eb = fit$beta #Generate test samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) X_t[1:test_size,1]=1 t = 1/(1+exp(-X_t%*%beta_true)) tmp = runif(test_size,0,1) y_t = rep(0,test_size) y_t[t>tmp] = 1 #Prediction Ey = predict(fit,newx = X_t) cat("\n--------------------") cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n pacc:", cal.w.acc(as.character(Ey),as.character(y_t))) cat("\n") # multinomial set.seed(2022) cat("\n") cat("Test multinomial GAGA\n") p_size = 20 C = 3 classnames = c("C1","C2","C3","C4") sample_size = 500 test_size = 1000 ratio = 0.5 #The ratio of zeroes in coefficients Num = 10 # Total number of experiments R1 = 1 R2 = 5 #Set the true coefficients beta_true = matrix(rep(0,p_size*C),c(p_size,C)) zeroNum = round(ratio*p_size) for(jj in 1:C){ ind = sample(1:p_size,zeroNum) tmp = runif(p_size,0,R2) tmp[ind] = 0 beta_true[,jj] = tmp } #Generate training samples X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 z = X%*%beta_true t = exp(z)/(1+rowSums(exp(z))) t = cbind(t,1-rowSums(t)) tt = t(apply(t,1,cumsum)) tt = cbind(rep(0,sample_size),tt) # y = matrix(rep(0,sample_size*(C+1)),c(sample_size,C+1)) y = rep(0,sample_size) for(jj in 1:sample_size){ tmp = runif(1,0,1) for(kk in 1:(C+1)){ if((tmp>tt[jj,kk])&&(tmp<=tt[jj,kk+1])){ # y[jj,kk] = 1 y[jj] = kk break } } } y = classnames[y] fit = GAGAs(X, y,alpha=1,family = "multinomial") Eb = fit$beta #Prediction #Generate test samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) X_t[1:test_size,1]=1 z = X_t%*%beta_true t = exp(z)/(1+rowSums(exp(z))) t = cbind(t,1-rowSums(t)) tt = t(apply(t,1,cumsum)) tt = cbind(rep(0,test_size),tt) y_t = rep(0,test_size) for(jj in 1:test_size){ tmp = runif(1,0,1) for(kk in 1:(C+1)){ if((tmp>tt[jj,kk])&&(tmp<=tt[jj,kk+1])){ y_t[jj] = kk break } } } y_t = classnames[y_t] Ey = predict(fit,newx = X_t) cat("\n--------------------") cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n pacc:", cal.w.acc(as.character(Ey),as.character(y_t))) cat("\n") # Poisson set.seed(2022) p_size = 30 sample_size=300 R1 = 1/sqrt(p_size) R2 = 5 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 y = rpois(sample_size,lambda = as.vector(exp(X%*%beta_true))) y = as.vector(y) # Estimate fit = GAGAs(X,y,alpha = 2,family="poisson") Eb = fit$beta cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) # cox p_size = 50 sample_size = 500 test_size = 1000 R1 = 3 R2 = 1 ratio = 0.5 #The ratio of zeroes in coefficients censoringRate = 0.25 #Proportion of censoring data in observation data # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,-R2,R2) beta_true[ind] = 0 # Generate training samples X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) z = X%*%beta_true u = runif(sample_size,0,1) t = ((-log(1-u)/(3*exp(z)))*100)^(0.1) cs = rep(0,sample_size) csNum = round(censoringRate*sample_size) ind = sample(1:sample_size,csNum) cs[ind] = 1 t[ind] = runif(csNum,0,0.8)*t[ind] y = cbind(t,1 - cs) colnames(y) = c("time", "status") #Estimation fit = GAGAs(X,y,alpha=2,family="cox") Eb = fit$beta #Generate testing samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) z = X_t%*%beta_true u = runif(test_size,0,1) t = ((-log(1-u)/(3*exp(z)))*100)^(0.1) cs = rep(0,test_size) csNum = round(censoringRate*test_size) ind = sample(1:test_size,csNum) cs[ind] = 1 t[ind] = runif(csNum,0,0.8)*t[ind] y_t = cbind(t,1 - cs) colnames(y_t) = c("time", "status") #Prediction pred = predict(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n Cindex:", cal.cindex(pred,y_t))
Fit a linear model with a Gaussian noise via the Global Adaptive Generative Adjustment algorithm
LM_GAGA( X, y, alpha = 3, itrNum = 50, thresh = 0.001, QR_flag = FALSE, flag = TRUE, lamda_0 = 0.001, fix_sigma = FALSE, sigm2_0 = 1, fdiag = TRUE, frp = TRUE )
LM_GAGA( X, y, alpha = 3, itrNum = 50, thresh = 0.001, QR_flag = FALSE, flag = TRUE, lamda_0 = 0.001, fix_sigma = FALSE, sigm2_0 = 1, fdiag = TRUE, frp = TRUE )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
Quantitative response vector. |
alpha |
Hyperparameter. The suggested value for alpha is 2 or 3. When the collinearity of the load matrix is serious, the hyperparameters can be selected larger, such as 5. |
itrNum |
The number of iteration steps. In general, 20 steps are enough.
If the condition number of |
thresh |
Convergence threshold for beta Change, if |
QR_flag |
It identifies whether to use QR decomposition to speed up the algorithm. Currently only valid for linear models. |
flag |
It identifies whether to make model selection. The default is |
lamda_0 |
The initial value of the regularization parameter for ridge regression. The running result of the algorithm is not sensitive to this value. |
fix_sigma |
It identifies whether to update the variance estimate of the Gaussian noise or not.
|
sigm2_0 |
The initial variance of the Gaussian noise. |
fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
frp |
Identifies whether pre-processing is performed by the OMP method to reduce the number of parameters |
Coefficient vector.
# Gaussian set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 # The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size))
# Gaussian set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 # The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size))
Fit a logistic model via the Global Adaptive Generative Adjustment algorithm
logistic_GAGA( X, y, alpha = 1, itrNum = 30, thresh = 0.001, flag = TRUE, lamda_0 = 0.001, fdiag = TRUE, subItrNum = 20 )
logistic_GAGA( X, y, alpha = 1, itrNum = 30, thresh = 0.001, flag = TRUE, lamda_0 = 0.001, fdiag = TRUE, subItrNum = 20 )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
should be either a factor with two levels. |
alpha |
Hyperparameter. The suggested value for alpha is 1 or 2. When the collinearity of the load matrix is serious, the hyperparameters can be selected larger, such as 5. |
itrNum |
The number of iteration steps. In general, 20 steps are enough.
If the condition number of |
thresh |
Convergence threshold for beta Change, if |
flag |
It identifies whether to make model selection. The default is |
lamda_0 |
The initial value of the regularization parameter for ridge regression. The running result of the algorithm is not sensitive to this value. |
fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
subItrNum |
Maximum number of steps for subprocess iterations. |
Coefficient vector.
# binomial set.seed(2022) cat("\n") cat("Test binomial GAGA\n") p_size = 30 sample_size=600 test_size=1000 R1 = 1 R2 = 3 ratio = 0.5 #The ratio of zeroes in coefficients #Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,R2*0.2,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 t = 1/(1+exp(-X%*%beta_true)) tmp = runif(sample_size,0,1) y = rep(0,sample_size) y[t>tmp] = 1 fit = GAGAs(X,y,family = "binomial", alpha = 1) Eb = fit$beta #Generate test samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) X_t[1:test_size,1]=1 t = 1/(1+exp(-X_t%*%beta_true)) tmp = runif(test_size,0,1) y_t = rep(0,test_size) y_t[t>tmp] = 1 #Prediction Ey = predict(fit,newx = X_t) cat("\n--------------------") cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n pacc:", cal.w.acc(as.character(Ey),as.character(y_t))) cat("\n")
# binomial set.seed(2022) cat("\n") cat("Test binomial GAGA\n") p_size = 30 sample_size=600 test_size=1000 R1 = 1 R2 = 3 ratio = 0.5 #The ratio of zeroes in coefficients #Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,R2*0.2,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 t = 1/(1+exp(-X%*%beta_true)) tmp = runif(sample_size,0,1) y = rep(0,sample_size) y[t>tmp] = 1 fit = GAGAs(X,y,family = "binomial", alpha = 1) Eb = fit$beta #Generate test samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) X_t[1:test_size,1]=1 t = 1/(1+exp(-X_t%*%beta_true)) tmp = runif(test_size,0,1) y_t = rep(0,test_size) y_t[t>tmp] = 1 #Prediction Ey = predict(fit,newx = X_t) cat("\n--------------------") cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n pacc:", cal.w.acc(as.character(Ey),as.character(y_t))) cat("\n")
Fit a multinomial model the Global Adaptive Generative Adjustment algorithm
multinomial_GAGA( X, y, alpha = 1, itrNum = 50, thresh = 0.001, flag = TRUE, lamda_0 = 0.001, fdiag = TRUE, subItrNum = 20 )
multinomial_GAGA( X, y, alpha = 1, itrNum = 50, thresh = 0.001, flag = TRUE, lamda_0 = 0.001, fdiag = TRUE, subItrNum = 20 )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
a One-hot response matrix or a |
alpha |
Hyperparameter. The suggested value for alpha is 1 or 2. When the collinearity of the load matrix is serious, the hyperparameters can be selected larger, such as 5. |
itrNum |
The number of iteration steps. In general, 20 steps are enough.
If the condition number of |
thresh |
Convergence threshold for beta Change, if |
flag |
It identifies whether to make model selection. The default is |
lamda_0 |
The initial value of the regularization parameter for ridge regression. The running result of the algorithm is not sensitive to this value. |
fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
subItrNum |
Maximum number of steps for subprocess iterations. |
Coefficient matrix with K-1 columns, where K is the class number. For k=1,..,K-1, the probability
. For k=K, the probability
.
# multinomial set.seed(2022) cat("\n") cat("Test multinomial GAGA\n") p_size = 20 C = 3 classnames = c("C1","C2","C3","C4") sample_size = 500 test_size = 1000 ratio = 0.5 #The ratio of zeroes in coefficients Num = 10 # Total number of experiments R1 = 1 R2 = 5 #Set the true coefficients beta_true = matrix(rep(0,p_size*C),c(p_size,C)) zeroNum = round(ratio*p_size) for(jj in 1:C){ ind = sample(1:p_size,zeroNum) tmp = runif(p_size,0,R2) tmp[ind] = 0 beta_true[,jj] = tmp } #Generate training samples X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 z = X%*%beta_true t = exp(z)/(1+rowSums(exp(z))) t = cbind(t,1-rowSums(t)) tt = t(apply(t,1,cumsum)) tt = cbind(rep(0,sample_size),tt) # y = matrix(rep(0,sample_size*(C+1)),c(sample_size,C+1)) y = rep(0,sample_size) for(jj in 1:sample_size){ tmp = runif(1,0,1) for(kk in 1:(C+1)){ if((tmp>tt[jj,kk])&&(tmp<=tt[jj,kk+1])){ # y[jj,kk] = 1 y[jj] = kk break } } } y = classnames[y] fit = GAGAs(X, y,alpha=1,family = "multinomial") Eb = fit$beta #Prediction #Generate test samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) X_t[1:test_size,1]=1 z = X_t%*%beta_true t = exp(z)/(1+rowSums(exp(z))) t = cbind(t,1-rowSums(t)) tt = t(apply(t,1,cumsum)) tt = cbind(rep(0,test_size),tt) y_t = rep(0,test_size) for(jj in 1:test_size){ tmp = runif(1,0,1) for(kk in 1:(C+1)){ if((tmp>tt[jj,kk])&&(tmp<=tt[jj,kk+1])){ y_t[jj] = kk break } } } y_t = classnames[y_t] Ey = predict(fit,newx = X_t) cat("\n--------------------") cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n pacc:", cal.w.acc(as.character(Ey),as.character(y_t))) cat("\n")
# multinomial set.seed(2022) cat("\n") cat("Test multinomial GAGA\n") p_size = 20 C = 3 classnames = c("C1","C2","C3","C4") sample_size = 500 test_size = 1000 ratio = 0.5 #The ratio of zeroes in coefficients Num = 10 # Total number of experiments R1 = 1 R2 = 5 #Set the true coefficients beta_true = matrix(rep(0,p_size*C),c(p_size,C)) zeroNum = round(ratio*p_size) for(jj in 1:C){ ind = sample(1:p_size,zeroNum) tmp = runif(p_size,0,R2) tmp[ind] = 0 beta_true[,jj] = tmp } #Generate training samples X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 z = X%*%beta_true t = exp(z)/(1+rowSums(exp(z))) t = cbind(t,1-rowSums(t)) tt = t(apply(t,1,cumsum)) tt = cbind(rep(0,sample_size),tt) # y = matrix(rep(0,sample_size*(C+1)),c(sample_size,C+1)) y = rep(0,sample_size) for(jj in 1:sample_size){ tmp = runif(1,0,1) for(kk in 1:(C+1)){ if((tmp>tt[jj,kk])&&(tmp<=tt[jj,kk+1])){ # y[jj,kk] = 1 y[jj] = kk break } } } y = classnames[y] fit = GAGAs(X, y,alpha=1,family = "multinomial") Eb = fit$beta #Prediction #Generate test samples X_t = R1*matrix(rnorm(test_size * p_size), ncol = p_size) X_t[1:test_size,1]=1 z = X_t%*%beta_true t = exp(z)/(1+rowSums(exp(z))) t = cbind(t,1-rowSums(t)) tt = t(apply(t,1,cumsum)) tt = cbind(rep(0,test_size),tt) y_t = rep(0,test_size) for(jj in 1:test_size){ tmp = runif(1,0,1) for(kk in 1:(C+1)){ if((tmp>tt[jj,kk])&&(tmp<=tt[jj,kk+1])){ y_t[jj] = kk break } } } y_t = classnames[y_t] Ey = predict(fit,newx = X_t) cat("\n--------------------") cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n pacc:", cal.w.acc(as.character(Ey),as.character(y_t))) cat("\n")
Fit a Poisson model the Global Adaptive Generative Adjustment algorithm
poisson_GAGA( X, y, alpha = 1, itrNum = 30, thresh = 0.001, flag = TRUE, lamda_0 = 0.5, fdiag = TRUE, subItrNum = 20 )
poisson_GAGA( X, y, alpha = 1, itrNum = 30, thresh = 0.001, flag = TRUE, lamda_0 = 0.5, fdiag = TRUE, subItrNum = 20 )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
Non-negative count response vector. |
alpha |
Hyperparameter. The suggested value for alpha is 1 or 2. When the collinearity of the load matrix is serious, the hyperparameters can be selected larger, such as 5. |
itrNum |
The number of iteration steps. In general, 20 steps are enough.
If the condition number of |
thresh |
Convergence threshold for beta Change, if |
flag |
It identifies whether to make model selection. The default is |
lamda_0 |
The initial value of the regularization parameter for ridge regression. The running result of the algorithm is not sensitive to this value. |
fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
subItrNum |
Maximum number of steps for subprocess iterations. |
Coefficient vector.
# Poisson set.seed(2022) p_size = 30 sample_size=300 R1 = 1/sqrt(p_size) R2 = 5 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 y = rpois(sample_size,lambda = as.vector(exp(X%*%beta_true))) y = as.vector(y) # Estimate fit = GAGAs(X,y,alpha = 2,family="poisson") Eb = fit$beta cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0)))
# Poisson set.seed(2022) p_size = 30 sample_size=300 R1 = 1/sqrt(p_size) R2 = 5 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) X[1:sample_size,1]=1 y = rpois(sample_size,lambda = as.vector(exp(X%*%beta_true))) y = as.vector(y) # Estimate fit = GAGAs(X,y,alpha = 2,family="poisson") Eb = fit$beta cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0)))
Get predictions from a GAGA cox model fit object
predict_cox_GAGA(fit, newx)
predict_cox_GAGA(fit, newx)
fit |
Fitted "GAGA" object. |
newx |
Matrix of new values for x at which predictions are to be made. Must be a
matrix. If the intercept term needs to be considered in the estimation process, then
the first column of |
Predictions
Get predictions from a GAGA linear model fit object
predict_LM_GAGA(fit, newx)
predict_LM_GAGA(fit, newx)
fit |
Fitted "GAGA" object. |
newx |
Matrix of new values for x at which predictions are to be made. Must be a
matrix. If the intercept term needs to be considered in the estimation process, then
the first column of |
Predictions
Get predictions from a GAGA logistic model fit object
predict_logistic_GAGA(fit, newx)
predict_logistic_GAGA(fit, newx)
fit |
Fitted "GAGA" object. |
newx |
Matrix of new values for x at which predictions are to be made. Must be a
matrix. If the intercept term needs to be considered in the estimation process, then
the first column of |
Predictions
Get predictions from a GAGA multinomial model fit object
predict_multinomial_GAGA(fit, newx)
predict_multinomial_GAGA(fit, newx)
fit |
Fitted "GAGA" object. |
newx |
Matrix of new values for x at which predictions are to be made. Must be a
matrix. If the intercept term needs to be considered in the estimation process, then
the first column of |
Predictions
Get predictions from a GAGA poisson model fit object
predict_poisson_GAGA(fit, newx)
predict_poisson_GAGA(fit, newx)
fit |
Fitted "GAGA" object. |
newx |
Matrix of new values for x at which predictions are to be made. Must be a
matrix. If the intercept term needs to be considered in the estimation process, then
the first column of |
Predictions
Gives fitted values from a fitted GAGA object.
## S3 method for class 'GAGA' predict(object, newx, ...)
## S3 method for class 'GAGA' predict(object, newx, ...)
object |
Fitted "GAGA" object. |
newx |
Matrix of new values for x at which predictions are to be made. Must be a
matrix. If the intercept term needs to be considered in the estimation process, then
the first column of |
... |
some other params |
Predictions
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size))
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") Eb = fit$beta #Create testing data X_t = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y_t=X_t%*%beta_true + rnorm(sample_size,mean=0,sd=2) #Prediction Ey = predict.GAGA(fit,newx=X_t) cat("\n err:", norm(Eb-beta_true,type="2")/norm(beta_true,type="2")) cat("\n acc:", cal.w.acc(as.character(Eb!=0),as.character(beta_true!=0))) cat("\n perr:", norm(Ey-y_t,type="2")/sqrt(sample_size))
Fit a linear model via the GAGA algorithm using cpp.
rcpp_lm_gaga( X, y, s_alpha, s_itrNum, s_thresh, s_QR_flag, s_flag, s_lamda_0, s_fix_sigma, s_sigm2_0, s_fdiag, s_frp )
rcpp_lm_gaga( X, y, s_alpha, s_itrNum, s_thresh, s_QR_flag, s_flag, s_lamda_0, s_fix_sigma, s_sigm2_0, s_fdiag, s_frp )
X |
Input matrix, of dimension nobs*nvars; each row is an observation.
If the intercept term needs to be considered in the estimation process, then the first column of |
y |
Quantitative response N*1 matrix. |
s_alpha |
Hyperparameter. The suggested value for alpha is 2 or 3. |
s_itrNum |
The number of iteration steps. In general, 20 steps are enough. |
s_thresh |
Convergence threshold for beta Change, if |
s_QR_flag |
It identifies whether to use QR decomposition to speed up the algorithm. |
s_flag |
It identifies whether to make model selection. The default is |
s_lamda_0 |
The initial value of the regularization parameter for ridge regression. |
s_fix_sigma |
It identifies whether to update the variance estimate of the Gaussian noise or not. |
s_sigm2_0 |
The initial variance of the Gaussian noise. |
s_fdiag |
It identifies whether to use diag Approximation to speed up the algorithm. |
s_frp |
Pre-processing by OMP method to reduce the number of parameters |
Coefficient vector
Print a summary of GAGA object
## S3 method for class 'GAGA' summary(object, ...)
## S3 method for class 'GAGA' summary(object, ...)
object |
Fitted "GAGA" object. |
... |
some other params |
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") summary(fit)
set.seed(2022) p_size = 30 sample_size=300 R1 = 3 R2 = 2 ratio = 0.5 #The ratio of zeroes in coefficients # Set the true coefficients zeroNum = round(ratio*p_size) ind = sample(1:p_size,zeroNum) beta_true = runif(p_size,0,R2) beta_true[ind] = 0 X = R1*matrix(rnorm(sample_size * p_size), ncol = p_size) y=X%*%beta_true + rnorm(sample_size,mean=0,sd=2) # Estimation fit = GAGAs(X,y,alpha = 3,family="gaussian") summary(fit)