Title: | Semi-Parametric Dimension Reduction Models Using Orthogonality Constrained Optimization |
---|---|
Description: | Utilize an orthogonality constrained optimization algorithm of Wen & Yin (2013) <DOI:10.1007/s10107-012-0584-1> to solve a variety of dimension reduction problems in the semiparametric framework, such as Ma & Zhu (2012) <DOI:10.1080/01621459.2011.646925>, Ma & Zhu (2013) <DOI:10.1214/12-AOS1072>, Sun, Zhu, Wang & Zeng (2019) <DOI:10.1093/biomet/asy064> and Zhou, Zhu & Zeng (2021) <DOI:10.1093/biomet/asaa087>. The package also implements some existing dimension reduction methods such as hMave by Xia, Zhang, & Xu (2010) <DOI:10.1198/jasa.2009.tm09372> and partial SAVE by Feng, Wen & Zhu (2013) <DOI:10.1080/01621459.2012.746065>. It also serves as a general purpose optimization solver for problems with orthogonality constraints, i.e., in Stiefel manifold. Parallel computing for approximating the gradient is enabled through 'OpenMP'. |
Authors: | Ruilin Zhao [aut, cph], Ruoqing Zhu [aut, cre, cph] , Jiyang Zhang [aut, cph], Wenzhuo Zhou [aut, cph], Peng Xu [aut, cph], James Joseph Balamuta [ctb] |
Maintainer: | Ruoqing Zhu <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.6.8 |
Built: | 2024-11-09 06:32:18 UTC |
Source: | CRAN |
The CP-SIR model for right-censored survival outcome. This model is correct only under very strong assumptions, however, since it only requires an SVD, the solution is used as the initial value in the orthoDr optimization.
CP_SIR(x, y, censor, bw = silverman(1, length(y)))
CP_SIR(x, y, censor, bw = silverman(1, length(y)))
x |
A matrix for features (continuous only). |
y |
A vector of observed time. |
censor |
A vector of censoring indicator. |
bw |
Kernel bandwidth for nonparametric estimations (one-dimensional), the default is using Silverman's formula. |
A list
consisting of
values |
The eigenvalues of the estimation matrix |
vectors |
The estimated directions, ordered by eigenvalues |
Sun, Q., Zhu, R., Wang, T. and Zeng, D. (2019) "Counting Process Based Dimension Reduction Method for Censored Outcomes." Biometrika, 106(1), 181-196. DOI: doi:10.1093/biomet/asy064
# This is setting 1 in Sun et. al. (2017) with reduced sample size library(MASS) set.seed(1) N <- 200 P <- 6 V <- 0.5^abs(outer(1:P, 1:P, "-")) dataX <- as.matrix(mvrnorm(N, mu = rep(0, P), Sigma = V)) failEDR <- as.matrix(c(1, 0.5, 0, 0, 0, rep(0, P - 5))) censorEDR <- as.matrix(c(0, 0, 0, 1, 1, rep(0, P - 5))) T <- rexp(N, exp(dataX %*% failEDR)) C <- rexp(N, exp(dataX %*% censorEDR - 1)) ndr <- 1 Y <- pmin(T, C) Censor <- (T < C) # fit the model cpsir.fit <- CP_SIR(dataX, Y, Censor) distance(failEDR, cpsir.fit$vectors[, 1:ndr, drop = FALSE], "dist")
# This is setting 1 in Sun et. al. (2017) with reduced sample size library(MASS) set.seed(1) N <- 200 P <- 6 V <- 0.5^abs(outer(1:P, 1:P, "-")) dataX <- as.matrix(mvrnorm(N, mu = rep(0, P), Sigma = V)) failEDR <- as.matrix(c(1, 0.5, 0, 0, 0, rep(0, P - 5))) censorEDR <- as.matrix(c(0, 0, 0, 1, 1, rep(0, P - 5))) T <- rexp(N, exp(dataX %*% failEDR)) C <- rexp(N, exp(dataX %*% censorEDR - 1)) ndr <- 1 Y <- pmin(T, C) Censor <- (T < C) # fit the model cpsir.fit <- CP_SIR(dataX, Y, Censor) distance(failEDR, cpsir.fit$vectors[, 1:ndr, drop = FALSE], "dist")
Calculate the Gaussian kernel distance between rows of X1 and rows of X2.
As a result, this is an extension to the stats::dist()
function.
dist_cross(x1, x2)
dist_cross(x1, x2)
x1 |
First data matrix |
x2 |
Second data matrix |
A distance matrix with its (i, j)th element being the Gaussian kernel
distance between ith row of X1
jth row of X2
.
# two matrices set.seed(1) x1 <- matrix(rnorm(10), 5, 2) x2 <- matrix(rnorm(6), 3, 2) dist_cross(x1, x2)
# two matrices set.seed(1) x1 <- matrix(rnorm(10), 5, 2) x2 <- matrix(rnorm(6), 3, 2) dist_cross(x1, x2)
Calculate the distance correlation between two linear spaces.
distance(s1, s2, type = "dist", x = NULL)
distance(s1, s2, type = "dist", x = NULL)
s1 |
First space |
s2 |
Second space |
type |
Type of distance measures: |
x |
The covariate values, for canonical correlation only. |
The distance between s1
and s2
.
# two spaces failEDR <- as.matrix(cbind( c(1, 1, 0, 0, 0, 0), c(0, 0, 1, -1, 0, 0) )) B <- as.matrix(cbind( c(0.1, 1.1, 0, 0, 0, 0), c(0, 0, 1.1, -0.9, 0, 0) )) distance(failEDR, B, "dist") distance(failEDR, B, "trace") N <- 300 P <- 6 dataX <- matrix(rnorm(N * P), N, P) distance(failEDR, B, "canonical", dataX)
# two spaces failEDR <- as.matrix(cbind( c(1, 1, 0, 0, 0, 0), c(0, 0, 1, -1, 0, 0) )) B <- as.matrix(cbind( c(0.1, 1.1, 0, 0, 0, 0), c(0, 0, 1.1, -0.9, 0, 0) )) distance(failEDR, B, "dist") distance(failEDR, B, "trace") N <- 300 P <- 6 dataX <- matrix(rnorm(N * P), N, P) distance(failEDR, B, "canonical", dataX)
This is an almost direct R translation of Xia, Zhang & Xu's (2010) hMave
MATLAB
code. We implemented further options for setting a different initial
value. The computational algorithm does not utilize the orthogonality
constrained optimization.
hMave(x, y, censor, m0, B0 = NULL)
hMave(x, y, censor, m0, B0 = NULL)
x |
A matrix for features. |
y |
A vector of observed time. |
censor |
A vector of censoring indicator. |
m0 |
number of dimensions to use |
B0 |
initial value of B. This is a feature we implemented. |
A list
consisting of
B |
The estimated B matrix |
cv |
Leave one out cross-validation error |
Xia, Y., Zhang, D., & Xu, J. (2010). Dimension reduction and semiparametric estimation of survival models. Journal of the American Statistical Association, 105(489), 278-290. DOI: doi:10.1198/jasa.2009.tm09372
# generate some survival data set.seed(1) P <- 7 N <- 150 dataX <- matrix(runif(N * P), N, P) failEDR <- as.matrix(cbind(c(1, 1.3, -1.3, 1, -0.5, 0.5, -0.5, rep(0, P - 7)))) T <- exp(dataX %*% failEDR + rnorm(N)) C <- runif(N, 0, 15) Y <- pmin(T, C) Censor <- (T < C) # fit the model hMave.fit <- hMave(dataX, Y, Censor, 1)
# generate some survival data set.seed(1) P <- 7 N <- 150 dataX <- matrix(runif(N * P), N, P) failEDR <- as.matrix(cbind(c(1, 1.3, -1.3, 1, -0.5, 0.5, -0.5, rep(0, P - 7)))) T <- exp(dataX %*% failEDR + rnorm(N)) C <- runif(N, 0, 15) Y <- pmin(T, C) Censor <- (T < C) # fit the model hMave.fit <- hMave(dataX, Y, Censor, 1)
Calculate the Gaussian kernel weights between rows of X1
and rows of X2
.
kernel_weight(x1, x2, kernel = "gaussian", dist = "euclidean")
kernel_weight(x1, x2, kernel = "gaussian", dist = "euclidean")
x1 |
First data matrix |
x2 |
Second data matrix |
kernel |
The kernel function, currently only using Gaussian kernel. |
dist |
The distance metric, currently only using the Euclidean distance. |
A distance matrix, with its (i, j)th element being the kernel weights for
the i th row of X1
jth row of X2
.
# two matrices set.seed(1) x1 <- matrix(rnorm(10), 5, 2) x2 <- matrix(rnorm(6), 3, 2) kernel_weight(x1, x2)
# two matrices set.seed(1) x1 <- matrix(rnorm(10), 5, 2) x2 <- matrix(rnorm(6), 3, 2) kernel_weight(x1, x2)
A general purpose optimization solver with orthogonality constraint.
The orthogonality constrained optimization method is a nearly direct
translation from Wen and Yin (2010)'s MATLAB
code.
ortho_optim( B, fn, grad = NULL, ..., maximize = FALSE, control = list(), maxitr = 500, verbose = FALSE )
ortho_optim( B, fn, grad = NULL, ..., maximize = FALSE, control = list(), maxitr = 500, verbose = FALSE )
B |
Initial |
fn |
A function that calculate the objective function value.
The first argument should be |
grad |
A function that calculate the gradient. The first argument
should be |
... |
Arguments passed to |
maximize |
By default, the solver will try to minimize the objective
function unless |
control |
A list of tuning variables for optimization. |
maxitr |
Maximum number of iterations |
verbose |
Should information be displayed |
A orthoDr
object that consists of a list
with named entries of:
B |
The optimal |
fn |
The final functional value |
itr |
The number of iterations |
converge |
convergence code |
Wen, Z., & Yin, W. (2013). A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142(1), 397-434. DOI: doi:10.1007/s10107-012-0584-1
# an eigen value problem library(pracma) set.seed(1) n <- 100 k <- 6 A <- matrix(rnorm(n * n), n, n) A <- t(A) %*% A B <- gramSchmidt(matrix(rnorm(n * k), n, k))$Q fx <- function(B, A) -0.5 * sum(diag(t(B) %*% A %*% B)) gx <- function(B, A) -A %*% B fit <- ortho_optim(B, fx, gx, A = A) fx(fit$B, A) # compare with the solution from the eigen function sol <- eigen(A)$vectors[, 1:k] fx(sol, A)
# an eigen value problem library(pracma) set.seed(1) n <- 100 k <- 6 A <- matrix(rnorm(n * n), n, n) A <- t(A) %*% A B <- gramSchmidt(matrix(rnorm(n * k), n, k))$Q fx <- function(B, A) -0.5 * sum(diag(t(B) %*% A %*% B)) gx <- function(B, A) -A %*% B fit <- ortho_optim(B, fx, gx, A = A) fx(fit$B, A) # compare with the solution from the eigen function sol <- eigen(A)$vectors[, 1:k] fx(sol, A)
Performs the "Direct Learning & Pseudo-direct Learning" Method for personalized medicine.
orthoDr_pdose( x, a, r, ndr = ndr, B.initial = NULL, bw = NULL, lambda = 0.1, K = sqrt(length(r)), method = c("direct", "pseudo_direct"), keep.data = FALSE, control = list(), maxitr = 500, verbose = FALSE, ncore = 0 )
orthoDr_pdose( x, a, r, ndr = ndr, B.initial = NULL, bw = NULL, lambda = 0.1, K = sqrt(length(r)), method = c("direct", "pseudo_direct"), keep.data = FALSE, control = list(), maxitr = 500, verbose = FALSE, ncore = 0 )
x |
A |
a |
A |
r |
A |
ndr |
A dimension structure |
B.initial |
Initial |
bw |
A Kernel bandwidth, assuming each variables have unit variance |
lambda |
The penalty level for kernel ridge regression. If a range of values is specified, the GCV will be used to select the best tuning |
K |
A number of grids in the range of dose |
method |
Either |
keep.data |
Should the original data be kept for prediction |
control |
A list of tuning variables for optimization. |
maxitr |
Maximum number of iterations |
verbose |
Should information be displayed |
ncore |
the number of cores for parallel computing |
A orthoDr
object consisting of list
with named elements:
B |
The optimal |
fn |
The final functional value |
itr |
The number of iterations |
converge |
convergence code |
Zhou, W., Zhu, R., & Zeng, D. (2021). A parsimonious personalized dose-finding model via dimension reduction. Biometrika, 108(3), 643-659. DOI: doi:10.1093/biomet/asaa087
# generate some personalized dose scenario exampleset <- function(size, ncov) { X <- matrix(runif(size * ncov, -1, 1), ncol = ncov) A <- runif(size, 0, 2) Edr <- as.matrix(c(0.5, -0.5)) D_opt <- X %*% Edr + 1 mu <- 2 + 0.5 * (X %*% Edr) - 7 * abs(D_opt - A) R <- rnorm(length(mu), mu, 1) R <- R - min(R) datainfo <- list(X = X, A = A, R = R, D_opt = D_opt, mu = mu) return(datainfo) } # generate data set.seed(123) n <- 150 p <- 2 ndr <- 1 train <- exampleset(n, p) test <- exampleset(500, p) # the direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = 0.1, method = "direct", K = sqrt(n), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2 ) dose <- predict(orthofit, test$X) # ` # compare with the optimal dose dosedistance <- mean((test$D_opt - dose$pred)^2) print(dosedistance) # the pseudo direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = seq(0.1, 0.2, 0.01), method = "pseudo_direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2 ) dose <- predict(orthofit, test$X) # compare with the optimal dose dosedistance <- mean((test$D_opt - dose$pred)^2) print(dosedistance)
# generate some personalized dose scenario exampleset <- function(size, ncov) { X <- matrix(runif(size * ncov, -1, 1), ncol = ncov) A <- runif(size, 0, 2) Edr <- as.matrix(c(0.5, -0.5)) D_opt <- X %*% Edr + 1 mu <- 2 + 0.5 * (X %*% Edr) - 7 * abs(D_opt - A) R <- rnorm(length(mu), mu, 1) R <- R - min(R) datainfo <- list(X = X, A = A, R = R, D_opt = D_opt, mu = mu) return(datainfo) } # generate data set.seed(123) n <- 150 p <- 2 ndr <- 1 train <- exampleset(n, p) test <- exampleset(500, p) # the direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = 0.1, method = "direct", K = sqrt(n), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2 ) dose <- predict(orthofit, test$X) # ` # compare with the optimal dose dosedistance <- mean((test$D_opt - dose$pred)^2) print(dosedistance) # the pseudo direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = seq(0.1, 0.2, 0.01), method = "pseudo_direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2 ) dose <- predict(orthofit, test$X) # compare with the optimal dose dosedistance <- mean((test$D_opt - dose$pred)^2) print(dosedistance)
Performs the semiparametric dimension reduction method associated with Ma & Zhu (2012).
orthoDr_reg( x, y, method = "sir", ndr = 2, B.initial = NULL, bw = NULL, keep.data = FALSE, control = list(), maxitr = 500, verbose = FALSE, ncore = 0 )
orthoDr_reg( x, y, method = "sir", ndr = 2, B.initial = NULL, bw = NULL, keep.data = FALSE, control = list(), maxitr = 500, verbose = FALSE, ncore = 0 )
x |
A |
y |
A |
method |
Dimension reduction methods (semi-): |
ndr |
The number of directions |
B.initial |
Initial |
bw |
A Kernel bandwidth, assuming each variables have unit variance |
keep.data |
Should the original data be kept for prediction. Default
is |
control |
A list of tuning variables for optimization.
|
maxitr |
Maximum number of iterations |
verbose |
Should information be displayed |
ncore |
Number of cores for parallel computing. The default is the maximum number of threads. |
A orthoDr
object consisting of list
with named elements:
B |
The optimal |
fn |
The final functional value |
itr |
The number of iterations |
converge |
convergence code |
Ma, Y., & Zhu, L. (2012). A semiparametric approach to dimension reduction. Journal of the American Statistical Association, 107(497), 168-179. DOI: doi:10.1080/01621459.2011.646925
Ma, Y., & Zhu, L. (2013). Efficient estimation in sufficient dimension reduction. Annals of statistics, 41(1), 250. DOI: doi:10.1214/12-AOS1072
# generate some regression data set.seed(1) N <- 100 P <- 4 dataX <- matrix(rnorm(N * P), N, P) Y <- -1 + dataX[, 1] + rnorm(N) # fit the semi-sir model orthoDr_reg(dataX, Y, ndr = 1, method = "sir") # fit the semi-phd model Y <- -1 + dataX[, 1]^2 + rnorm(N) orthoDr_reg(dataX, Y, ndr = 1, method = "phd")
# generate some regression data set.seed(1) N <- 100 P <- 4 dataX <- matrix(rnorm(N * P), N, P) Y <- -1 + dataX[, 1] + rnorm(N) # fit the semi-sir model orthoDr_reg(dataX, Y, ndr = 1, method = "sir") # fit the semi-phd model Y <- -1 + dataX[, 1]^2 + rnorm(N) orthoDr_reg(dataX, Y, ndr = 1, method = "phd")
Models the data according to the counting process based semiparametric dimension reduction (IR-CP) model for right censored survival outcome.
orthoDr_surv( x, y, censor, method = "dm", ndr = ifelse(method == "forward", 1, 2), B.initial = NULL, bw = NULL, keep.data = FALSE, control = list(), maxitr = 500, verbose = FALSE, ncore = 0 )
orthoDr_surv( x, y, censor, method = "dm", ndr = ifelse(method == "forward", 1, 2), B.initial = NULL, bw = NULL, keep.data = FALSE, control = list(), maxitr = 500, verbose = FALSE, ncore = 0 )
x |
A |
y |
A |
censor |
A |
method |
Estimation equation to use. Either:
|
ndr |
The number of directions |
B.initial |
Initial |
bw |
A Kernel bandwidth, assuming each variables have unit variance. |
keep.data |
Should the original data be kept for prediction.
Default is |
control |
A list of tuning variables for optimization. |
maxitr |
Maximum number of iterations |
verbose |
Should information be displayed |
ncore |
Number of cores for parallel computing. The default is the maximum number of threads. |
A orthoDr
object consisting of list
with named elements:
B |
The optimal |
fn |
The final functional value |
itr |
The number of iterations |
converge |
convergence code |
Sun, Q., Zhu, R., Wang, T., & Zeng, D. (2019). Counting process-based dimension reduction methods for censored outcomes. Biometrika, 106(1), 181-196. DOI: doi:10.1093/biomet/asy064
# This is setting 1 in Sun et. al. (2017) with reduced sample size library(MASS) set.seed(1) N <- 200 P <- 6 V <- 0.5^abs(outer(1:P, 1:P, "-")) dataX <- as.matrix(mvrnorm(N, mu = rep(0, P), Sigma = V)) failEDR <- as.matrix(c(1, 0.5, 0, 0, 0, rep(0, P - 5))) censorEDR <- as.matrix(c(0, 0, 0, 1, 1, rep(0, P - 5))) T <- rexp(N, exp(dataX %*% failEDR)) C <- rexp(N, exp(dataX %*% censorEDR - 1)) ndr <- 1 Y <- pmin(T, C) Censor <- (T < C) # fit the model forward.fit <- orthoDr_surv(dataX, Y, Censor, method = "forward") distance(failEDR, forward.fit$B, "dist") dn.fit <- orthoDr_surv(dataX, Y, Censor, method = "dn", ndr = ndr) distance(failEDR, dn.fit$B, "dist") dm.fit <- orthoDr_surv(dataX, Y, Censor, method = "dm", ndr = ndr) distance(failEDR, dm.fit$B, "dist")
# This is setting 1 in Sun et. al. (2017) with reduced sample size library(MASS) set.seed(1) N <- 200 P <- 6 V <- 0.5^abs(outer(1:P, 1:P, "-")) dataX <- as.matrix(mvrnorm(N, mu = rep(0, P), Sigma = V)) failEDR <- as.matrix(c(1, 0.5, 0, 0, 0, rep(0, P - 5))) censorEDR <- as.matrix(c(0, 0, 0, 1, 1, rep(0, P - 5))) T <- rexp(N, exp(dataX %*% failEDR)) C <- rexp(N, exp(dataX %*% censorEDR - 1)) ndr <- 1 Y <- pmin(T, C) Censor <- (T < C) # fit the model forward.fit <- orthoDr_surv(dataX, Y, Censor, method = "forward") distance(failEDR, forward.fit$B, "dist") dn.fit <- orthoDr_surv(dataX, Y, Censor, method = "dn", ndr = ndr) distance(failEDR, dn.fit$B, "dist") dm.fit <- orthoDr_surv(dataX, Y, Censor, method = "dm", ndr = ndr) distance(failEDR, dm.fit$B, "dist")
orthoDr
modelsThe prediction function for orthoDr
fitted models
## S3 method for class 'orthoDr' predict(object, testx, ...)
## S3 method for class 'orthoDr' predict(object, testx, ...)
object |
A fitted |
testx |
Testing data |
... |
Additional parameters, not used. |
The predicted object
# generate some survival data N <- 100 P <- 4 dataX <- matrix(rnorm(N * P), N, P) Y <- exp(-1 + dataX[, 1] + rnorm(N)) Censor <- rbinom(N, 1, 0.8) # fit the model with keep.data = TRUE orthoDr.fit <- orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm", keep.data = TRUE ) # predict 10 new observations predict(orthoDr.fit, matrix(rnorm(10 * P), 10, P)) # generate some personalized dose scenario exampleset <- function(size, ncov) { X <- matrix(runif(size * ncov, -1, 1), ncol = ncov) A <- runif(size, 0, 2) Edr <- as.matrix(c(0.5, -0.5)) D_opt <- X %*% Edr + 1 mu <- 2 + 0.5 * (X %*% Edr) - 7 * abs(D_opt - A) R <- rnorm(length(mu), mu, 1) R <- R - min(R) datainfo <- list(X = X, A = A, R = R, D_opt = D_opt, mu = mu) return(datainfo) } # generate data set.seed(123) n <- 150 p <- 2 ndr <- 1 train <- exampleset(n, p) test <- exampleset(500, p) # the direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = 0.1, method = "direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2 ) predict(orthofit, test$X) # the pseudo direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = seq(0.1, 0.2, 0.01), method = "pseudo_direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2 ) predict(orthofit, test$X)
# generate some survival data N <- 100 P <- 4 dataX <- matrix(rnorm(N * P), N, P) Y <- exp(-1 + dataX[, 1] + rnorm(N)) Censor <- rbinom(N, 1, 0.8) # fit the model with keep.data = TRUE orthoDr.fit <- orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm", keep.data = TRUE ) # predict 10 new observations predict(orthoDr.fit, matrix(rnorm(10 * P), 10, P)) # generate some personalized dose scenario exampleset <- function(size, ncov) { X <- matrix(runif(size * ncov, -1, 1), ncol = ncov) A <- runif(size, 0, 2) Edr <- as.matrix(c(0.5, -0.5)) D_opt <- X %*% Edr + 1 mu <- 2 + 0.5 * (X %*% Edr) - 7 * abs(D_opt - A) R <- rnorm(length(mu), mu, 1) R <- R - min(R) datainfo <- list(X = X, A = A, R = R, D_opt = D_opt, mu = mu) return(datainfo) } # generate data set.seed(123) n <- 150 p <- 2 ndr <- 1 train <- exampleset(n, p) test <- exampleset(500, p) # the direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = 0.1, method = "direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2 ) predict(orthofit, test$X) # the pseudo direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = seq(0.1, 0.2, 0.01), method = "pseudo_direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2 ) predict(orthofit, test$X)
orthoDr
objectProvides a custom print wrapper for displaying orthoDr
fitted models.
## S3 method for class 'orthoDr' print(x, ...)
## S3 method for class 'orthoDr' print(x, ...)
x |
A fitted |
... |
Additional parameters, not used. |
Sliently returns the orthoDr
object supplied into the function to
allow for use with pipes.
# generate some survival data N <- 100 P <- 4 dataX <- matrix(rnorm(N * P), N, P) Y <- exp(-1 + dataX[, 1] + rnorm(N)) Censor <- rbinom(N, 1, 0.8) # fit the model orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm")
# generate some survival data N <- 100 P <- 4 dataX <- matrix(rnorm(N * P), N, P) Y <- exp(-1 + dataX[, 1] + rnorm(N)) Censor <- rbinom(N, 1, 0.8) # fit the model orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm")
The partial-SAVE model. This model is correct only under very strong
assumptions, the solution is used as the initial value in the orthoDr
optimization.
pSAVE(x, a, r, ndr = 2, nslices0 = 2)
pSAVE(x, a, r, ndr = 2, nslices0 = 2)
x |
A |
a |
A |
r |
A |
ndr |
The dimension structure |
nslices0 |
Number of slides used for save |
A list
consisting of:
vectors |
The basis of central subspace, ordered by eigenvalues |
Feng, Z., Wen, X. M., Yu, Z., & Zhu, L. (2013). On partial sufficient dimension reduction with applications to partially linear multi-index models. Journal of the American Statistical Association, 108(501), 237-246. DOI: doi:10.1080/01621459.2012.746065
A simple Silverman's rule of thumb bandwidth calculation.
silverman(d, n)
silverman(d, n)
d |
Number of dimension |
n |
Number of observation |
A simple bandwidth choice
silverman(1, 300)
silverman(1, 300)
The clinical variables of the SKCM dataset. The original data was obtained from The Cancer Genome Atlas (TCGA).
skcm.clinical
skcm.clinical
Contains 469 subjects with 156 failures. Each row contains one subject, subject ID is indicated by row name. Variables include:
Time
Censor
Gender
Age
Note: Age
has 8 missing values.
https://www.cancer.gov/ccg/research/genome-sequencing/tcga
The expression of top 20 genes of cutaneous melanoma literature based on the MelGene Database.
skcm.melgene
skcm.melgene
Each row contains one subject, subject ID is indicated by row name. Gene names in the columns. The columns are scaled.
Chatzinasiou, Foteini, Christina M. Lill, Katerina Kypreou, Irene Stefanaki, Vasiliki Nicolaou, George Spyrou, Evangelos Evangelou et al. "Comprehensive field synopsis and systematic meta-analyses of genetic association studies in cutaneous melanoma." Journal of the National Cancer Institute 103, no. 16 (2011): 1227-1235. Emmanouil I. Athanasiadis, Kyriaki Antonopoulou, Foteini Chatzinasiou, Christina M. Lill, Marilena M. Bourdakou, Argiris Sakellariou, Katerina Kypreou, Irene Stefanaki, Evangelos Evangelou, John P.A. Ioannidis, Lars Bertram, Alexander J. Stratigos, George M. Spyrou, A Web-based database of genetic association studies in cutaneous melanoma enhanced with network-driven data exploration tools, Database, Volume 2014, 2014, bau101, https://doi.org/10.1093/database/bau101 https://www.cancer.gov/ccg/research/genome-sequencing/tcga
Produce 2D or 3D plots of right censored survival data based on a given dimension reduction space
view_dr_surv( x, y, censor, B = NULL, bw = NULL, FUN = "log", type = "2D", legend.add = TRUE, xlab = "Reduced Direction", ylab = "Time", zlab = "Survival" )
view_dr_surv( x, y, censor, B = NULL, bw = NULL, FUN = "log", type = "2D", legend.add = TRUE, xlab = "Reduced Direction", ylab = "Time", zlab = "Survival" )
x |
A |
y |
A |
censor |
A |
B |
The dimension reduction subspace, can only be 1 dimensional |
bw |
A Kernel bandwidth (3D plot only) for approximating the survival function, default is the Silverman's formula |
FUN |
A scaling function applied to the time points |
type |
|
legend.add |
Should legend be added (2D plot only) |
xlab |
x axis label |
ylab |
y axis label |
zlab |
z axis label |
An rgl
object that is rendered.
Sun, Q., Zhu, R., Wang, T., & Zeng, D. (2019). Counting process-based dimension reduction methods for censored outcomes. Biometrika, 106(1), 181-196. DOI: doi:10.1093/biomet/asy064
# generate some survival data N <- 100 P <- 4 dataX <- matrix(rnorm(N * P), N, P) Y <- exp(-1 + dataX[, 1] + rnorm(N)) Censor <- rbinom(N, 1, 0.8) orthoDr.fit <- orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm") view_dr_surv(dataX, Y, Censor, orthoDr.fit$B)
# generate some survival data N <- 100 P <- 4 dataX <- matrix(rnorm(N * P), N, P) Y <- exp(-1 + dataX[, 1] + rnorm(N)) Censor <- rbinom(N, 1, 0.8) orthoDr.fit <- orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm") view_dr_surv(dataX, Y, Censor, orthoDr.fit$B)