Title: | Deep Neural Network Tools for Probability and Statistic Models |
---|---|
Description: | Contains tools to build deep neural network with flexible users define loss function and probability models. Several applications included in this package are, 1) The (deepAFT) model, a deep neural network model for accelerated failure time (AFT) model for survival data. 2) The (deepGLM) model, a deep neural network model for generalized linear model (glm) for continuous, categorical and Poisson data. |
Authors: | Bingshu E. Chen [aut, cre], Patrick Norman [aut, ctb], Wenyu Jiang [ctb], Wanlu Li [ctb] |
Maintainer: | Bingshu E. Chen <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0.6 |
Built: | 2024-11-10 06:41:30 UTC |
Source: | CRAN |
This package provides tools for deep neural network which allow user define loss function for complex outcome data with probability and statistics models such as generalized linear models, accelerated failure time (AFT) models, and Cox proportional hazards models.
It contains the essential building blocks such as feed forward network and back propagation. This gives users the flexibility to write their own loss function (i.e. cost function) and train the neural network.
{dnn} is a R package for deep learning neural network with probability models that use the negative of the log-likelihood as the loss function. It provides functions for feed forward network from covariates to the output layer and back propagation to find the derivatives of the weight parameters. Different optimization methods such as stochastic gradient descent (SGD), Momentum and ADAM can be used to train the network.
Currently, { dnn } can be install by
the package source file 'dnn.tar.gz', use
install.packages("dnn.tar.gz", repos = NULL, type = "source")
users can use the following steps to install the most recent version of 'dnn' package:
1. First, you need to install the 'devtools' package. You can skip this step if you have 'devtools' installed in your R. Invoke R and then type
install.packages("devtools")
2. Load the devtools package.
library(devtools)
3. Install "dnn" package from github with R command
install_github("statapps/dnn")
A stable version of View the "dnn" package is also available from the Comprehensive R Archive Network (https://CRAN.R-project.org/package=dnn) and can be installed using R command
install.packages("dnn")
Bingshu E. Chen
Maintainer: Bingshu E. Chen <[email protected]>
dNNmodel
,
bwdNN
,
fwdNN
,
deepAFT
,
deepGLM
,
deepSurv
,
coxph
,
glm
survival
# Create the models with 3 layers model = dNNmodel(units=c(8, 6, 1), activation = c('elu', 'relu', 'sigmoid'), input_shape = c(3)) print(model) # # Feed forward network with dummy data x x = matrix(runif(15), nrow = 5, ncol = 3) cache = fwdNN(x, model) # # Back propagation with dummy dy = dL/dyhat and minin batch for SGD dy = as.matrix(runif(5, -0.1, 0.1), nrow = 5) dW = bwdNN(dy, cache, model) # # Gradient descent with SGD lr_rate = 0.0001 sgd = function(w, dw) {w-lr_rate*dw} model$params = mapply(sgd, w = model$params, dw = dW)
# Create the models with 3 layers model = dNNmodel(units=c(8, 6, 1), activation = c('elu', 'relu', 'sigmoid'), input_shape = c(3)) print(model) # # Feed forward network with dummy data x x = matrix(runif(15), nrow = 5, ncol = 3) cache = fwdNN(x, model) # # Back propagation with dummy dy = dL/dyhat and minin batch for SGD dy = as.matrix(runif(5, -0.1, 0.1), nrow = 5) dW = bwdNN(dy, cache, model) # # Gradient descent with SGD lr_rate = 0.0001 sgd = function(w, dw) {w-lr_rate*dw} model$params = mapply(sgd, w = model$params, dw = dW)
Different type of activation functions and the corresponding derivatives
sigmoid(x) elu(x) relu(x) lrelu(x) idu(x) dsigmoid(y) delu(y) drelu(y) dlrelu(y) dtanh(y) #activation function tanh(x) is already available in R
sigmoid(x) elu(x) relu(x) lrelu(x) idu(x) dsigmoid(y) delu(y) drelu(y) dlrelu(y) dtanh(y) #activation function tanh(x) is already available in R
x |
input of the activation function |
y |
input of the derivative of the activation function |
Each function returns either the activation function (e.g. sigmoid, relu) or its derivative (e.g. dsigmoid, drelu).
An activation function is applied to x and returns a matrix the same size as x. The detail formula for each activation function is:
sigmoid |
return 1/(1+exp(-x)) |
elu |
return x for x>0 and exp(x)-1 for x<0 |
relu |
return x for x>0 and 0 for x<0 |
lrelu |
return x for x>0 and 0.1*x for x<0 |
tanh |
return tanh(x) |
idu |
return (x) |
Bingshu E. Chen
bwdNN
,
fwdNN
,
dNNmodel
,
optimizerSGD
,
optimizerNAG
# Specify a dnn nodel with user define activation function in layer 2. softmax = function(x) {log(1+exp(x))} # y = log(1+exp(x)) dsoftmax = function(y) {sigmoid(y)} # x = exp(y)/(1+exp(y)) model = dNNmodel(units=c(8, 6, 1), activation= c('relu', 'softmax', 'sigmoid'), input_shape = c(3)) print(model)
# Specify a dnn nodel with user define activation function in layer 2. softmax = function(x) {log(1+exp(x))} # y = log(1+exp(x)) dsoftmax = function(y) {sigmoid(y)} # x = exp(y)/(1+exp(y)) model = dNNmodel(units=c(8, 6, 1), activation= c('relu', 'softmax', 'sigmoid'), input_shape = c(3)) print(model)
{bwdNN} is an R function for back propagation in DNN network.
# # To apply back propagation in with a feed forward model # # use # bwdNN(dy, cache, model) # # to calculate derivative of dL/dW
# # To apply back propagation in with a feed forward model # # use # bwdNN(dy, cache, model) # # to calculate derivative of dL/dW
dy |
the derivative of the cost function with respect to the output layer of the fwdNN function. |
cache |
the cached output of fwdNN. |
model |
a model return from dNNmodel function. |
Here 'dy' plays an import role in the back propagation { bwdNN } since the probability model's loss function takes the output layer of the { dnn } (denote as yhat) as one of its parameter. Then 'dy' equals to the partial derivative of the loss function (-Log Likelihood) with respect to yhat, that is, dy = dL/d(yhat). For example, if the 'dnn' predicts the probability (yhat = p) for the mixture of two populations f1 and f2, then the likelihood function is f = p*f1 + (1-p)*f2, and the loss function is L = -log(p*f1+(1-p)*f2). Hence, dy = dL/dp = -(f1-f2)/f.
'cache' is the cache of each input layer generated from the { fwdNN } function.
The function { bwdCheck } calculates the numerical derivatives of dL/dW, which can be used to check if the back propagation is correct or not, see example below.
A list contains the derivatives of weight parameter W is returned.
Bingshu E. Chen ([email protected])
dNNmodel
,
fwdNN
,
plot.dNNmodel
,
print.dNNmodel
,
summary.dNNmodel
,
### define a dnn model, calculate the feed forward network model = dNNmodel(units = c(8, 6, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) print(model) x = matrix(runif(15), nrow = 5, ncol = 3) cache = fwdNN(x, model) # dy = dL/dp, where L is the cost function such as the # log-likehood and p is the output layer parameter of the DNN dy = as.matrix(runif(5, -0.1, 0.1), nrow = 5) # a dummy dy for bwdNN input y = predict(model, x) + dy # back propagation dW = bwdNN(dy, cache, model) dw = bwdCheck(x, y, model) print(dW[[1]]) print(dw[[1]])
### define a dnn model, calculate the feed forward network model = dNNmodel(units = c(8, 6, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) print(model) x = matrix(runif(15), nrow = 5, ncol = 3) cache = fwdNN(x, model) # dy = dL/dp, where L is the cost function such as the # log-likehood and p is the output layer parameter of the DNN dy = as.matrix(runif(5, -0.1, 0.1), nrow = 5) # a dummy dy for bwdNN input y = predict(model, x) + dy # back propagation dW = bwdNN(dy, cache, model) dw = bwdCheck(x, y, model) print(dW[[1]]) print(dw[[1]])
Fit a deep learning survival regression model. These are location-scale models for an arbitrary transform of the time variable; the most common cases use a log transformation, leading to accelerated failure time models.
deepAFT(x, ...) ## S3 method for class 'formula' deepAFT(formula, model, data, control = list(...), method = c("BuckleyJames", "ipcw", "transform"), ...) ## Default S3 method: deepAFT(x, y, model, control, ...) ## S3 method for class 'ipcw' deepAFT(x, y, model, control, ...) # use: # deepAFT.ipcw(x, y, model, control) # or # class(x) = "ipcw" # deepAFT(x, y, model, control) # ## S3 method for class 'trans' deepAFT(x, y, model, control, ...) # use: # class(x) = "transform" # deepAFT(x, y, model, control)
deepAFT(x, ...) ## S3 method for class 'formula' deepAFT(formula, model, data, control = list(...), method = c("BuckleyJames", "ipcw", "transform"), ...) ## Default S3 method: deepAFT(x, y, model, control, ...) ## S3 method for class 'ipcw' deepAFT(x, y, model, control, ...) # use: # deepAFT.ipcw(x, y, model, control) # or # class(x) = "ipcw" # deepAFT(x, y, model, control) # ## S3 method for class 'trans' deepAFT(x, y, model, control, ...) # use: # class(x) = "transform" # deepAFT(x, y, model, control)
formula |
a formula expression as for other regression models. The response is usually a survival object as returned by the 'Surv' function. See the documentation for 'Surv', 'lm' and 'formula' for details. |
model |
deep neural network model, see below for details. |
data |
a data.frame in which to interpret the variables named in the formula. |
x |
Covariates for the AFT model |
y |
Surv object for the AFT model |
method |
methods to handle censoring data in deep AFT model fit, 'BuckleyJames' for Buckley and James method, 'ipcw' for inverse probability censoring weights method. 'transform' for transformation based on book of Fan and Gijbels (1996, page 168) |
control |
a list of control values, in the format produced by 'dnnControl'. The default value 'dnnControl()' |
... |
optional arguments |
See "Deep learning with R" for details on how to build a deep learning model.
The following parameters in 'dnnControl' will be used to control the model fit process.
'epochs': number of deep learning epochs, default is 100.
'batch_size': batch size, default is 128. 'NaN' may be generated if batch size is too small and there is not event in a batch.
'verbose': verbose = 1 for print out verbose during the model fit, 0 for not print.
'epsilon': epsilon for convergence check, default is epsilon = 0.001.
'max.iter': number of maximum iteration, default is max.iter = 100.
'censor.groups': a vector for censoring groups. A KM curve for censoring will be fit for each group. If a matrix is provided, then a Cox model will be used to predict the censoring probability.
When the variance for covariance matrix X is too large, please use xbar = apply(x, 2, stndx) to standardize X.
An object of class "deepAFT" is returned. The deepAFT object contains the following list components:
x |
Covariates for the AFT model |
y |
Survival object for the AFT model, y = Surv(time, event) |
model |
A fitted artificial neural network (ANN) model |
mean.ipt |
mean survival or censoring time |
predictor |
predictor score mu = f(x) |
risk |
risk score = exp(predictor) |
method |
method for deepAFT fitting, either Buckley-James, IPCW or transformed model |
For right censored survival time only
Chen, B. E. and Norman P.
Buckley, J. and James, I. (1979). Linear regression with cencored data. Biometrika, 66, page 429-436.
Norman, P. Li, W., Jiang, W. and Chen, B. E. (2024). DeepAFT: A nonparametric accelerated failure time model with artificial neural network. Manuscript submitted to Statistics in Medicine.
Chollet, F. and Allaire J. J. (2017). Deep learning with R. Manning.
print.deepAFT
, survreg
, ibs.deepAFT
## Example for deep learning model for AFT survival data set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(15), nrow = 5, ncol = 3) time = exp(x[, 1]) status = c(1, 0, 1, 1, 1) fit = deepAFT(Surv(time, status) ~ x, model)
## Example for deep learning model for AFT survival data set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(15), nrow = 5, ncol = 3) time = exp(x[, 1]) status = c(1, 0, 1, 1, 1) fit = deepAFT(Surv(time, status) ~ x, model)
Fit generalized linear models (Gaussian, Binomial and Poisson) using deep learning neural network (DNN). The glm formula is specified by giving a symbolic description of the predictor and a description of the error distribution.
deepGlm(formula, model, family = c("gaussian", "binomial", "poisson"), data, epochs = 200, lr_rate = 1e-04, batch_size = 64, alpha = 0.7, lambda = 1, verbose = 0, weights = NULL, ...)
deepGlm(formula, model, family = c("gaussian", "binomial", "poisson"), data, epochs = 200, lr_rate = 1e-04, batch_size = 64, alpha = 0.7, lambda = 1, verbose = 0, weights = NULL, ...)
formula |
a formula expression as for other regression models. The response is usually an object for glm response variable. See the documentation for 'glm', 'lm' and 'formula' for details. |
model |
a deep neural network model, created by function dNNmodel(). |
family |
a description of the error distribution and link function to be used in the model. This can be either a character string of 'gaussian', 'binomial', or 'poisson', naming a family function, or result of a call to a family function (See 'family' for details of family functions).) |
data |
a data.frame in which to interpret the variables named in the formula. |
epochs |
number of deep learning epochs, default is 200. |
batch_size |
batch size, default is 64. 'NaN' may be generated if batch size is too small and there is not event in a batch. |
lr_rate |
learning rate for the gradient descent algorithm, default is lr_rate = 1e-04. |
weights |
an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector. |
alpha |
momentum rate for the gradient descent method, alpha takes value in [0, 1), default is alpha = 0.70. |
lambda |
L2 regularization parameter for deep learning. |
verbose |
verbose = 1 for print out verbose during the model fit, 0 for not print. |
... |
optional arguments |
See dNNmodel
for details on how to specify a deep learning model.
The following parameters in 'dnnControl' will be used to control the model fit process.
'epochs': number of deep learning epochs, default is 30.
'verbose': verbose = 1 for print out verbose during the model fit, 0 for not print.
When the variance for covariance matrix X is too large, please use xbar = scale(x) to standardize X.
An object of class "deepGlm" is returned. The deepGlm object contains the following list components:
x |
Covariates for glm model |
y |
Object for glm model |
model |
dnn model |
predictor |
predictor score mu = f(x) |
risk |
risk score = exp(predictor) |
For glm models with Gaussian, Binomial and Poisson only
Chen, B. E.
Chollet, F. and Allaire J. J. (2017). Deep learning with R. Manning.
deepAFT
, dNNmodel
,
predict.deepGlm
,
print.deepSurv
, glm
## Example for deep learning for glm models set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(15), nrow = 5, ncol = 3) y = exp(x[, 1] + rnorm(5)) fit = deepGlm(y ~ x, model, family = "gaussian")
## Example for deep learning for glm models set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(15), nrow = 5, ncol = 3) y = exp(x[, 1] + rnorm(5)) fit = deepGlm(y ~ x, model, family = "gaussian")
Fit a survival regression model under the Cox proportional hazards assumption using deep learning neural network (DNN).
deepSurv(formula, model, data, epochs = 200, lr_rate = 1e-04, batch_size = 64, alpha = 0.7, lambda = 1, verbose = 0, weights = NULL, ...)
deepSurv(formula, model, data, epochs = 200, lr_rate = 1e-04, batch_size = 64, alpha = 0.7, lambda = 1, verbose = 0, weights = NULL, ...)
formula |
a formula expression as for other regression models. The response is usually a survival object as returned by the 'Surv' function. See the documentation for 'Surv', 'lm' and 'formula' for details. |
model |
a deep neural network model, created by function dNNmodel(). |
data |
a data.frame in which to interpret the variables named in the formula. |
epochs |
number of deep learning epochs, default is 200. |
batch_size |
batch size, default is 64. 'NaN' may be generated if batch size is too small and there is not event in a batch. |
lr_rate |
learning rate for the gradient descent algorithm, default is lr_rate = 1e-04. |
weights |
an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector. |
alpha |
momentum rate for the gradient descent method, alpha takes value in [0, 1), default is alpha = 0.70. |
lambda |
L2 regularization parameter for deep learning. |
verbose |
verbose = 1 for print out verbose during the model fit, 0 for not print. |
... |
optional arguments |
See "Deep learning with R" for details on how to build a deep learning model.
The following parameters in 'dnnControl' will be used to control the model fit process.
'epochs': number of deep learning epochs, default is 30.
'verbose': verbose = 1 for print out verbose during the model fit, 0 for not print.
'epsilon': epsilon for convergence check, default is epsilon = 0.001.
'max.iter': number of maximum iteration, default is max.iter = 30.
When the variance for covariance matrix X is too large, please use xbar = scale(x) to standardize X.
An object of class "deepSurv" is returned. The deepSurv object contains the following list components:
x |
Covariates for Cox model |
y |
Surv object for Cox model |
model |
dnn model |
predictor |
predictor score mu = f(x) |
risk |
risk score = exp(predictor) |
For right censored survival time only
Chen, B. E. wrote the R code using the partial likelihood cost function proposed by Katzman et al (2018).
Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology 2018; 18: 24.
deepAFT
, deepGlm
,
print.deepSurv
, survreg
## Example for deep learning proportional hazards survival model set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(15), nrow = 5, ncol = 3) time = exp(x[, 1]) status = c(1, 0, 1, 1, 1) fit = deepSurv(Surv(time, status) ~ x, model = model)
## Example for deep learning proportional hazards survival model set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(15), nrow = 5, ncol = 3) time = exp(x[, 1]) status = c(1, 0, 1, 1, 1) fit = deepSurv(Surv(time, status) ~ x, model = model)
dnnFit
dnnFit
dnnControl is an auxiliary function for dnnFit
. Typically only used internally by the dnn package, may be used to construct a control argument for the deep learning neural network model to specify parameters such as a loss function.
dnnControl(loss = c("mse", "cox", "bin", "log", "mae"), epochs = 300, batch_size = 64, verbose = 0, lr_rate = 0.0001, alpha = 0.5, lambda = 1.0, epsilon = 0.01, max.iter = 100, censor.group = NULL, weights = NULL)
dnnControl(loss = c("mse", "cox", "bin", "log", "mae"), epochs = 300, batch_size = 64, verbose = 0, lr_rate = 0.0001, alpha = 0.5, lambda = 1.0, epsilon = 0.01, max.iter = 100, censor.group = NULL, weights = NULL)
loss |
loss function for the neural network model, "mse" for mean square error (guassian glm model), "mae" for mean absolute error, "cox" for the Cox partial likelihood (proportional hazards model), "bin" for cross-entropy (binomial glm model), "log" for log-linear (poisson glm model). |
epochs |
number of deep learning epochs, default is 30. |
batch_size |
batch size, default is 64. 'NaN' may be generated if batch size is too small and there is not event in a batch. |
lr_rate |
learning rate, default is 0.0001. |
weights |
an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector, default is NULL. |
alpha |
alpha decay rate for momentum gradient descent, default is 0.5. |
lambda |
regularization term for dnn weighting parameters, 0.5*lambda*W*W), default is 1.0. |
verbose |
verbose = 1 for print out verbose during the model fit, 0 for not print. |
epsilon |
epsilon for convergence check, default is epsilon = 0.01. |
max.iter |
number of maximum iteration, default is max.iter = 100. This is used in the deepAFT function |
censor.group |
a vector for censoring groups. A KM curve for censoring will be fit for each group. If a matrix is provided, then a Cox model will be used to predict the censoring probability. Used only in the deepAFT function. |
dnnControl is used in model fitting of "dnnFit". Additional loss functions will be added to the library in the future.
This function checks the internal consistency and returns a list of values as input to control model fitting of "dnnFit".
For right censored survival time only
Chen, B. E.
Norman, P. and Chen, B. E. (2023). DeepAFAT: A nonparametric accelerated failure time model with artificial neural network. Manuscript to be submitted.
deepAFT
, deepGLM
, deepSurv
, dnnFit
## Example for dnnControl ## # model = dNNmodel() control = dnnControl(loss='mse') # can also be used in # fit = dnnFit(y ~ x, model, control) # print(fit)
## Example for dnnControl ## # model = dNNmodel() control = dnnControl(loss='mse') # can also be used in # fit = dnnFit(y ~ x, model, control) # print(fit)
dnnFit is used to train a deep learning neural network model based on a specified loss function.
dnnFit(x, y, model, control)
dnnFit(x, y, model, control)
x |
covariates for the neural network model |
y |
output (target) value for neural network model |
model |
the neural network model, see below for details |
control |
a list of control values, in the format produced by 'dnnControl'. The default value is dnnControl(loss='mse') |
The 'dnnFit' function takes the input data, the target values, the network architecture, and the loss function as arguments, and returns a trained model that minimizes the loss function. The function also supports various options for regularization and optimization of the model.
See dNNmodel
for details on how to specify a deep learning model.
Parameters in dnnControl
will be used to control the model fit process. The loss function can be specified as dnnControl(loss = "lossFunction"). Currently, the following loss functions are supported:
'mse': Mean square error loss = 0.5*sum(dy^2)
'cox': Cox partial likelihood loss = -sum(delta*(yhat - log(S0)))
'bin': Cross-entropy = -sum(y*log(p) + (1-y)*log(1-p))
'log': Log linear cost = -sum(y*log(lambda)-lambda)
'mae': Mean absolute error loss = sum(abs(dy))
Additional loss functions will be added to the library in the future.
{ dnnFit2 } is a C++ version of dnnFit, which runs about 20% faster, however, only loss = 'mse' and 'cox' are currently supported.
When the variance for covariance matrix X is too large, please use xbar = scale(x) to standardize X.
An object of class "dnnFit" is returned. The dnnFit object contains the following list components:
cost |
cost at the final epoch. |
dW |
the gradient at the final epoch dW = dL/dW. |
fitted.values |
predictor value mu = f(x). |
history |
a cost history at each epoch. |
lp |
predictor value mu = f(x). |
logLik |
-2*log Likelihood = cost. |
model |
a dNNmodel object. |
residuals |
raw residual dy = d log(L)/dmu |
dvi |
deviance dvi = dy*dy |
Chen, B. E. and Norman P.
Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66, page 429-436.
Norman, P. and Chen, B. E. (2019). DeepAFAT: A nonparametric accelerated failure time model with artificial neural network. Manuscript to be submitted.
Chollet, F. and Allaire J. J. (2017). Deep learning with R. Manning.
deepAFT
, deepGlm
, deepSurv
, dnnControl
## Example for dnnFit with MSE loss function to do a non-linear regression set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(15), nrow = 5, ncol = 3) y = exp(x[, 1]) control = dnnControl(loss='mse') fit = dnnFit(x, y, model, control)
## Example for dnnFit with MSE loss function to do a non-linear regression set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(15), nrow = 5, ncol = 3) y = exp(x[, 1]) control = dnnControl(loss='mse') fit = dnnFit(x, y, model, control)
{dNNmodel} is an R function to create a deep neural network model that is to be used in the feed forward network { fwdNN } and back propagation { bwdNN }.
dNNmodel(units, activation=NULL, input_shape = NULL, type = NULL, N = NULL, Rcpp=TRUE, optimizer = c("momentum", "nag", "adam"))
dNNmodel(units, activation=NULL, input_shape = NULL, type = NULL, N = NULL, Rcpp=TRUE, optimizer = c("momentum", "nag", "adam"))
units |
number of nodes for each layer |
activation |
activation function |
input_shape |
the number of columns of input X, default is NULL. |
N |
the number of training sample, default is NULL. |
type |
default is "dense", currently only support dense layer. |
Rcpp |
use Rcpp (C++ for R) to speed up the fwdNN and bwdNN, default is "TRUE". |
optimizer |
optimizer used in SGD, default is "momentum". |
dNNmodel returns an object of class "dNNmodel".
The function "print" (i.e., "print.dNNmodel") can be used to print a summary of the dnn model,
The function "summary" (i.e., "summary.dNNmodel") can be used to print a summary of the dnn model,
An object of class "dNNmodel" is a list containing at least the following components:
units |
number of nodes for each layer |
activation |
activation function |
drvfun |
derivative of the activation function |
params |
the initial values of the parameters, to be updated in model training. |
input_shape |
the number of columns of input X, default is NULL. |
N |
the number of training sample, default is NULL. |
type |
default is "dense", currently only support dense layer. |
Bingshu E. Chen ([email protected])
plot.dNNmodel
,
print.dNNmodel
,
summary.dNNmodel
,
fwdNN
,
bwdNN
,
optimizerSGD
,
optimizerNAG
,
### To define a dnn model model = dNNmodel(units = c(8, 6, 1), activation = c("relu", "sigmoid", "sigmoid"), input_shape = c(3))
### To define a dnn model model = dNNmodel(units = c(8, 6, 1), activation = c("relu", "sigmoid", "sigmoid"), input_shape = c(3))
{fwdNN} is an R function for feed forward network.
fwdNN(X, model) # # to calculate a feed feedward model #
fwdNN(X, model) # # to calculate a feed feedward model #
X |
For "dNNmodel", X is a design matrix of dimension n * p. |
model |
a model return from dNNmodel function. |
'cache' is the cache of each input layer, will be used in the bwdNN function.
The function fwdNN return a list containing at least the following components:
cache |
a list contains the values of each output layer after activation function transformation and adding the intercept term (i.e. the bias term). The intercept does not add to the output layer in the cache. |
Bingshu E. Chen ([email protected])
bwdNN
,
plot.dNNmodel
,
print.dNNmodel
,
summary.dNNmodel
,
### define a dnn model, calculate the feed forward network model = dNNmodel(units = c(8, 6, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) ### feed forward with a dummy x matrix x = matrix(runif(15), nrow = 5, ncol = 3) cache = fwdNN(x, model)
### define a dnn model, calculate the feed forward network model = dNNmodel(units = c(8, 6, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) ### feed forward with a dummy x matrix x = matrix(runif(15), nrow = 5, ncol = 3) cache = fwdNN(x, model)
{ hyperTuning} is a tuning tool to find the optimal hyper parameter for the ANN model.
hyperTuning(x, y, model, ER = c("cindex", "mse"), method = c('BuckleyJames', 'ipcw', 'transform', 'deepSurv'), lower = NULL, upper = NULL, node = FALSE, K = 5, R = 25) ### additional function used in hyperTuning is cross-validation prediction error # # CVpredErr(x, y, model, control, method) #
hyperTuning(x, y, model, ER = c("cindex", "mse"), method = c('BuckleyJames', 'ipcw', 'transform', 'deepSurv'), lower = NULL, upper = NULL, node = FALSE, K = 5, R = 25) ### additional function used in hyperTuning is cross-validation prediction error # # CVpredErr(x, y, model, control, method) #
x |
Covariates for the deep neural network model |
y |
Surv object for the deep neural network model |
model |
A deep neural network model, created by function dNNmodel(). |
ER |
Prediction error measurement to be used in the cross vaditation, can be either a concordance index (cindex) or a mean square error (mse), default is cindex |
method |
Methods to handle censoring data in deep AFT model fit, 'BuckleyJames' for the Buckley and James method, 'ipcw' for the inverse probability censoring weights method. 'transform' for the transformation method based on book of Fan and Gijbels (1996, page 168). 'deepSurv' for the deepSurv model(Katzman, 2017) |
node |
Tuning the number of nodes in each hidden layer, default is FALSE |
K |
Number of folders of the cross-validatin, default is K = 5. |
lower , upper
|
Bounds on the hyper parameters for the deep learning method. If NULL, then the default value for lower = dnnControl(alpha = 0.5, lambda = 1.0, lr_rate = 0.0001), upper = dnnControl(alpha = 0.97, lambda = 10, lr_rate = 0.001). |
R |
Number of random sample draw from the hyper parameter space, default is R = 25. |
A random search method is used to optimal hyper parameter (Bergstra and Bengio, 2012). The function { CVpredErr} will be call to calculate the cross-validation prediction error for the given x and y with the specified method from the input argument.
A list of "model" and "dnnControl" is returned. The list contains at least the following components,
model |
The "model" contains the optimal number of nodes for each hidden layer in the model specified by |
control |
The "control" contains the optimal tuning parameters with list components the same as those created by |
Chen, B. E. ([email protected])
Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. The Journal of Machine Learning Research. 13, page 281-305.
deepAFT
, deepGLM
, deepSurv
, dnnFit
### Tuning the hyper parameter for a deepAFT model: #### cross-validation take a long time to run. set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(45), nrow = 15, ncol = 3) time = exp(x[, 1]) status = rbinom(15, 1, 0.5) y = Surv(time, status) ctl = dnnControl(epochs = 30) hyperTuning(x, y, model, method = "BuckleyJames", K = 2, R = 2, lower = ctl)
### Tuning the hyper parameter for a deepAFT model: #### cross-validation take a long time to run. set.seed(101) ### define model layers model = dNNmodel(units = c(4, 3, 1), activation = c("elu", "sigmoid", "sigmoid"), input_shape = 3) x = matrix(runif(45), nrow = 15, ncol = 3) time = exp(x[, 1]) status = rbinom(15, 1, 0.5) y = Surv(time, status) ctl = dnnControl(epochs = 30) hyperTuning(x, y, model, method = "BuckleyJames", K = 2, R = 2, lower = ctl)
The function ibs is used to calculate integrated Brier Score for deepAFT.
ibs(object, ...) ### To calculate Brier score for the original fitted data ## Default S3 method: ibs(object, ...) ### To calculate Brier score for new data with new outcomes ## S3 method for class 'deepAFT' ibs(object, newdata=NULL, newy = NULL, ...)
ibs(object, ...) ### To calculate Brier score for the original fitted data ## Default S3 method: ibs(object, ...) ### To calculate Brier score for new data with new outcomes ## S3 method for class 'deepAFT' ibs(object, newdata=NULL, newy = NULL, ...)
object |
the results of a deepAFT fit. |
newdata |
optional argument, if no null, new data and new y will be used for calculation. |
newy |
optional argument, used together with new data. |
... |
other unused arguments. |
ibs is called to calculate integrate Brier score for the deepAFT model deepAFT
.
A list contains the integrate Brier score and the Brier score is returned:
ibs |
Integerate Brier score |
bs |
Brier score |
Bingshu E. Chen
Compute Mean Square Error (mse) values for a survival object
## S3 method for class 'deepAFT' mseIPCW(object, newdata, newy)
## S3 method for class 'deepAFT' mseIPCW(object, newdata, newy)
object |
the results of a model fit using a deepAFT or a survreg function. |
newdata |
optional new data at which to do predictions. If absent, predictions are for the dataframe used in the original fit. |
newy |
optional new outcome variable y. |
predict is called to predict object from a deepAFT deepAFT
or a survreg model.
IPCW method is used to calcuate the mean square error for censored survival time.
mseIPCW returns the mse for the predicted survival data.
Bingshu E. Chen
The default method for predict predict
,
deepAFT
, survfit.dSurv
Different type of optimizer functions such as SGD, Momentum, AdamG and NAG.
optimizerMomentum(V, dW, W, alpha = 0.63, lr = 1e-4, lambda = 1)
optimizerMomentum(V, dW, W, alpha = 0.63, lr = 1e-4, lambda = 1)
V |
Momentum V = alpha*V - lr*(dW + lambda*W); W = W + V. NAG V = alpha*(V - lr*(dW + lambda*W); W = W + V - lr*(dW + lambda*W) |
dW |
derivative of cost with respect to W, can be founde by dW = bwdNN2(dy, cache, model), |
W |
weights for DNN model, optimizerd by W = W + V |
alpha |
Momentum rate 0 < alpha < 1, default is alpah = 0.5. |
lr |
learning rate, default is lr = 0.001. |
lambda |
regulation rate for cost + 0.5*lambda*||W||, default is lambda = 1.0. |
For SGD with momentum, use
V = 0; obj = optimizerMomentum(V, dW, W); V = obj$V; W = obj$W
For SDG with MAG
V = 0; obj = optimizerNAG(V, dW, W); V = obj$V; W = obj$W
return and updated W and other parameters such as V, V1 and V2 that will be used on SGD.
Bingshu E. Chen
activation
,
bwdNN
,
fwdNN
,
dNNmodel
,
dnnFit
Plot function for plotting of R objects in the dnn package.
Several different type of plots can be produced for the deep learning mdels. Plot method is used to provide a summary of outputs from "deepAFT", "deepGLM", "deepSurv" and "dnn".
Use "methods(plot)" and the documentation for these for other plot methods.
## S3 method for class 'dNNmodel' plot(x, ...) ## S3 method for class 'deepAFT' plot(x, type = c("predicted", "residuals", "baselineKM"), ...)
## S3 method for class 'dNNmodel' plot(x, ...) ## S3 method for class 'deepAFT' plot(x, type = c("predicted", "residuals", "baselineKM"), ...)
x |
a class of "dNNmodel". |
type |
type of plot in deepAFT object, "predicted" to plot the linear predicted values, "residuals" to plot residuals, "baselineKM" to plot baseline Kaplan-Meier survival curve. |
... |
other options used in plot(). |
plot.deepAFT is called to plot the fitted deep learning AFT model.
plot.dNNmodel is called to plot fitted dnn model
The default method, plot.default has its own help page. Use methods("plot") to get all the methods for the plot generic.
No return value, called to plot a figure.
Bingshu E. Chen
The default method for plot plot.default
.
glm
Compute predicted values for a deepAFT object
## S3 method for class 'deepAFT' ## S3 method for class 'dSurv' predict(object, newdata, newy=NULL, ...)
## S3 method for class 'deepAFT' ## S3 method for class 'dSurv' predict(object, newdata, newy=NULL, ...)
object |
the results of a model fit using the deepAFT function. |
newdata |
optional new data at which to do predictions. If absent, predictions are for the dataframe used in the original fit. |
newy |
optional new outcome variable y. |
... |
other options used in predict(). |
predict.dSurv is called to predict object from the deepAFT or deepSurv model deepAFT
.
The default method, predict has its own help page. Use methods("predict") to get all the methods for the predict generic.
predict.dSurv returns a list of predicted values, prediction error and residuals.
lp |
linear predictor of beta(w)*Z, where beta(w) is the fitted regression coefficient and Z is covariance matrix. |
risk |
risk score, exp(lp). When new y is provided, both lp and risk will be ordered by survival time of the new y. |
cumhaz |
cumulative hzard function. |
time |
time for cumulative hazard function. Time from new y will be used is provided |
Bingshu E. Chen
The default method for predict predict
,
deepAFT
, survfit.dSurv
print is used to provide a short summary of outputs from deepAFT
, deepSurv
, deepGLM
, and dNNmodel
.
## S3 method for class 'deepAFT' print(x, ...) ## S3 method for class 'summary.deepAFT' print(x, ...) ## S3 method for class 'deepAFT' summary(object, ...) ## S3 method for class 'dNNmodel' print(x, ...) ## S3 method for class 'dNNmodel' summary(object, ...)
## S3 method for class 'deepAFT' print(x, ...) ## S3 method for class 'summary.deepAFT' print(x, ...) ## S3 method for class 'deepAFT' summary(object, ...) ## S3 method for class 'dNNmodel' print(x, ...) ## S3 method for class 'dNNmodel' summary(object, ...)
x |
a class returned from deepAFT, deepSurv, deepGLM model fit or a dNNmodel |
object |
a class of deepAFT object |
... |
other options used in print() |
print.deepAFT is called to print object or summary of object from the deep learning AFT models deepAFT
.
summary(fit) provides detail summary of ‘deepAFT’ model fit, including predictors,
baseline survival function for T0=T/exp(mu), and martingale residuals for the fitted model.
print.dNNmodel is called to print object or summary of object from the dNNmodel.
The default method, print.default has its own help page. Use methods("print") to get all the methods for the print generic.
An object of class "summary.deepAFT" is returned. The object contains the following list components:
location |
location parameter exp(mu), to predice the mean value of survival time. |
sfit |
survfit object of the baselie survival function of T0=T/exp(mu). |
cindex |
Concordance index of the fitted deepAFT model. |
resid |
martingle residuals of the fitted deepAFT model. |
method |
the model used to fit the deepAFT model. |
Bingshu E. Chen
The default method for print print.default
. Other methods include
survreg
,
deepAFT
,
summary
Calculates martingale, deviance or Cox-Snell residuals for a previously fitted (deepAFT) model.
## S3 method for class 'deepAFT' ## S3 method for class 'dSurv' residuals(object, type = c("martingale", "deviance", "coxSnell"), ...)
## S3 method for class 'deepAFT' ## S3 method for class 'dSurv' residuals(object, type = c("martingale", "deviance", "coxSnell"), ...)
object |
the results of a (deepAFT) fit. |
type |
character string indicating the type of residual desired. Possible values are "martingale", "deviance". Only enough of the string to determine a unique match is required. |
... |
other unused arguments. |
residuals.deepAFT is called to compute baseline survival function S_T0(t) from the deepAFT model deepAFT
, where T0 = T/exp(mu), or log(T) = log(T) - mu.
The default method, residuals has its own help page. Use methods("residuals") to get all the methods for the residuals generic.
For martingale and deviance residuals, the returned object is a vector with one element for each subject. The row order will match the input data for the original fit.
See residuals
for more detail about other output values.
For deviance residuals, the status variable may need to be reconstructed.
Bingshu E. Chen
The default method for residuals residuals
,
predict.dSurv
, survfit.dSurv
, and deepAFT
.
Density, distribution function, quantile function and random variable generation for a survival distribution with a provided hazard function or cumulative hazard function
dsurv(x, h0 = NULL, H0 = function(x){x}, log=FALSE) psurv(q, h0 = NULL, H0 = function(x){x}, low.tail=TRUE, log.p=FALSE) qsurv(p, h0 = NULL, H0 = function(x){x}, low.tail=TRUE) rsurv(n, h0 = NULL, H0 = function(x){x}) rcoxph(n, h0 = NULL, H0 = function(x){x}, lp = 0)
dsurv(x, h0 = NULL, H0 = function(x){x}, log=FALSE) psurv(q, h0 = NULL, H0 = function(x){x}, low.tail=TRUE, log.p=FALSE) qsurv(p, h0 = NULL, H0 = function(x){x}, low.tail=TRUE) rsurv(n, h0 = NULL, H0 = function(x){x}) rcoxph(n, h0 = NULL, H0 = function(x){x}, lp = 0)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. |
h0 |
hazard function, default is h0 = NULL. |
H0 |
cumulative hazard function, default is H0(x) = x. |
lp |
linear predictor for rcoxph, H(x) = H0(x)exp(lp). |
log , log.p
|
logical; if TRUE, probabilities p are give as log(p). |
low.tail |
logical; if TRUE, probabilities are P[X < or = x] otherwise, S(x) = P[X>x]. |
If { h0 } or { H0 } are not specified, they assume the default values of h0(x) = 1 and H0(x) = x, respectively.
The survival distribution function is given by,
S(x) = exp(-H0(x)),
where H0(x) is the cumulative hazard function. Only one of h0 or H0 can be specified, if h0 is given, then H0(x) = integrate(h0, 0, x, subdivisions = 500L)
To generate Cox PH survival time, use
u = exp(-H(t)*exp(lp))
then, -log(u)*exp(-lp) = H(t). Find t such that H(t) = -log(u)exp(-lp).
{ dsurv } gives the density h(x)/S(x), { psurv } gives the distribution function, { qsurv } gives the quantile function, { rsurv } generates random survival time, and { rcoxph } generates random survival time with Cox proportional hazards model.
The length of the result is determined by n for rsurv and rcoxph.
Bingshu E. Chen
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, volume 1. Wiley, New York.
Distributions
for other standard distributions, including dweibull
for the Weibull distribution.
#### use qsurv to generate quantiles for weibull distribution H1 = function(x) x^3 qsurv(seq(0.1, 0.9, 0.2), H0 = H1) ### shall be the same as qweibull(seq(0.1, 0.9, 0.2), 3) #### to get random survival time from the cumulative hazard function H1(t) rsurv(15, H0 = H1)
#### use qsurv to generate quantiles for weibull distribution H1 = function(x) x^3 qsurv(seq(0.1, 0.9, 0.2), H0 = H1) ### shall be the same as qweibull(seq(0.1, 0.9, 0.2), 3) #### to get random survival time from the cumulative hazard function H1(t) rsurv(15, H0 = H1)
Computes the predicted survival function of a previously fitted deepAFT or deepSurv model.
## S3 method for class 'deepAFT' or 'deepSurv' ## S3 method for class 'dSurv' survfit(formula, se.fit=TRUE, conf.int=.95, ...)
## S3 method for class 'deepAFT' or 'deepSurv' ## S3 method for class 'dSurv' survfit(formula, se.fit=TRUE, conf.int=.95, ...)
formula |
a deepAFT or deepSurv fit object. |
se.fit |
a logical value indicating whether standard errors shall be computed. Default is TRUE |
conf.int |
the level for a two-sided confidence interval on the survival curve. Default is 0.95 |
... |
other unused arguments. |
survfit.dSurv is called to compuate baseline survival function S_T0(t) from the deepAFT model deepAFT
, where T0 = T/exp(mu), or log(T) = log(T) - mu.
For the deepSurv model deepAFT
, survfit.dSurv evaluates the Nelson-Aalen estimate of the baseline survival function.
The default method, survfit has its own help page. Use methods("survfit") to get all the methods for the survfit generic.
survfit.deepAFT returns a list of predicted baseline survival function, cumulative hazard function and residuals.
surv |
Predicted baseline survival function for T0=T/exp(mu). |
cumhaz |
Baseline cumulative hazard function, -log(surv). |
hazard |
Baseline hazard function. |
varhaz |
Variance of the baseline hazard. |
residuals |
Martingale residuals of the (deepAFT) model. |
std.err |
Standard error for the cumulative hazard function, if se.fit = TRUE. |
See survfit
for more detail about other output values such as upper, lower, conf.type.
Confidence interval is based on log-transformation of survival function.
Bingshu E. Chen
The default method for survfit survfit
,
predict.dSurv