Title: | Stochastic Limited Memory Quasi-Newton Optimizers |
---|---|
Description: | Implementations of stochastic, limited-memory quasi-Newton optimizers, similar in spirit to the LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno) algorithm, for smooth stochastic optimization. Implements the following methods: oLBFGS (online LBFGS) (Schraudolph, N.N., Yu, J. and Guenter, S., 2007 <http://proceedings.mlr.press/v2/schraudolph07a.html>), SQN (stochastic quasi-Newton) (Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016 <arXiv:1401.7020>), adaQN (adaptive quasi-Newton) (Keskar, N.S., Berahas, A.S., 2016, <arXiv:1511.01169>). Provides functions for easily creating R objects with partial_fit/predict methods from some given objective/gradient/predict functions. Includes an example stochastic logistic regression using these optimizers. Provides header files and registered C routines for using it directly from C/C++. |
Authors: | David Cortes |
Maintainer: | David Cortes <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 0.1.2-1 |
Built: | 2024-12-16 06:51:54 UTC |
Source: | CRAN |
Optimizes an empirical (possibly non-convex) loss function over batches of sample data.
adaQN(x0, grad_fun, obj_fun = NULL, pred_fun = NULL, initial_step = 0.01, step_fun = function(iter) 1/sqrt((iter/100) + 1), callback_iter = NULL, args_cb = NULL, verbose = TRUE, mem_size = 10, fisher_size = 100, bfgs_upd_freq = 20, max_incr = 1.01, min_curvature = 1e-04, y_reg = NULL, scal_reg = 1e-04, rmsprop_weight = 0.9, use_grad_diff = FALSE, check_nan = TRUE, nthreads = -1, X_val = NULL, y_val = NULL, w_val = NULL)
adaQN(x0, grad_fun, obj_fun = NULL, pred_fun = NULL, initial_step = 0.01, step_fun = function(iter) 1/sqrt((iter/100) + 1), callback_iter = NULL, args_cb = NULL, verbose = TRUE, mem_size = 10, fisher_size = 100, bfgs_upd_freq = 20, max_incr = 1.01, min_curvature = 1e-04, y_reg = NULL, scal_reg = 1e-04, rmsprop_weight = 0.9, use_grad_diff = FALSE, check_nan = TRUE, nthreads = -1, X_val = NULL, y_val = NULL, w_val = NULL)
x0 |
Initial values for the variables to optimize. |
grad_fun |
Function taking as unnamed arguments 'x_curr' (variable values), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the gradient when evalauted on that data. |
obj_fun |
Function taking as unnamed arguments 'x_curr' (variable values), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the objective function when evalauted on that data. Only required when using 'max_incr'. |
pred_fun |
Function taking an unnamed argument as data, another unnamed argument as the variable values, and optional extra arguments ('...'). Will be called when using 'predict' on the object returned by this function. |
initial_step |
Initial step size. |
step_fun |
Function accepting the iteration number as an unnamed parameter, which will output the number by which 'initial_step' will be multiplied at each iteration to get the step size for that iteration. |
callback_iter |
Callback function which will be called at the end of each iteration. Will pass three unnamed arguments: the current variable values, the current iteration number, and 'args_cb'. Pass 'NULL' if there is no need to call a callback function. |
args_cb |
Extra argument to pass to the callback function. |
verbose |
Whether to print information about iteration statuses when something goes wrong. |
mem_size |
Number of correction pairs to store for approximation of Hessian-vector products. |
fisher_size |
Number of gradients to store for calculation of the empirical Fisher product with gradients. If passing 'NULL', will force 'use_grad_diff' to 'TRUE'. |
bfgs_upd_freq |
Number of iterations (batches) after which to generate a BFGS correction pair. |
max_incr |
Maximum ratio of function values in the validation set under the average values of 'x' during current epoch vs. previous epoch. If the ratio is above this threshold, the BFGS and Fisher memories will be reset, and 'x' values reverted to their previous average. If not using a validation set, will take a longer batch for function evaluations (same as used for gradients when using 'use_grad_diff' = 'TRUE'). Pass 'NULL' for no function-increase checking. |
min_curvature |
Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum. |
y_reg |
Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization. |
scal_reg |
Regularization parameter to use in the denominator for AdaGrad and RMSProp scaling. |
rmsprop_weight |
If not 'NULL', will use RMSProp formula instead of AdaGrad for approximated inverse-Hessian initialization. |
use_grad_diff |
Whether to create the correction pairs using differences between gradients instead of empirical Fisher matrix. These gradients are calculated on a larger batch than the regular ones (given by batch_size * bfgs_upd_freq). |
check_nan |
Whether to check for variables becoming NaN after each iteration, and reverting the step if they do (will also reset BFGS memory). |
nthreads |
Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads. |
X_val |
Covariates to use as validation set (only used when passing 'max_incr'). If not passed, will use a larger batch of stored data, in the same way as for Hessian-vector products in SQN. |
y_val |
Target variable for the covariates to use as validation set (only used when passing 'max_incr'). If not passed, will use a larger batch of stored data, in the same way as for Hessian-vector products in SQN. |
w_val |
Sample weights for the covariates to use as validation set (only used when passing 'max_incr'). If not passed, will use a larger batch of stored data, in the same way as for Hessian-vector products in SQN. |
an 'adaQN' object with the user-supplied functions, which can be fit to batches of data through function 'partial_fit', and can produce predictions on new data through function 'predict'.
Keskar, N.S. and Berahas, A.S., 2016, September. "adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs." In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 1-16). Springer, Cham.
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.
partial_fit , predict.stochQN_guided , adaQN_free
### Example regression with randomly-generated data library(stochQN) ### Will sample data y ~ Ax + epsilon true_coefs <- c(1.12, 5.34, -6.123) generate_data_batch <- function(true_coefs, n = 100) { X <- matrix( rnorm(length(true_coefs) * n), nrow=n, ncol=length(true_coefs)) y <- X %*% true_coefs + rnorm(n) return(list(X = X, y = y)) } ### Regular regression function that minimizes RMSE eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- as.numeric(X %*% coefs) RMSE <- sqrt(mean((pred - y)^2)) reg <- 2 * lambda * as.numeric(coefs %*% coefs) return(RMSE + reg) } eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- X %*% coefs grad <- colMeans(X * as.numeric(pred - y)) grad <- grad + 2 * lambda * as.numeric(coefs^2) return(grad) } pred_fun <- function(X, coefs, ...) { return(as.numeric(X %*% coefs)) } ### Initialize optimizer form arbitrary values x0 <- c(1, 1, 1) optimizer <- adaQN(x0, grad_fun=eval_grad, pred_fun=pred_fun, obj_fun=eval_fun, initial_step=1e-0) val_data <- generate_data_batch(true_coefs, n=100) ### Fit to 50 batches of data, 100 observations each for (i in 1:50) { set.seed(i) new_batch <- generate_data_batch(true_coefs, n=100) partial_fit( optimizer, new_batch$X, new_batch$y, lambda=1e-5) x_curr <- get_curr_x(optimizer) i_curr <- get_iteration_number(optimizer) if ((i_curr %% 10) == 0) { cat(sprintf( "Iteration %d - E[f(x)]: %f - values of x: [%f, %f, %f]\n", i_curr, eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5), x_curr[1], x_curr[2], x_curr[3])) } } ### Predict for new data new_batch <- generate_data_batch(true_coefs, n=10) yhat <- predict(optimizer, new_batch$X)
### Example regression with randomly-generated data library(stochQN) ### Will sample data y ~ Ax + epsilon true_coefs <- c(1.12, 5.34, -6.123) generate_data_batch <- function(true_coefs, n = 100) { X <- matrix( rnorm(length(true_coefs) * n), nrow=n, ncol=length(true_coefs)) y <- X %*% true_coefs + rnorm(n) return(list(X = X, y = y)) } ### Regular regression function that minimizes RMSE eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- as.numeric(X %*% coefs) RMSE <- sqrt(mean((pred - y)^2)) reg <- 2 * lambda * as.numeric(coefs %*% coefs) return(RMSE + reg) } eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- X %*% coefs grad <- colMeans(X * as.numeric(pred - y)) grad <- grad + 2 * lambda * as.numeric(coefs^2) return(grad) } pred_fun <- function(X, coefs, ...) { return(as.numeric(X %*% coefs)) } ### Initialize optimizer form arbitrary values x0 <- c(1, 1, 1) optimizer <- adaQN(x0, grad_fun=eval_grad, pred_fun=pred_fun, obj_fun=eval_fun, initial_step=1e-0) val_data <- generate_data_batch(true_coefs, n=100) ### Fit to 50 batches of data, 100 observations each for (i in 1:50) { set.seed(i) new_batch <- generate_data_batch(true_coefs, n=100) partial_fit( optimizer, new_batch$X, new_batch$y, lambda=1e-5) x_curr <- get_curr_x(optimizer) i_curr <- get_iteration_number(optimizer) if ((i_curr %% 10) == 0) { cat(sprintf( "Iteration %d - E[f(x)]: %f - values of x: [%f, %f, %f]\n", i_curr, eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5), x_curr[1], x_curr[2], x_curr[3])) } } ### Predict for new data new_batch <- generate_data_batch(true_coefs, n=10) yhat <- predict(optimizer, new_batch$X)
Optimizes an empirical (perhaps non-convex) loss function over batches of sample data. Compared to function/class 'adaQN', this version lets the user do all the calculations from the outside, only interacting with the object by means of a function that returns a request type and is fed the required calculation through methods 'update_gradient' and 'update_function'.
Order in which requests are made:
========== loop ===========
* calc_grad
... (repeat calc_grad)
if max_incr > 0:
* calc_fun_val_batch
if 'use_grad_diff':
* calc_grad_big_batch (skipped if below max_incr)
===========================
After running this function, apply 'run_adaQN_free' to it to get the first requested piece of information.
adaQN_free(mem_size = 10, fisher_size = 100, bfgs_upd_freq = 20, max_incr = 1.01, min_curvature = 1e-04, scal_reg = 1e-04, rmsprop_weight = 0.9, y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE, nthreads = -1)
adaQN_free(mem_size = 10, fisher_size = 100, bfgs_upd_freq = 20, max_incr = 1.01, min_curvature = 1e-04, scal_reg = 1e-04, rmsprop_weight = 0.9, y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE, nthreads = -1)
mem_size |
Number of correction pairs to store for approximation of Hessian-vector products. |
fisher_size |
Number of gradients to store for calculation of the empirical Fisher product with gradients. If passing 'NULL', will force 'use_grad_diff' to 'TRUE'. |
bfgs_upd_freq |
Number of iterations (batches) after which to generate a BFGS correction pair. |
max_incr |
Maximum ratio of function values in the validation set under the average values of 'x' during current epoch vs. previous epoch. If the ratio is above this threshold, the BFGS and Fisher memories will be reset, and 'x' values reverted to their previous average. Pass 'NULL' for no function-increase checking. |
min_curvature |
Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum. |
scal_reg |
Regularization parameter to use in the denominator for AdaGrad and RMSProp scaling. |
rmsprop_weight |
If not 'NULL', will use RMSProp formula instead of AdaGrad for approximated inverse-Hessian initialization. |
y_reg |
Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization. |
use_grad_diff |
Whether to create the correction pairs using differences between gradients instead of Fisher matrix. These gradients are calculated on a larger batch than the regular ones (given by batch_size * bfgs_upd_freq). If 'TRUE', empirical Fisher matrix will not be used. |
check_nan |
Whether to check for variables becoming NaN after each iteration, and reverting the step if they do (will also reset BFGS and Fisher memory). |
nthreads |
Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads. |
An 'adaQN_free' object, which can be used through functions 'update_gradient', 'update_fun', and 'run_adaQN_free'
Keskar, N.S. and Berahas, A.S., 2016, September. "adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs." In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 1-16). Springer, Cham.
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.
update_gradient , update_fun , run_adaQN_free
### Example optimizing Rosenbrock 2D function ### Note that this example is not stochastic, as the ### function is not evaluated in expectation based on ### batches of data, but rather it has a given absolute ### form that never varies. library(stochQN) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } ### Initial values of x x_opt = as.numeric(c(0, 2)) cat(sprintf("Initial values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2])) ### Will use constant step size throughout ### (not recommended) step_size = 1e-2 ### Initialize the optimizer optimizer = adaQN_free() ### Keep track of the iteration number curr_iter <- 0 ### Run a loop for many iterations ### (Note that some iterations might require more ### than 1 calculation request) for (i in 1:200) { req <- run_adaQN_free(optimizer, x_opt, step_size) if (req$task == "calc_grad") { update_gradient(optimizer, grr(req$requested_on)) } else if (req$task == "calc_fun_val_batch") { update_fun(optimizer, fr(req$requested_on)) } ### Track progress every 10 iterations if (req$info$iteration_number > curr_iter) { curr_iter <- req$info$iteration_number } if ((curr_iter %% 10) == 0) { cat(sprintf( "Iteration %3d - Current function value: %.3f\n", req$info$iteration_number, fr(x_opt))) } } cat(sprintf("Current values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2]))
### Example optimizing Rosenbrock 2D function ### Note that this example is not stochastic, as the ### function is not evaluated in expectation based on ### batches of data, but rather it has a given absolute ### form that never varies. library(stochQN) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } ### Initial values of x x_opt = as.numeric(c(0, 2)) cat(sprintf("Initial values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2])) ### Will use constant step size throughout ### (not recommended) step_size = 1e-2 ### Initialize the optimizer optimizer = adaQN_free() ### Keep track of the iteration number curr_iter <- 0 ### Run a loop for many iterations ### (Note that some iterations might require more ### than 1 calculation request) for (i in 1:200) { req <- run_adaQN_free(optimizer, x_opt, step_size) if (req$task == "calc_grad") { update_gradient(optimizer, grr(req$requested_on)) } else if (req$task == "calc_fun_val_batch") { update_fun(optimizer, fr(req$requested_on)) } ### Track progress every 10 iterations if (req$info$iteration_number > curr_iter) { curr_iter <- req$info$iteration_number } if ((curr_iter %% 10) == 0) { cat(sprintf( "Iteration %3d - Current function value: %.3f\n", req$info$iteration_number, fr(x_opt))) } } cat(sprintf("Current values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2]))
Retrieve fitted coefficients from stochastic logistic regression object
## S3 method for class 'stoch_logistic' coef(object, ...)
## S3 method for class 'stoch_logistic' coef(object, ...)
object |
A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'. Must have already been fit to at least 1 batch of data. |
... |
Not used. |
An (n x 1) matrix with the coefficients, in the same format as those from 'glm'.
stochastic.logistic.regression
Get current values of the optimization variables
get_curr_x(optimizer)
get_curr_x(optimizer)
optimizer |
An optimizer (guided-mode) from this module, as output by functions 'oLBFGS', 'SQN', 'adaQN'. |
A numeric vector with the current values of the variables being optimized.
Get current iteration number from the optimizer object
get_iteration_number(optimizer)
get_iteration_number(optimizer)
optimizer |
An optimizer (guided-mode) from this module, as output by functions 'oLBFGS', 'SQN', 'adaQN'. |
The current iteration number.
Optimizes an empirical (convex) loss function over batches of sample data.
oLBFGS(x0, grad_fun, pred_fun = NULL, initial_step = 0.01, step_fun = function(iter) 1/sqrt((iter/10) + 1), callback_iter = NULL, args_cb = NULL, verbose = TRUE, mem_size = 10, hess_init = NULL, min_curvature = 1e-04, y_reg = NULL, check_nan = TRUE, nthreads = -1)
oLBFGS(x0, grad_fun, pred_fun = NULL, initial_step = 0.01, step_fun = function(iter) 1/sqrt((iter/10) + 1), callback_iter = NULL, args_cb = NULL, verbose = TRUE, mem_size = 10, hess_init = NULL, min_curvature = 1e-04, y_reg = NULL, check_nan = TRUE, nthreads = -1)
x0 |
Initial values for the variables to optimize. |
grad_fun |
Function taking as unnamed arguments 'x_curr' (variable values), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the gradient when evalauted on that data. |
pred_fun |
Function taking an unnamed argument as data, another unnamed argument as the variable values, and optional extra arguments ('...'). Will be called when using 'predict' on the object returned by this function. |
initial_step |
Initial step size. |
step_fun |
Function accepting the iteration number as an unnamed parameter, which will output the number by which 'initial_step' will be multiplied at each iteration to get the step size for that iteration. |
callback_iter |
Callback function which will be called at the end of each iteration. Will pass three unnamed arguments: the current variable values, the current iteration number, and 'args_cb'. Pass 'NULL' if there is no need to call a callback function. |
args_cb |
Extra argument to pass to the callback function. |
verbose |
Whether to print information about iteration statuses when something goes wrong. |
mem_size |
Number of correction pairs to store for approximation of Hessian-vector products. |
hess_init |
Value to which to initialize the diagonal of H0. If passing 'NULL', will use the same initializion as for SQN ((s_last * y_last) / (y_last * y_last)). |
min_curvature |
Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum. |
y_reg |
Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization. |
check_nan |
Whether to check for variables becoming NA after each iteration, and reverting the step if they do (will also reset BFGS memory). |
nthreads |
Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads. |
an 'oLBFGS' object with the user-supplied functions, which can be fit to batches of data through function 'partial_fit', and can produce predictions on new data through function 'predict'.
Schraudolph, N.N., Yu, J. and Guenter, S., 2007, March. "A stochastic quasi-Newton method for online convex optimization." In Artificial Intelligence and Statistics (pp. 436-443).
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.
partial_fit , predict.stochQN_guided , oLBFGS_free
### Example regression with randomly-generated data library(stochQN) ### Will sample data y ~ Ax + epsilon true_coefs <- c(1.12, 5.34, -6.123) generate_data_batch <- function(true_coefs, n = 100) { X <- matrix( rnorm(length(true_coefs) * n), nrow=n, ncol=length(true_coefs)) y <- X %*% true_coefs + rnorm(n) return(list(X = X, y = y)) } ### Regular regression function that minimizes RMSE eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- as.numeric(X %*% coefs) RMSE <- sqrt(mean((pred - y)^2)) reg <- lambda * as.numeric(coefs %*% coefs) return(RMSE + reg) } eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- X %*% coefs grad <- colMeans(X * as.numeric(pred - y)) grad <- grad + 2 * lambda * as.numeric(coefs^2) return(grad) } pred_fun <- function(X, coefs, ...) { return(as.numeric(X %*% coefs)) } ### Initialize optimizer form arbitrary values x0 <- c(1, 1, 1) optimizer <- oLBFGS(x0, grad_fun=eval_grad, pred_fun=pred_fun, initial_step=1e-1) val_data <- generate_data_batch(true_coefs, n=100) ### Fit to 50 batches of data, 100 observations each set.seed(1) for (i in 1:50) { new_batch <- generate_data_batch(true_coefs, n=100) partial_fit( optimizer, new_batch$X, new_batch$y, lambda=1e-5) x_curr <- get_curr_x(optimizer) i_curr <- get_iteration_number(optimizer) if ((i_curr %% 10) == 0) { cat(sprintf( "Iteration %d - E[f(x)]: %f - values of x: [%f, %f, %f]\n", i_curr, eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5), x_curr[1], x_curr[2], x_curr[3])) } } ### Predict for new data new_batch <- generate_data_batch(true_coefs, n=10) yhat <- predict(optimizer, new_batch$X)
### Example regression with randomly-generated data library(stochQN) ### Will sample data y ~ Ax + epsilon true_coefs <- c(1.12, 5.34, -6.123) generate_data_batch <- function(true_coefs, n = 100) { X <- matrix( rnorm(length(true_coefs) * n), nrow=n, ncol=length(true_coefs)) y <- X %*% true_coefs + rnorm(n) return(list(X = X, y = y)) } ### Regular regression function that minimizes RMSE eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- as.numeric(X %*% coefs) RMSE <- sqrt(mean((pred - y)^2)) reg <- lambda * as.numeric(coefs %*% coefs) return(RMSE + reg) } eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- X %*% coefs grad <- colMeans(X * as.numeric(pred - y)) grad <- grad + 2 * lambda * as.numeric(coefs^2) return(grad) } pred_fun <- function(X, coefs, ...) { return(as.numeric(X %*% coefs)) } ### Initialize optimizer form arbitrary values x0 <- c(1, 1, 1) optimizer <- oLBFGS(x0, grad_fun=eval_grad, pred_fun=pred_fun, initial_step=1e-1) val_data <- generate_data_batch(true_coefs, n=100) ### Fit to 50 batches of data, 100 observations each set.seed(1) for (i in 1:50) { new_batch <- generate_data_batch(true_coefs, n=100) partial_fit( optimizer, new_batch$X, new_batch$y, lambda=1e-5) x_curr <- get_curr_x(optimizer) i_curr <- get_iteration_number(optimizer) if ((i_curr %% 10) == 0) { cat(sprintf( "Iteration %d - E[f(x)]: %f - values of x: [%f, %f, %f]\n", i_curr, eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5), x_curr[1], x_curr[2], x_curr[3])) } } ### Predict for new data new_batch <- generate_data_batch(true_coefs, n=10) yhat <- predict(optimizer, new_batch$X)
Optimizes an empirical (convex) loss function over batches of sample data. Compared to function/class 'oLBFGS', this version lets the user do all the calculations from the outside, only interacting with the object by means of a function that returns a request type and is fed the required calculation through a method 'update_gradient'.
Order in which requests are made:
========== loop ===========
* calc_grad
* calc_grad_same_batch (might skip if using check_nan)
===========================
After running this function, apply 'run_oLBFGS_free' to it to get the first requested piece of information.
oLBFGS_free(mem_size = 10, hess_init = NULL, min_curvature = 1e-04, y_reg = NULL, check_nan = TRUE, nthreads = -1)
oLBFGS_free(mem_size = 10, hess_init = NULL, min_curvature = 1e-04, y_reg = NULL, check_nan = TRUE, nthreads = -1)
mem_size |
Number of correction pairs to store for approximation of Hessian-vector products. |
hess_init |
Value to which to initialize the diagonal of H0. If passing 'NULL', will use the same initializion as for SQN ((s_last * y_last) / (y_last * y_last)). |
min_curvature |
Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum. |
y_reg |
Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization. |
check_nan |
Whether to check for variables becoming NA after each iteration, and reverting the step if they do (will also reset BFGS memory). |
nthreads |
Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads. |
An 'oLBFGS_free' object, which can be used through functions 'update_gradient' and 'run_oLBFGS_free'
Schraudolph, N.N., Yu, J. and Guenter, S., 2007, March. "A stochastic quasi-Newton method for online convex optimization." In Artificial Intelligence and Statistics (pp. 436-443).
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.
update_gradient , run_oLBFGS_free
### Example optimizing Rosenbrock 2D function ### Note that this example is not stochastic, as the ### function is not evaluated in expectation based on ### batches of data, but rather it has a given absolute ### form that never varies. ### Warning: this optimizer is meant for convex functions ### (Rosenbrock's is not convex) library(stochQN) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } ### Initial values of x x_opt = as.numeric(c(0, 2)) cat(sprintf("Initial values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2])) ### Will use a constant step size throughout ### (not recommended) step_size <- 1e-1 ### Initialize the optimizer optimizer <- oLBFGS_free() ### Keep track of the iteration number curr_iter <- 0 ### Run a loop for 100 iterations ### (Note that each iteration requires 2 calculations, ### hence the 200) for (i in 1:200) { req <- run_oLBFGS_free(optimizer, x_opt, step_size) if (req$task == "calc_grad") { update_gradient(optimizer, grr(req$requested_on)) } else if (req$task == "calc_grad_same_batch") { update_gradient(optimizer, grr(req$requested_on)) } ### Track progress every 10 iterations if (req$info$iteration_number > curr_iter) { curr_iter <- req$info$iteration_number if ((curr_iter %% 10) == 0) { cat(sprintf( "Iteration %3d - Current function value: %.3f\n", req$info$iteration_number, fr(x_opt) )) } } } cat(sprintf("Current values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2]))
### Example optimizing Rosenbrock 2D function ### Note that this example is not stochastic, as the ### function is not evaluated in expectation based on ### batches of data, but rather it has a given absolute ### form that never varies. ### Warning: this optimizer is meant for convex functions ### (Rosenbrock's is not convex) library(stochQN) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } ### Initial values of x x_opt = as.numeric(c(0, 2)) cat(sprintf("Initial values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2])) ### Will use a constant step size throughout ### (not recommended) step_size <- 1e-1 ### Initialize the optimizer optimizer <- oLBFGS_free() ### Keep track of the iteration number curr_iter <- 0 ### Run a loop for 100 iterations ### (Note that each iteration requires 2 calculations, ### hence the 200) for (i in 1:200) { req <- run_oLBFGS_free(optimizer, x_opt, step_size) if (req$task == "calc_grad") { update_gradient(optimizer, grr(req$requested_on)) } else if (req$task == "calc_grad_same_batch") { update_gradient(optimizer, grr(req$requested_on)) } ### Track progress every 10 iterations if (req$info$iteration_number > curr_iter) { curr_iter <- req$info$iteration_number if ((curr_iter %% 10) == 0) { cat(sprintf( "Iteration %3d - Current function value: %.3f\n", req$info$iteration_number, fr(x_opt) )) } } } cat(sprintf("Current values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2]))
Runs one iteration of the stochastic optimizer on the new data passed here.
partial_fit(optimizer, X, y = NULL, weights = NULL, ...)
partial_fit(optimizer, X, y = NULL, weights = NULL, ...)
optimizer |
A stochastic optimizer from this package as output by functions 'oLBFGS', 'SQN', 'adaQN'. Will be modified in-place. |
X |
Covariates to pass to the user-defined gradient / objective / Hessian-vector functions. |
y |
Target variable to pass to the user-defined gradient / objective / Hessian-vector functions. |
weights |
Target variable to pass to the user-defined gradient / objective / Hessian-vector functions. |
... |
Additional arguments to pass to the user-defined gradient / objective / Hessian-vector functions. |
No return value (object is modified in-place).
Perform a quasi-Newton iteration to update the model with new data.
partial_fit_logistic(logistic_model, X, y = NULL, w = NULL)
partial_fit_logistic(logistic_model, X, y = NULL, w = NULL)
logistic_model |
A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'. Will be modified in-place. |
X |
Data with covariates. If passing a 'data.frame', the model object must have been initialized with a formula, and 'X' must also contain the target variable ('y'). If passing a matrix, must also pass 'y'. Note that whatever factor levels are present in the first batch of data, will be taken as the whole factor levels. |
y |
The target variable, when using matrices. Ignored when using formula. |
w |
Sample weights (optional). If required, must pass them at every partial fit iteration. |
No return value. Model object is updated in-place.
stochastic.logistic.regression
Makes predictions for new data from the fitted model. Model have already been fit to at least 1 batch of data.
## S3 method for class 'stoch_logistic' predict(object, newdata, type = "prob", ...)
## S3 method for class 'stoch_logistic' predict(object, newdata, type = "prob", ...)
object |
A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'. |
newdata |
New data on which to make predictions. |
type |
Type of prediction to make. Can pass 'prob' to get probabilities for the positive class, or 'class' to get the predicted class. |
... |
Not used. |
A vector with the predicted classes or probabilities for 'newdata'.
stochastic.logistic.regression
Calls the user-defined predict function for an object optimized through this package's functions.
## S3 method for class 'stochQN_guided' predict(object, newdata, ...)
## S3 method for class 'stochQN_guided' predict(object, newdata, ...)
object |
Optimizer from this module as output by functions 'oLBFGS', 'SQN', 'adaQN'. Must have been constructed with a predict function. |
newdata |
Data on which to make predictions (will be passed to the user-provided function). |
... |
Additional arguments to pass to the user-provided predict function. |
Print summary info about adaQN guided-mode object
## S3 method for class 'adaQN' print(x, ...)
## S3 method for class 'adaQN' print(x, ...)
x |
An 'adaQN' object as output by function of the same name. |
... |
Not used. |
Print summary info about adaQN free-mode object
## S3 method for class 'adaQN_free' print(x, ...)
## S3 method for class 'adaQN_free' print(x, ...)
x |
An 'adaQN_free' object as output by function of the same name. |
... |
Not used. |
Print summary info about oLBFGS guided-mode object
## S3 method for class 'oLBFGS' print(x, ...)
## S3 method for class 'oLBFGS' print(x, ...)
x |
An 'oLBFGS' object as output by function of the same name. |
... |
Not used. |
Print summary info about oLBFGS free-mode object
## S3 method for class 'oLBFGS_free' print(x, ...)
## S3 method for class 'oLBFGS_free' print(x, ...)
x |
An 'oLBFGS_free' object as output by function of the same name. |
... |
Not used. |
Print summary info about SQN guided-mode object
## S3 method for class 'SQN' print(x, ...)
## S3 method for class 'SQN' print(x, ...)
x |
An 'SQN' object as output by function of the same name. |
... |
Not used. |
Print summary info about SQN free-mode object
## S3 method for class 'SQN_free' print(x, ...)
## S3 method for class 'SQN_free' print(x, ...)
x |
An 'SQN_free' object as output by function of the same name. |
... |
Not used. |
Print general info about stochastic logistic regression object
## S3 method for class 'stoch_logistic' print(x, ...)
## S3 method for class 'stoch_logistic' print(x, ...)
x |
A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'. |
... |
Not used. |
stochastic.logistic.regression
Run the next step of an adaQN optimization procedure, after the last requested calculation has been fed to the optimizer. When run for the first time, there is no request, so the function just needs to be run on the object as it is returned from function 'adaQN_free'.
run_adaQN_free(optimizer, x, step_size)
run_adaQN_free(optimizer, x, step_size)
optimizer |
An 'adaQN_free' optimizer, for which its last request must have been served. Will be updated in-place. |
x |
Current values of the variables being optimized. Must be a numeric vector. Will be updated in-place. |
step_size |
Step size for the quasi-Newton update. |
A request with the next piece of required information. The output will be a list with the following levels:
task Requested task (one of "calc_grad", "calc_fun_val_batch", "calc_grad_big_batch").
requested_on Values of 'x' at which the requested information must be calculated.
info
x_changed_in_run Whether the 'x' vector was updated.
iteration_number Current iteration number (in terms of quasi-Newton updates).
iteration_info Information about potential problems encountered during the iteration.
Run the next step of an oLBFGS optimization procedure, after the last requested calculation has been fed to the optimizer. When run for the first time, there is no request, so the function just needs to be run on the object as it is returned from function 'oLBFGS_free'.
run_oLBFGS_free(optimizer, x, step_size)
run_oLBFGS_free(optimizer, x, step_size)
optimizer |
An 'oLBFGS_free' optimizer, for which its last request must have been served. Will be updated in-place. |
x |
Current values of the variables being optimized. Must be a numeric vector. Will be updated in-place. |
step_size |
Step size for the quasi-Newton update. |
A request with the next piece of required information. The output will be a list with the following levels:
task Requested task (one of "calc_grad" or "calc_grad_same_batch").
requested_on Values of 'x' at which the requested information must be calculated.
info
x_changed_in_run Whether the 'x' vector was updated.
iteration_number Current iteration number (in terms of quasi-Newton updates).
iteration_info Information about potential problems encountered during the iteration.
Run the next step of an SQN optimization procedure, after the last requested calculation has been fed to the optimizer. When run for the first time, there is no request, so the function just needs to be run on the object as it is returned from function 'SQN_free'.
run_SQN_free(optimizer, x, step_size)
run_SQN_free(optimizer, x, step_size)
optimizer |
An 'SQN_free' optimizer, for which its last request must have been served. Will be updated in-place. |
x |
Current values of the variables being optimized. Must be a numeric vector. Will be updated in-place. |
step_size |
Step size for the quasi-Newton update. |
A request with the next piece of required information. The output will be a list with the following levels:
task Requested task (one of "calc_grad", "calc_grad_big_batch", "calc_hess_vec").
requested_on
req_x Values of 'x' at which the requested information (gradient/Hessian) must be calculated.
req_vec Vector by which the Hessian must be multiplied. Will output 'NULL' when this calculation is not needed.
info
x_changed_in_run Whether the 'x' vector was updated.
iteration_number Current iteration number (in terms of quasi-Newton updates).
iteration_info Information about potential problems encountered during the iteration.
Optimizes an empirical (convex) loss function over batches of sample data.
SQN(x0, grad_fun, hess_vec_fun = NULL, pred_fun = NULL, initial_step = 0.001, step_fun = function(iter) 1/sqrt((iter/10) + 1), callback_iter = NULL, args_cb = NULL, verbose = TRUE, mem_size = 10, bfgs_upd_freq = 20, min_curvature = 1e-04, y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE, nthreads = -1)
SQN(x0, grad_fun, hess_vec_fun = NULL, pred_fun = NULL, initial_step = 0.001, step_fun = function(iter) 1/sqrt((iter/10) + 1), callback_iter = NULL, args_cb = NULL, verbose = TRUE, mem_size = 10, bfgs_upd_freq = 20, min_curvature = 1e-04, y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE, nthreads = -1)
x0 |
Initial values for the variables to optimize. |
grad_fun |
Function taking as unnamed arguments 'x_curr' (variable values), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the gradient when evalauted on that data. |
hess_vec_fun |
Function taking as unnamed arguments 'x_curr' (variable values), 'vec' (numeric vector), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the Hessian (with variable values at 'x_curr') when evalauted on that data, multiplied by the vector 'vec'. Not required when using 'use_grad_diff' = 'TRUE'. |
pred_fun |
Function taking an unnamed argument as data, another unnamed argument as the variable values, and optional extra arguments ('...'). Will be called when using 'predict' on the object returned by this function. |
initial_step |
Initial step size. |
step_fun |
Function accepting the iteration number as an unnamed parameter, which will output the number by which 'initial_step' will be multiplied at each iteration to get the step size for that iteration. |
callback_iter |
Callback function which will be called at the end of each iteration. Will pass three unnamed arguments: the current variable values, the current iteration number, and 'args_cb'. Pass 'NULL' if there is no need to call a callback function. |
args_cb |
Extra argument to pass to the callback function. |
verbose |
Whether to print information about iteration statuses when something goes wrong. |
mem_size |
Number of correction pairs to store for approximation of Hessian-vector products. |
bfgs_upd_freq |
Number of iterations (batches) after which to generate a BFGS correction pair. |
min_curvature |
Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum. |
y_reg |
Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization. |
use_grad_diff |
Whether to create the correction pairs using differences between gradients instead of Hessian-vector products. These gradients are calculated on a larger batch than the regular ones (given by batch_size * bfgs_upd_freq). |
check_nan |
Whether to check for variables becoming NaN after each iteration, and reverting the step if they do (will also reset BFGS memory). |
nthreads |
Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads. |
an 'SQN' object with the user-supplied functions, which can be fit to batches of data through function 'partial_fit', and can produce predictions on new data through function 'predict'.
Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016. "A stochastic quasi-Newton method for large-scale optimization." SIAM Journal on Optimization, 26(2), pp.1008-1031.
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.
partial_fit , predict.stochQN_guided , SQN_free
### Example logistic regression with randomly-generated data library(stochQN) ### Will sample data y ~ Bernoulli(sigm(Ax)) true_coefs <- c(1.12, 5.34, -6.123) generate_data_batch <- function(true_coefs, n = 100) { X <- matrix(rnorm(length(true_coefs) * n), nrow=n, ncol=length(true_coefs)) y <- 1 / (1 + exp(-as.numeric(X %*% true_coefs))) y <- as.numeric(y >= runif(n)) return(list(X = X, y = y)) } ### Logistic regression likelihood/loss eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- 1 / (1 + exp(-as.numeric(X %*% coefs))) logloss <- mean(-(y * log(pred) + (1 - y) * log(1 - pred))) reg <- lambda * as.numeric(coefs %*% coefs) return(logloss + reg) } eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- 1 / (1 + exp(-(X %*% coefs))) grad <- colMeans(X * as.numeric(pred - y)) grad <- grad + 2 * lambda * as.numeric(coefs^2) return(as.numeric(grad)) } eval_Hess_vec <- function(coefs, vec, X, y, weights=NULL, lambda=1e-5) { pred <- 1 / (1 + exp(-as.numeric(X %*% coefs))) diag <- pred * (1 - pred) Hp <- (t(X) * diag) %*% (X %*% vec) Hp <- Hp / NROW(X) + 2 * lambda * vec return(as.numeric(Hp)) } pred_fun <- function(X, coefs, ...) { return(1 / (1 + exp(-as.numeric(X %*% coefs)))) } ### Initialize optimizer form arbitrary values x0 <- c(1, 1, 1) optimizer <- SQN(x0, grad_fun=eval_grad, pred_fun=pred_fun, hess_vec_fun=eval_Hess_vec, initial_step=1e-0) val_data <- generate_data_batch(true_coefs, n=100) ### Fit to 250 batches of data, 100 observations each set.seed(1) for (i in 1:250) { new_batch <- generate_data_batch(true_coefs, n=100) partial_fit(optimizer, new_batch$X, new_batch$y, lambda=1e-5) x_curr <- get_curr_x(optimizer) i_curr <- get_iteration_number(optimizer) if ((i_curr %% 10) == 0) { cat(sprintf("Iteration %3d - E[f(x)]: %f - values of x: [%f, %f, %f]\n", i_curr, eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5), x_curr[1], x_curr[2], x_curr[3])) } } ### Predict for new data new_batch <- generate_data_batch(true_coefs, n=10) yhat <- predict(optimizer, new_batch$X)
### Example logistic regression with randomly-generated data library(stochQN) ### Will sample data y ~ Bernoulli(sigm(Ax)) true_coefs <- c(1.12, 5.34, -6.123) generate_data_batch <- function(true_coefs, n = 100) { X <- matrix(rnorm(length(true_coefs) * n), nrow=n, ncol=length(true_coefs)) y <- 1 / (1 + exp(-as.numeric(X %*% true_coefs))) y <- as.numeric(y >= runif(n)) return(list(X = X, y = y)) } ### Logistic regression likelihood/loss eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- 1 / (1 + exp(-as.numeric(X %*% coefs))) logloss <- mean(-(y * log(pred) + (1 - y) * log(1 - pred))) reg <- lambda * as.numeric(coefs %*% coefs) return(logloss + reg) } eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) { pred <- 1 / (1 + exp(-(X %*% coefs))) grad <- colMeans(X * as.numeric(pred - y)) grad <- grad + 2 * lambda * as.numeric(coefs^2) return(as.numeric(grad)) } eval_Hess_vec <- function(coefs, vec, X, y, weights=NULL, lambda=1e-5) { pred <- 1 / (1 + exp(-as.numeric(X %*% coefs))) diag <- pred * (1 - pred) Hp <- (t(X) * diag) %*% (X %*% vec) Hp <- Hp / NROW(X) + 2 * lambda * vec return(as.numeric(Hp)) } pred_fun <- function(X, coefs, ...) { return(1 / (1 + exp(-as.numeric(X %*% coefs)))) } ### Initialize optimizer form arbitrary values x0 <- c(1, 1, 1) optimizer <- SQN(x0, grad_fun=eval_grad, pred_fun=pred_fun, hess_vec_fun=eval_Hess_vec, initial_step=1e-0) val_data <- generate_data_batch(true_coefs, n=100) ### Fit to 250 batches of data, 100 observations each set.seed(1) for (i in 1:250) { new_batch <- generate_data_batch(true_coefs, n=100) partial_fit(optimizer, new_batch$X, new_batch$y, lambda=1e-5) x_curr <- get_curr_x(optimizer) i_curr <- get_iteration_number(optimizer) if ((i_curr %% 10) == 0) { cat(sprintf("Iteration %3d - E[f(x)]: %f - values of x: [%f, %f, %f]\n", i_curr, eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5), x_curr[1], x_curr[2], x_curr[3])) } } ### Predict for new data new_batch <- generate_data_batch(true_coefs, n=10) yhat <- predict(optimizer, new_batch$X)
Optimizes an empirical (convex) loss function over batches of sample data. Compared to function/class 'SQN', this version lets the user do all the calculations from the outside, only interacting with the object by means of a function that returns a request type and is fed the required calculation through methods 'update_gradient' and 'update_hess_vec'.
Order in which requests are made:
========== loop ===========
* calc_grad
... (repeat calc_grad)
if 'use_grad_diff':
* calc_grad_big_batch
else:
* calc_hess_vec
===========================
After running this function, apply 'run_SQN_free' to it to get the first requested piece of information.
SQN_free(mem_size = 10, bfgs_upd_freq = 20, min_curvature = 1e-04, y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE, nthreads = -1)
SQN_free(mem_size = 10, bfgs_upd_freq = 20, min_curvature = 1e-04, y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE, nthreads = -1)
mem_size |
Number of correction pairs to store for approximation of Hessian-vector products. |
bfgs_upd_freq |
Number of iterations (batches) after which to generate a BFGS correction pair. |
min_curvature |
Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum. |
y_reg |
Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization. |
use_grad_diff |
Whether to create the correction pairs using differences between gradients instead of Hessian-vector products. These gradients are calculated on a larger batch than the regular ones (given by batch_size * bfgs_upd_freq). |
check_nan |
Whether to check for variables becoming NaN after each iteration, and reverting the step if they do (will also reset BFGS memory). |
nthreads |
Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads. |
An 'SQN_free' object, which can be used through functions 'update_gradient', 'update_hess_vec', and 'run_SQN_free'
Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016. "A stochastic quasi-Newton method for large-scale optimization." SIAM Journal on Optimization, 26(2), pp.1008-1031.
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.
update_gradient , update_hess_vec , run_oLBFGS_free
### Example optimizing Rosenbrock 2D function ### Note that this example is not stochastic, as the ### function is not evaluated in expectation based on ### batches of data, but rather it has a given absolute ### form that never varies. ### Warning: this optimizer is meant for convex functions ### (Rosenbrock's is not convex) library(stochQN) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } Hvr <- function(x, v) { ## Hessian of 'fr' by vector 'v' x1 <- x[1] x2 <- x[2] H <- matrix(c(1200 * x1^2 - 400*x2 + 2, -400 * x1, -400 * x1, 200), nrow = 2) as.vector(H %*% v) } ### Initial values of x x_opt = as.numeric(c(0, 2)) cat(sprintf("Initial values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2])) ### Will use constant step size throughout ### (not recommended) step_size = 1e-3 ### Initialize the optimizer optimizer = SQN_free() ### Keep track of the iteration number curr_iter <- 0 ### Run a loop for severa, iterations ### (Note that some iterations might require more ### than 1 calculation request) for (i in 1:200) { req <- run_SQN_free(optimizer, x_opt, step_size) if (req$task == "calc_grad") { update_gradient(optimizer, grr(req$requested_on$req_x)) } else if (req$task == "calc_hess_vec") { update_hess_vec(optimizer, Hvr(req$requested_on$req_x, req$requested_on$req_vec)) } ### Track progress every 10 iterations if (req$info$iteration_number > curr_iter) { curr_iter <- req$info$iteration_number } if ((curr_iter %% 10) == 0) { cat(sprintf( "Iteration %3d - Current function value: %.3f\n", req$info$iteration_number, fr(x_opt))) } } cat(sprintf("Current values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2]))
### Example optimizing Rosenbrock 2D function ### Note that this example is not stochastic, as the ### function is not evaluated in expectation based on ### batches of data, but rather it has a given absolute ### form that never varies. ### Warning: this optimizer is meant for convex functions ### (Rosenbrock's is not convex) library(stochQN) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } Hvr <- function(x, v) { ## Hessian of 'fr' by vector 'v' x1 <- x[1] x2 <- x[2] H <- matrix(c(1200 * x1^2 - 400*x2 + 2, -400 * x1, -400 * x1, 200), nrow = 2) as.vector(H %*% v) } ### Initial values of x x_opt = as.numeric(c(0, 2)) cat(sprintf("Initial values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2])) ### Will use constant step size throughout ### (not recommended) step_size = 1e-3 ### Initialize the optimizer optimizer = SQN_free() ### Keep track of the iteration number curr_iter <- 0 ### Run a loop for severa, iterations ### (Note that some iterations might require more ### than 1 calculation request) for (i in 1:200) { req <- run_SQN_free(optimizer, x_opt, step_size) if (req$task == "calc_grad") { update_gradient(optimizer, grr(req$requested_on$req_x)) } else if (req$task == "calc_hess_vec") { update_hess_vec(optimizer, Hvr(req$requested_on$req_x, req$requested_on$req_vec)) } ### Track progress every 10 iterations if (req$info$iteration_number > curr_iter) { curr_iter <- req$info$iteration_number } if ((curr_iter %% 10) == 0) { cat(sprintf( "Iteration %3d - Current function value: %.3f\n", req$info$iteration_number, fr(x_opt))) } } cat(sprintf("Current values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2]))
Stochastic Logistic Regression
stochastic.logistic.regression(formula = NULL, pos_class = NULL, dim = NULL, intercept = TRUE, x0 = NULL, optimizer = "adaQN", optimizer_args = list(initial_step = 0.1, verbose = FALSE), lambda = 0.001, random_seed = 1, val_data = NULL)
stochastic.logistic.regression(formula = NULL, pos_class = NULL, dim = NULL, intercept = TRUE, x0 = NULL, optimizer = "adaQN", optimizer_args = list(initial_step = 0.1, verbose = FALSE), lambda = 0.001, random_seed = 1, val_data = NULL)
formula |
Formula for the model, if it is fit to data.frames instead of matrices/vectors. |
pos_class |
If fit to data in the form of data.frames, a string indicating which of the classes is the positive one. If fit to data in the form of matrices/vector, pass 'NULL'. |
dim |
Dimensionality of the model (number of features). Ignored when passing 'formula' or when passing 'x0'. If the intercept is added from the option 'intercept' here, it should not be counted towards 'dim'. |
intercept |
Whether to add an intercept to the covariates. Only ussed when fitting to matrices/vectors. Ignored when passing formula (for formulas without intercept, put '-1' in the RHS to get rid of the intercept). |
x0 |
Initial values of the variables. If passed, will ignore 'dim' and 'random_seed'. If not passed, will generate random starting values ~ Norm(0, 0.1). |
optimizer |
The optimizer to use - one of 'adaQN' (recommended), 'SQN', 'oLBFGS'. |
optimizer_args |
Arguments to pass to the optimizer (same ones as the functions of the same name). Must be a list. See the documentation of each optimizer for the parameters they take. |
lambda |
Regularization parameter. Be aware that the functions assume the log-likelihood (a.k.a. loss) is divided by the number of observations, so this number should be small. |
random_seed |
Random seed to use for the initialization of the variables. Ignored when passing 'x0'. |
val_data |
Validation data (only used for 'adaQN'). If passed, must be a list with entries 'X', 'y' (if passing data.frames for fitting), and optionally 'w' (sample weights). |
Binary logistic regression, fit in batches using this package's own optimizers.
An object of class 'stoch_logistic', which can be fit to batches of data through functon 'partial_fit_logistic'.
partial_fit_logistic, coef.stoch_logistic , predict.stoch_logistic , adaQN , SQN, oLBFGS
library(stochQN) ### Load Iris dataset data("iris") ### Example with X + y interface X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]) y <- as.numeric(iris$Species == "setosa") ### Initialize model with default parameters model <- stochastic.logistic.regression(dim = 4) ### Fit to 10 randomly-subsampled batches batch_size <- as.integer(nrow(X) / 3) for (i in 1:10) { set.seed(i) batch <- sample(nrow(X), size = batch_size, replace=TRUE) partial_fit_logistic(model, X, y) } ### Check classification accuracy cat(sprintf( "Accuracy after 10 iterations: %.2f%%\n", 100 * mean( predict(model, X, type = "class") == y) )) ### Example with formula interface iris_df <- iris levels(iris_df$Species) <- c("setosa", "other", "other") ### Initialize model with default parameters model <- stochastic.logistic.regression(Species ~ ., pos_class="setosa") ### Fit to 10 randomly-subsampled batches batch_size <- as.integer(nrow(iris_df) / 3) for (i in 1:10) { set.seed(i) batch <- sample(nrow(iris_df), size=batch_size, replace=TRUE) partial_fit_logistic(model, iris_df) } cat(sprintf( "Accuracy after 10 iterations: %.2f%%\n", 100 * mean( predict( model, iris_df, type = "class") == iris_df$Species ) ))
library(stochQN) ### Load Iris dataset data("iris") ### Example with X + y interface X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]) y <- as.numeric(iris$Species == "setosa") ### Initialize model with default parameters model <- stochastic.logistic.regression(dim = 4) ### Fit to 10 randomly-subsampled batches batch_size <- as.integer(nrow(X) / 3) for (i in 1:10) { set.seed(i) batch <- sample(nrow(X), size = batch_size, replace=TRUE) partial_fit_logistic(model, X, y) } ### Check classification accuracy cat(sprintf( "Accuracy after 10 iterations: %.2f%%\n", 100 * mean( predict(model, X, type = "class") == y) )) ### Example with formula interface iris_df <- iris levels(iris_df$Species) <- c("setosa", "other", "other") ### Initialize model with default parameters model <- stochastic.logistic.regression(Species ~ ., pos_class="setosa") ### Fit to 10 randomly-subsampled batches batch_size <- as.integer(nrow(iris_df) / 3) for (i in 1:10) { set.seed(i) batch <- sample(nrow(iris_df), size=batch_size, replace=TRUE) partial_fit_logistic(model, iris_df) } cat(sprintf( "Accuracy after 10 iterations: %.2f%%\n", 100 * mean( predict( model, iris_df, type = "class") == iris_df$Species ) ))
Same as 'print' function. To check the fitted coefficients use function 'coef'.
## S3 method for class 'stoch_logistic' summary(object, ...)
## S3 method for class 'stoch_logistic' summary(object, ...)
object |
A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'. |
... |
Not used. |
coef.stoch_logistic , stochastic.logistic.regression
Update the (expected) value of the objective function in an 'adaQN_free' object, after it has been requested by the optimizer (do NOT update it otherwise).
update_fun(optimizer, fun)
update_fun(optimizer, fun)
optimizer |
An 'adaQN_free' object which after the last run had requested a new function evaluation. |
fun |
Function as evaluated (in expectation) on the values of 'x' that were returned in the request. |
No return value (object is updated in-place).
Update the (expected) gradient in an optimizer from this package, after it has been requested by the optimizer (do NOT update it otherwise).
update_gradient(optimizer, gradient)
update_gradient(optimizer, gradient)
optimizer |
A free-mode optimizer from this package ('oLBFGS_free', 'SQN_free', 'adaQN_free') which after the last run had requested a new gradient evaluation.. |
gradient |
The (expected value of the) gradient as evaluated on the values of 'x' that were returned in the request. Must be a numeric vector. |
No return value (object is updated in-place).
Update the (expected) values of the Hessian-vector product in an 'SQN_free' object, after it has been requested by the optimizer (do NOT update it otherwise).
update_hess_vec(optimizer, hess_vec)
update_hess_vec(optimizer, hess_vec)
optimizer |
An 'SQN_free' optimizer which after the last run had requested a new Hessian-vector evaluation. |
hess_vec |
The (expected) value of the Hessian evaluated at the values of 'x' that were returned in the request, multiplied by the vector that was returned in the same request. Must be a numeric vector. |
No return value (object is updated in-place).