Package 'stochQN' reference manual

Title:	Stochastic Limited Memory Quasi-Newton Optimizers
Description:	Implementations of stochastic, limited-memory quasi-Newton optimizers, similar in spirit to the LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno) algorithm, for smooth stochastic optimization. Implements the following methods: oLBFGS (online LBFGS) (Schraudolph, N.N., Yu, J. and Guenter, S., 2007 <http://proceedings.mlr.press/v2/schraudolph07a.html>), SQN (stochastic quasi-Newton) (Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016 <arXiv:1401.7020>), adaQN (adaptive quasi-Newton) (Keskar, N.S., Berahas, A.S., 2016, <arXiv:1511.01169>). Provides functions for easily creating R objects with partial_fit/predict methods from some given objective/gradient/predict functions. Includes an example stochastic logistic regression using these optimizers. Provides header files and registered C routines for using it directly from C/C++.
Authors:	David Cortes
Maintainer:	David Cortes <[email protected]>
License:	BSD_2_clause + file LICENSE
Version:	0.1.2-1
Built:	2025-02-14 06:46:34 UTC
Source:	CRAN

adaQN guided optimizer

Description

Optimizes an empirical (possibly non-convex) loss function over batches of sample data.

Usage

adaQN(x0, grad_fun, obj_fun = NULL, pred_fun = NULL,
  initial_step = 0.01, step_fun = function(iter) 1/sqrt((iter/100) +
  1), callback_iter = NULL, args_cb = NULL, verbose = TRUE,
  mem_size = 10, fisher_size = 100, bfgs_upd_freq = 20,
  max_incr = 1.01, min_curvature = 1e-04, y_reg = NULL,
  scal_reg = 1e-04, rmsprop_weight = 0.9, use_grad_diff = FALSE,
  check_nan = TRUE, nthreads = -1, X_val = NULL, y_val = NULL,
  w_val = NULL)
adaQN(x0, grad_fun, obj_fun = NULL, pred_fun = NULL,
  initial_step = 0.01, step_fun = function(iter) 1/sqrt((iter/100) +
  1), callback_iter = NULL, args_cb = NULL, verbose = TRUE,
  mem_size = 10, fisher_size = 100, bfgs_upd_freq = 20,
  max_incr = 1.01, min_curvature = 1e-04, y_reg = NULL,
  scal_reg = 1e-04, rmsprop_weight = 0.9, use_grad_diff = FALSE,
  check_nan = TRUE, nthreads = -1, X_val = NULL, y_val = NULL,
  w_val = NULL)

Arguments

`x0`	Initial values for the variables to optimize.
`grad_fun`	Function taking as unnamed arguments 'x_curr' (variable values), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the gradient when evalauted on that data.
`obj_fun`	Function taking as unnamed arguments 'x_curr' (variable values), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the objective function when evalauted on that data. Only required when using 'max_incr'.
`pred_fun`	Function taking an unnamed argument as data, another unnamed argument as the variable values, and optional extra arguments ('...'). Will be called when using 'predict' on the object returned by this function.
`initial_step`	Initial step size.
`step_fun`	Function accepting the iteration number as an unnamed parameter, which will output the number by which 'initial_step' will be multiplied at each iteration to get the step size for that iteration.
`callback_iter`	Callback function which will be called at the end of each iteration. Will pass three unnamed arguments: the current variable values, the current iteration number, and 'args_cb'. Pass 'NULL' if there is no need to call a callback function.
`args_cb`	Extra argument to pass to the callback function.
`verbose`	Whether to print information about iteration statuses when something goes wrong.
`mem_size`	Number of correction pairs to store for approximation of Hessian-vector products.
`fisher_size`	Number of gradients to store for calculation of the empirical Fisher product with gradients. If passing 'NULL', will force 'use_grad_diff' to 'TRUE'.
`bfgs_upd_freq`	Number of iterations (batches) after which to generate a BFGS correction pair.
`max_incr`	Maximum ratio of function values in the validation set under the average values of 'x' during current epoch vs. previous epoch. If the ratio is above this threshold, the BFGS and Fisher memories will be reset, and 'x' values reverted to their previous average. If not using a validation set, will take a longer batch for function evaluations (same as used for gradients when using 'use_grad_diff' = 'TRUE'). Pass 'NULL' for no function-increase checking.
`min_curvature`	Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum.
`y_reg`	Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization.
`scal_reg`	Regularization parameter to use in the denominator for AdaGrad and RMSProp scaling.
`rmsprop_weight`	If not 'NULL', will use RMSProp formula instead of AdaGrad for approximated inverse-Hessian initialization.
`use_grad_diff`	Whether to create the correction pairs using differences between gradients instead of empirical Fisher matrix. These gradients are calculated on a larger batch than the regular ones (given by batch_size * bfgs_upd_freq).
`check_nan`	Whether to check for variables becoming NaN after each iteration, and reverting the step if they do (will also reset BFGS memory).
`nthreads`	Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads.
`X_val`	Covariates to use as validation set (only used when passing 'max_incr'). If not passed, will use a larger batch of stored data, in the same way as for Hessian-vector products in SQN.
`y_val`	Target variable for the covariates to use as validation set (only used when passing 'max_incr'). If not passed, will use a larger batch of stored data, in the same way as for Hessian-vector products in SQN.
`w_val`	Sample weights for the covariates to use as validation set (only used when passing 'max_incr'). If not passed, will use a larger batch of stored data, in the same way as for Hessian-vector products in SQN.

Value

an 'adaQN' object with the user-supplied functions, which can be fit to batches of data through function 'partial_fit', and can produce predictions on new data through function 'predict'.

References

Keskar, N.S. and Berahas, A.S., 2016, September. "adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs." In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 1-16). Springer, Cham.
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.

Examples

### Example regression with randomly-generated data
library(stochQN)

### Will sample data y ~ Ax + epsilon
true_coefs <- c(1.12, 5.34, -6.123)

generate_data_batch <- function(true_coefs, n = 100) {
  X <- matrix(
    rnorm(length(true_coefs) * n),
    nrow=n, ncol=length(true_coefs))
  y <- X %*% true_coefs + rnorm(n)
  return(list(X = X, y = y))
}

### Regular regression function that minimizes RMSE
eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- as.numeric(X %*% coefs)
  RMSE <- sqrt(mean((pred - y)^2))
  reg  <- 2 * lambda * as.numeric(coefs %*% coefs)
  return(RMSE + reg)
}

eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- X %*% coefs
  grad <- colMeans(X * as.numeric(pred - y))
  grad <- grad + 2 * lambda * as.numeric(coefs^2)
  return(grad)
}

pred_fun <- function(X, coefs, ...) {
  return(as.numeric(X %*% coefs))
}

### Initialize optimizer form arbitrary values
x0 <- c(1, 1, 1)
optimizer <- adaQN(x0, grad_fun=eval_grad,
  pred_fun=pred_fun, obj_fun=eval_fun, initial_step=1e-0)
val_data <- generate_data_batch(true_coefs, n=100)

### Fit to 50 batches of data, 100 observations each
for (i in 1:50) {
  set.seed(i)
  new_batch <- generate_data_batch(true_coefs, n=100)
  partial_fit(
    optimizer,
    new_batch$X, new_batch$y,
    lambda=1e-5)
  x_curr <- get_curr_x(optimizer)
  i_curr <- get_iteration_number(optimizer)
  if ((i_curr %% 10)  == 0) {
    cat(sprintf(
      "Iteration %d - E[f(x)]: %f - values of x: [%f, %f, %f]\n",
      i_curr,
      eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5),
      x_curr[1], x_curr[2], x_curr[3]))
  }
}

### Predict for new data
new_batch <- generate_data_batch(true_coefs, n=10)
yhat <- predict(optimizer, new_batch$X)
### Example regression with randomly-generated data
library(stochQN)

### Will sample data y ~ Ax + epsilon
true_coefs <- c(1.12, 5.34, -6.123)

generate_data_batch <- function(true_coefs, n = 100) {
  X <- matrix(
    rnorm(length(true_coefs) * n),
    nrow=n, ncol=length(true_coefs))
  y <- X %*% true_coefs + rnorm(n)
  return(list(X = X, y = y))
}

### Regular regression function that minimizes RMSE
eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- as.numeric(X %*% coefs)
  RMSE <- sqrt(mean((pred - y)^2))
  reg  <- 2 * lambda * as.numeric(coefs %*% coefs)
  return(RMSE + reg)
}

eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- X %*% coefs
  grad <- colMeans(X * as.numeric(pred - y))
  grad <- grad + 2 * lambda * as.numeric(coefs^2)
  return(grad)
}

pred_fun <- function(X, coefs, ...) {
  return(as.numeric(X %*% coefs))
}

### Initialize optimizer form arbitrary values
x0 <- c(1, 1, 1)
optimizer <- adaQN(x0, grad_fun=eval_grad,
  pred_fun=pred_fun, obj_fun=eval_fun, initial_step=1e-0)
val_data <- generate_data_batch(true_coefs, n=100)

### Fit to 50 batches of data, 100 observations each
for (i in 1:50) {
  set.seed(i)
  new_batch <- generate_data_batch(true_coefs, n=100)
  partial_fit(
    optimizer,
    new_batch$X, new_batch$y,
    lambda=1e-5)
  x_curr <- get_curr_x(optimizer)
  i_curr <- get_iteration_number(optimizer)
  if ((i_curr %% 10)  == 0) {
    cat(sprintf(
      "Iteration %d - E[f(x)]: %f - values of x: [%f, %f, %f]\n",
      i_curr,
      eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5),
      x_curr[1], x_curr[2], x_curr[3]))
  }
}

### Predict for new data
new_batch <- generate_data_batch(true_coefs, n=10)
yhat <- predict(optimizer, new_batch$X)

Optimizes an empirical (perhaps non-convex) loss function over batches of sample data. Compared to function/class 'adaQN', this version lets the user do all the calculations from the outside, only interacting with the object by means of a function that returns a request type and is fed the required calculation through methods 'update_gradient' and 'update_function'.

Order in which requests are made:

========== loop ===========

* calc_grad

⁠ ⁠... (repeat calc_grad)

if max_incr > 0:

⁠ ⁠* calc_fun_val_batch

if 'use_grad_diff':

⁠ ⁠* calc_grad_big_batch (skipped if below max_incr)

===========================

After running this function, apply 'run_adaQN_free' to it to get the first requested piece of information.

Usage

adaQN_free(mem_size = 10, fisher_size = 100, bfgs_upd_freq = 20,
  max_incr = 1.01, min_curvature = 1e-04, scal_reg = 1e-04,
  rmsprop_weight = 0.9, y_reg = NULL, use_grad_diff = FALSE,
  check_nan = TRUE, nthreads = -1)
adaQN_free(mem_size = 10, fisher_size = 100, bfgs_upd_freq = 20,
  max_incr = 1.01, min_curvature = 1e-04, scal_reg = 1e-04,
  rmsprop_weight = 0.9, y_reg = NULL, use_grad_diff = FALSE,
  check_nan = TRUE, nthreads = -1)

Arguments

`mem_size`	Number of correction pairs to store for approximation of Hessian-vector products.
`fisher_size`	Number of gradients to store for calculation of the empirical Fisher product with gradients. If passing 'NULL', will force 'use_grad_diff' to 'TRUE'.
`bfgs_upd_freq`	Number of iterations (batches) after which to generate a BFGS correction pair.
`max_incr`	Maximum ratio of function values in the validation set under the average values of 'x' during current epoch vs. previous epoch. If the ratio is above this threshold, the BFGS and Fisher memories will be reset, and 'x' values reverted to their previous average. Pass 'NULL' for no function-increase checking.
`min_curvature`	Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum.
`scal_reg`	Regularization parameter to use in the denominator for AdaGrad and RMSProp scaling.
`rmsprop_weight`	If not 'NULL', will use RMSProp formula instead of AdaGrad for approximated inverse-Hessian initialization.
`y_reg`	Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization.
`use_grad_diff`	Whether to create the correction pairs using differences between gradients instead of Fisher matrix. These gradients are calculated on a larger batch than the regular ones (given by batch_size * bfgs_upd_freq). If 'TRUE', empirical Fisher matrix will not be used.
`check_nan`	Whether to check for variables becoming NaN after each iteration, and reverting the step if they do (will also reset BFGS and Fisher memory).
`nthreads`	Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads.

Value

An 'adaQN_free' object, which can be used through functions 'update_gradient', 'update_fun', and 'run_adaQN_free'

References

Keskar, N.S. and Berahas, A.S., 2016, September. "adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs." In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 1-16). Springer, Cham.
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.

Examples

### Example optimizing Rosenbrock 2D function
### Note that this example is not stochastic, as the
### function is not evaluated in expectation based on
### batches of data, but rather it has a given absolute
### form that never varies.
library(stochQN)


fr <- function(x) { ## Rosenbrock Banana function
	x1 <- x[1]
	x2 <- x[2]
	100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
	x1 <- x[1]
	x2 <- x[2]
	c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
	  200 * (x2 - x1 * x1))
}


### Initial values of x
x_opt = as.numeric(c(0, 2))
cat(sprintf("Initial values of x: [%.3f, %.3f]\n",
			x_opt[1], x_opt[2]))

### Will use constant step size throughout
### (not recommended)
step_size = 1e-2

### Initialize the optimizer
optimizer = adaQN_free()

### Keep track of the iteration number
curr_iter <- 0

### Run a loop for many iterations
### (Note that some iterations might require more
###  than 1 calculation request)
for (i in 1:200) {
	req <- run_adaQN_free(optimizer, x_opt, step_size)
	if (req$task == "calc_grad") {
	  update_gradient(optimizer, grr(req$requested_on))
	} else if (req$task == "calc_fun_val_batch") {
	  update_fun(optimizer, fr(req$requested_on))
	}

	### Track progress every 10 iterations
	if (req$info$iteration_number > curr_iter) {
		curr_iter <- req$info$iteration_number
	}
	if ((curr_iter %% 10) == 0) {
		cat(sprintf(
		  "Iteration %3d - Current function value: %.3f\n",
		  req$info$iteration_number, fr(x_opt)))
	}
}
cat(sprintf("Current values of x: [%.3f, %.3f]\n",
			x_opt[1], x_opt[2]))
### Example optimizing Rosenbrock 2D function
### Note that this example is not stochastic, as the
### function is not evaluated in expectation based on
### batches of data, but rather it has a given absolute
### form that never varies.
library(stochQN)


fr <- function(x) { ## Rosenbrock Banana function
	x1 <- x[1]
	x2 <- x[2]
	100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
	x1 <- x[1]
	x2 <- x[2]
	c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
	  200 * (x2 - x1 * x1))
}


### Initial values of x
x_opt = as.numeric(c(0, 2))
cat(sprintf("Initial values of x: [%.3f, %.3f]\n",
			x_opt[1], x_opt[2]))

### Will use constant step size throughout
### (not recommended)
step_size = 1e-2

### Initialize the optimizer
optimizer = adaQN_free()

### Keep track of the iteration number
curr_iter <- 0

### Run a loop for many iterations
### (Note that some iterations might require more
###  than 1 calculation request)
for (i in 1:200) {
	req <- run_adaQN_free(optimizer, x_opt, step_size)
	if (req$task == "calc_grad") {
	  update_gradient(optimizer, grr(req$requested_on))
	} else if (req$task == "calc_fun_val_batch") {
	  update_fun(optimizer, fr(req$requested_on))
	}

	### Track progress every 10 iterations
	if (req$info$iteration_number > curr_iter) {
		curr_iter <- req$info$iteration_number
	}
	if ((curr_iter %% 10) == 0) {
		cat(sprintf(
		  "Iteration %3d - Current function value: %.3f\n",
		  req$info$iteration_number, fr(x_opt)))
	}
}
cat(sprintf("Current values of x: [%.3f, %.3f]\n",
			x_opt[1], x_opt[2]))

Retrieve fitted coefficients from stochastic logistic regression object

Description

Retrieve fitted coefficients from stochastic logistic regression object

Usage

## S3 method for class 'stoch_logistic'
coef(object, ...)
## S3 method for class 'stoch_logistic'
coef(object, ...)

Arguments

`object`	A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'. Must have already been fit to at least 1 batch of data.
`...`	Not used.

Value

An (n x 1) matrix with the coefficients, in the same format as those from 'glm'.

Get current values of the optimization variables

Description

Get current values of the optimization variables

Usage

get_curr_x(optimizer)
get_curr_x(optimizer)

Arguments

optimizer

An optimizer (guided-mode) from this module, as output by functions 'oLBFGS', 'SQN', 'adaQN'.

Value

A numeric vector with the current values of the variables being optimized.

Get current iteration number from the optimizer object

Description

Get current iteration number from the optimizer object

Usage

get_iteration_number(optimizer)
get_iteration_number(optimizer)

Arguments

optimizer

An optimizer (guided-mode) from this module, as output by functions 'oLBFGS', 'SQN', 'adaQN'.

Value

The current iteration number.

oLBFGS guided optimizer

Description

Optimizes an empirical (convex) loss function over batches of sample data.

Usage

oLBFGS(x0, grad_fun, pred_fun = NULL, initial_step = 0.01,
  step_fun = function(iter) 1/sqrt((iter/10) + 1),
  callback_iter = NULL, args_cb = NULL, verbose = TRUE,
  mem_size = 10, hess_init = NULL, min_curvature = 1e-04,
  y_reg = NULL, check_nan = TRUE, nthreads = -1)
oLBFGS(x0, grad_fun, pred_fun = NULL, initial_step = 0.01,
  step_fun = function(iter) 1/sqrt((iter/10) + 1),
  callback_iter = NULL, args_cb = NULL, verbose = TRUE,
  mem_size = 10, hess_init = NULL, min_curvature = 1e-04,
  y_reg = NULL, check_nan = TRUE, nthreads = -1)

Arguments

`x0`	Initial values for the variables to optimize.
`grad_fun`	Function taking as unnamed arguments 'x_curr' (variable values), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the gradient when evalauted on that data.
`pred_fun`	Function taking an unnamed argument as data, another unnamed argument as the variable values, and optional extra arguments ('...'). Will be called when using 'predict' on the object returned by this function.
`initial_step`	Initial step size.
`step_fun`	Function accepting the iteration number as an unnamed parameter, which will output the number by which 'initial_step' will be multiplied at each iteration to get the step size for that iteration.
`callback_iter`	Callback function which will be called at the end of each iteration. Will pass three unnamed arguments: the current variable values, the current iteration number, and 'args_cb'. Pass 'NULL' if there is no need to call a callback function.
`args_cb`	Extra argument to pass to the callback function.
`verbose`	Whether to print information about iteration statuses when something goes wrong.
`mem_size`	Number of correction pairs to store for approximation of Hessian-vector products.
`hess_init`	Value to which to initialize the diagonal of H0. If passing 'NULL', will use the same initializion as for SQN ((s_last * y_last) / (y_last * y_last)).
`min_curvature`	Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum.
`y_reg`	Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization.
`check_nan`	Whether to check for variables becoming NA after each iteration, and reverting the step if they do (will also reset BFGS memory).
`nthreads`	Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads.

Value

an 'oLBFGS' object with the user-supplied functions, which can be fit to batches of data through function 'partial_fit', and can produce predictions on new data through function 'predict'.

References

Schraudolph, N.N., Yu, J. and Guenter, S., 2007, March. "A stochastic quasi-Newton method for online convex optimization." In Artificial Intelligence and Statistics (pp. 436-443).
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.

Examples

### Example regression with randomly-generated data
library(stochQN)

### Will sample data y ~ Ax + epsilon
true_coefs <- c(1.12, 5.34, -6.123)

generate_data_batch <- function(true_coefs, n = 100) {
  X <- matrix(
    rnorm(length(true_coefs) * n),
    nrow=n, ncol=length(true_coefs))
  y <- X %*% true_coefs + rnorm(n)
  return(list(X = X, y = y))
}

### Regular regression function that minimizes RMSE
eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- as.numeric(X %*% coefs)
  RMSE <- sqrt(mean((pred - y)^2))
  reg  <- lambda * as.numeric(coefs %*% coefs)
  return(RMSE + reg)
}

eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- X %*% coefs
  grad <- colMeans(X * as.numeric(pred - y))
  grad <- grad + 2 * lambda * as.numeric(coefs^2)
  return(grad)
}

pred_fun <- function(X, coefs, ...) {
  return(as.numeric(X %*% coefs))
}

### Initialize optimizer form arbitrary values
x0 <- c(1, 1, 1)
optimizer <- oLBFGS(x0, grad_fun=eval_grad,
  pred_fun=pred_fun, initial_step=1e-1)
val_data <- generate_data_batch(true_coefs, n=100)

### Fit to 50 batches of data, 100 observations each
set.seed(1)
for (i in 1:50) {
  new_batch <- generate_data_batch(true_coefs, n=100)
  partial_fit(
    optimizer,
    new_batch$X, new_batch$y,
    lambda=1e-5)
  x_curr <- get_curr_x(optimizer)
  i_curr <- get_iteration_number(optimizer)
  if ((i_curr %% 10)  == 0) {
    cat(sprintf(
      "Iteration %d - E[f(x)]: %f - values of x: [%f, %f, %f]\n",
      i_curr,
      eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5),
      x_curr[1], x_curr[2], x_curr[3]))
  }
}

### Predict for new data
new_batch <- generate_data_batch(true_coefs, n=10)
yhat <- predict(optimizer, new_batch$X)
### Example regression with randomly-generated data
library(stochQN)

### Will sample data y ~ Ax + epsilon
true_coefs <- c(1.12, 5.34, -6.123)

generate_data_batch <- function(true_coefs, n = 100) {
  X <- matrix(
    rnorm(length(true_coefs) * n),
    nrow=n, ncol=length(true_coefs))
  y <- X %*% true_coefs + rnorm(n)
  return(list(X = X, y = y))
}

### Regular regression function that minimizes RMSE
eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- as.numeric(X %*% coefs)
  RMSE <- sqrt(mean((pred - y)^2))
  reg  <- lambda * as.numeric(coefs %*% coefs)
  return(RMSE + reg)
}

eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- X %*% coefs
  grad <- colMeans(X * as.numeric(pred - y))
  grad <- grad + 2 * lambda * as.numeric(coefs^2)
  return(grad)
}

pred_fun <- function(X, coefs, ...) {
  return(as.numeric(X %*% coefs))
}

### Initialize optimizer form arbitrary values
x0 <- c(1, 1, 1)
optimizer <- oLBFGS(x0, grad_fun=eval_grad,
  pred_fun=pred_fun, initial_step=1e-1)
val_data <- generate_data_batch(true_coefs, n=100)

### Fit to 50 batches of data, 100 observations each
set.seed(1)
for (i in 1:50) {
  new_batch <- generate_data_batch(true_coefs, n=100)
  partial_fit(
    optimizer,
    new_batch$X, new_batch$y,
    lambda=1e-5)
  x_curr <- get_curr_x(optimizer)
  i_curr <- get_iteration_number(optimizer)
  if ((i_curr %% 10)  == 0) {
    cat(sprintf(
      "Iteration %d - E[f(x)]: %f - values of x: [%f, %f, %f]\n",
      i_curr,
      eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5),
      x_curr[1], x_curr[2], x_curr[3]))
  }
}

### Predict for new data
new_batch <- generate_data_batch(true_coefs, n=10)
yhat <- predict(optimizer, new_batch$X)

oLBFGS Free-Mode Optimizer

Description

Optimizes an empirical (convex) loss function over batches of sample data. Compared to function/class 'oLBFGS', this version lets the user do all the calculations from the outside, only interacting with the object by means of a function that returns a request type and is fed the required calculation through a method 'update_gradient'.

Order in which requests are made:

========== loop ===========

* calc_grad

* calc_grad_same_batch (might skip if using check_nan)

===========================

After running this function, apply 'run_oLBFGS_free' to it to get the first requested piece of information.

Usage

oLBFGS_free(mem_size = 10, hess_init = NULL, min_curvature = 1e-04,
  y_reg = NULL, check_nan = TRUE, nthreads = -1)
oLBFGS_free(mem_size = 10, hess_init = NULL, min_curvature = 1e-04,
  y_reg = NULL, check_nan = TRUE, nthreads = -1)

Arguments

`mem_size`	Number of correction pairs to store for approximation of Hessian-vector products.
`hess_init`	Value to which to initialize the diagonal of H0. If passing 'NULL', will use the same initializion as for SQN ((s_last * y_last) / (y_last * y_last)).
`min_curvature`	Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum.
`y_reg`	Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization.
`check_nan`	Whether to check for variables becoming NA after each iteration, and reverting the step if they do (will also reset BFGS memory).
`nthreads`	Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads.

Value

An 'oLBFGS_free' object, which can be used through functions 'update_gradient' and 'run_oLBFGS_free'

References

Schraudolph, N.N., Yu, J. and Guenter, S., 2007, March. "A stochastic quasi-Newton method for online convex optimization." In Artificial Intelligence and Statistics (pp. 436-443).
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.

Examples

### Example optimizing Rosenbrock 2D function
### Note that this example is not stochastic, as the
### function is not evaluated in expectation based on
### batches of data, but rather it has a given absolute
### form that never varies.
### Warning: this optimizer is meant for convex functions
### (Rosenbrock's is not convex)
library(stochQN)


fr <- function(x) { ## Rosenbrock Banana function
	x1 <- x[1]
	x2 <- x[2]
	100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
	x1 <- x[1]
	x2 <- x[2]
	c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
	200 * (x2 - x1 * x1))
}

### Initial values of x
x_opt = as.numeric(c(0, 2))
cat(sprintf("Initial values of x: [%.3f, %.3f]\n",
	x_opt[1], x_opt[2]))
	
### Will use a constant step size throughout
### (not recommended)
step_size <- 1e-1

### Initialize the optimizer
optimizer <- oLBFGS_free()

### Keep track of the iteration number
curr_iter <- 0

### Run a loop for 100 iterations
### (Note that each iteration requires 2 calculations,
###  hence the 200)
for (i in 1:200) {
	req <- run_oLBFGS_free(optimizer, x_opt, step_size)
	if (req$task == "calc_grad") {
	  update_gradient(optimizer, grr(req$requested_on))
	} else if (req$task == "calc_grad_same_batch") {
	  update_gradient(optimizer, grr(req$requested_on))
	}
	
	### Track progress every 10 iterations
	if (req$info$iteration_number > curr_iter) {
	  curr_iter <- req$info$iteration_number
	  if ((curr_iter %% 10) == 0) {
	  cat(sprintf(
	   "Iteration %3d - Current function value: %.3f\n",
	  req$info$iteration_number, fr(x_opt)
	  ))
	  }
	}
}
cat(sprintf("Current values of x: [%.3f, %.3f]\n",
	x_opt[1], x_opt[2]))
### Example optimizing Rosenbrock 2D function
### Note that this example is not stochastic, as the
### function is not evaluated in expectation based on
### batches of data, but rather it has a given absolute
### form that never varies.
### Warning: this optimizer is meant for convex functions
### (Rosenbrock's is not convex)
library(stochQN)


fr <- function(x) { ## Rosenbrock Banana function
	x1 <- x[1]
	x2 <- x[2]
	100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
	x1 <- x[1]
	x2 <- x[2]
	c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
	200 * (x2 - x1 * x1))
}

### Initial values of x
x_opt = as.numeric(c(0, 2))
cat(sprintf("Initial values of x: [%.3f, %.3f]\n",
	x_opt[1], x_opt[2]))
	
### Will use a constant step size throughout
### (not recommended)
step_size <- 1e-1

### Initialize the optimizer
optimizer <- oLBFGS_free()

### Keep track of the iteration number
curr_iter <- 0

### Run a loop for 100 iterations
### (Note that each iteration requires 2 calculations,
###  hence the 200)
for (i in 1:200) {
	req <- run_oLBFGS_free(optimizer, x_opt, step_size)
	if (req$task == "calc_grad") {
	  update_gradient(optimizer, grr(req$requested_on))
	} else if (req$task == "calc_grad_same_batch") {
	  update_gradient(optimizer, grr(req$requested_on))
	}
	
	### Track progress every 10 iterations
	if (req$info$iteration_number > curr_iter) {
	  curr_iter <- req$info$iteration_number
	  if ((curr_iter %% 10) == 0) {
	  cat(sprintf(
	   "Iteration %3d - Current function value: %.3f\n",
	  req$info$iteration_number, fr(x_opt)
	  ))
	  }
	}
}
cat(sprintf("Current values of x: [%.3f, %.3f]\n",
	x_opt[1], x_opt[2]))

Partial fit stochastic model to new data

Description

Runs one iteration of the stochastic optimizer on the new data passed here.

Usage

partial_fit(optimizer, X, y = NULL, weights = NULL, ...)
partial_fit(optimizer, X, y = NULL, weights = NULL, ...)

Arguments

`optimizer`	A stochastic optimizer from this package as output by functions 'oLBFGS', 'SQN', 'adaQN'. Will be modified in-place.
`X`	Covariates to pass to the user-defined gradient / objective / Hessian-vector functions.
`y`	Target variable to pass to the user-defined gradient / objective / Hessian-vector functions.
`weights`	Target variable to pass to the user-defined gradient / objective / Hessian-vector functions.
`...`	Additional arguments to pass to the user-defined gradient / objective / Hessian-vector functions.

Value

No return value (object is modified in-place).

Update stochastic logistic regression model with new batch of data

Description

Perform a quasi-Newton iteration to update the model with new data.

Usage

partial_fit_logistic(logistic_model, X, y = NULL, w = NULL)
partial_fit_logistic(logistic_model, X, y = NULL, w = NULL)

Arguments

`logistic_model`	A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'. Will be modified in-place.
`X`	Data with covariates. If passing a 'data.frame', the model object must have been initialized with a formula, and 'X' must also contain the target variable ('y'). If passing a matrix, must also pass 'y'. Note that whatever factor levels are present in the first batch of data, will be taken as the whole factor levels.
`y`	The target variable, when using matrices. Ignored when using formula.
`w`	Sample weights (optional). If required, must pass them at every partial fit iteration.

Value

No return value. Model object is updated in-place.

Prediction function for stochastic logistic regression

Description

Makes predictions for new data from the fitted model. Model have already been fit to at least 1 batch of data.

Usage

## S3 method for class 'stoch_logistic'
predict(object, newdata, type = "prob", ...)
## S3 method for class 'stoch_logistic'
predict(object, newdata, type = "prob", ...)

Arguments

`object`	A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'.
`newdata`	New data on which to make predictions.
`type`	Type of prediction to make. Can pass 'prob' to get probabilities for the positive class, or 'class' to get the predicted class.
`...`	Not used.

Value

A vector with the predicted classes or probabilities for 'newdata'.

Predict function for stochastic optimizer object

Description

Calls the user-defined predict function for an object optimized through this package's functions.

Usage

## S3 method for class 'stochQN_guided'
predict(object, newdata, ...)
## S3 method for class 'stochQN_guided'
predict(object, newdata, ...)

Arguments

`object`	Optimizer from this module as output by functions 'oLBFGS', 'SQN', 'adaQN'. Must have been constructed with a predict function.
`newdata`	Data on which to make predictions (will be passed to the user-provided function).
`...`	Additional arguments to pass to the user-provided predict function.

Print summary info about adaQN guided-mode object

Description

Print summary info about adaQN guided-mode object

Usage

## S3 method for class 'adaQN'
print(x, ...)
## S3 method for class 'adaQN'
print(x, ...)

Arguments

`x`	An 'adaQN' object as output by function of the same name.
`...`	Not used.

Print summary info about adaQN free-mode object

Description

Print summary info about adaQN free-mode object

Usage

## S3 method for class 'adaQN_free'
print(x, ...)
## S3 method for class 'adaQN_free'
print(x, ...)

Arguments

`x`	An 'adaQN_free' object as output by function of the same name.
`...`	Not used.

Print summary info about oLBFGS guided-mode object

Description

Print summary info about oLBFGS guided-mode object

Usage

## S3 method for class 'oLBFGS'
print(x, ...)
## S3 method for class 'oLBFGS'
print(x, ...)

Arguments

`x`	An 'oLBFGS' object as output by function of the same name.
`...`	Not used.

Print summary info about oLBFGS free-mode object

Description

Print summary info about oLBFGS free-mode object

Usage

## S3 method for class 'oLBFGS_free'
print(x, ...)
## S3 method for class 'oLBFGS_free'
print(x, ...)

Arguments

`x`	An 'oLBFGS_free' object as output by function of the same name.
`...`	Not used.

Print summary info about SQN guided-mode object

Description

Print summary info about SQN guided-mode object

Usage

## S3 method for class 'SQN'
print(x, ...)
## S3 method for class 'SQN'
print(x, ...)

Arguments

`x`	An 'SQN' object as output by function of the same name.
`...`	Not used.

Print summary info about SQN free-mode object

Description

Print summary info about SQN free-mode object

Usage

## S3 method for class 'SQN_free'
print(x, ...)
## S3 method for class 'SQN_free'
print(x, ...)

Arguments

`x`	An 'SQN_free' object as output by function of the same name.
`...`	Not used.

Print general info about stochastic logistic regression object

Description

Print general info about stochastic logistic regression object

Usage

## S3 method for class 'stoch_logistic'
print(x, ...)
## S3 method for class 'stoch_logistic'
print(x, ...)

Arguments

`x`	A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'.
`...`	Not used.

Run adaQN optimizer in free-mode

Description

Run the next step of an adaQN optimization procedure, after the last requested calculation has been fed to the optimizer. When run for the first time, there is no request, so the function just needs to be run on the object as it is returned from function 'adaQN_free'.

Usage

run_adaQN_free(optimizer, x, step_size)
run_adaQN_free(optimizer, x, step_size)

Arguments

`optimizer`	An 'adaQN_free' optimizer, for which its last request must have been served. Will be updated in-place.
`x`	Current values of the variables being optimized. Must be a numeric vector. Will be updated in-place.
`step_size`	Step size for the quasi-Newton update.

Value

A request with the next piece of required information. The output will be a list with the following levels:

task Requested task (one of "calc_grad", "calc_fun_val_batch", "calc_grad_big_batch").
requested_on Values of 'x' at which the requested information must be calculated.
info
- x_changed_in_run Whether the 'x' vector was updated.
- iteration_number Current iteration number (in terms of quasi-Newton updates).
- iteration_info Information about potential problems encountered during the iteration.

Run oLBFGS optimizer in free-mode

Description

Run the next step of an oLBFGS optimization procedure, after the last requested calculation has been fed to the optimizer. When run for the first time, there is no request, so the function just needs to be run on the object as it is returned from function 'oLBFGS_free'.

Usage

run_oLBFGS_free(optimizer, x, step_size)
run_oLBFGS_free(optimizer, x, step_size)

Arguments

`optimizer`	An 'oLBFGS_free' optimizer, for which its last request must have been served. Will be updated in-place.
`x`	Current values of the variables being optimized. Must be a numeric vector. Will be updated in-place.
`step_size`	Step size for the quasi-Newton update.

Value

A request with the next piece of required information. The output will be a list with the following levels:

task Requested task (one of "calc_grad" or "calc_grad_same_batch").
requested_on Values of 'x' at which the requested information must be calculated.
info
- x_changed_in_run Whether the 'x' vector was updated.
- iteration_number Current iteration number (in terms of quasi-Newton updates).
- iteration_info Information about potential problems encountered during the iteration.

Run SQN optimizer in free-mode

Description

Run the next step of an SQN optimization procedure, after the last requested calculation has been fed to the optimizer. When run for the first time, there is no request, so the function just needs to be run on the object as it is returned from function 'SQN_free'.

Usage

run_SQN_free(optimizer, x, step_size)
run_SQN_free(optimizer, x, step_size)

Arguments

`optimizer`	An 'SQN_free' optimizer, for which its last request must have been served. Will be updated in-place.
`x`	Current values of the variables being optimized. Must be a numeric vector. Will be updated in-place.
`step_size`	Step size for the quasi-Newton update.

Value

A request with the next piece of required information. The output will be a list with the following levels:

task Requested task (one of "calc_grad", "calc_grad_big_batch", "calc_hess_vec").
requested_on
- req_x Values of 'x' at which the requested information (gradient/Hessian) must be calculated.
- req_vec Vector by which the Hessian must be multiplied. Will output 'NULL' when this calculation is not needed.
info
- x_changed_in_run Whether the 'x' vector was updated.
- iteration_number Current iteration number (in terms of quasi-Newton updates).
- iteration_info Information about potential problems encountered during the iteration.

SQN guided optimizer

Description

Optimizes an empirical (convex) loss function over batches of sample data.

Usage

SQN(x0, grad_fun, hess_vec_fun = NULL, pred_fun = NULL,
  initial_step = 0.001, step_fun = function(iter) 1/sqrt((iter/10) +
  1), callback_iter = NULL, args_cb = NULL, verbose = TRUE,
  mem_size = 10, bfgs_upd_freq = 20, min_curvature = 1e-04,
  y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE,
  nthreads = -1)
SQN(x0, grad_fun, hess_vec_fun = NULL, pred_fun = NULL,
  initial_step = 0.001, step_fun = function(iter) 1/sqrt((iter/10) +
  1), callback_iter = NULL, args_cb = NULL, verbose = TRUE,
  mem_size = 10, bfgs_upd_freq = 20, min_curvature = 1e-04,
  y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE,
  nthreads = -1)

Arguments

`x0`	Initial values for the variables to optimize.
`grad_fun`	Function taking as unnamed arguments 'x_curr' (variable values), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the gradient when evalauted on that data.
`hess_vec_fun`	Function taking as unnamed arguments 'x_curr' (variable values), 'vec' (numeric vector), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the Hessian (with variable values at 'x_curr') when evalauted on that data, multiplied by the vector 'vec'. Not required when using 'use_grad_diff' = 'TRUE'.
`pred_fun`	Function taking an unnamed argument as data, another unnamed argument as the variable values, and optional extra arguments ('...'). Will be called when using 'predict' on the object returned by this function.
`initial_step`	Initial step size.
`step_fun`	Function accepting the iteration number as an unnamed parameter, which will output the number by which 'initial_step' will be multiplied at each iteration to get the step size for that iteration.
`callback_iter`	Callback function which will be called at the end of each iteration. Will pass three unnamed arguments: the current variable values, the current iteration number, and 'args_cb'. Pass 'NULL' if there is no need to call a callback function.
`args_cb`	Extra argument to pass to the callback function.
`verbose`	Whether to print information about iteration statuses when something goes wrong.
`mem_size`	Number of correction pairs to store for approximation of Hessian-vector products.
`bfgs_upd_freq`	Number of iterations (batches) after which to generate a BFGS correction pair.
`min_curvature`	Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum.
`y_reg`	Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization.
`use_grad_diff`	Whether to create the correction pairs using differences between gradients instead of Hessian-vector products. These gradients are calculated on a larger batch than the regular ones (given by batch_size * bfgs_upd_freq).
`check_nan`	Whether to check for variables becoming NaN after each iteration, and reverting the step if they do (will also reset BFGS memory).
`nthreads`	Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads.

Value

an 'SQN' object with the user-supplied functions, which can be fit to batches of data through function 'partial_fit', and can produce predictions on new data through function 'predict'.

References

Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016. "A stochastic quasi-Newton method for large-scale optimization." SIAM Journal on Optimization, 26(2), pp.1008-1031.
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.

Examples

### Example logistic regression with randomly-generated data
library(stochQN)

### Will sample data y ~ Bernoulli(sigm(Ax))
true_coefs <- c(1.12, 5.34, -6.123)

generate_data_batch <- function(true_coefs, n = 100) {
  X <- matrix(rnorm(length(true_coefs) * n), nrow=n, ncol=length(true_coefs))
  y <- 1 / (1 + exp(-as.numeric(X %*% true_coefs)))
  y <- as.numeric(y >= runif(n))
  return(list(X = X, y = y))
}

### Logistic regression likelihood/loss
eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred    <- 1 / (1 + exp(-as.numeric(X %*% coefs)))
  logloss <- mean(-(y * log(pred) + (1 - y) * log(1 - pred)))
  reg     <- lambda * as.numeric(coefs %*% coefs)
  return(logloss + reg)
}

eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- 1 / (1 + exp(-(X %*% coefs)))
  grad <- colMeans(X * as.numeric(pred - y))
  grad <- grad + 2 * lambda * as.numeric(coefs^2)
  return(as.numeric(grad))
}

eval_Hess_vec <- function(coefs, vec, X, y, weights=NULL, lambda=1e-5) {
  pred <- 1 / (1 + exp(-as.numeric(X %*% coefs)))
  diag <- pred * (1 - pred)
  Hp   <- (t(X) * diag) %*% (X %*% vec)
  Hp   <- Hp / NROW(X) + 2 * lambda * vec
  return(as.numeric(Hp))
}

pred_fun <- function(X, coefs, ...) {
  return(1 / (1 + exp(-as.numeric(X %*% coefs))))
}


### Initialize optimizer form arbitrary values
x0 <- c(1, 1, 1)
optimizer <- SQN(x0, grad_fun=eval_grad, pred_fun=pred_fun,
  hess_vec_fun=eval_Hess_vec, initial_step=1e-0)
val_data <- generate_data_batch(true_coefs, n=100)

### Fit to 250 batches of data, 100 observations each
set.seed(1)
for (i in 1:250) {
  new_batch <- generate_data_batch(true_coefs, n=100)
  partial_fit(optimizer, new_batch$X, new_batch$y, lambda=1e-5)
  x_curr <- get_curr_x(optimizer)
  i_curr <- get_iteration_number(optimizer)
  if ((i_curr %% 10)  == 0) {
    cat(sprintf("Iteration %3d - E[f(x)]: %f - values of x: [%f, %f, %f]\n",
      i_curr, eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5),
      x_curr[1], x_curr[2], x_curr[3]))
  }
}

### Predict for new data
new_batch <- generate_data_batch(true_coefs, n=10)
yhat <- predict(optimizer, new_batch$X)
### Example logistic regression with randomly-generated data
library(stochQN)

### Will sample data y ~ Bernoulli(sigm(Ax))
true_coefs <- c(1.12, 5.34, -6.123)

generate_data_batch <- function(true_coefs, n = 100) {
  X <- matrix(rnorm(length(true_coefs) * n), nrow=n, ncol=length(true_coefs))
  y <- 1 / (1 + exp(-as.numeric(X %*% true_coefs)))
  y <- as.numeric(y >= runif(n))
  return(list(X = X, y = y))
}

### Logistic regression likelihood/loss
eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred    <- 1 / (1 + exp(-as.numeric(X %*% coefs)))
  logloss <- mean(-(y * log(pred) + (1 - y) * log(1 - pred)))
  reg     <- lambda * as.numeric(coefs %*% coefs)
  return(logloss + reg)
}

eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- 1 / (1 + exp(-(X %*% coefs)))
  grad <- colMeans(X * as.numeric(pred - y))
  grad <- grad + 2 * lambda * as.numeric(coefs^2)
  return(as.numeric(grad))
}

eval_Hess_vec <- function(coefs, vec, X, y, weights=NULL, lambda=1e-5) {
  pred <- 1 / (1 + exp(-as.numeric(X %*% coefs)))
  diag <- pred * (1 - pred)
  Hp   <- (t(X) * diag) %*% (X %*% vec)
  Hp   <- Hp / NROW(X) + 2 * lambda * vec
  return(as.numeric(Hp))
}

pred_fun <- function(X, coefs, ...) {
  return(1 / (1 + exp(-as.numeric(X %*% coefs))))
}


### Initialize optimizer form arbitrary values
x0 <- c(1, 1, 1)
optimizer <- SQN(x0, grad_fun=eval_grad, pred_fun=pred_fun,
  hess_vec_fun=eval_Hess_vec, initial_step=1e-0)
val_data <- generate_data_batch(true_coefs, n=100)

### Fit to 250 batches of data, 100 observations each
set.seed(1)
for (i in 1:250) {
  new_batch <- generate_data_batch(true_coefs, n=100)
  partial_fit(optimizer, new_batch$X, new_batch$y, lambda=1e-5)
  x_curr <- get_curr_x(optimizer)
  i_curr <- get_iteration_number(optimizer)
  if ((i_curr %% 10)  == 0) {
    cat(sprintf("Iteration %3d - E[f(x)]: %f - values of x: [%f, %f, %f]\n",
      i_curr, eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5),
      x_curr[1], x_curr[2], x_curr[3]))
  }
}

### Predict for new data
new_batch <- generate_data_batch(true_coefs, n=10)
yhat <- predict(optimizer, new_batch$X)

SQN Free-Mode Optimizer

Description

Optimizes an empirical (convex) loss function over batches of sample data. Compared to function/class 'SQN', this version lets the user do all the calculations from the outside, only interacting with the object by means of a function that returns a request type and is fed the required calculation through methods 'update_gradient' and 'update_hess_vec'.

Order in which requests are made:

========== loop ===========

* calc_grad

⁠ ⁠... (repeat calc_grad)

if 'use_grad_diff':

⁠ ⁠* calc_grad_big_batch

else:

⁠ ⁠* calc_hess_vec

===========================

After running this function, apply 'run_SQN_free' to it to get the first requested piece of information.

Usage

SQN_free(mem_size = 10, bfgs_upd_freq = 20, min_curvature = 1e-04,
  y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE,
  nthreads = -1)
SQN_free(mem_size = 10, bfgs_upd_freq = 20, min_curvature = 1e-04,
  y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE,
  nthreads = -1)

Arguments

`mem_size`	Number of correction pairs to store for approximation of Hessian-vector products.
`bfgs_upd_freq`	Number of iterations (batches) after which to generate a BFGS correction pair.
`min_curvature`	Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum.
`y_reg`	Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization.
`use_grad_diff`	Whether to create the correction pairs using differences between gradients instead of Hessian-vector products. These gradients are calculated on a larger batch than the regular ones (given by batch_size * bfgs_upd_freq).
`check_nan`	Whether to check for variables becoming NaN after each iteration, and reverting the step if they do (will also reset BFGS memory).
`nthreads`	Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads.

Value

An 'SQN_free' object, which can be used through functions 'update_gradient', 'update_hess_vec', and 'run_SQN_free'

References

Byrd, R.H., Hansen, S.L., Nocedal, J. and Singer, Y., 2016. "A stochastic quasi-Newton method for large-scale optimization." SIAM Journal on Optimization, 26(2), pp.1008-1031.
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.

Examples

### Example optimizing Rosenbrock 2D function
### Note that this example is not stochastic, as the
### function is not evaluated in expectation based on
### batches of data, but rather it has a given absolute
### form that never varies.
### Warning: this optimizer is meant for convex functions
### (Rosenbrock's is not convex)
library(stochQN)


fr <- function(x) { ## Rosenbrock Banana function
	x1 <- x[1]
	x2 <- x[2]
	100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
	x1 <- x[1]
	x2 <- x[2]
	c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
	  200 * (x2 - x1 * x1))
}
Hvr <- function(x, v) { ## Hessian of 'fr' by vector 'v'
	x1 <- x[1]
	x2 <- x[2]
	H <- matrix(c(1200 * x1^2 - 400*x2 + 2,
			  -400 * x1, -400 * x1, 200),
			nrow = 2)
	as.vector(H %*% v)
}

### Initial values of x
x_opt = as.numeric(c(0, 2))
cat(sprintf("Initial values of x: [%.3f, %.3f]\n",
			x_opt[1], x_opt[2]))

### Will use constant step size throughout
### (not recommended)
step_size = 1e-3

### Initialize the optimizer
optimizer = SQN_free()

### Keep track of the iteration number
curr_iter <- 0

### Run a loop for severa, iterations
### (Note that some iterations might require more
###  than 1 calculation request)
for (i in 1:200) {
  req <- run_SQN_free(optimizer, x_opt, step_size)
  if (req$task == "calc_grad") {
    update_gradient(optimizer, grr(req$requested_on$req_x))
  } else if (req$task == "calc_hess_vec") {
	   update_hess_vec(optimizer,
      Hvr(req$requested_on$req_x, req$requested_on$req_vec))
  }

  ### Track progress every 10 iterations
  if (req$info$iteration_number > curr_iter) {
  	curr_iter <- req$info$iteration_number
  }
  if ((curr_iter %% 10) == 0) {
  	cat(sprintf(
  	 "Iteration %3d - Current function value: %.3f\n",
  	 req$info$iteration_number, fr(x_opt)))
  }
}
cat(sprintf("Current values of x: [%.3f, %.3f]\n",
			x_opt[1], x_opt[2]))
### Example optimizing Rosenbrock 2D function
### Note that this example is not stochastic, as the
### function is not evaluated in expectation based on
### batches of data, but rather it has a given absolute
### form that never varies.
### Warning: this optimizer is meant for convex functions
### (Rosenbrock's is not convex)
library(stochQN)


fr <- function(x) { ## Rosenbrock Banana function
	x1 <- x[1]
	x2 <- x[2]
	100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
	x1 <- x[1]
	x2 <- x[2]
	c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
	  200 * (x2 - x1 * x1))
}
Hvr <- function(x, v) { ## Hessian of 'fr' by vector 'v'
	x1 <- x[1]
	x2 <- x[2]
	H <- matrix(c(1200 * x1^2 - 400*x2 + 2,
			  -400 * x1, -400 * x1, 200),
			nrow = 2)
	as.vector(H %*% v)
}

### Initial values of x
x_opt = as.numeric(c(0, 2))
cat(sprintf("Initial values of x: [%.3f, %.3f]\n",
			x_opt[1], x_opt[2]))

### Will use constant step size throughout
### (not recommended)
step_size = 1e-3

### Initialize the optimizer
optimizer = SQN_free()

### Keep track of the iteration number
curr_iter <- 0

### Run a loop for severa, iterations
### (Note that some iterations might require more
###  than 1 calculation request)
for (i in 1:200) {
  req <- run_SQN_free(optimizer, x_opt, step_size)
  if (req$task == "calc_grad") {
    update_gradient(optimizer, grr(req$requested_on$req_x))
  } else if (req$task == "calc_hess_vec") {
	   update_hess_vec(optimizer,
      Hvr(req$requested_on$req_x, req$requested_on$req_vec))
  }

  ### Track progress every 10 iterations
  if (req$info$iteration_number > curr_iter) {
  	curr_iter <- req$info$iteration_number
  }
  if ((curr_iter %% 10) == 0) {
  	cat(sprintf(
  	 "Iteration %3d - Current function value: %.3f\n",
  	 req$info$iteration_number, fr(x_opt)))
  }
}
cat(sprintf("Current values of x: [%.3f, %.3f]\n",
			x_opt[1], x_opt[2]))

Stochastic Logistic Regression

Description

Stochastic Logistic Regression

Usage

stochastic.logistic.regression(formula = NULL, pos_class = NULL,
  dim = NULL, intercept = TRUE, x0 = NULL, optimizer = "adaQN",
  optimizer_args = list(initial_step = 0.1, verbose = FALSE),
  lambda = 0.001, random_seed = 1, val_data = NULL)
stochastic.logistic.regression(formula = NULL, pos_class = NULL,
  dim = NULL, intercept = TRUE, x0 = NULL, optimizer = "adaQN",
  optimizer_args = list(initial_step = 0.1, verbose = FALSE),
  lambda = 0.001, random_seed = 1, val_data = NULL)

Arguments

`formula`	Formula for the model, if it is fit to data.frames instead of matrices/vectors.
`pos_class`	If fit to data in the form of data.frames, a string indicating which of the classes is the positive one. If fit to data in the form of matrices/vector, pass 'NULL'.
`dim`	Dimensionality of the model (number of features). Ignored when passing 'formula' or when passing 'x0'. If the intercept is added from the option 'intercept' here, it should not be counted towards 'dim'.
`intercept`	Whether to add an intercept to the covariates. Only ussed when fitting to matrices/vectors. Ignored when passing formula (for formulas without intercept, put '-1' in the RHS to get rid of the intercept).
`x0`	Initial values of the variables. If passed, will ignore 'dim' and 'random_seed'. If not passed, will generate random starting values ~ Norm(0, 0.1).
`optimizer`	The optimizer to use - one of 'adaQN' (recommended), 'SQN', 'oLBFGS'.
`optimizer_args`	Arguments to pass to the optimizer (same ones as the functions of the same name). Must be a list. See the documentation of each optimizer for the parameters they take.
`lambda`	Regularization parameter. Be aware that the functions assume the log-likelihood (a.k.a. loss) is divided by the number of observations, so this number should be small.
`random_seed`	Random seed to use for the initialization of the variables. Ignored when passing 'x0'.
`val_data`	Validation data (only used for 'adaQN'). If passed, must be a list with entries 'X', 'y' (if passing data.frames for fitting), and optionally 'w' (sample weights).

Details

Binary logistic regression, fit in batches using this package's own optimizers.

Value

An object of class 'stoch_logistic', which can be fit to batches of data through functon 'partial_fit_logistic'.

Examples

library(stochQN)

### Load Iris dataset
data("iris")

### Example with X + y interface
X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width",
  "Petal.Length", "Petal.Width")])
y <- as.numeric(iris$Species == "setosa")

### Initialize model with default parameters
model <- stochastic.logistic.regression(dim = 4)

### Fit to 10 randomly-subsampled batches
batch_size <- as.integer(nrow(X) / 3)
for (i in 1:10) {
  set.seed(i)
  batch <- sample(nrow(X),
      size = batch_size, replace=TRUE)
  partial_fit_logistic(model, X, y)
}

### Check classification accuracy
cat(sprintf(
  "Accuracy after 10 iterations: %.2f%%\n",
  100 * mean(
    predict(model, X, type = "class") == y)
  ))


### Example with formula interface
iris_df <- iris
levels(iris_df$Species) <- c("setosa", "other", "other")

### Initialize model with default parameters
model <- stochastic.logistic.regression(Species ~ .,
  pos_class="setosa")

### Fit to 10 randomly-subsampled batches
batch_size <- as.integer(nrow(iris_df) / 3)
for (i in 1:10) {
  set.seed(i)
  batch <- sample(nrow(iris_df),
      size=batch_size, replace=TRUE)
  partial_fit_logistic(model, iris_df)
}
cat(sprintf(
  "Accuracy after 10 iterations: %.2f%%\n",
  100 * mean(
    predict(
      model, iris_df, type = "class") == iris_df$Species
      )
  ))
library(stochQN)

### Load Iris dataset
data("iris")

### Example with X + y interface
X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width",
  "Petal.Length", "Petal.Width")])
y <- as.numeric(iris$Species == "setosa")

### Initialize model with default parameters
model <- stochastic.logistic.regression(dim = 4)

### Fit to 10 randomly-subsampled batches
batch_size <- as.integer(nrow(X) / 3)
for (i in 1:10) {
  set.seed(i)
  batch <- sample(nrow(X),
      size = batch_size, replace=TRUE)
  partial_fit_logistic(model, X, y)
}

### Check classification accuracy
cat(sprintf(
  "Accuracy after 10 iterations: %.2f%%\n",
  100 * mean(
    predict(model, X, type = "class") == y)
  ))


### Example with formula interface
iris_df <- iris
levels(iris_df$Species) <- c("setosa", "other", "other")

### Initialize model with default parameters
model <- stochastic.logistic.regression(Species ~ .,
  pos_class="setosa")

### Fit to 10 randomly-subsampled batches
batch_size <- as.integer(nrow(iris_df) / 3)
for (i in 1:10) {
  set.seed(i)
  batch <- sample(nrow(iris_df),
      size=batch_size, replace=TRUE)
  partial_fit_logistic(model, iris_df)
}
cat(sprintf(
  "Accuracy after 10 iterations: %.2f%%\n",
  100 * mean(
    predict(
      model, iris_df, type = "class") == iris_df$Species
      )
  ))

Print general info about stochastic logistic regression object

Description

Same as 'print' function. To check the fitted coefficients use function 'coef'.

Usage

## S3 method for class 'stoch_logistic'
summary(object, ...)
## S3 method for class 'stoch_logistic'
summary(object, ...)

Arguments

`object`	A 'stoch_logistic' object as output by function 'stochastic.logistic.regression'.
`...`	Not used.

Update objective function value (adaQN)

Description

Update the (expected) value of the objective function in an 'adaQN_free' object, after it has been requested by the optimizer (do NOT update it otherwise).

Usage

update_fun(optimizer, fun)
update_fun(optimizer, fun)

Arguments

`optimizer`	An 'adaQN_free' object which after the last run had requested a new function evaluation.
`fun`	Function as evaluated (in expectation) on the values of 'x' that were returned in the request.

Value

No return value (object is updated in-place).

Update gradient (oLBFGS, SQN, adaQN)

Description

Update the (expected) gradient in an optimizer from this package, after it has been requested by the optimizer (do NOT update it otherwise).

Usage

update_gradient(optimizer, gradient)
update_gradient(optimizer, gradient)

Arguments

`optimizer`	A free-mode optimizer from this package ('oLBFGS_free', 'SQN_free', 'adaQN_free') which after the last run had requested a new gradient evaluation..
`gradient`	The (expected value of the) gradient as evaluated on the values of 'x' that were returned in the request. Must be a numeric vector.

Value

No return value (object is updated in-place).

Update Hessian-vector product (SQN)

Description

Update the (expected) values of the Hessian-vector product in an 'SQN_free' object, after it has been requested by the optimizer (do NOT update it otherwise).

Usage

update_hess_vec(optimizer, hess_vec)
update_hess_vec(optimizer, hess_vec)

Arguments

`optimizer`	An 'SQN_free' optimizer which after the last run had requested a new Hessian-vector evaluation.
`hess_vec`	The (expected) value of the Hessian evaluated at the values of 'x' that were returned in the request, multiplied by the vector that was returned in the same request. Must be a numeric vector.

Value

No return value (object is updated in-place).

Package 'stochQN'

Help Index

adaQN guided optimizer

Description

Usage

Arguments

Value

References

See Also

Examples

adaQN Free-Mode Optimizer

Description

Usage

Arguments

Value

References

See Also

Examples

Retrieve fitted coefficients from stochastic logistic regression object

Description

Usage

Arguments

Value

See Also

Get current values of the optimization variables

Description

Usage

Arguments

Value

See Also

Get current iteration number from the optimizer object

Description

Usage

Arguments

Value

See Also

oLBFGS guided optimizer

Description

Usage

Arguments

Value

References

See Also

Examples

oLBFGS Free-Mode Optimizer

Description

Usage

Arguments

Value

References

See Also

Examples

Partial fit stochastic model to new data

Description

Usage

Arguments

Value

See Also

Update stochastic logistic regression model with new batch of data

Description

Usage

Arguments

Value

See Also

Prediction function for stochastic logistic regression

Description

Usage

Arguments

Value

See Also

Predict function for stochastic optimizer object

Description

Usage

Arguments

See Also

Print summary info about adaQN guided-mode object

Description

Usage

Arguments

Print summary info about adaQN free-mode object