Title: | Bandit-Based Experiments and Policy Evaluation |
---|---|
Description: | Frequentist inference on adaptively generated data. The methods implemented are based on Zhan et al. (2021) <doi:10.48550/arXiv.2106.02029> and Hadad et al. (2021) <doi:10.48550/arXiv.1911.02768>. For illustration, several functions for simulating non-contextual and contextual adaptive experiments using Thompson sampling are also supplied. |
Authors: | Molly Offer-Westort [aut, cre, cph] , Yinghui Zhou [aut] , Ruohan Zhan [aut] |
Maintainer: | Molly Offer-Westort <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2024-11-29 13:56:41 UTC |
Source: | CRAN |
This function checks if the number of observations is greater than 1, which is required for conducting inference.
.check_A(A)
.check_A(A)
A |
An integer representing the number of observations. |
Returns NULL if the number of observations is valid; otherwise, throws an error.
This function checks if the first batch size is greater than or equal to the number of treatment arms.
.check_first_batch(batch_sizes, ys)
.check_first_batch(batch_sizes, ys)
batch_sizes |
A numeric vector specifying batch sizes. |
ys |
A matrix of counterfactual conditions. |
Returns NULL if the first batch size is valid; otherwise, throws an error.
This function checks the dimensional compatibility of 'gammahats' and 'contextual_probs'. It validates the dimensions and types based on whether the probability objects are contextual or non-contextual.
.check_shape(gammahats, contextual_probs)
.check_shape(gammahats, contextual_probs)
gammahats |
A matrix representing estimates. |
contextual_probs |
An object representing probabilities, either a matrix (non-contextual) or an array (contextual). |
Returns NULL if the shapes and types are valid; otherwise, throws an error.
Estimates the value of a policy based on AIPW scores and a policy matrix using non-contextual
adaptive weighting. If evalwts
is not provided, uses equal weights for all observations.
aw_estimate(scores, policy, evalwts = NULL)
aw_estimate(scores, policy, evalwts = NULL)
scores |
Numeric matrix. AIPW scores, shape |
policy |
Numeric matrix. Policy matrix |
evalwts |
Optional numeric vector. Non-contextual adaptive weights |
Numeric scalar. Estimated policy value.
scores <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy <- matrix(c(0.2, 0.3, 0.5, 0.6, 0.1, 0.3, 0.4, 0.5, 0.1, 0.2, 0.7, 0.1), ncol = 3, byrow = TRUE) aw_estimate(scores = scores, policy = policy, evalwts = c(0.5, 1, 0.5, 1.5))
scores <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy <- matrix(c(0.2, 0.3, 0.5, 0.6, 0.1, 0.3, 0.4, 0.5, 0.1, 0.2, 0.7, 0.1), ncol = 3, byrow = TRUE) aw_estimate(scores = scores, policy = policy, evalwts = c(0.5, 1, 0.5, 1.5))
Computes AIPW/doubly robust scores based on observed rewards, pulled arms, and inverse
probability scores. If mu_hat
is provided, compute AIPW scores, otherwise compute IPW scores.
aw_scores(yobs, ws, balwts, K, mu_hat = NULL)
aw_scores(yobs, ws, balwts, K, mu_hat = NULL)
yobs |
Numeric vector. Observed rewards. Must not contain NA values. |
ws |
Integer vector. Pulled arms. Must not contain NA values. Length must match |
balwts |
Numeric matrix. Inverse probability score |
K |
Integer. Number of arms. Must be a positive integer. |
mu_hat |
Optional numeric matrix. Plug-in estimator of arm outcomes, shape |
Numeric matrix. AIPW scores, shape [A, K]
.
aw_scores(yobs = c(0.5, 1, 0, 1.5), ws = c(1, 2, 2, 3), balwts = matrix(c(0.5, 2, 1, 0.5, 1, 1.5, 0.5, 1.5, 2, 1.5, 0.5, 1), ncol = 3), K = 3, mu_hat = matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3))
aw_scores(yobs = c(0.5, 1, 0, 1.5), ws = c(1, 2, 2, 3), balwts = matrix(c(0.5, 2, 1, 0.5, 1, 1.5, 0.5, 1.5, 2, 1.5, 0.5, 1), ncol = 3), K = 3, mu_hat = matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3))
Computes the variance of a policy value estimate based on AIPW scores, a policy matrix, and non-contextual adaptive weights.
aw_var(scores, estimate, policy, evalwts = NULL)
aw_var(scores, estimate, policy, evalwts = NULL)
scores |
Numeric matrix. AIPW scores, shape |
estimate |
Numeric scalar. Policy value estimate. |
policy |
Numeric matrix. Policy matrix |
evalwts |
Optional numeric vector. Non-contextual adaptive weights |
Numeric scalar. Variance of policy value estimate.
scores <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy <- matrix(c(0.2, 0.3, 0.5, 0.6, 0.1, 0.3, 0.4, 0.5, 0.1, 0.2, 0.7, 0.1), ncol = 3, byrow = TRUE) estimate <- aw_estimate(scores = scores, policy = policy, evalwts = c(0.5, 1, 0.5, 1.5)) aw_var(scores = scores, estimate = estimate, policy = policy, evalwts = c(0.5, 1, 0.5, 1.5))
scores <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy <- matrix(c(0.2, 0.3, 0.5, 0.6, 0.1, 0.3, 0.4, 0.5, 0.1, 0.2, 0.7, 0.1), ncol = 3, byrow = TRUE) estimate <- aw_estimate(scores = scores, policy = policy, evalwts = c(0.5, 1, 0.5, 1.5)) aw_var(scores = scores, estimate = estimate, policy = policy, evalwts = c(0.5, 1, 0.5, 1.5))
Calculates the inverse probability score of pulling arms, given the actions taken and the true probabilities of each arm being chosen.
calculate_balwts(ws, probs)
calculate_balwts(ws, probs)
ws |
Integer vector. Indicates which arm was chosen for observations at each time |
probs |
Numeric matrix or array. True probabilities of each arm being chosen at each time step. Shape |
A matrix or array containing the inverse probability score of pulling arms.
set.seed(123) A <- 5 K <- 3 ws <- sample(1:K, A, replace = TRUE) probs <- matrix(runif(A * K), nrow = A, ncol = K) balwts <- calculate_balwts(ws, probs)
set.seed(123) A <- 5 K <- 3 ws <- sample(1:K, A, replace = TRUE) probs <- matrix(runif(A * K), nrow = A, ncol = K) balwts <- calculate_balwts(ws, probs)
Computes the estimate and variance of a policy evaluation based on adaptive weights, AIPW scores, and a policy matrix.
calculate_continuous_X_statistics(h, gammahat, policy)
calculate_continuous_X_statistics(h, gammahat, policy)
h |
Numeric matrix. Adaptive weights |
gammahat |
Numeric matrix. AIPW scores, shape |
policy |
Numeric matrix. Policy matrix |
Named numeric vector with elements estimate
and var
, representing the estimated
policy value and the variance of the estimate, respectively.
h <- matrix(c(0.4, 0.3, 0.2, 0.1, 0.2, 0.3, 0.3, 0.2, 0.5, 0.3, 0.2, 0.1, 0.1, 0.2, 0.1, 0.6), ncol = 4, byrow = TRUE) scores <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy <- matrix(c(0.2, 0.3, 0.5, 0.6, 0.1, 0.3, 0.4, 0.5, 0.1, 0.2, 0.7, 0.1), ncol = 3, byrow = TRUE) gammahat <- scores - policy calculate_continuous_X_statistics(h = h, gammahat = gammahat, policy = policy)
h <- matrix(c(0.4, 0.3, 0.2, 0.1, 0.2, 0.3, 0.3, 0.2, 0.5, 0.3, 0.2, 0.1, 0.1, 0.2, 0.1, 0.6), ncol = 4, byrow = TRUE) scores <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy <- matrix(c(0.2, 0.3, 0.5, 0.6, 0.1, 0.3, 0.4, 0.5, 0.1, 0.2, 0.7, 0.1), ncol = 3, byrow = TRUE) gammahat <- scores - policy calculate_continuous_X_statistics(h = h, gammahat = gammahat, policy = policy)
Draws arms from a LinTS or non-contextual TS agent for multi-armed bandit problems.
draw_thompson(model, start, end, xs = NULL)
draw_thompson(model, start, end, xs = NULL)
model |
List. Contains the parameters of the model, generated by |
start |
Integer. Starting index of observations for which arms are to be drawn. Must be a positive integer. |
end |
Integer. Ending index of the observations for which arms are to be drawn. Must be an integer greater than or equal to |
xs |
Optional matrix. Covariates of shape |
A list containing the drawn arms (w
) and their corresponding probabilities (ps
).
set.seed(123) model <- LinTSModel(K = 5, p = 3, floor_start = 1/5, floor_decay = 0.9, num_mc = 100, is_contextual = TRUE) draws <- draw_thompson(model = model, start = 1, end = 10, xs = matrix(rnorm(30), ncol = 3))
set.seed(123) model <- LinTSModel(K = 5, p = 3, floor_start = 1/5, floor_decay = 0.9, num_mc = 100, is_contextual = TRUE) draws <- draw_thompson(model = model, start = 1, end = 10, xs = matrix(rnorm(30), ncol = 3))
Computes the estimate and variance of a policy evaluation based on non-contextual weights, AIPW scores, and a policy matrix.
estimate(w, gammahat, policy)
estimate(w, gammahat, policy)
w |
Numeric vector. Non-contextual weights, length |
gammahat |
Numeric matrix. AIPW scores, shape |
policy |
Numeric matrix. Policy matrix |
Named numeric vector with elements estimate
and var
, representing the estimated
policy value and the variance of the estimate, respectively.
w <- c(0.5, 1, 0.5, 1.5) scores <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy <- matrix(c(0.2, 0.3, 0.5, 0.6, 0.1, 0.3, 0.4, 0.5, 0.1, 0.2, 0.7, 0.1), ncol = 3, byrow = TRUE) gammahat <- scores - policy estimate(w = w, gammahat = gammahat, policy = policy)
w <- c(0.5, 1, 0.5, 1.5) scores <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy <- matrix(c(0.2, 0.3, 0.5, 0.6, 0.1, 0.3, 0.4, 0.5, 0.1, 0.2, 0.7, 0.1), ncol = 3, byrow = TRUE) gammahat <- scores - policy estimate(w = w, gammahat = gammahat, policy = policy)
Generates covariates and potential outcomes for a classification dataset.
generate_bandit_data(xs = NULL, y = NULL, noise_std = 1, signal_strength = 1)
generate_bandit_data(xs = NULL, y = NULL, noise_std = 1, signal_strength = 1)
xs |
Optional matrix. Covariates of shape |
y |
Optional vector. Labels of length |
noise_std |
Numeric. Standard deviation of the noise added to the potential outcomes. Default is |
signal_strength |
Numeric. Strength of the signal in the potential outcomes. Default is |
A list containing the generated data (xs
, ys
, muxs
, A
, p
, K
) and the true class probabilities (mus
).
data <- generate_bandit_data(xs = as.matrix(iris[,1:4]), y = as.numeric(iris[,5]), noise_std = 0.1, signal_strength = 1.0)
data <- generate_bandit_data(xs = as.matrix(iris[,1:4]), y = as.numeric(iris[,5]), noise_std = 0.1, signal_strength = 1.0)
Clips a numeric vector between two values.
ifelse_clip(lamb, x, y)
ifelse_clip(lamb, x, y)
lamb |
Numeric vector. Values to be clipped. |
x |
Numeric. Lower bound of the clip range. |
y |
Numeric. Upper bound of the clip range. |
Numeric vector. Clipped values.
lamb <- c(1, 2, 3, 4, 5) ifelse_clip(lamb, 2, 4)
lamb <- c(1, 2, 3, 4, 5) ifelse_clip(lamb, 2, 4)
Imposes a floor on the given array a, ensuring that its elements are greater than or equal to amin.
impose_floor(a, amin)
impose_floor(a, amin)
a |
Numeric vector. Must not contain NA values. |
amin |
Numeric. Minimum allowed value. Must be between 0 and 1. |
A numeric vector with the same length as a
, with the floor imposed on its elements.
a <- c(0.25, 0.25, 0.25, 0.25) imposed_a <- impose_floor(a = a, amin = 0.1)
a <- c(0.25, 0.25, 0.25, 0.25) imposed_a <- impose_floor(a = a, amin = 0.1)
Creates a linear Thompson Sampling model for multi-armed bandit problems.
LinTSModel( K, p = NULL, floor_start, floor_decay, num_mc = 100, is_contextual = TRUE )
LinTSModel( K, p = NULL, floor_start, floor_decay, num_mc = 100, is_contextual = TRUE )
K |
Integer. Number of arms. Must be a positive integer. |
p |
Integer. Dimension of the contextual vector, if |
floor_start |
Numeric. Specifies the initial value for the assignment probability floor. It ensures that at the start of the process, no assignment probability falls below this threshold. Must be a positive number. |
floor_decay |
Numeric. Decay rate of the floor. The floor decays with the number of observations in the experiment such that at each point in time, the applied floor is: |
num_mc |
Integer. Number of Monte Carlo simulations used to approximate the expected reward. Must be a positive integer. Default is 100. |
is_contextual |
Logical. Indicates whether the problem is contextual or not. Default is |
A list containing the parameters of the LinTSModel.
model <- LinTSModel(K = 5, p = 3, floor_start = 1/5, floor_decay = 0.9, num_mc = 100, is_contextual = TRUE)
model <- LinTSModel(K = 5, p = 3, floor_start = 1/5, floor_decay = 0.9, num_mc = 100, is_contextual = TRUE)
Calculates average response and differences in average response under counterfactual treatment policies.
Estimates are produced using provided inverse probability weighted (IPW) or augmented inverse probability weighted (AIPW) scores paired with various adaptive weighting schemes, as proposed in Hadad et al. (2021) and Zhan et al. (2021).
We briefly outline the target quantities:
For observations indexed , treatments
, we denote as
the potential outcome for the unit at time
under treatment
.
A policy
is a treatment assignment procedure that is the subject of evaluation, described in terms of treatment assignment probabilities for each subject to receive each counterfactual treatment.
We target estimation of average response under a specified policy:
The user may specify a list of list of policies to be evaluated, under policy1
.
Alternatively, they may estimate policy contrasts if policy0
is provided:
output_estimates( policy0 = NULL, policy1, contrasts = "combined", gammahat, probs_array, uniform = TRUE, non_contextual_minvar = TRUE, contextual_minvar = TRUE, non_contextual_stablevar = TRUE, contextual_stablevar = TRUE, non_contextual_twopoint = TRUE, floor_decay = 0 )
output_estimates( policy0 = NULL, policy1, contrasts = "combined", gammahat, probs_array, uniform = TRUE, non_contextual_minvar = TRUE, contextual_minvar = TRUE, non_contextual_stablevar = TRUE, contextual_stablevar = TRUE, non_contextual_twopoint = TRUE, floor_decay = 0 )
policy0 |
Optional matrix. Single policy probability matrix for contrast evaluation, dimensions |
policy1 |
List of matrices. List of counterfactual policy matrices for evaluation, dimensions |
contrasts |
Character. The method to estimate policy contrasts, either |
gammahat |
(A)IPW scores matrix with dimensions |
probs_array |
Numeric array. Probability matrix or array with dimensions |
uniform |
Logical. Estimate uniform weights. |
non_contextual_minvar |
Logical. Estimate non-contextual |
contextual_minvar |
Logical. Estimate contextual |
non_contextual_stablevar |
Logical. Estimate non-contextual |
contextual_stablevar |
Logical. Estimate contextual |
non_contextual_twopoint |
Logical. Estimate |
floor_decay |
Numeric. Floor decay parameter used in the calculation. Default is 0. |
A list of treatment effect estimates under different weighting schemes.
Hadad V, Hirshberg DA, Zhan R, Wager S, Athey S (2021). “Confidence intervals for policy evaluation in adaptive experiments.” Proceedings of the national academy of sciences, 118(15), e2014602118.
Zhan R, Hadad V, Hirshberg DA, Athey S (2021). “Off-policy evaluation via adaptive weighting with data from contextual bandits.” In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2125–2135.
set.seed(123) # In a non-contextual setting, generate example values for policy1, gammahat, and probs_array gammahat <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy0 <- matrix(c(1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0), ncol = 3, byrow = TRUE) policy1 <- list(matrix(c(0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0), ncol = 3, byrow = TRUE)) probs_array <- array(0, dim = c(4, 4, 3)) for (i in 1:4) { temp_vector <- runif(3) normalized_vector <- temp_vector / sum(temp_vector) probs_array[i, 1, ] <- normalized_vector } for (k in 1:3) { for (i in 1:4) { temp_vector <- runif(3) normalized_vector <- temp_vector / sum(temp_vector) probs_array[i, 2:4, k] <- normalized_vector } } estimates <- output_estimates(policy1 = policy1, policy0 = policy0, gammahat = gammahat, probs_array = probs_array) # plot plot_results <- function(result) { estimates <- result[, "estimate"] std.errors <- result[, "std.error"] labels <- rownames(result) # Define the limits for the x-axis based on estimates and std.errors xlims <- c(min(estimates - 2*std.errors), max(estimates + 2*std.errors)) # Create the basic error bar plot using base R invisible( plot(estimates, 1:length(estimates), xlim = xlims, xaxt = "n", xlab = "Coefficient Estimate", ylab = "", yaxt = "n", pch = 16, las = 1, main = "Coefficients and CIs") ) # Add y-axis labels invisible( axis(2, at = 1:length(estimates), labels = labels, las = 1, tick = FALSE, line = 0.5) ) # Add the x-axis values x_ticks <- x_ticks <- seq(from = round(xlims[1], .5), to = round(xlims[2], .5), by = 0.5) invisible( axis(1, at = x_ticks, labels = x_ticks) ) # Add error bars invisible( segments(estimates - std.errors, 1:length(estimates), estimates + std.errors, 1:length(estimates)) ) } sample_result <- estimates[[1]] op <- par(no.readonly = TRUE) par(mar=c(5, 12, 4, 2)) plot_results(sample_result) par(op)
set.seed(123) # In a non-contextual setting, generate example values for policy1, gammahat, and probs_array gammahat <- matrix(c(0.5, 0.8, 0.6, 0.3, 0.9, 0.2, 0.5, 0.7, 0.4, 0.8, 0.2, 0.6), ncol = 3, byrow = TRUE) policy0 <- matrix(c(1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0), ncol = 3, byrow = TRUE) policy1 <- list(matrix(c(0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0), ncol = 3, byrow = TRUE)) probs_array <- array(0, dim = c(4, 4, 3)) for (i in 1:4) { temp_vector <- runif(3) normalized_vector <- temp_vector / sum(temp_vector) probs_array[i, 1, ] <- normalized_vector } for (k in 1:3) { for (i in 1:4) { temp_vector <- runif(3) normalized_vector <- temp_vector / sum(temp_vector) probs_array[i, 2:4, k] <- normalized_vector } } estimates <- output_estimates(policy1 = policy1, policy0 = policy0, gammahat = gammahat, probs_array = probs_array) # plot plot_results <- function(result) { estimates <- result[, "estimate"] std.errors <- result[, "std.error"] labels <- rownames(result) # Define the limits for the x-axis based on estimates and std.errors xlims <- c(min(estimates - 2*std.errors), max(estimates + 2*std.errors)) # Create the basic error bar plot using base R invisible( plot(estimates, 1:length(estimates), xlim = xlims, xaxt = "n", xlab = "Coefficient Estimate", ylab = "", yaxt = "n", pch = 16, las = 1, main = "Coefficients and CIs") ) # Add y-axis labels invisible( axis(2, at = 1:length(estimates), labels = labels, las = 1, tick = FALSE, line = 0.5) ) # Add the x-axis values x_ticks <- x_ticks <- seq(from = round(xlims[1], .5), to = round(xlims[2], .5), by = 0.5) invisible( axis(1, at = x_ticks, labels = x_ticks) ) # Add error bars invisible( segments(estimates - std.errors, 1:length(estimates), estimates + std.errors, 1:length(estimates)) ) } sample_result <- estimates[[1]] op <- par(no.readonly = TRUE) par(mar=c(5, 12, 4, 2)) plot_results(sample_result) par(op)
Generates a plot of the cumulative assignment.
plot_cumulative_assignment(results, batch_sizes)
plot_cumulative_assignment(results, batch_sizes)
results |
List. Results of the experiment, including the actions taken ( |
batch_sizes |
Integer vector. Batch sizes used in the experiment. Must be positive integers. |
A plot of the cumulative assignment.
set.seed(123) A <- 1000 K <- 4 xs <- matrix(runif(A * K), nrow = A, ncol = K) ys <- matrix(rbinom(A * K, 1, 0.5), nrow = A, ncol = K) batch_sizes <- c(250, 250, 250, 250) results <- run_experiment(ys = ys, floor_start = 1/K, floor_decay = 0.9, batch_sizes = batch_sizes, xs = xs) plot_cumulative_assignment(results, batch_sizes)
set.seed(123) A <- 1000 K <- 4 xs <- matrix(runif(A * K), nrow = A, ncol = K) ys <- matrix(rbinom(A * K, 1, 0.5), nrow = A, ncol = K) batch_sizes <- c(250, 250, 250, 250) results <- run_experiment(ys = ys, floor_start = 1/K, floor_decay = 0.9, batch_sizes = batch_sizes, xs = xs) plot_cumulative_assignment(results, batch_sizes)
Initializes matrices needed for ridge regression to estimate the expected rewards of different arms.
ridge_init(p, K)
ridge_init(p, K)
p |
Integer. Number of covariates. Must be a positive integer. |
K |
Integer. Number of arms. Must be a positive integer. |
A list containing initialized matrices R_A
, R_Ainv
, b
, and theta
for each arm.
p <- 3 K <- 5 init <- ridge_init(p, K)
p <- 3 K <- 5 init <- ridge_init(p, K)
Computes leave-future-out ridge-basedn estimates of arm expected rewards based on provided data.
ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes, alpha = 1)
ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes, alpha = 1)
xs |
Matrix. Covariates of shape |
ws |
Integer vector. Indicates which arm was chosen for observations at each time |
yobs |
Numeric vector. Observed outcomes, length |
K |
Integer. Number of arms. Must be a positive integer. |
batch_sizes |
Integer vector. Sizes of batches in which data is processed. Must be positive integers. |
alpha |
Numeric. Ridge regression regularization parameter. Default is 1. |
A 3D array containing the expected reward estimates for each arm and each time t
, of shape [A, A, K]
.
set.seed(123) p <- 3 K <- 5 A <- 100 xs <- matrix(runif(A * p), nrow = A, ncol = p) ws <- sample(1:K, A, replace = TRUE) yobs <- runif(A) batch_sizes <- c(25, 25, 25, 25) muhat <- ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes) print(muhat)
set.seed(123) p <- 3 K <- 5 A <- 100 xs <- matrix(runif(A * p), nrow = A, ncol = p) ws <- sample(1:K, A, replace = TRUE) yobs <- runif(A) batch_sizes <- c(25, 25, 25, 25) muhat <- ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes) print(muhat)
Given previous matrices and a new observation, updates the matrices for ridge regression.
ridge_update(R_A, b, xs, t, yobs, alpha)
ridge_update(R_A, b, xs, t, yobs, alpha)
R_A |
Matrix. Current matrix |
b |
Numeric vector. Current vector |
xs |
Matrix. Covariates of shape |
t |
Integer. Current time or instance. |
yobs |
Numeric vector. Observed outcomes, length |
alpha |
Numeric. Ridge regression regularization parameter. Default is 1. |
A list containing updated matrices R_A
, R_Ainv
, b
, and theta
.
set.seed(123) p <- 3 K <- 5 init <- ridge_init(p, K) R_A <- init$R_A[[1]] b <- init$b[1, ] xs <- matrix(runif(10 * p), nrow = 10, ncol = p) yobs <- runif(10) t <- 1 alpha <- 1 updated <- ridge_update(R_A, b, xs, t, yobs[t], alpha)
set.seed(123) p <- 3 K <- 5 init <- ridge_init(p, K) R_A <- init$R_A[[1]] b <- init$b[1, ] xs <- matrix(runif(10 * p), nrow = 10, ncol = p) yobs <- runif(10) t <- 1 alpha <- 1 updated <- ridge_update(R_A, b, xs, t, yobs[t], alpha)
Runs a LinTS or non-contextual TS bandit experiment, given potential outcomes and covariates.
run_experiment( ys, floor_start, floor_decay, batch_sizes, xs = NULL, balanced = NULL )
run_experiment( ys, floor_start, floor_decay, batch_sizes, xs = NULL, balanced = NULL )
ys |
Matrix. Potential outcomes of shape |
floor_start |
Numeric. Specifies the initial value for the assignment probability floor. It ensures that at the start of the process, no assignment probability falls below this threshold. Must be a positive number. |
floor_decay |
Numeric. Decay rate of the floor. The floor decays with the number of observations in the experiment such that at each point in time, the applied floor is: |
batch_sizes |
Integer vector. Size of each batch. Must be positive integers. |
xs |
Optional matrix. Covariates of shape |
balanced |
Optional logical. Indicates whether to balance the batches. Default is |
A list containing the pulled arms (ws
), observed rewards (yobs
), assignment probabilities (probs
), and the fitted bandit model (fitted_bandit_model
).
set.seed(123) A <- 1000 K <- 4 xs <- matrix(runif(A * K), nrow = A, ncol = K) ys <- matrix(rbinom(A * K, 1, 0.5), nrow = A, ncol = K) batch_sizes <- c(250, 250, 250, 250) results <- run_experiment(ys = ys, floor_start = 1/K, floor_decay = 0.9, batch_sizes = batch_sizes, xs = xs)
set.seed(123) A <- 1000 K <- 4 xs <- matrix(runif(A * K), nrow = A, ncol = K) ys <- matrix(rbinom(A * K, 1, 0.5), nrow = A, ncol = K) batch_sizes <- c(250, 250, 250, 250) results <- run_experiment(ys = ys, floor_start = 1/K, floor_decay = 0.9, batch_sizes = batch_sizes, xs = xs)
Generates covariates and potential outcomes of a synthetic dataset for a simple tree model.
simple_tree_data( A, K = 5, p = 10, noise_std = 1, split = 1.676, signal_strength = 1, noise_form = "normal" )
simple_tree_data( A, K = 5, p = 10, noise_std = 1, split = 1.676, signal_strength = 1, noise_form = "normal" )
A |
Integer. Number of observations in the dataset. Must be a positive integer. |
K |
Integer. Number of arms. Must be a positive integer. |
p |
Integer. Number of covariates. Must be a positive integer. |
noise_std |
Numeric. Standard deviation of the noise added to the potential outcomes. Must be a non-negative number. |
split |
Numeric. Split point for creating treatment groups based on the covariates. |
signal_strength |
Numeric. Strength of the signal in the potential outcomes. |
noise_form |
Character. Distribution of the noise added to the potential outcomes. Can be either "normal" or "uniform". |
A list containing the generated data (xs
, ys
, muxs
) and the true potential outcome means (mus
).
set.seed(123) A <- 1000 K <- 4 # Number of treatment arms p <- 10 # Number of covariates synthetic_data <- simple_tree_data(A = A, K = K, p = p, noise_std = 1.0, split = 1.676, signal_strength = 1.0, noise_form = 'normal')
set.seed(123) A <- 1000 K <- 4 # Number of treatment arms p <- 10 # Number of covariates synthetic_data <- simple_tree_data(A = A, K = K, p = p, noise_std = 1.0, split = 1.676, signal_strength = 1.0, noise_form = 'normal')
Implements the stick breaking algorithm for calculating weights in the stable-var scheme.
stick_breaking(Z)
stick_breaking(Z)
Z |
Numeric array. Input array, shape |
Numeric array. Stick breaking weights, shape [A, K]
. Must not contain NA values.
set.seed(123) Z <- array(runif(10), dim = c(2, 5)) stick_breaking(Z)
set.seed(123) Z <- array(runif(10), dim = c(2, 5)) stick_breaking(Z)
Calculates the allocation ratio for a two-point stable-variance bandit, given the empirical mean and the discount parameter alpha.
twopoint_stable_var_ratio(A, e, alpha)
twopoint_stable_var_ratio(A, e, alpha)
A |
Integer. Size of the experiment. Must be a positive integer. |
e |
Numeric. Empirical mean. Must be a numeric value. |
alpha |
Numeric. Discount parameter. |
Numeric vector. Allocation ratio lambda.
# Calculate the allocation ratio for a two-point stable-variance bandit with e=0.1 and alpha=0.5 twopoint_stable_var_ratio(1000, 0.1, 0.5)
# Calculate the allocation ratio for a two-point stable-variance bandit with e=0.1 and alpha=0.5 twopoint_stable_var_ratio(1000, 0.1, 0.5)
Updates the parameters of a linear Thompson Sampling model for multi-armed bandit problems based on new observations.
update_thompson(ws, yobs, model, xs = NULL, ps = NULL, balanced = NULL)
update_thompson(ws, yobs, model, xs = NULL, ps = NULL, balanced = NULL)
ws |
Integer vector. Indicates which arm was chosen for observations at each time |
yobs |
Numeric vector. Observed outcomes, length |
model |
List. Contains the parameters of the LinTSModel. |
xs |
Optional matrix. Covariates of shape |
ps |
Optional matrix. Probabilities of selecting each arm for each observation, if the LinTSModel is balanced. Default is |
balanced |
Logical. Indicates whether to use balanced Thompson Sampling. Default is |
A list containing the updated parameters of the LinTSModel.
set.seed(123) model <- LinTSModel(K = 5, p = 3, floor_start = 1, floor_decay = 0.9, num_mc = 100, is_contextual = TRUE) A <- 1000 ws <- numeric(A) yobs <- numeric(A) model <- update_thompson(ws = ws, yobs = yobs, model = model)
set.seed(123) model <- LinTSModel(K = 5, p = 3, floor_start = 1, floor_decay = 0.9, num_mc = 100, is_contextual = TRUE) A <- 1000 ws <- numeric(A) yobs <- numeric(A) model <- update_thompson(ws = ws, yobs = yobs, model = model)