For two group models (i.e., treatment and control group with no covariates), we denote the parameter for the treatment group by μ_t and the parameter for the control group by μ_c. The default null and alternative hypotheses are given by H₀ : μ_t − μ_c ≥ δ and H₁ : μ_t − μ_c < δ, where δ is a prespecified constant.

We use the following definition of Bayesian power / type I error rate.

Let Θ₀ and Θ₁ denote the parameter spaces corresponding to H₀ and H₁. Let y⁽ⁿ⁾ denote the simulated current data associated with a sample size of n and let θ = (μ_t, μ_c, τ_c) denote the model parameters. Let π^(s)(θ) denote the sampling prior and let π^(f)(θ) denote the fitting prior. The sampling prior is used to generate the hypothetical data while the fitting prior is used to fit the model after the data is generated. Let π₀^(s)(θ) denote a sampling prior that only puts mass in the null region, i.e., θ ⊂ Θ₀. Let π₁^(s)(θ) denote a sampling prior that only puts mass in the alternative region, i.e., θ ⊂ Θ₁. To determine Bayesian sample size, we estimate the quantity β_sj⁽ⁿ⁾ = E_s[I{P(μ_t − μ_c < δ|y⁽ⁿ⁾, π^(f)) ≥ γ}] where j = 0 or 1, corresponding to the expectation taken with respect to π₀^(s)(θ) or π₁^(s)(θ). The constant γ > 0 is a prespecified posterior probability threshold for rejecting the null hypothesis (e.g., 0.975). The probability is computed with respect to the posterior distribution given the simulated data y⁽ⁿ⁾ and the fitting prior π^(f)(θ), and the expectation is taken with respect to the marginal distribution of y⁽ⁿ⁾ defined based on the sampling prior π^(s)(θ). Then β_s0⁽ⁿ⁾ corresponding to π^(s)(θ) = π₀^(s)(θ) is the Bayesian type I error rate, while β_s1⁽ⁿ⁾ corresponding to π^(s)(θ) = π₁^(s)(θ) is the Bayesian power.

1. Two Group Cases with Binary Outcomes

We first demonstrate a model for binary outcomes for treatment and control groups with no covariates.

We consider the non-inferiority design application of Chen et al. (2011). The goal was to design a trial to evaluate a new generation of drug-eluting stent (DES) (“test device”) with the first generation of DES (“control device”). The primary endpoint is the 12-month Target Lesion Failure (TLF) (binary).

Historical information can be borrowed from two previously conducted trials involving the first generation of DES. The table below summarizes the historical data.

Summary of the historical data.
	% TLF (# of failure/sample size)
Historical Trial 1	8.2% (44/535)
Historical Trial 2	10.9% (33/304)

Let y_t^(n_t) = (y_t1, ⋯, y_{tn_t}) and y_c^(n_c) = (y_c1, ⋯, y_{cn_c}) denote the responses from the current trial for the test device and the control device, respectively. The total sample size is n = n_t + n_c.

We assume the i-th observation from the test group y_ti follows Bern(μ_t), and the i-th observation from the control group y_ci follows Bern(μ_c).

We will illustrate Bayesian sample size determination (SSD) incorporating historical data using the power prior with fixed a₀ and the normalized power for a₀ modeled as random.

The hypotheses for non-inferiority testing are H₀ : μ_t − μ_c ≥ δ and H₁ : μ_t − μ_c < δ, where δ is a prespecified non-inferiority margin. We set δ = 4.1%.

We choose beta(10⁻⁴, 10⁻⁴) for the initial prior for μ_c, which performs similarly to the uniform improper initial prior for $\log\left(\frac{\mu_c}{1-\mu_c}\right)$ used in Chen et al. (2011) in terms of operating characteristics.

Power is computed under the assumption that μ_t = μ_c and type I error rate is computed under the assumption that μ_t = μ_c + δ.

For sampling priors, a point mass prior at μ_c = 9.2% is used for π^(s)(μ_c) where 9.2% is the pooled proportion for the two historical control datasets, and a point mass prior at μ_t = μ_c is used for π^(s)(μ_t).

For all computations, we use N = 10, 000, $\frac{n_t}{n_c} = 3$, and γ = 0.95.

1.1 Power Prior with Fixed a₀

When a₀ is fixed, the historical matrix is defined where each row represents a historical dataset, and the three columns represent the sum of responses, sample size and a₀, respectively, of the historical control data. We use a₀₁ = a₀₂ = 0.3.


historical <- matrix(0, ncol=3, nrow=2)
historical[1,] <- c(44, 535, 0.3)
historical[2,] <- c(33, 304, 0.3)

We consider n_t values ranging from 600 to 1000 to achieve the desired power of 0.8.

Since point mass sampling priors are used for μ_t and μ_c, samp.prior.mu.t and samp.prior.mu.c are both scalars.

For Bernoulli outcomes, beta initial priors are used for μ_t and μ_c, with hyperparameters specified by prior.mu.t.shape1, prior.mu.t.shape2, prior.mu.c.shape1 and prior.mu.c.shape2.


library(BayesPPD)

set.seed(1)
n.t_vals <- seq(from=600, to=1000, by=50)
powers <- NULL

for(i in 1:length(n.t_vals)){
  n.t <- n.t_vals[i]
  results <- power.two.grp.fixed.a0(data.type="Bernoulli", 
      n.t=n.t, n.c=round(n.t/3), historical=historical,
      samp.prior.mu.t=0.092, samp.prior.mu.c=0.092,
      prior.mu.t.shape1=0.0001, prior.mu.t.shape2=0.0001, 
      prior.mu.c.shape1=0.0001,prior.mu.c.shape2=0.0001,
      delta=0.041, N=10000)
  power <- results$`power/type I error`
  powers <- c(powers, power)
}

powers
#> [1] 0.7819 0.8112 0.8220 0.8383 0.8588 0.8763 0.8865 0.8922 0.9084

We can see that a sample size of 650 is required to achieve a power of at least 0.8.

A power curve is plotted below for sample sizes ranging from 600 to 1000.

library(ggplot2)

df <- data.frame(sample_size=n.t_vals, power=powers)
ggplot(data=df, aes(x=sample_size, y=powers)) +
  geom_smooth(method = lm, formula = y ~ x, se = FALSE) +
  geom_point() +
  xlab("Sample Size") +
  ylab("Power")

We then compute the type I error rate for these sample sizes.

Since the type I error rate is computed under the assumption that μ_t = μ_c + δ, we use a point mass at μ_c = 9.2% for the sampling prior for μ_c, and a point mass at μ_t = 9.2% + 4.1% for the sampling prior for μ_t.

The following type I error rate calculations match the results given in Table 2 of Chen et al. (2011).

TIEs <- NULL

for(i in 1:length(n.t_vals)){
  n.t <- n.t_vals[i]
  results <- power.two.grp.fixed.a0(data.type="Bernoulli", 
      n.t=n.t, n.c=round(n.t/3), historical=historical,
      samp.prior.mu.t=0.092+0.041, samp.prior.mu.c=0.092,
      prior.mu.t.shape1=0.0001, prior.mu.t.shape2=0.0001, 
      prior.mu.c.shape1=0.0001,prior.mu.c.shape2=0.0001,
      delta=0.041, N=10000)
  TIE <- results$`power/type I error`
  TIEs <- c(TIEs, TIE)
}

TIEs
#> [1] 0.0275 0.0299 0.0310 0.0290 0.0307 0.0313 0.0295 0.0300 0.0316

1.2 Normalized Power Prior (a₀ Modeled as Random)

When a₀ is modeled as random, the normalized power prior is used and the priors for a₀₁ and a₀₂ are beta(1,1), as in Chen et al. (2011). We run 10,000 iterations of the slice sampler. We use the default settings for the upper limits, lower limits and slice widths for a₀₁ and a₀₂. The same initial priors and sampling priors are used as in the fixed a₀ case.

When a₀ is modeled as random, the historical matrix is defined where each row represents a historical dataset, and the two columns represent the sum of the responses and the sample size, respectively.

historical <- matrix(0, ncol=2, nrow=2)
historical[1,] <- c(44, 535)
historical[2,] <- c(33, 304)

The code below computes the power when n_t = 750.

n.t <- 750
results <- power.two.grp.random.a0(data.type="Bernoulli", 
      n.t=n.t, n.c=round(n.t/3),historical=historical,
      samp.prior.mu.t=0.092, samp.prior.mu.c=0.092,
      prior.mu.t.shape1=0.0001, prior.mu.t.shape2=0.0001, 
      prior.mu.c.shape1=0.0001,prior.mu.c.shape2=0.0001,
      prior.a0.shape1=1,prior.a0.shape2=1,
      delta=0.041, gamma=0.95,
      nMC=10000, nBI=250, N=10000)
summary(results)

2. Two Group Cases with Normally Distributed Outcomes

We now demonstrate a model for normally distributed outcomes for treatment and control groups with no covariates. We use simulated data for this example.

We assume the i-th observation from the treatment group y_ti follows N(μ_t, τ⁻¹) and the i-th observation from the control group y_ci follows N(μ_c, τ⁻¹), where τ is the precision parameter for the current data.

The null hypothesis is H₀ : μ_t − μ_c ≥ δ. We set δ = 0.

We assume the treatment group sample size (n_t) and the control group sample size (n_c) are both 100.

2.1 Power Prior with Fixed a₀

First, we assume a₀ is fixed. We simulate three historical datasets. For normally distributed data, the historical matrix is defined where each row represents a historical dataset, and the four columns represent the sum of the responses, the sample size, the sample variance and a₀, respectively.

data.type <- "Normal"
n.t <- 100
n.c <- 100

# Simulate three historical datasets
K <- 3
historical <- matrix(0, ncol=4, nrow=K)
# The columns are the sum of the responses, the sample size, the sample variance and a_0
historical[1,] <- c(50, 50, 1, 0.3)
historical[2,] <- c(30, 50, 1, 0.5)
historical[3,] <- c(20, 50, 1, 0.7)

To calculate power, we can provide the sampling prior of μ_t and μ_c such that the mass of μ_t − μ_c < 0. We generate the sampling prior for the variance parameter from a Gamma(1, 1) distribution.

# Generate sampling priors
set.seed(1)
samp.prior.mu.t <- rnorm(50000)
samp.prior.mu.c <- rnorm(50000)

sub_ind <- which(samp.prior.mu.t < samp.prior.mu.c) 
# Here, mass is put on the alternative region, so power is calculated. 
samp.prior.mu.t <- samp.prior.mu.t[sub_ind]
samp.prior.mu.c <- samp.prior.mu.c[sub_ind]
samp.prior.var.t <- rgamma(100, 1, 1)
samp.prior.var.c <- rgamma(100, 1, 1)

We run 10, 000 iterations of the Gibbs sampler for N = 100 simulated datasets. Note that N should be larger in practice.

set.seed(1)
results <- power.two.grp.fixed.a0(data.type=data.type, n.t=n.t, n.c=n.t, historical=historical,  
           samp.prior.mu.t=samp.prior.mu.t, samp.prior.mu.c=samp.prior.mu.c, 
           samp.prior.var.t=samp.prior.var.t, samp.prior.var.c=samp.prior.var.c,
           delta=0, nMC=10000, nBI=250, N=100) 

summary(results)
#>      average posterior mean   bias
#> mu_t                 -0.580 -0.007
#> mu_c                  0.535 -0.026
#> The power/type I error rate is  0.79 .
#> The average of the posterior probabilities of P(mu_t - mu_c < delta) is 0.842 .

Next, to calculate type I error, we can provide the sampling prior of μ_t and μ_c such that the mass of μ_t − μ_c > = 0.

# Generate sampling priors
set.seed(1)
samp.prior.mu.t <- rnorm(50000)
samp.prior.mu.c <- rnorm(50000)

sub_ind <- which(samp.prior.mu.t >= samp.prior.mu.c) 
# Here, mass is put on the null region, so type I error rate is calculated. 
samp.prior.mu.t <- samp.prior.mu.t[sub_ind]
samp.prior.mu.c <- samp.prior.mu.c[sub_ind]

set.seed(1)
results <- power.two.grp.fixed.a0(data.type=data.type, n.t=n.t, n.c=n.t, historical=historical,  
           samp.prior.mu.t=samp.prior.mu.t, samp.prior.mu.c=samp.prior.mu.c, 
           samp.prior.var.t=samp.prior.var.t, samp.prior.var.c=samp.prior.var.c,
           delta=0, nMC=10000, nBI=250, N=100) 

summary(results)
#>      average posterior mean   bias
#> mu_t                  0.488 -0.079
#> mu_c                 -0.454  0.111
#> The power/type I error rate is  0.14 .
#> The average of the posterior probabilities of P(mu_t - mu_c < delta) is 0.186 .

2.2 Normalized Power Prior (a₀ Modeled as Random)

Next, we model a₀ as random with the normalized power prior. We simulate three historical datasets. Here, the historical matrix is defined where each row represents a historical dataset, and the three columns represent the sum of the responses, the sample size, and the sample variance, respectively.

data.type <- "Normal"
n.t <- 100
n.c <- 100

# Simulate three historical datasets
K <- 3
historical <- matrix(0, ncol=3, nrow=K)
# The columns are the sum of the responses, the sample size, and the sample variance 
historical[1,] <- c(50, 50, 1)
historical[2,] <- c(30, 50, 1)
historical[3,] <- c(20, 50, 1)

# Generate sampling priors
set.seed(1)
samp.prior.mu.t <- rnorm(50000)
samp.prior.mu.c <- rnorm(50000)

sub_ind <- which(samp.prior.mu.t < samp.prior.mu.c) 
# Here, mass is put on the alternative region, so power is calculated. 
samp.prior.mu.t <- samp.prior.mu.t[sub_ind]
samp.prior.mu.c <- samp.prior.mu.c[sub_ind]
samp.prior.var.t <- rgamma(100, 1, 1)
samp.prior.var.c <- rgamma(100, 1, 1)

We use the default prior on a₀, the uniform prior. The average posterior means of a₀ and τ are also returned below. We run 10, 000 iterations of the Gibbs sampler (for μ_c) and the slice sampler (for a₀) for N = 100 simulated datasets. Note that N should be larger in practice.

set.seed(1)
results <- power.two.grp.random.a0(data.type=data.type, n.t=n.t, n.c=n.t, historical=historical,  
           samp.prior.mu.t=samp.prior.mu.t, samp.prior.mu.c=samp.prior.mu.c, 
           samp.prior.var.t=samp.prior.var.t, samp.prior.var.c=samp.prior.var.c,
           delta=0, nMC=10000, nBI=250, N=100) 

summary(results)
#>      average posterior mean   bias
#> mu_t                 -0.646 -0.074
#> mu_c                  0.614  0.053
#> The power/type I error rate is  0.77 .
#> The average of the posterior probabilities of P(mu_t - mu_c < delta) is 0.823 .
results$`average posterior means of a0`
#>           [,1]
#> [1,] 0.1752784
#> [2,] 0.1486350
#> [3,] 0.1751385
results$`average posterior mean of tau`
#> [1] 3.64496

Next, to calculate type I error, we can provide the sampling prior of μ_t and μ_c such that the mass of μ_t − μ_c > = 0.

# Generate sampling priors
set.seed(1)
samp.prior.mu.t <- rnorm(50000)
samp.prior.mu.c <- rnorm(50000)

sub_ind <- which(samp.prior.mu.t >= samp.prior.mu.c) 
# Here, mass is put on the null region, so type I error rate is calculated. 
samp.prior.mu.t <- samp.prior.mu.t[sub_ind]
samp.prior.mu.c <- samp.prior.mu.c[sub_ind]

set.seed(1)
results <- power.two.grp.random.a0(data.type=data.type, n.t=n.t, n.c=n.t, historical=historical,  
           samp.prior.mu.t=samp.prior.mu.t, samp.prior.mu.c=samp.prior.mu.c, 
           samp.prior.var.t=samp.prior.var.t, samp.prior.var.c=samp.prior.var.c,
           delta=0, nMC=10000, nBI=250, N=100) 

summary(results)
#>      average posterior mean   bias
#> mu_t                  0.621  0.053
#> mu_c                 -0.597 -0.032
#> The power/type I error rate is  0.12 .
#> The average of the posterior probabilities of P(mu_t - mu_c < delta) is 0.151 .

- 1. Two Group Cases with Binary Outcomes
  - 1.1 Power Prior with Fixed a₀
  - 1.2 Normalized Power Prior (a₀ Modeled as Random)
- 2. Two Group Cases with Normally Distributed Outcomes
  - 2.1 Power Prior with Fixed a₀
  - 2.2 Normalized Power Prior (a₀ Modeled as Random)

Bayesian Sample Size Determination for Two Group Models (Binary and Normal Outcomes)

1. Two Group Cases with Binary Outcomes

1.1 Power Prior with Fixed a0

1.2 Normalized Power Prior (a0 Modeled as Random)

2. Two Group Cases with Normally Distributed Outcomes

2.1 Power Prior with Fixed a0

2.2 Normalized Power Prior (a0 Modeled as Random)

1.1 Power Prior with Fixed a₀

1.2 Normalized Power Prior (a₀ Modeled as Random)

2.1 Power Prior with Fixed a₀

2.2 Normalized Power Prior (a₀ Modeled as Random)