Bayesian Sample Size Determination for Two Group Models (Binary and Normal Outcomes)

For two group models (i.e., treatment and control group with no covariates), we denote the parameter for the treatment group by μt and the parameter for the control group by μc. The default null and alternative hypotheses are given by H0 : μt − μc ≥ δ and H1 : μt − μc < δ, where δ is a prespecified constant.

We use the following definition of Bayesian power / type I error rate.

Let Θ0 and Θ1 denote the parameter spaces corresponding to H0 and H1. Let y(n) denote the simulated current data associated with a sample size of n and let θ = (μt, μc, τc) denote the model parameters. Let π(s)(θ) denote the sampling prior and let π(f)(θ) denote the fitting prior. The sampling prior is used to generate the hypothetical data while the fitting prior is used to fit the model after the data is generated. Let π0(s)(θ) denote a sampling prior that only puts mass in the null region, i.e., θ ⊂ Θ0. Let π1(s)(θ) denote a sampling prior that only puts mass in the alternative region, i.e., θ ⊂ Θ1. To determine Bayesian sample size, we estimate the quantity βsj(n) = Es[I{P(μt − μc < δ|y(n), π(f)) ≥ γ}] where j = 0 or 1, corresponding to the expectation taken with respect to π0(s)(θ) or π1(s)(θ). The constant γ > 0 is a prespecified posterior probability threshold for rejecting the null hypothesis (e.g., 0.975). The probability is computed with respect to the posterior distribution given the simulated data y(n) and the fitting prior π(f)(θ), and the expectation is taken with respect to the marginal distribution of y(n) defined based on the sampling prior π(s)(θ). Then βs0(n) corresponding to π(s)(θ) = π0(s)(θ) is the Bayesian type I error rate, while βs1(n) corresponding to π(s)(θ) = π1(s)(θ) is the Bayesian power.

1. Two Group Cases with Binary Outcomes

We first demonstrate a model for binary outcomes for treatment and control groups with no covariates.

We consider the non-inferiority design application of Chen et al. (2011). The goal was to design a trial to evaluate a new generation of drug-eluting stent (DES) (“test device”) with the first generation of DES (“control device”). The primary endpoint is the 12-month Target Lesion Failure (TLF) (binary).

Historical information can be borrowed from two previously conducted trials involving the first generation of DES. The table below summarizes the historical data.

Summary of the historical data.
% TLF (# of failure/sample size)
Historical Trial 1 8.2% (44/535)
Historical Trial 2 10.9% (33/304)

Let yt(nt) = (yt1, ⋯, ytnt) and yc(nc) = (yc1, ⋯, ycnc) denote the responses from the current trial for the test device and the control device, respectively. The total sample size is n = nt + nc.

We assume the i-th observation from the test group yti follows Bern(μt), and the i-th observation from the control group yci follows Bern(μc).

We will illustrate Bayesian sample size determination (SSD) incorporating historical data using the power prior with fixed a0 and the normalized power for a0 modeled as random.

The hypotheses for non-inferiority testing are H0 : μt − μc ≥ δ and H1 : μt − μc < δ, where δ is a prespecified non-inferiority margin. We set δ = 4.1%.

We choose beta(10−4, 10−4) for the initial prior for μc, which performs similarly to the uniform improper initial prior for $\log\left(\frac{\mu_c}{1-\mu_c}\right)$ used in Chen et al. (2011) in terms of operating characteristics.

Power is computed under the assumption that μt = μc and type I error rate is computed under the assumption that μt = μc + δ.

For sampling priors, a point mass prior at μc = 9.2% is used for π(s)(μc) where 9.2% is the pooled proportion for the two historical control datasets, and a point mass prior at μt = μc is used for π(s)(μt).

For all computations, we use N = 10, 000, $\frac{n_t}{n_c} = 3$, and γ = 0.95.

1.1 Power Prior with Fixed a0

When a0 is fixed, the historical matrix is defined where each row represents a historical dataset, and the three columns represent the sum of responses, sample size and a0, respectively, of the historical control data. We use a01 = a02 = 0.3.


historical <- matrix(0, ncol=3, nrow=2)
historical[1,] <- c(44, 535, 0.3)
historical[2,] <- c(33, 304, 0.3)

We consider nt values ranging from 600 to 1000 to achieve the desired power of 0.8.

Since point mass sampling priors are used for μt and μc, samp.prior.mu.t and samp.prior.mu.c are both scalars.

For Bernoulli outcomes, beta initial priors are used for μt and μc, with hyperparameters specified by prior.mu.t.shape1, prior.mu.t.shape2, prior.mu.c.shape1 and prior.mu.c.shape2.


library(BayesPPD)

set.seed(1)
n.t_vals <- seq(from=600, to=1000, by=50)
powers <- NULL

for(i in 1:length(n.t_vals)){
  n.t <- n.t_vals[i]
  results <- power.two.grp.fixed.a0(data.type="Bernoulli", 
      n.t=n.t, n.c=round(n.t/3), historical=historical,
      samp.prior.mu.t=0.092, samp.prior.mu.c=0.092,
      prior.mu.t.shape1=0.0001, prior.mu.t.shape2=0.0001, 
      prior.mu.c.shape1=0.0001,prior.mu.c.shape2=0.0001,
      delta=0.041, N=10000)
  power <- results$`power/type I error`
  powers <- c(powers, power)
}

powers
#> [1] 0.7819 0.8112 0.8220 0.8383 0.8588 0.8763 0.8865 0.8922 0.9084

We can see that a sample size of 650 is required to achieve a power of at least 0.8.

A power curve is plotted below for sample sizes ranging from 600 to 1000.

library(ggplot2)

df <- data.frame(sample_size=n.t_vals, power=powers)
ggplot(data=df, aes(x=sample_size, y=powers)) +
  geom_smooth(method = lm, formula = y ~ x, se = FALSE) +
  geom_point() +
  xlab("Sample Size") +
  ylab("Power")

We then compute the type I error rate for these sample sizes.

Since the type I error rate is computed under the assumption that μt = μc + δ, we use a point mass at μc = 9.2% for the sampling prior for μc, and a point mass at μt = 9.2% + 4.1% for the sampling prior for μt.

The following type I error rate calculations match the results given in Table 2 of Chen et al. (2011).

TIEs <- NULL

for(i in 1:length(n.t_vals)){
  n.t <- n.t_vals[i]
  results <- power.two.grp.fixed.a0(data.type="Bernoulli", 
      n.t=n.t, n.c=round(n.t/3), historical=historical,
      samp.prior.mu.t=0.092+0.041, samp.prior.mu.c=0.092,
      prior.mu.t.shape1=0.0001, prior.mu.t.shape2=0.0001, 
      prior.mu.c.shape1=0.0001,prior.mu.c.shape2=0.0001,
      delta=0.041, N=10000)
  TIE <- results$`power/type I error`
  TIEs <- c(TIEs, TIE)
}

TIEs
#> [1] 0.0275 0.0299 0.0310 0.0290 0.0307 0.0313 0.0295 0.0300 0.0316

1.2 Normalized Power Prior (a0 Modeled as Random)

When a0 is modeled as random, the normalized power prior is used and the priors for a01 and a02 are beta(1,1), as in Chen et al. (2011). We run 10,000 iterations of the slice sampler. We use the default settings for the upper limits, lower limits and slice widths for a01 and a02. The same initial priors and sampling priors are used as in the fixed a0 case.

When a0 is modeled as random, the historical matrix is defined where each row represents a historical dataset, and the two columns represent the sum of the responses and the sample size, respectively.

historical <- matrix(0, ncol=2, nrow=2)
historical[1,] <- c(44, 535)
historical[2,] <- c(33, 304)

The code below computes the power when nt = 750.

n.t <- 750
results <- power.two.grp.random.a0(data.type="Bernoulli", 
      n.t=n.t, n.c=round(n.t/3),historical=historical,
      samp.prior.mu.t=0.092, samp.prior.mu.c=0.092,
      prior.mu.t.shape1=0.0001, prior.mu.t.shape2=0.0001, 
      prior.mu.c.shape1=0.0001,prior.mu.c.shape2=0.0001,
      prior.a0.shape1=1,prior.a0.shape2=1,
      delta=0.041, gamma=0.95,
      nMC=10000, nBI=250, N=10000)
summary(results)

2. Two Group Cases with Normally Distributed Outcomes

We now demonstrate a model for normally distributed outcomes for treatment and control groups with no covariates. We use simulated data for this example.

We assume the i-th observation from the treatment group yti follows N(μt, τ−1) and the i-th observation from the control group yci follows N(μc, τ−1), where τ is the precision parameter for the current data.

The null hypothesis is H0 : μt − μc ≥ δ. We set δ = 0.

We assume the treatment group sample size (nt) and the control group sample size (nc) are both 100.

2.1 Power Prior with Fixed a0

First, we assume a0 is fixed. We simulate three historical datasets. For normally distributed data, the historical matrix is defined where each row represents a historical dataset, and the four columns represent the sum of the responses, the sample size, the sample variance and a0, respectively.

data.type <- "Normal"
n.t <- 100
n.c <- 100

# Simulate three historical datasets
K <- 3
historical <- matrix(0, ncol=4, nrow=K)
# The columns are the sum of the responses, the sample size, the sample variance and a_0
historical[1,] <- c(50, 50, 1, 0.3)
historical[2,] <- c(30, 50, 1, 0.5)
historical[3,] <- c(20, 50, 1, 0.7)

To calculate power, we can provide the sampling prior of μt and μc such that the mass of μt − μc < 0. We generate the sampling prior for the variance parameter from a Gamma(1, 1) distribution.

# Generate sampling priors
set.seed(1)
samp.prior.mu.t <- rnorm(50000)
samp.prior.mu.c <- rnorm(50000)

sub_ind <- which(samp.prior.mu.t < samp.prior.mu.c) 
# Here, mass is put on the alternative region, so power is calculated. 
samp.prior.mu.t <- samp.prior.mu.t[sub_ind]
samp.prior.mu.c <- samp.prior.mu.c[sub_ind]
samp.prior.var.t <- rgamma(100, 1, 1)
samp.prior.var.c <- rgamma(100, 1, 1)

We run 10, 000 iterations of the Gibbs sampler for N = 100 simulated datasets. Note that N should be larger in practice.

set.seed(1)
results <- power.two.grp.fixed.a0(data.type=data.type, n.t=n.t, n.c=n.t, historical=historical,  
           samp.prior.mu.t=samp.prior.mu.t, samp.prior.mu.c=samp.prior.mu.c, 
           samp.prior.var.t=samp.prior.var.t, samp.prior.var.c=samp.prior.var.c,
           delta=0, nMC=10000, nBI=250, N=100) 

summary(results)
#>      average posterior mean   bias
#> mu_t                 -0.580 -0.007
#> mu_c                  0.535 -0.026
#> The power/type I error rate is  0.79 .
#> The average of the posterior probabilities of P(mu_t - mu_c < delta) is 0.842 .

Next, to calculate type I error, we can provide the sampling prior of μt and μc such that the mass of μt − μc >  = 0.

# Generate sampling priors
set.seed(1)
samp.prior.mu.t <- rnorm(50000)
samp.prior.mu.c <- rnorm(50000)

sub_ind <- which(samp.prior.mu.t >= samp.prior.mu.c) 
# Here, mass is put on the null region, so type I error rate is calculated. 
samp.prior.mu.t <- samp.prior.mu.t[sub_ind]
samp.prior.mu.c <- samp.prior.mu.c[sub_ind]

set.seed(1)
results <- power.two.grp.fixed.a0(data.type=data.type, n.t=n.t, n.c=n.t, historical=historical,  
           samp.prior.mu.t=samp.prior.mu.t, samp.prior.mu.c=samp.prior.mu.c, 
           samp.prior.var.t=samp.prior.var.t, samp.prior.var.c=samp.prior.var.c,
           delta=0, nMC=10000, nBI=250, N=100) 

summary(results)
#>      average posterior mean   bias
#> mu_t                  0.488 -0.079
#> mu_c                 -0.454  0.111
#> The power/type I error rate is  0.14 .
#> The average of the posterior probabilities of P(mu_t - mu_c < delta) is 0.186 .

2.2 Normalized Power Prior (a0 Modeled as Random)

Next, we model a0 as random with the normalized power prior. We simulate three historical datasets. Here, the historical matrix is defined where each row represents a historical dataset, and the three columns represent the sum of the responses, the sample size, and the sample variance, respectively.

data.type <- "Normal"
n.t <- 100
n.c <- 100

# Simulate three historical datasets
K <- 3
historical <- matrix(0, ncol=3, nrow=K)
# The columns are the sum of the responses, the sample size, and the sample variance 
historical[1,] <- c(50, 50, 1)
historical[2,] <- c(30, 50, 1)
historical[3,] <- c(20, 50, 1)

To calculate power, we can provide the sampling prior of μt and μc such that the mass of μt − μc < 0. We generate the sampling prior for the variance parameter from a Gamma(1, 1) distribution.

# Generate sampling priors
set.seed(1)
samp.prior.mu.t <- rnorm(50000)
samp.prior.mu.c <- rnorm(50000)

sub_ind <- which(samp.prior.mu.t < samp.prior.mu.c) 
# Here, mass is put on the alternative region, so power is calculated. 
samp.prior.mu.t <- samp.prior.mu.t[sub_ind]
samp.prior.mu.c <- samp.prior.mu.c[sub_ind]
samp.prior.var.t <- rgamma(100, 1, 1)
samp.prior.var.c <- rgamma(100, 1, 1)

We use the default prior on a0, the uniform prior. The average posterior means of a0 and τ are also returned below. We run 10, 000 iterations of the Gibbs sampler (for μc) and the slice sampler (for a0) for N = 100 simulated datasets. Note that N should be larger in practice.

set.seed(1)
results <- power.two.grp.random.a0(data.type=data.type, n.t=n.t, n.c=n.t, historical=historical,  
           samp.prior.mu.t=samp.prior.mu.t, samp.prior.mu.c=samp.prior.mu.c, 
           samp.prior.var.t=samp.prior.var.t, samp.prior.var.c=samp.prior.var.c,
           delta=0, nMC=10000, nBI=250, N=100) 

summary(results)
#>      average posterior mean  bias
#> mu_t                 -0.451 0.122
#> mu_c                  0.574 0.013
#> The power/type I error rate is  0.78 .
#> The average of the posterior probabilities of P(mu_t - mu_c < delta) is 0.865 .
results$`average posterior means of a0`
#>           [,1]
#> [1,] 0.4217424
#> [2,] 0.4130283
#> [3,] 0.4330019
results$`average posterior mean of tau`
#> [1] 1.121705

Next, to calculate type I error, we can provide the sampling prior of μt and μc such that the mass of μt − μc >  = 0.

# Generate sampling priors
set.seed(1)
samp.prior.mu.t <- rnorm(50000)
samp.prior.mu.c <- rnorm(50000)

sub_ind <- which(samp.prior.mu.t >= samp.prior.mu.c) 
# Here, mass is put on the null region, so type I error rate is calculated. 
samp.prior.mu.t <- samp.prior.mu.t[sub_ind]
samp.prior.mu.c <- samp.prior.mu.c[sub_ind]

set.seed(1)
results <- power.two.grp.random.a0(data.type=data.type, n.t=n.t, n.c=n.t, historical=historical,  
           samp.prior.mu.t=samp.prior.mu.t, samp.prior.mu.c=samp.prior.mu.c, 
           samp.prior.var.t=samp.prior.var.t, samp.prior.var.c=samp.prior.var.c,
           delta=0, nMC=10000, nBI=250, N=100) 

summary(results)
#>      average posterior mean   bias
#> mu_t                  0.531 -0.037
#> mu_c                 -0.235  0.330
#> The power/type I error rate is  0.16 .
#> The average of the posterior probabilities of P(mu_t - mu_c < delta) is 0.207 .