Computation of Fisher information matrix
- Example 1: GLM Fisher information matrix
- Example 2: MLM Fisher information matrix
Applications in paid research studies
- Example 3: trial_data & constrained sampling with GLM
- Example 4: trauma_data & constrained sampling with MLM
References

library(CDsampling)

In the context of paid research studies and clinical trials, budget considerations and patient sampling from available populations are subject to inherent constraints. CDsampling integrates optimal design theories within the framework of constrained sampling.

This package offers the possibility to find both D-optimal approximate and exact allocations for sampling with or without constraints.
Additionally, it provides functions to find constrained uniform sampling as a robust sampling strategy with limited model information.
It also provides tool for computation of Fisher information matrix of the generalized linear models (GLMs) including regular linear regression model and the multinomial logistic models (MLMs).
Two datasets are embedded in the package for application examples.

Computation of Fisher information matrix

Example 1: GLM Fisher information matrix

Consider a research study with a simple logistic regression model \[\log(\frac{\mu_i}{1-\mu_i}) = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}\] where \(\mu_i = E(Y_i\mid {\mathbf x}_i)\), \({\mathbf x}_i = (x_{i1}, x_{i2})^\top \in \{(-1, -1), (-1, +1), (+1, -1)\}\) and parameters \(\boldsymbol \beta = (\beta_0, \beta_1, \beta_2) = (0.5, 0.5, 0.5)\). In this example, we have \(m=3\) design points \((x_{i1}, x_{i2})^\top \in \{(-1, -1), (-1, +1), (+1, -1)\}\), the design matrix \(\mathbf X\) (with the first column for the intercept): \[\begin{bmatrix} 1 & -1 & -1 \\ 1 & -1 & 1 \\ 1 & 1 & -1\\ \end{bmatrix}\].

To calculate Fisher information matrix of the design with GLM, we can use F_func_GLM( ) in the package with input of approximate allocation \(w\), coefficients \(\boldsymbol \beta\), and design matrix \(\mathbf X\).

beta = c(0.5, 0.5, 0.5) #coefficients
X = matrix(data=c(1,-1,-1,1,-1,1,1,1,-1), byrow=TRUE, nrow=3) #design matrix
w = c(1/3,1/3,1/3) #approximate allocation
CDsampling::F_func_GLM(w=w, beta=beta, X=X, link='logit')
#> Dimensions: 3 x 3 
#> Matrix:
#> ------------------------------ 
#>      [,1]        [,2]        [,3]       
#> [1,]  0.23500371 -0.07833457 -0.07833457
#> [2,] -0.07833457  0.23500371 -0.07833457
#> [3,] -0.07833457 -0.07833457  0.23500371
#> ------------------------------

Example 2: MLM Fisher information matrix

Consider a research study with cumulative non-proportional odds multinomial logit model with \(J=5\) response levels and covariates \((x_{i1}, x_{i2})=\{(1,0),(2,0),(3,0), (4,0),(1,1),(2,1),(3,1),(4,1)\}\). The model can be written as:\[\log(\frac{\pi_{i1}}{\pi_{i2}+\dots+\pi_{i5}}) = \beta_{11}+\beta_{12}x_{i1}+\beta_{13}x_{i2}\] \[\log(\frac{\pi_{i1}+\pi_{i2}}{\pi_{i3}+\pi_{i4}+\pi_{i5}}) = \beta_{21}+\beta_{22}x_{i1}+\beta_{23}x_{i2}\] \[\log(\frac{\pi_{i1}+\pi_{i2}+\pi_{i3}}{\pi_{i4}+\pi_{i5}}) = \beta_{31}+\beta_{32}x_{i1}+\beta_{33}x_{i2}\] \[\log(\frac{\pi_{i1}+\dots+\pi_{i4}}{\pi_{i5}}) = \beta_{41}+\beta_{42}x_{i1}+\beta_{43}x_{i2}\] where \(i=1,\dots,8\). We have \(m=8\) design points and \(p=12\) parameters. We assume the parameters \(\boldsymbol \beta = (\beta_{11}, \beta_{12}, \beta_{13}, \beta_{21}, \beta_{22}, \beta_{23}, \beta_{31}, \beta_{32}, \beta_{33}, \beta_{41}, \beta_{42}, \beta_{43})^\top = (-4.047, -0.131, 4.214, -2.225, -0.376,\) \(3.519, -0.302, -0.237, 2.420, 1.386, -0.120, 1.284)^\top\). The approximate allocation for the eight design points is \(\mathbf w = (1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)^\top\).

To calculate the Fisher information matrix with MLM, we can use the F_func_MLM( ) with input of approximate allocation \(w\), covariate coefficients \(\boldsymbol \beta\), design matrix \(\mathbf X\), and multinomial logit model (cumulative for this example).

The design matrix incorporates all \(8\) design points of covariates \((x_{i1}, x_{i2})\) specified by the cumulative logit model’s four equations. For example, when \(x_{i1}=1\), \(x_{i2}=0\), the design matrix takes the following format:

\[\begin{bmatrix} 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \end{bmatrix}\].

J=5; p=12; m=8; #response levels; num of parameters; num of design points
beta = c(-4.047, -0.131, 4.214, -2.225, -0.376, 3.519, -0.302,
    -0.237,  2.420, 1.386,  -0.120,  1.284) 
Xi=rep(0,J*p*m) #design matrix   
dim(Xi)=c(J,p,m)
Xi[,,1] = rbind(c( 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

Xi[,,2] = rbind(c( 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

Xi[,,3] = rbind(c( 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 3, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,4] = rbind(c( 1, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 4, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 4, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,5] = rbind(c( 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,6] = rbind(c( 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 2, 1, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 2, 1, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 1),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,7] = rbind(c( 1, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 3, 1, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 3, 1, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 1),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,8] = rbind(c( 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 4, 1, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 4, 1, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 1),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
 alloc = rep(1/8,m) #approximate allocation
 CDsampling::F_func_MLM(w=alloc, beta=beta, X=Xi, link='cumulative')
#> Dimensions: 12 x 12 
#> Matrix:
#> ------------------------------------------------------------------------------------------------------------------------ 
#>       [,1]        [,2]        [,3]        [,4]        [,5]        [,6]       
#>  [1,]  0.44505694  1.37915564  0.43609135 -0.37247296 -1.21252053 -0.36379349
#>  [2,]  1.37915564  4.78410934  1.35694145 -1.21252053 -4.31436344 -1.19085831
#>  [3,]  0.43609135  1.35694145  0.43609135 -0.36379349 -1.19085831 -0.36379349
#>  [4,] -0.37247296 -1.21252053 -0.36379349  0.51192600  1.55177413  0.48018678
#>  [5,] -1.21252053 -4.31436344 -1.19085831  1.55177413  5.31193908  1.48241981
#>  [6,] -0.36379349 -1.19085831 -0.36379349  0.48018678  1.48241981  0.48018678
#>  [7,]  0.00000000  0.00000000  0.00000000 -0.09154268 -0.22027625 -0.07471168
#>  [8,]  0.00000000  0.00000000  0.00000000 -0.22027625 -0.64323991 -0.18445802
#>  [9,]  0.00000000  0.00000000  0.00000000 -0.07471168 -0.18445802 -0.07471168
#> [10,]  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#> [11,]  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#> [12,]  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#>       [,7]        [,8]        [,9]        [,10]       [,11]       [,12]      
#>  [1,]  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#>  [2,]  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#>  [3,]  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#>  [4,] -0.09154268 -0.22027625 -0.07471168  0.00000000  0.00000000  0.00000000
#>  [5,] -0.22027625 -0.64323991 -0.18445802  0.00000000  0.00000000  0.00000000
#>  [6,] -0.07471168 -0.18445802 -0.07471168  0.00000000  0.00000000  0.00000000
#>  [7,]  0.29320484  0.71978889  0.16205254 -0.10435894 -0.25399393 -0.06194131
#>  [8,]  0.71978889  2.13312603  0.41294396 -0.25399393 -0.74842743 -0.15305771
#>  [9,]  0.16205254  0.41294396  0.16205254 -0.06194131 -0.15305771 -0.06194131
#> [10,] -0.10435894 -0.25399393 -0.06194131  0.17861575  0.45287619  0.06925715
#> [11,] -0.25399393 -0.74842743 -0.15305771  0.45287619  1.37180187  0.17395849
#> [12,] -0.06194131 -0.15305771 -0.06194131  0.06925715  0.17395849  0.06925715
#> ------------------------------------------------------------------------------------------------------------------------

Applications in paid research studies

CDsampling package addresses constrained sampling problems in paid research studies or clinical trials where the number of qualified volunteers exceeds the available budget. The CDsampling package implements the designer’s sampling strategy and utilizes the constrained lift-one algorithm to find the optimal sample sizes for different subgroups with the goal to achieve the most accurate model estimates.

Example 3: trial_data & constrained sampling with GLM

The trial_data is a simulated dataset containing information for \(N=500\) volunteers with gender, age, and final efficacy information. The covariates considered in this example are:

1). Gender:

\(0\): for female
\(1\) for male

2). Age group:

\(age\_1 = 0\) and \(age\_2 = 0\): ages \(18\sim25\)
\(age\_1 = 1\) and \(age\_2 = 0\): ages \(26\sim64\)
\(age\_1 = 0\) and \(age\_2 = 1\): ages \(65\) and above.

There are \(m=6\) design points, that is, the number of combinations of gender and age groups \((x_{gender\_i}, x_{age\_1i}, x_{age\_2i})\):

1). \((0,0,0)\): Female, 18-25 \(N_1=50\))

2). \((0,1,0)\): Female, 26-64 (\(N_2=40\))

3). \((0,0,1)\): Female, 65+ (\(N_3=10\))

4). \((1,0,0)\): Male, 18-25 (\(N_4=200\))

5). \((1,1,0)\): Male, 26-64 (\(N_5=150\))

6). \((1,0,1)\): Male, 65+ (\(N_6=50\))

Suppose that a sample of \(n=200\) participants is required due to budget limit.Our goal is to find the constrained D-optimal allocation \((w_1, w_2, \dots, w_6)\) with feasible allocation \[S = \{(w_1, \ldots, w_m)^T \in S_0 \mid n w_i \leq N_i, i=1, \ldots, m\}.\]

We use constrained lift-one algorithm liftone_constrained_GLM( ) to find the locally D-optimal approximate sampling allocations with the input of design matrix \(X\), \(\mathbf W\) matrix which is the result returned from W_func_GLM( ), constraints setup (g.con, g.dir, and g.rhs), and boundaries in searching for lift-one weight (step 3 in the constrained lift-one algorithm, see reference).

We consider the logistic regression model for \(j=1,\dots,m\), \(i=1,\dots,n_j\) with \(\boldsymbol \beta=(\beta_{1}, \beta_{21}, \beta_{22})= (0,3,3,3)\):

\[\begin{equation}\label{eq:trial_logistic_model} {\rm logit} \{P(Y_{ij}=1 \mid x_{gender\_i}, x_{age\_1i}, x_{age\_2i})\} = \beta_0 + \beta_1 x_{gender\_i} + \beta_{21} x_{age\_1i} + \beta_{22} x_{age\_2i} \end{equation}\]

Use the following R codes to define the coefficients, sample size, and design matrix:

beta = c(0, 3, 3, 3) #coefficients 
#design matrix X
X=matrix(data=c(1,0,0,0,1,0,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,0,1), ncol=4, byrow=TRUE) 
nsample=200 #sample size

To run the liftone_constrained_GLM( ) function, we also need to the \(\mathbf W\) matrix from the calculation of Fisher information matrix, we can use the W_func_GLM( ) function in the package:

W_matrix=CDsampling::W_func_GLM(X=X, b=beta, link="logit") #W matrix

Lastly, we also need to define the constraints (number of patients from different gender and age group) and boundaries for constrained sampling (please see the reference for details of lower bound \(r_{i1}\) and upper bound \(r_{i2}\)):

rc = c(50, 40, 10, 200, 150, 50)/nsample #constraints for each subgroup
m = 6
g.con = matrix(0,nrow=(2*m+1), ncol=m)
g.con[1,] = rep(1, m)
g.con[2:(m+1),] = diag(m)
g.con[(m+2):(2*m+1), ] = diag(m)
g.dir = c("==", rep("<=", m), rep(">=", m))
g.rhs = c(1, rc, rep(0, m))

lower.bound=function(i, w){
  nsample = 200
  rc = c(50, 40, 10, 200, 150, 50)/nsample
  m=length(w) #num of categories
  temp = rep(0,m)
  temp[w>0]=1-pmin(1,rc[w>0])*(1-w[i])/w[w>0];
  temp[i]=0;
  max(0,temp);
}

upper.bound=function(i, w){
  nsample = 200
  rc = c(50, 40, 10, 200, 150, 50)/nsample
  m=length(w) #num of categories
  rc[i];
  min(1,rc[i])
}

We can define an optional subgroups label for the constrained approximate allocations:

label = c("F, 18-25", "F, 26-64", "F, >=65", "M, 18-25", "M, 26-64", "M, >=65")

Now, we can run the constrained lift-one algorithm to find optimal approximate allocations for the defined subgroups:

set.seed(2025)
approximate_design = CDsampling::liftone_constrained_GLM(X=X, W=W_matrix, g.con=g.con, g.dir=g.dir, 
                                               g.rhs=g.rhs, lower.bound=lower.bound, 
                                               upper.bound=upper.bound, label=label, reltol=1e-10, 
                                               maxit=100, random=TRUE, nram=4, w00=NULL, 
                                               epsilon=1e-8)

print(approximate_design)
#> Optimal Sampling Results:
#> ================================================================================
#> Optimal approximate allocation:
#>    F, 18-25 F, 26-64 F, >=65 M, 18-25 M, 26-64 M, >=65
#> w  0.25     0.2      0.05    0.5      0.0      0.0    
#> w0 0.25     0.2      0.05    0.5      0.0      0.0    
#> --------------------------------------------------------------------------------
#> maximum :
#> 2.8813e-08 
#> --------------------------------------------------------------------------------
#> convergence :
#> TRUE 
#> --------------------------------------------------------------------------------
#> itmax :
#> 1.0 
#> --------------------------------------------------------------------------------
#> deriv.ans :
#> 0.0, 3.6017e-08, 4.8528e-07, -1.1525e-07, -1.0310e-07, -7.9507e-08 
#> --------------------------------------------------------------------------------
#> gmax :
#> 0.0 
#> --------------------------------------------------------------------------------
#> reason :
#> "gmax <= 0" 
#> --------------------------------------------------------------------------------

The output contains several key components:

\(w\): the converged D-optimal approximate
\(w_0\): random initial weights used for the optimization
maximum: the achieved maximum determinant of the Fisher information matrix
reason: criteria for lift-one loop termination with either “all derivative <=0”, and “gmax <=0” (see reference).

To find the exact allocation (integer value of allocation), we can use the approxtoexact_constrained_func( ) with input of sample size \(n\), approximate allocation found by constrained lift-one algorithm, number of design point \(m\), coefficients beta, link type of GLM (“logit” in this example), Fdet_func=Fdet_func_GLM, and design matrix \(X\):

exact_design = CDsampling::approxtoexact_constrained_func(n=200, w=approximate_design$w, m=6, 
                                                           beta=beta, link='logit', X=X, 
                                                           Fdet_func=Fdet_func_GLM, 
                                                          iset_func=iset_func_trial, label=label) 

print(exact_design)
#> Optimal Sampling Results:
#> ================================================================================
#> Optimal exact allocation:
#>                 F, 18-25 F, 26-64 F, >=65 M, 18-25 M, 26-64 M, >=65
#> allocation      50.0     40.0     10.0    100.0    0.0      0.0    
#> allocation.real 0.25     0.2      0.05    0.5      0.0      0.0    
#> --------------------------------------------------------------------------------
#> det.maximum :
#> 46.1012 
#> --------------------------------------------------------------------------------

The allocation represents the final D-optimal exact allocation of the constrained sampling, while allocation.real is the approximate allocation found in the previous step.

Note: When prior estimates of beta coefficients are unavailable, consider using EW D-optimal allocations as an alternative option. EW D-optimal method requires prior distributions of beta coefficients, which substitutes the coefficients value with expectation of prior distributions in the computaion. (Please see reference for details).

If there is no knowledge of beta coefficients, we recommend using constrained uniform design with approxtoexact_constrained_func( ). We adjust Fdet_func =Fdet_func_unif and the approximate application \(w\) to be \(1\) subject in each subgroup. In this example, we have \(n=200\) sample thus \(w=(1/200, 1/200, 1/200, 1/200, 1/200, 1/200)\):

w00 = rep(1/200, 6)
unif_design = CDsampling::approxtoexact_constrained_func(n=200, w=w00, m=6, beta=NULL, 
  link=NULL, X=NULL, Fdet_func=Fdet_func_unif, iset_func=iset_func_trial)

print(unif_design)
#> Optimal Sampling Results:
#> ================================================================================
#> Optimal exact allocation:
#>                 1     2     3     4     5     6    
#> allocation      38.0  38.0  10.0  38.0  38.0  38.0 
#> allocation.real 0.005 0.005 0.005 0.005 0.005 0.005
#> --------------------------------------------------------------------------------
#> det.maximum :
#> 792351680.0 
#> --------------------------------------------------------------------------------

The allocation represents the constrained uniform exact allocation, while allocation.real is the initial allocation we provided.

Example 4: trauma_data & constrained sampling with MLM

The trauma_data contains information on \(N=802\) trauma patients with five ordered treatment outcomes:

1). Death (\(1\))

2). Vegetative state (\(2\))

3). Major disability (\(3\))

4). Minor disability (\(4\))

5). Good recovery (\(5\)).

We consider two key covariates:

1). Treatment dose levels:

\(x_{i1} = 1\): Placebo
\(x_{i1} = 2\): Low dose
\(x_{i1} = 3\): Medium dose
\(x_{i1} = 4\): High dose.

2). Trauma symptoms:

\(x_{i2} = 0\): mild symptoms (\(392\) mild symptoms patients)
\(x_{i2} = 1\): moderate /severe symptoms (\(410\) moderate/severe symptoms patients),

This creates \(m=8\) design points from the combination of all covariates levels.

In this study, we plan to enroll \(n=600\) patients and the constraint comes from the available patients in the two trauma symptoms subgroups (\(392\) and \(410\)). The collection of all feasible approximate allocation is: \[S=\{(w_1, \ldots, w_8)^\top \in S_0\mid n(w_1+w_2+w_3+w_4) \leq 392, n(w_5+w_6+w_7+w_8) \leq 410\}.\] This condition ensures that the number of mild patients sampled from all the four dose levels does not exceed the available mild patients, and the same applies to the moderate/severe patients.

The cumulative non-proportional logit model is specified as:\[\log(\frac{\pi_{i1}}{\pi_{i2}+\dots+\pi_{i5}}) = \beta_{11}+\beta_{12}x_{i1}+\beta_{13}x_{i2}\] \[\log(\frac{\pi_{i1}+\pi_{i2}}{\pi_{i3}+\pi_{i4}+\pi_{i5}}) = \beta_{21}+\beta_{22}x_{i1}+\beta_{23}x_{i2}\] \[\log(\frac{\pi_{i1}+\pi_{i2}+\pi_{i3}}{\pi_{i4}+\pi_{i5}}) = \beta_{31}+\beta_{32}x_{i1}+\beta_{33}x_{i2}\] \[\log(\frac{\pi_{i1}+\dots+\pi_{i4}}{\pi_{i5}}) = \beta_{41}+\beta_{42}x_{i1}+\beta_{43}x_{i2}\]

with the assumed parameter vector \(\boldsymbol\beta = (\hat\beta_{11}, \hat\beta_{12}, \hat\beta_{13}, \hat\beta_{21}, \hat\beta_{22}, \hat\beta_{23}, \hat\beta_{31}, \hat\beta_{32}, \hat\beta_{33}, \hat\beta_{41}, \hat\beta_{42}, \hat\beta_{43})^\top = (-4.047, -0.131, 4.214, -2.225, -0.376,\\ 3.519, -0.302, -0.237, 2.420, 1.386, -0.120, 1.284)^\top\).

For this example, we can reuse the set-up of design matrix and beta coefficients from Example 2: MLM Fisher information matrix:

J=5; p=12; m=8; #response levels; num of parameters; num of design points
beta = c(-4.047, -0.131, 4.214, -2.225, -0.376, 3.519, -0.302,
    -0.237,  2.420, 1.386,  -0.120,  1.284) #coefficients
Xi=rep(0,J*p*m) #design matrix
dim(Xi)=c(J,p,m)
Xi[,,1] = rbind(c( 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

Xi[,,2] = rbind(c( 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

Xi[,,3] = rbind(c( 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 3, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,4] = rbind(c( 1, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 4, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 4, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,5] = rbind(c( 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,6] = rbind(c( 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 2, 1, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 2, 1, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 1),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,7] = rbind(c( 1, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 3, 1, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 3, 1, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 1),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

 Xi[,,8] = rbind(c( 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 1, 4, 1, 0, 0, 0, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 1, 4, 1, 0, 0, 0),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 1),
                c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

Similarly, we define the sample size (\(600\)), constraints, and boundaries using the following R codes:

nsample=600 #sample size
constraint = c(392, 410)  #mild:severe

#boundaries r_i1 and r_i2 in constrained lift-one
lower.bound <- function(i, w0){
  n = 600
  constraint = c(392,410)
  if(i <= 4){
    a.lower <- (sum(w0[5:8])-(constraint[2]/n)*(1-w0[i]))/(sum(w0[5:8]))
  }
  else{
    a.lower <- (sum(w0[1:4])-(constraint[1]/n)*(1-w0[i]))/(sum(w0[1:4]))
  }
  a.lower
}

upper.bound <- function(i, w0){
  n = 600
  constraint = c(392,410)
  if(i <= 4){
    b.upper <- ((constraint[1]/n)*(1-w0[i]) - (sum(w0[1:4])-w0[i]))/(1-sum(w0[1:4]))
  }
  else{
    b.upper <- ((constraint[2]/n)*(1-w0[i]) - (sum(w0[5:8])-w0[i]))/(1-sum(w0[5:8]))
  }
  b.upper
}

#constraints of the example
g.con = matrix(0,nrow=length(constraint)+1+m, ncol=m)
g.con[2:3,] = matrix(data=c(1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1), ncol = m, byrow=TRUE)
g.con[1,] = rep(1, m)
g.con[4:(length(constraint)+1+m), ] = diag(1, nrow=m)
g.dir = c("==", "<=","<=", rep(">=",m))
g.rhs = c(1, ifelse((constraint/nsample<1),constraint/nsample,1), rep(0, m))

Define an optional subgroups label for the constrained approximate allocations:

label = c("Placebo-Mild", "Low-Mild", "Medium-Mild", "High-Mild", "Placebo-Severe", "Low-Severe", "Medium-Severe", "High-Severe")

Then, we can use the constrained lift-one algorithm to find the locally D-optimal approximate sampling with liftone_constrained_MLM( ) function in the package:

set.seed(123)
approx_design = CDsampling::liftone_constrained_MLM(m=m, p=p, Xi=Xi, J=J, beta=beta, 
                                       lower.bound=lower.bound, upper.bound=upper.bound, 
                                        g.con=g.con, g.dir=g.dir, g.rhs=g.rhs, label=label, w00=NULL, 
                                        link='cumulative', Fi.func = Fi_func_MLM, 
                                        reltol=1e-5, maxit=500, delta = 1e-6, 
                                        epsilon=1e-8, random=TRUE, nram=1)

print(approx_design)
#> Optimal Sampling Results:
#> ================================================================================
#> Optimal approximate allocation:
#>    Placebo-Mild Low-Mild Medium-Mild High-Mild Placebo-Severe Low-Severe
#> w  0.2594       0.0      0.0         0.1666    0.2796         0.0       
#> w0 0.0          0.0      0.3167      0.0       0.6833         0.0       
#>    Medium-Severe High-Severe
#> w  0.0           0.2944     
#> w0 0.0           0.0        
#> --------------------------------------------------------------------------------
#> Maximum :
#> 7.4958e-11 
#> --------------------------------------------------------------------------------
#> itmax :
#> 12.0 
#> --------------------------------------------------------------------------------
#> convergence :
#> TRUE 
#> --------------------------------------------------------------------------------
#> deriv.ans :
#> -7.5216e-11, -4.0226e-11, -4.0226e-11, -3.7980e-10, -2.0680e-25, -4.0226e-11, -4.0226e-11, 5.2190e-11 
#> --------------------------------------------------------------------------------
#> gmax :
#> NA 
#> --------------------------------------------------------------------------------
#> reason :
#> "all derivative <=0" 
#> --------------------------------------------------------------------------------

Similar to Example 3: trial_data & constrained sampling with GLM, \(w\) reports the D-optimal approximate allocation, while \(w_0\) denotes the random initial weights used in the optimization procedure. maximum reports the determinant of Fisher information matrix, and reason is the termination criteria for the lift-one algorithm.

To convert the approximate allocation to exact allocation, we use the function approxtoexact_constrained_func( ) in the package, with Fdet_func=Fdet_func_MLM:

exact_design = CDsampling::approxtoexact_constrained_func(n=600, w=approx_design$w, m=8, 
 beta=beta, link='cumulative', X=Xi, Fdet_func=Fdet_func_MLM, 
 iset_func=iset_func_trauma, label=label)

print(exact_design)
#> Optimal Sampling Results:
#> ================================================================================
#> Optimal exact allocation:
#>                 Placebo-Mild Low-Mild Medium-Mild High-Mild Placebo-Severe
#> allocation      155.0        0.0      0.0         100.0     168.0         
#> allocation.real 0.2594       0.0      0.0         0.1666    0.2796        
#>                 Low-Severe Medium-Severe High-Severe
#> allocation      0.0        0.0           177.0      
#> allocation.real 0.0        0.0           0.2944     
#> --------------------------------------------------------------------------------
#> det.maximum :
#> 1.63163827059162e+23 
#> --------------------------------------------------------------------------------

If there is no coefficients information, we can find the constrained uniform sampling with approxtoexact_constrained_func( ) using Fdet_func=Fdet_func_unif:

iset_func_trauma <- function(allocation){
  iset = rep(1,8)
  if(sum(allocation[1:4])>=392){iset[1:4]=0}
  if(sum(allocation[5:8])>=410){iset[5:8]=0}
  return(iset)
 }

unif_design = CDsampling::approxtoexact_constrained_func(n=600, w=rep(1/600,8), m=8, beta=NULL, 
 link=NULL, X=NULL, label=label, Fdet_func=Fdet_func_unif, iset_func=iset_func_trauma)

unif_design
#> Optimal Sampling Results:
#> ================================================================================
#> Optimal exact allocation:
#>                 Placebo-Mild Low-Mild Medium-Mild High-Mild Placebo-Severe
#> allocation      75.0         75.0     75.0        75.0      75.0          
#> allocation.real 0.0017       0.0017   0.0017      0.0017    0.0017        
#>                 Low-Severe Medium-Severe High-Severe
#> allocation      75.0       75.0          75.0       
#> allocation.real 0.0017     0.0017        0.0017     
#> --------------------------------------------------------------------------------
#> det.maximum :
#> 1001129150390625 
#> --------------------------------------------------------------------------------

Note:
The reported determinant of the Fisher information matrix (det.maximum) is model-specific and depends on the choice of Fdet_func. For meaningful comparisons:
- Use Fdet_func_GLM( ) for GLMs with the derived allocations
- Use Fdet_func_MLM( ) for MLMs with the derived allocations

Do not compare values across functions.

References

Huang Y., Tong L., Yang J. (2025). Constrained D-optimal Design for Paid Research Study, Statistica Sinica, DOI: 10.5705/ss.202022.0414.

Introduction to CDsampling

Table of Contents

Computation of Fisher information matrix

Example 1: GLM Fisher information matrix

Example 2: MLM Fisher information matrix

Applications in paid research studies

Example 3: trial_data & constrained sampling with GLM

Example 4: trauma_data & constrained sampling with MLM

References