JAGS vs. Stan

library(BH)
library(rACMEMEEV)

Here we will compare and contrast the two different sampling backends that are available to users of this library. There are far more indepth reviews of when to use which sampler and why (Beraha et al. 2021; Homan et al. 2014), but, nevertheless, we can further add some guidance for this specific library.

Setting up the Workflow.

I’m going to skip over a lot of the background information on why measurement error adjustment is important in dietary exposure assessment as there are many far more detailed tutorials on this. We should begin by creating a custom dataset that can be used throughout the entirety of this example. Normally dietary consumption data is characterized by a gamma distribution (Passerelli et al. 2022), so we that’s the data that will be generated. It is very important to note here that this library will standardize (i.e. make the consumption distribution more normal) by default. So, please be careful when applying this library to your dataset.

x <- rgamma(100, shape = 1, rate = 0.01)
y <- rgamma(100, shape = 3, rate = 0.02)
z <- rgamma(100, shape = 3, rate = 0.3)

These will represent random consumption values, and accordingly, we will generate some example output data that is linked to these input datas.

output <- rlnorm(100, meanlog = 3.5, sdlog = 0.2)

Thus, the full dataset looks like:

df <- data.frame(
  list(x = x, y = y, z = z, output = output)
)

head(df, 10)
#>             x         y         z   output
#> 1  193.929578  90.74957  8.662289 32.25831
#> 2   18.041910 338.32089  6.448633 46.03330
#> 3   53.443286  42.33018  3.058837 45.20608
#> 4    2.396523  86.38140  6.098603 25.39488
#> 5   42.777424 380.82160  4.813437 30.38507
#> 6   14.315278 104.60287 11.283213 38.96740
#> 7  294.574967 138.23559  7.384926 33.34672
#> 8   45.663770  97.96775  6.283557 30.57872
#> 9    2.690928 184.23220  5.053584 31.23650
#> 10 233.469657 340.39325  3.089829 37.17521

x_v_coef <- generate_coefficient(1000, 0.4, 0.7, 0.95)
y_v_coef <- generate_coefficient(1000, 0.5, 0.7, 0.95)
z_v_coef <- generate_coefficient(1000, 0.3, 0.6, 0.95)


### Stan vs. JAGS Pre-Model

With the data generated, we can now fit the pre-model to solve for the covariances
in the observed data. [Muoka et. al 2021](#References) created a very clever situation
where the JAGS sampler will converge quite quickly as the inverse-wishart distribution
is used to create a semi-conjugate prior situation. This of course does not guarantee
that the sampled posterior will be fully realized, but it does give the sampler that
much more of a chance to converge. From an API perspective, the same function to fit
the pre-model with a JAGS backend can be used to fit with a Stan backend. The only thing
that needs to be adjusted is the `stan = TRUE` parameter for the function.


``` r
jags_output <- acme_model(df, c("x", "y", "z"))
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 100
#>    Unobserved stochastic nodes: 4
#>    Total graph size: 166
#> 
#> Initializing model

stan_output <- acme_model(df, c("x", "y", "z"), stan = TRUE)
#> 
#> SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 0.000153 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.53 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
#> Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
#> Chain 1: Iteration: 1501 / 10000 [ 15%]  (Sampling)
#> Chain 1: Iteration: 2500 / 10000 [ 25%]  (Sampling)
#> Chain 1: Iteration: 3500 / 10000 [ 35%]  (Sampling)
#> Chain 1: Iteration: 4500 / 10000 [ 45%]  (Sampling)
#> Chain 1: Iteration: 5500 / 10000 [ 55%]  (Sampling)
#> Chain 1: Iteration: 6500 / 10000 [ 65%]  (Sampling)
#> Chain 1: Iteration: 7500 / 10000 [ 75%]  (Sampling)
#> Chain 1: Iteration: 8500 / 10000 [ 85%]  (Sampling)
#> Chain 1: Iteration: 9500 / 10000 [ 95%]  (Sampling)
#> Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.96 seconds (Warm-up)
#> Chain 1:                5.873 seconds (Sampling)
#> Chain 1:                6.833 seconds (Total)
#> Chain 1:

Both of the models can be then introspected to see the underlying JAGS / Stan model code along with samples to get a better understanding of how the samplers can yield slightly different posteriors.

jags_output$model
#> JAGS model:
#> 
#> 
#>   model {
#>     for (i in 1:n) {
#>       model_data[i, 1:p] ~ dmnorm( zMu[1:p] , zInvCovMat[1:p, 1:p] )
#>     }
#> 
#>     for (varIdx in 1:p) {
#>       # -- uninformative prior for zMu -- #
#>       zMu[varIdx] ~ dnorm(0, 1000000)
#>     }
#>     zInvCovMat ~ dwish(zRmat[1:p, 1:p], zdf)
#> 
#>     # -- Convert invCovMat to SD and correlation -- #
#>     zCovMat <- inverse(zInvCovMat)
#>     for (varIdx in 1:p) {
#>       zSigma[varIdx] <- sqrt(zCovMat[varIdx,varIdx])
#>     }
#>     for (varIdx1 in 1:p) {
#>       for (varIdx2 in 1:p) {
#>         zRho[varIdx1, varIdx2] <- (
#>           zCovMat[varIdx1, varIdx2] / (zSigma[varIdx1] * zSigma[varIdx2])
#>         )
#>       }
#>     }
#> 
#>     for (varIdx in 1:p) {
#>       sigma[varIdx] <- zSigma[varIdx] * sd_orig[, varIdx]
#>       mu[varIdx] <- zMu[varIdx] * sd_orig[, varIdx] + mean_orig[, varIdx]
#>     }
#>     for (varIdx1 in 1:p) {
#>       for (varIdx2 in 1:p) {
#>         rho[varIdx1, varIdx2] <- zRho[varIdx1, varIdx2]
#>       }
#>     }
#>   }
#>   
#> Fully observed variables:
#>  mean_orig model_data n p sd_orig zRmat zdf

stan_output$model
#> S4 class stanmodel 'anon_model' coded as follows:
#>   data {
#>     int<lower=1> n;
#>     int<lower=1> p;
#>     matrix[n, p] model_data;
#>     matrix[p, p] zRmat;
#>     real<lower=0> zdf;
#>     vector[p] sd_orig;
#>     vector[p] mean_orig;
#>   }
#>   parameters {
#>     vector[p] zMu;
#>     cov_matrix[p] zInvCovMat;
#>   }
#>   transformed parameters {
#>     matrix[p, p] zCovMat;
#>     vector[p] zSigma;
#>     matrix[p, p] zRho;
#>     // -- Convert invCovMat to SD and correlation --
#>     zCovMat = inv(zInvCovMat);
#>     for (varIdx in 1:p) {
#>       zSigma[varIdx] = sqrt(zCovMat[varIdx, varIdx]);
#>     }
#>     for (varIdx1 in 1:p) {
#>       for (varIdx2 in 1:p) {
#>         zRho[varIdx1, varIdx2] = zCovMat[varIdx1, varIdx2] / (zSigma[varIdx1] * zSigma[varIdx2]);
#>       }
#>     }
#>   }
#>   model {
#>     for (i in 1:n) {
#>       model_data[i, 1:p] ~ multi_normal(zMu[1:p], zInvCovMat[1:p, 1:p]);
#>     }
#>     // --  uninformative prior for zMu --
#>     for (i in 1:p) {
#>       zMu[i] ~ normal(0, 1000000);
#>     }
#>     zInvCovMat ~ wishart(zdf, zRmat[1:p, 1:p]);
#>   }
#>   generated quantities {
#>     vector[p] mu;
#>     vector[p] sigma;
#>     matrix[p, p] rho;
#>     // -- transform sd and means back to original scale --
#>     for (varIdx in 1:p) {
#>       sigma[varIdx] = zSigma[varIdx] * sd_orig[varIdx];
#>       mu[varIdx] = zMu[varIdx] * sd_orig[varIdx] + mean_orig[varIdx];
#>     }
#>     rho = zRho;
#>   }
#>

To show exactly how the samplers can create differing results. We can directly look at the different metrics for the solved covariance structure as well as the actual resultant covariance matrix.

jags_output$covariance_matrix
#> 
#> Iterations = 1001:11000
#> Thinning interval = 1 
#> Number of chains = 1 
#> Sample size per chain = 10000 
#> 
#> 1. Empirical mean and standard deviation for each variable,
#>    plus standard error of the mean:
#> 
#>               Mean        SD  Naive SE Time-series SE
#> rho[1,1]   1.00000 1.409e-16 1.409e-18      0.0000000
#> rho[2,1]   0.08973 9.876e-02 9.876e-04      0.0010062
#> rho[3,1]  -0.13431 9.674e-02 9.674e-04      0.0009674
#> rho[1,2]   0.08973 9.876e-02 9.876e-04      0.0010062
#> rho[2,2]   1.00000 1.413e-16 1.413e-18      0.0000000
#> rho[3,2]  -0.13536 9.711e-02 9.711e-04      0.0009876
#> rho[1,3]  -0.13431 9.674e-02 9.674e-04      0.0009674
#> rho[2,3]  -0.13536 9.711e-02 9.711e-04      0.0009876
#> rho[3,3]   1.00000 0.000e+00 0.000e+00      0.0000000
#> sigma[1] 111.46477 7.851e+00 7.851e-02      0.0785135
#> sigma[2]  98.32318 7.020e+00 7.020e-02      0.0702036
#> sigma[3]   6.47109 4.602e-01 4.602e-03      0.0046861
#> 
#> 2. Quantiles for each variable:
#> 
#>             2.5%       25%       50%       75%     97.5%
#> rho[1,1]  1.0000   1.00000   1.00000   1.00000   1.00000
#> rho[2,1] -0.1095   0.02392   0.09162   0.15818   0.27672
#> rho[3,1] -0.3216  -0.19973  -0.13582  -0.06801   0.05336
#> rho[1,2] -0.1095   0.02392   0.09162   0.15818   0.27672
#> rho[2,2]  1.0000   1.00000   1.00000   1.00000   1.00000
#> rho[3,2] -0.3207  -0.20218  -0.13618  -0.07126   0.05988
#> rho[1,3] -0.3216  -0.19973  -0.13582  -0.06801   0.05336
#> rho[2,3] -0.3207  -0.20218  -0.13618  -0.07126   0.05988
#> rho[3,3]  1.0000   1.00000   1.00000   1.00000   1.00000
#> sigma[1] 97.1771 106.01193 111.08162 116.52160 127.90608
#> sigma[2] 85.8378  93.37880  97.89522 102.75721 113.37813
#> sigma[3]  5.6626   6.14172   6.44166   6.76914   7.45569

stan_output$covariance_matrix
#> $summary
#>                 mean      se_mean           sd        2.5%         25%
#> sigma[1]  108.681439 6.359603e-02 7.728919e+00   93.701564  103.405438
#> sigma[2]   95.641480 5.661347e-02 6.886170e+00   82.318185   90.903012
#> sigma[3]    6.312380 4.099654e-03 4.546054e-01    5.432135    5.998104
#> rho[1,1]    1.000000 1.582845e-18 1.364133e-16    1.000000    1.000000
#> rho[1,2]   -0.153791 8.288003e+00 7.645046e+02  -95.821564    4.143550
#> rho[1,3]    8.001893 1.198822e+01 1.103326e+03  -67.182934  -10.705513
#> rho[2,1]   -0.153791 8.288003e+00 7.645046e+02  -95.821564    4.143550
#> rho[2,2]    1.000000 2.244517e-18 1.353781e-16    1.000000    1.000000
#> rho[2,3]   12.704441 1.972487e+01 1.818550e+03  -76.509513  -10.740099
#> rho[3,1]    8.001893 1.198822e+01 1.103326e+03  -67.182934  -10.705513
#> rho[3,2]   12.704441 1.972487e+01 1.818550e+03  -76.509513  -10.740099
#> rho[3,3]    1.000000 1.867484e-18 1.364292e-16    1.000000    1.000000
#> lp__     -150.412423 3.688248e-02 2.161587e+00 -155.538379 -151.618380
#>                  50%         75%       97.5%     n_eff      Rhat
#> sigma[1]  108.565490  113.859423  124.025530 14769.896 1.0000109
#> sigma[2]   95.543536  100.307823  109.211892 14795.031 1.0000930
#> sigma[3]    6.308021    6.617028    7.220533 12296.312 0.9999330
#> rho[1,1]    1.000000    1.000000    1.000000  7427.390 0.9998823
#> rho[1,2]    7.201180   13.464637  111.826008  8508.646 0.9998860
#> rho[1,3]   -6.299569   -4.243921   57.410618  8470.296 1.0000241
#> rho[2,1]    7.201180   13.464637  111.826008  8508.646 0.9998860
#> rho[2,2]    1.000000    1.000000    1.000000  3637.899 0.9998823
#> rho[2,3]   -6.327430   -4.239735   56.159450  8500.058 0.9999862
#> rho[3,1]   -6.299569   -4.243921   57.410618  8470.296 1.0000241
#> rho[3,2]   -6.327430   -4.239735   56.159450  8500.058 0.9999862
#> rho[3,3]    1.000000    1.000000    1.000000  5337.044 0.9998823
#> lp__     -150.099312 -148.826037 -147.166337  3434.832 1.0002219
#> 
#> $c_summary
#> , , chains = chain:1
#> 
#>           stats
#> parameter         mean           sd        2.5%         25%         50%
#>   sigma[1]  108.681439 7.728919e+00   93.701564  103.405438  108.565490
#>   sigma[2]   95.641480 6.886170e+00   82.318185   90.903012   95.543536
#>   sigma[3]    6.312380 4.546054e-01    5.432135    5.998104    6.308021
#>   rho[1,1]    1.000000 1.364133e-16    1.000000    1.000000    1.000000
#>   rho[1,2]   -0.153791 7.645046e+02  -95.821564    4.143550    7.201180
#>   rho[1,3]    8.001893 1.103326e+03  -67.182934  -10.705513   -6.299569
#>   rho[2,1]   -0.153791 7.645046e+02  -95.821564    4.143550    7.201180
#>   rho[2,2]    1.000000 1.353781e-16    1.000000    1.000000    1.000000
#>   rho[2,3]   12.704441 1.818550e+03  -76.509513  -10.740099   -6.327430
#>   rho[3,1]    8.001893 1.103326e+03  -67.182934  -10.705513   -6.299569
#>   rho[3,2]   12.704441 1.818550e+03  -76.509513  -10.740099   -6.327430
#>   rho[3,3]    1.000000 1.364292e-16    1.000000    1.000000    1.000000
#>   lp__     -150.412423 2.161587e+00 -155.538379 -151.618380 -150.099312
#>           stats
#> parameter          75%       97.5%
#>   sigma[1]  113.859423  124.025530
#>   sigma[2]  100.307823  109.211892
#>   sigma[3]    6.617028    7.220533
#>   rho[1,1]    1.000000    1.000000
#>   rho[1,2]   13.464637  111.826008
#>   rho[1,3]   -4.243921   57.410618
#>   rho[2,1]   13.464637  111.826008
#>   rho[2,2]    1.000000    1.000000
#>   rho[2,3]   -4.239735   56.159450
#>   rho[3,1]   -4.243921   57.410618
#>   rho[3,2]   -4.239735   56.159450
#>   rho[3,3]    1.000000    1.000000
#>   lp__     -148.826037 -147.166337

As you can see the sigma values are quite similar, but the rho values vary quite a bit. This has direct implications for what the resultant structure of the output coefficient adjustments will look like. As referenced above, the samplers explore the posterior space in a bit of a different way, so this can lead to divergences between the two results. This should not be viewed as an error as the posterior here is not mathematically directly estimatable, but it is important to assess, for you, what makes more sense. The research team for this library believes that the JAGS sampler is best, so it is the default argument. All results from the Stan sampler should be viewed as experimental and not necessarily supported by the maintainers.

Stan vs. JAGS Error Adjustment

With the pre-model fit, now we can assess how the resulting coefficients are different depending on which sampler is used.

validity_coefficients <- c(x_v_coef, y_v_coef, z_v_coef)
jags_lambda <- attenuation_matrix(
  jags_output,
  c("x", "y", "z"),
  validity_coefficients
)
jags_model_output <- multivariate_model(
  "output ~ x + y + z",
  data = df,
  columns = c("x", "y", "z"),
  a_c_matrix = jags_lambda$matrix,
  sds = jags_lambda$sds,
  variances = jags_lambda$variances,
  univariate = TRUE
)

validity_coefficients <- c(x_v_coef, y_v_coef, z_v_coef)
stan_lambda <- attenuation_matrix(
  stan_output,
  c("x", "y", "z"),
  validity_coefficients,
  stan = TRUE
)
stan_model_output <- multivariate_model(
  "output ~ x + y + z",
  data = df,
  columns = c("x", "y", "z"),
  a_c_matrix = stan_lambda$matrix,
  sds = stan_lambda$sds,
  variances = stan_lambda$variances,
  univariate = TRUE
)

Please be aware that the stan = TRUE parameter is passed to each step of the adjustment pipeline as the underlying libraries have slightly different APIs. This will prevent unexpected errors if you accidentally try to pass a rJags object to a function expecting a Stan object.

jags_plots <- plot_covariates(jags_model_output, c("x", "y", "z"))
#> No id variables; using all as measure variables
#> No id variables; using all as measure variables
#> No id variables; using all as measure variables
jags_plots$x

jags_plots$y

jags_plots$z

stan_plots <- plot_covariates(stan_model_output, c("x", "y", "z"))
#> No id variables; using all as measure variables
#> No id variables; using all as measure variables
#> No id variables; using all as measure variables
stan_plots$x

stan_plots$y

stan_plots$z

Again, the differing pre-model results yield different adjustments for our synthetic dataset. It is worth repeating that one is not necessarily more accurate, but the divergences should be treated as important data points when assessing the measurement error structure in your own dataset.

Traceplots

And accordingly, the traceplots can be assessed to make sure that both the samplers have converged.

traceplots(stan_output$samples, c("x", "y", "z"), pre_model = TRUE, stan = TRUE)

traceplots(jags_output$samples, c("x", "y", "z"), pre_model = TRUE)

References

Passarelli, S., Free, C. M., Allen, L. H., Batis, C., Beal, T., Biltoft-Jensen, A. P., Bromage, S., Cao, L., Castellanos-Gutiérrez, A., Christensen, T., Crispim, S. P., Dekkers, A., De Ridder, K., Kronsteiner-Gicevic, S., Lee, C., Li, Y., Moursi, M., Moyersoen, I., Schmidhuber, J., … Golden, C. D. (2022). Estimating national and subnational nutrient intake distributions of global diets. The American Journal of Clinical Nutrition, 116(2), 551–560. https://doi.org/10.1093/ajcn/nqac108
Beraha, M., Falco, D., & Guglielmi, A. (2021). JAGS, NIMBLE, Stan: A detailed comparison among Bayesian MCMC software (No. arXiv:2107.09357). arXiv. https://doi.org/10.48550/arXiv.2107.09357
Homan, M. D., & Gelman, A. (2014). The No-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 15(1), 1593–1623.
Muoka, A. K., Agogo, G. O., Ngesa, O. O., & Mwambi, H. G. (2020b). A Method to adjust for measurement error in multiple exposure variables measured with correlated errors in the absence of an internal validation study. F1000Research, 9, 1486. https://doi.org/10.12688/f1000research.27892.1

Maintainers

westford14