--- title: "Introduction to bain" author: "Herbert Hoijtink, Caspar van Lissa, Joris Mulder, Xin Gu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to bain} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Acknowledgment Besides the authors Marlyne Bosman, Camiel van Zundert, and Fayette Klaassen have contributed examples to this vignette. ## Introduction `bain` is an acronym for "Bayesian informative hypotheses evaluation". It uses the Bayes factor to evaluate hypotheses specified using equality and inequality constraints among (linear combinations of) parameters in a wide range of statistical models. Two tutorials are retrievable from the bain website (look under tutorials) at https://informative-hypotheses.sites.uu.nl/software/bain/. The first introducing Bayesian evaluation of informative hypotheses is provided by Hoijtink, H., Mulder, J., van Lissa, C., and Gu, X. (2019). A tutorial on testing hypotheses using the Bayes factor. Psychological Methods, 24, 539-556. By reading the tutorial in combination with executing the analyses contained in `easyBFtutorial.R` and `BFtutorial.R` you quickly learn the basics of Bayesian hypothesis evaluation. The second containing introduction to Bayesian evaluation of informative hypotheses in the context of structural equation models is provided by Van Lissa, C., Gu, X., Mulder, J., Rosseel, Y., van Zundert, C. and Hoijtink, H. (2020), Structural Equation Modeling, 28, 292-301. **Users are advised to read these tutorials and this vignette before using `bain`**. The full `bain` reference list with applications and statistical underpinnings can be found on the `bain` website. ## Usage `bain(x, hypothesis, fraction = 1, ...)` ## Arguments `x` An R object containing the outcome of a statistical analysis. Currently, the following objects can be processed: 1) `t_test()` objects (Student's t-test, Welch's t-test, paired samples t-test, one-group t-test, equivalence test). Note that, `t_test` can be used in the same way as `t.test`. 2) `lm()` objects (ANOVA, ANCOVA, multiple regression). 3) `lavaan` objects generated with the `sem()`, `cfa()`, and `growth` functions. 4) A named vector containing the estimates resulting from a statistical analysis. Using this option triggers the `bain.default()` method. Note that, named means that each estimate has to be named such that it can be referred to in hypotheses. `hypothesis` A character string containing the informative hypotheses to evaluate (see below the section on "The Specification of Hypotheses"). `fraction = 1` A number representing the fraction of information in the data used to construct the prior distribution. The default value 1 denotes the minimal fraction, 2 denotes twice the minimal fraction, etc. Example 2d. below shows how `fraction` can be employed to execute a sensitivity analysis. See also Hoijtink, Mulder, van Lissa, and Gu, 2019). `...` Additional arguments (see below). ## Using `bain` with a `lm` or `t_test` object The following steps need to be executed: 1) `x <- lm()` or `x <- t_test()`. Execute an analysis with `lm` or `t_test`. See the Examples section below for a complete elaboration of the calls to `lm` and `t_test` that can be processed by `bain`. Note that, `lm` and `t_test` will apply list-wise deletion if there are cases with missing values in the variables used. An imputation based method for dealing with missing values tailored to Bayesian hypothesis evaluation is illustrated in Section 2 "Examples Using a `lm` object" in Example e. (based on Hoijtink, Gu, Mulder, and Rosseel, 2019). 2) If `lm()` is used, display the estimates and their names using the command `coef(x)`. (Unique abbreviations of) the names will be used to specify `hypotheses`. If `t_test()` is used, hypotheses have to be specified using the names `x`, `y`, and `difference` (see Examples 1a through 1e which can be found below). 3) `set.seed(seed)`. Set `seed` equal to an integer number to create a repeatable random number sequence. `bain` uses sampling to compute Bayes factors and posterior model probabilities. It is therefore recommended to run analyses with two different seeds to ensure stability of the results. 4) `results <- bain(x,hypotheses,fraction = 1)` or `results <- bain(x,hypotheses,fraction = 1,standardize = FALSE)`. The first call to `bain` is used in case of `lm` implementations of ANOVA, ANCOVA, and `t_test`. The second call to `bain` is used in case of `lm` implementations of multiple regression. With `standardize = TRUE` hypotheses with respect to standardized regression coefficients are evaluated. With `standardize = FALSE` hypotheses with respect to unstandardized regression coefficients are evaluated. `fraction = 1` represents the fraction of information in the data used to construct the prior distribution. The default value 1 denotes the minimal fraction, 2 denotes twice the minimal fraction, etc. Example 2d. below shows how `fraction` can be employed to execute a sensitivity analysis. See also Hoijtink, Mulder, van Lissa, and Gu, 2019). 5) `print(results)` Print the results of an analysis with `bain`. 6) `summary(results, ci=0.95)` Present estimates and credibility intervals for the parameters used to specify the `hypotheses`. `ci` can be used to specify the confidence level of the credibility intervals. ## Using `bain` with a `lavaan` object The following steps need to be executed: 1) `x <- sem()` or `x <- cfa()` or `x <- growth()`. Execute an analysis with the sem, cfa, or growth functions implemented in [`lavaan`](https://lavaan.ugent.be). Note that, by default, `lavaan` will apply list-wise deletion if there are cases with missing values in the variables used. An imputation based method for dealing with missing values tailored to Bayesian hypothesis evaluation is illustrated in Section 2 "Examples Using a `lm` object" in Example e. (based on Hoijtink, Gu, Mulder, and Rosseel, 2019). If an analysis with `lavaan` is executed using `missing = "fiml"` the sample size is not corrected for the presence of missing values. This will affect (bias) the evaluation of hypotheses specified using (about) equality constraints. 2) Specify a `lavaan` model using the `model <- ...` command. In case of multiple group models, only models **without** between group restrictions can be processed by `bain` with a `lavaan` object as input. 3) Display the estimates and their names using the command `coef(x)`. Only parameters who's names contain `~` (regression coefficients), `~1` (intercepts), or `=~` (factor loadings) can be used in the specification of hypotheses. (Unique abbreviations of) the names can be used to specify `hypotheses`. For multiple group analyses the names have to end with a group label `.grp`. Group labels can be assigned using commands like `sesamesim$sex <- factor(sesamesim$sex, labels = c("boy", "girl"))`. If in a `lavaan` model parameters are labeled, e.g., as in `model <- 'age ~ c(a1, a2)*peabody + c(b1, b2)*1` then the labels have to be used in the specification of hypotheses. 4) `set.seed(seed)`. Set `seed` equal to an integer number to create a repeatable random number sequence. `bain` uses sampling to compute Bayes factors and posterior model probabilities. It is therefore recommended to run analyses with two different seeds to ensure stability of the results. 5) `results <- bain(x,hypotheses,fraction = 1,standardize = FALSE)`. With `standardize = TRUE` hypotheses with respect to standardized coefficients are evaluated. With `standardize = FALSE` hypotheses with respect to unstandardized coefficients are evaluated. `fraction = 1` represents the fraction of information in the data used to construct the prior distribution. The default value 1 denotes the minimal fraction, 2 denotes twice the minimal fraction, etc. Example 2d. below shows how `fraction` can be employed to execute a sensitivity analysis. See also Hoijtink, Mulder, van Lissa, and Gu, 2019). 6) `print(results)` Print the results of an analysis with `bain`. 7) `summary(results, ci=0.95)` Present estimates and credibility intervals for the parameters used to specify the `hypotheses`. `ci` can be used to specify the confidence level of the credibility intervals. ## Using `bain` with a named vector The following steps need to be executed: 1) Execute a statistical analysis. In case of a single group analysis, the following information has to be extracted from the statistical analysis and supplied to `bain`: a) A vector containing estimates of the parameters used to specify `hypotheses`; b) A list containing the covariance matrix of these parameters; and, c) The sample size used for estimation of the parameters. Note that, due to missing values this sample size may be smaller than the total sample size. An imputation based method for dealing with missing values tailored to Bayesian hypothesis evaluation is illustrated in Section 2 "Examples Using a `lm` object" in Example e. (based on Hoijtink, Gu, Mulder, and Rosseel, 2019). In case of a multiple group analysis, the following information has to be extracted from the statistical analysis and supplied to `bain`: a) A vector containing estimates of the group specific parameters possibly augmented with the estimates of parameters that are shared by the groups. The structure of this vector is [parameters of group 1, parameters of group 2, ..., the parameters that are shared by the groups]; b) A list containing, per group, the covariance matrix of the parameters corresponding to the group at hand and, possibly, the augmented parameters. In the rows and columns of each covariance matrix the parameters of the group at hand come first, possibly followed by the augmented parameters. c) Per group the sample size used for estimation of the parameters. Note that, due to missing values this sample size may be smaller than the total sample size per group. An imputation based method for dealing with missing values tailored to Bayesian hypothesis evaluation is illustrated in Section 2 "Examples Using a `lm` object" in Example e. (based on Hoijtink, Gu, Mulder, and Rosseel, 2019). 2) Assign names to the estimates using `names(estimates)<-c()`. Note that, `names` is a character vector containing new names for the estimates in `estimates`. Each name has to start with a letter, and may consist of "letters", "numbers", "`.`", "`_`", "`:`", "`~`", "`=~`", and "`~1`". These names are used to specify `hypotheses` (see below). An example is `names <- c("a", "b", "c")`. 3) `set.seed(seed)`. Set `seed` equal to an integer number to create a repeatable random number sequence. `bain` uses sampling to compute Bayes factors and posterior model probabilities. It is therefore recommended to run analyses with two different seeds to ensure stability of the results. 4) `results <- bain(estimates, hypotheses, n=., Sigma=., group_parameters = 2, joint_parameters = 0, fraction = 1)` executes `bain` with the following arguments: a) `estimates` A named vector with parameter estimates. b) `hypotheses` A character string containing the informative hypotheses to evaluate (the specification is elaborated below). c) `n` A vector containing the sample size of each group in the analysis. See, Hoijtink, Gu, and Mulder (2019), for an elaboration of the difference between one and multiple group analyses. A multiple group analysis is required when group specific parameters are used to formulate `hypotheses`. Examples are the Student's and Welch's t-test, ANOVA, and ANCOVA. See the Examples section for elaborations of the specification of multiple group analyses when a named vector is input for `bain`. d) `Sigma` A list of covariance matrices. In case of one group analyses the list contains one covariance matrix. In case of multiple group analyses the list contains one covariance matrix for each group. See the Examples section and Hoijtink, Gu, and Mulder (2019) for further instructions. e) `group_parameters` The number of group specific parameters. In, for example, an ANOVA with three groups, `estimates` will contain three sample means, `group_parameters = 1` because each group is characterized by one mean, and `joint_parameters = 0` because there are no parameters that apply to each of the groups. In, for example, an ANCOVA with three groups and two covariates, `estimates` will contain five parameters (first the three adjusted means, followed by the regression coefficients of the two covariates), `group_parameters = 1` because each group is characterized by one adjusted mean, and `joint_parameters = 2` because there are two regression coefficients that apply to each group. In, for example, a repeated measures design with four repeated measures and two groups (a between factor with two levels and a within factor with four levels) `estimates` will contain eight means (first the four for group 1, followed by the four for group 2), `group_parameters = 4` because each group is characterized by four means and `joint_parameters = 0` because there are no parameters that apply to each of the groups. f) `joint_parameters` In case of one group `joint_parameters = 0`. In case of two or more groups, the number of parameters in `estimates` shared by the groups. In, for example, an ANCOVA, the number of `joint_parameters` equals the number of covariates. g) `fraction = 1` A number representing the fraction of information in the data used to construct the prior distribution. The default value 1 denotes the minimal fraction, 2 denotes twice the minimal fraction, etc. Example 2d. below shows how `fraction` can be employed to execute a sensitivity analysis. See also Hoijtink, Mulder, van Lissa, and Gu, 2019). 5) `print(results)` Print the results of an analysis with `bain`. 6) `summary(results, ci=0.95)` Present estimates and credibility intervals for the parameters used to specify the `hypotheses`. `ci` can be used to specify the confidence level of the credibility intervals. ## The specification of `hypotheses` `hypotheses` is a character string that specifies which informative hypotheses have to be evaluated. A simple example is `hypotheses <- "a > b > c; a = b = c;"` which specifies two hypotheses using three estimates with names "a", "b", and "c", respectively. The hypotheses specified have to adhere to the following rules: * When using `bain` with a `lm` or `t_test` or `lavaan` object, (unique abbreviations of) the names displayed by `coef(x)` have to be used (but see the section "Using bain with a lavaan object" for additional instructions if multiple group analyses are executed and/or parameters are labeled). If, for example, the names are `cat` and `dog`, `c` and `d ` would be unique abbreviations. If, for example, the names are `cato` and `cata` the whole names have to be used. * When using `bain` with a named vector, parameters are referred to using the names specified in `names()`. * Linear combinations of parameters must be specified adhering to the following rules: a) Each parameter name is used at most once. b) Each parameter name may or may not be pre-multiplied with a number. c) A constant may be added or subtracted from each parameter name. d) A linear combination can also be a single number. Examples are: `3 * a + 5`; `a + 2 * b + 3 * c - 2`; `a - b`; and `5`. * (Linear combinations of) parameters can be constrained using <, >, and =. For example, `a > 0` or `a > b = 0` or `2 * a < b + c > 5`. * The ampersand `&` can be used to combine different parts of a hypothesis. For example, `a > b & b > c` which is equivalent to `a > b > c` or `a > 0 & b > 0 & c > 0`. * Sets of (linear combinations of) parameters subjected to the same constraints can be specified using (). For example, `a > (b,c)` which is equivalent to `a > b & a > c`. * The specification of a hypothesis is completed by typing ; For example, `hypotheses <- "a > b > c; a = b = c;"`, specifies two hypotheses. * Hypotheses have to be compatible, non-redundant and possible. What these terms mean will be elaborated below. **The set of hypotheses has to be compatible.** For the statistical background of this requirement see Gu, Mulder, Hoijtink (2019). Usually the sets of hypotheses specified by researchers are compatible, and if not, `bain` will return an error message. The following steps can be used to determine if a set of hypotheses is compatible: 1) Replace a range constraint, e.g., `1 < a1 < 3`, by an equality constraint in which the parameter involved is equated to the midpoint of the range, that is, `a1 = 2`. 2) Replace in each hypothesis the < and > by =. For example, `a1 = a2 > a3 > a4` becomes `a1 = a2 = a3 = a4`. 3) The hypotheses are compatible if there is at least one solution to the resulting set of equations. For the two hypotheses considered under 1. and 2., the solution is `a1 = a2 = a3 = a4 = 2`. An example of two non-compatible hypotheses is `hypotheses <- "a = 0; a > 2;"` because there is no solution to the equations `a=0` and `a=2`. **Each hypothesis in a set of hypotheses has to be non-redundant.** A hypothesis is redundant if it can also be specified with fewer constraints. For example, `a = b & a > 0 & b > 0` is redundant because it can also be specified as `a = b & a > 0`. `bain` will work correctly if hypotheses specified using only < and > are redundant. `bain` will return an error message if hypotheses specified using at least one = are redundant. **Each hypothesis in a set of hypotheses has to be possible.** An hypothesis is impossible if estimates in agreement with the hypothesis do not exist. For example: values for `a, b, c` in agreement with `a > b > c & a < c` do not exist. It is the responsibility of the user to ensure that the hypotheses specified are possible. If not, `bain` will either return an error message or render an output table containing `Inf`'s. ## Output from an analysis with `bain` The commands `bain()` or `results<-bain()` followed by `results` or `print(results)` will render the default (most important) output from `bain`. These concern for each hypothesis specified in `hypotheses` the fit, complexity, Bayes factor versus the unconstrained hypothesis, Bayes factor versus its complement, posterior model probability (based on equal prior model probabilities) excluding the unconstrained hypothesis, posterior model probability including the unconstrained hypothesis, and posterior model probability of each hypothesis specified and their joint complement. Note that, all the posterior model probabilities are computed from the Bayes factors of each hypothesis versus the unconstrained hypothesis. In Hoijtink, Mulder, van Lissa, and Gu (2019) it is elaborated how these quantities (and the other output presented below) should be interpreted. Additionally, using `summary(results, ci=0.95)`, a descriptives matrix can be obtained in which for each estimate, the name, the value, and a 95\% central credibility interval is presented. The following commands can be used to retrieve the default and additional information from the `bain` output object: * `results$fit` renders the default output, `results$fit$Fit` contains only the column containing the fit of each hypothesis. In the last command `Fit` can be replaced by `Com`, `BF`, `BF.u`, `BF.c`, `PMPa`, `PMPb`, `PMPc` to obtain the information in the corresponding columns of the default output. Note that, in the columns `BF` and `BF.c` the Bayes factor of the hypothesis at hand versus its complement is displayed. In the column `BF.u` the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis is displayed. `PMPa` renders the posterior model probabilities (based on equal prior model probabilities) of the hypotheses specified. `PMPb` renders the posterior model probabilities (based on equal prior model probabilities) of the hypotheses specified plus the unconstrained hypothesis. `PMPc` renders the posterior model probabilities (based on equal prior model probabilities) of the hypotheses specified plus the complement of the union of these hypotheses. If, in the latter case, the complexity of the complement of the union of all hypotheses specified is smaller than .05, the hypotheses specified (almost) completely cover the parameter space. In this case `PMPc` is not provided and instead `PMPa` should be used. * `results$BFmatrix` contains the matrix containing the mutual Bayes factors of the hypotheses specified in `hypotheses`. * `results$b` contains for each of the groups in the analysis the fraction of information of the data in the group at hand used to specify the covariance matrix of the prior distribution. * `results$prior` contains the covariance matrix of the prior distribution. * `results$posterior` contains the covariance matrix of the posterior distribution. * `results$call` displays the call to `bain`. * `results$model` displays the named vector or the R object that is input to `bain`. * `results$n` displays the sample sizes per group. * `results$independent_restrictions` displays the number of independent constraints in the set of hypotheses under consideration. Note that, in Gu, Mulder, and Hoijtink (2018) en Hoijtink, Gu, and Mulder (2019) the definition given was misprinted (besides R and S also r and s should have been added to the definition). * `results$fit$Fit_eq` displays the fit of the equality constrained part of each hypothesis. Replacing `Fit_eq` by `Fit_in`, renders the fit of the inequality constrained part of an hypothesis conditional on the fit of the equality constrained part. `Com_eq`, and `Com_in`, respectively, are the complexity counterparts of `Fit_eq`, and `Fit_in`. Note that, if you have specified two hypotheses that both have a small `BF.u` (close to zero), then there is no support in the data for these hypotheses. In these cases the corresponding entry in `results$BFmatrix` (the Bayes factor comparing both hypotheses) is very unstable and should only be interpreted if repeated analyses using different seeds (see `set.seed()`) render about the same results. ## Examples Note that, each of the examples given below can be run by 1) copy pasting them into the Source screen of `RStudio`. 2) by highlighting them followed by `ctrl-enter` or `cmd-enter`. Unless indicated otherwise, the examples that follow below use a simulated data set inspired by the Sesame Street data set from: Stevens, J. P. (1996). Applied Multivariate Statistics for the Social Sciences. Mahwah NJ: Lawrence Erlbaum. This data set is included in the bain package. The variables contained in sesamesim are subsequently: * sex (1 = boy, 2 = girl) of the child * site (1 = disadvantaged inner city, 2 = advantaged suburban , 3 = advantaged rural, 4 = disadvantaged rural, 5 = disadvantaged Spanish speaking) from which the child originates * setting (1 = at home, 2 = at school) in which the child watches sesame street * age (in months) of the child * viewenc (0 = no, 1 = yes), whether or not the child is encouraged to watch Sesame Street * peabody (mental age) score of the child (higher score is higher mental age) * prenumb (score on a numbers test before watching Sesame Street for a year) * postnumb (score on a numbers test after watching Sesame Street for a year) * funumb (follow up numbers test score measured one year after postnumb) * Bb Knowledge of body parts before * Bl Knowledge of letters before * Bf Knowledge of forms before * Bn Knowledge of numbers before * Br Knowledge of relations before * Bc Knowledge of classifications before * Ab Knowledge of body parts after * Al Knowledge of letters after * Af Knowledge of forms after * An Knowledge of numbers after * Ar Knowledge of relations after * Ac Knowledge of classifications after The examples that follow below are organized in four categories: 1) running `bain` with a `t_test` object 2) running `bain` with a `lm` object 3) running `bain` with a `lavaan` object 3) running bain with a named vector The ANOVA and ANCOVA examples are provided using both a `lm` object and a named vector as input for `bain`. Below you will find the following examples: **1. Examples Using a `t_test` Object** If a `t_test` object is used, the main output table is labeled: "Bayesian informative hypothesis testing for an object of class t_test" (which denotes that a `t-test` was executed). a) Bayesian Student's t-test (equal within group variances) b) Bayesian Welch's t-test (unequal within group variances c) Bayesian paired samples t-test d) Bayesian one group t-test e) Bayesian Equivalence test **2. Examples Using a `lm` Object** a) Bayesian ANOVA. The example concerns a one-way ANOVA. Two-way or higher order ANOVAs can only be handled by recoding all factors into one factor. If, for example, there is a factor `sex` with levels `man` and `woman`, and a factor `age` with levels `young` and `old`, these have to be recoded in a new factor `sexage` with levels `manyoung`, `manold`, `womanyoung`, `womanold`. If a `lm` "ANOVA" object is used, the main output table is labeled: "Bayesian informative hypothesis testing for an object of class lm (ANOVA)" b) Bayesian ANCOVA. The example concerns a one-way ANCOVA. Two-way or higher order ANCOVAs can only be handled by recoding all factors into one factor. **Note that calls to `lm` using functions of the predictors (for example, adding squared predictors to the model using commands like `y ~ x + I(x^2)`) can not be processed by `bain`.** However, one can compute a new variable (for example, a squared predictor) and add this variable to the model specification of `lm`. If a `lm` "ANCOVA" object is used, the main output table is labeled: "Bayesian informative hypothesis testing for an object of class lm (ANCOVA)". **Note that, in the ANCOVA the covariates are centered!** c) Bayesian multiple regression. **Note that calls to `lm` using functions of the predictors (for example, adding squared predictors to the model using commands like `y ~ x + I(x^2)`) can not be processed by `bain`.** However, one can compute a new variable (for example, a squared predictor) and add this variable to the model specification of `lm`. Note furthermore, that only if `standardize = FALSE` interactions between predictors should be processed, because a standardized interaction is not the same as the interaction between two standardized variables. If a `lm` "regression with only continuous predictors" object is used, the main output table is labeled: "Bayesian informative hypothesis testing for an object of class lm (continuous predictors)". If a `lm` object contains **two or more** factors and, optionally, continuous predictors, the main output table is labeled: "Bayesian informative hypothesis testing for an object of class lm (mixed predictors)". In case of mixed predictors, the one group approach to computing Bayes factors is used (see, Hoijtink, Gu, and Mulder, 2019). This may render inferior results if group sizes are unequal. d) Bayesian ANOVA. Sensitivity analysis. See Hoijtink, Mulder, van Lissa, and Gu (2019) for elaborations. e) Bayesian multiple regression with missing data **3. Examples Using a `lavaan` Object** If a `lavaan` object is used, the main output table is labeled: "Bayesian informative hypothesis testing for an object of class lavaan". See the tutorial by Van Lissa, Gu, Mulder, Rosseel, van Zundert, and Hoijtink (2020) for further elaborations. a) Bayesian confirmatory factor analysis b) Bayesian latent regression c) Bayesian multiple group latent regression. Note that, multiple group models with between group restrictions cannot be processed. **4. Examples Using a Named Vector** If a `named vector` object is used, the main output table is labeled: "Bayesian informative hypothesis testing for an object of class numeric" (which denotes that `named vector` input was used). a) Bayesian ANOVA b) Bayesian Welch's ANOVA (unequal within group variances) c) Bayesian robust (against non-normality) ANOVA (unequal within group variances) d) Bayesian ANCOVA e) Bayesian repeated measures analysis (one within factor) f) Bayesian repeated measures analysis (within between design) g) Bayesian one group logistic regression (counterpart of multiple regression) h) Bayesian multiple group logistic regression (counterpart of ANCOVA) i) Bayesian confirmatory factor analysis **1. Examples Using a `t_test` Object** a) An example of the Bayesian Student's t-test (equal within group variances): ```{r ttest_ex1, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # collect the data for the boys in the vector x and for the girls in the # vector y x<-sesamesim$postnumb[which(sesamesim$sex==1)] y<-sesamesim$postnumb[which(sesamesim$sex==2)] # execute student's t-test ttest <- t_test(x,y, var.equal = TRUE) # set a seed value set.seed(100) # test hypotheses with bain. The names of the means are x and y. results <- bain(ttest, "x = y; x > y; x < y") # display the results results # obtain the descriptives table summary(results, ci = 0.95) ``` b) An example of the Bayesian Welch's t-test (unequal within group variances): ```{r ttest_ex2, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # collect the data for the boys in the vector x and for the girs in the # vector y x<-sesamesim$postnumb[which(sesamesim$sex==1)] y<-sesamesim$postnumb[which(sesamesim$sex==2)] # execute student's t-test ttest <- t_test(x,y, var.equal = FALSE) # set a seed value set.seed(100) # test hypotheses with bain. The names of the means are x and y. results <- bain(ttest, "x = y; x > y; x < y") # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` c) An example of the Bayesian paired samples t-test: ```{r ttest_ex3, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # compare the pre with the post measurements ttest <- t_test(sesamesim$prenumb,sesamesim$postnumb,paired = TRUE) # set a seed value set.seed(100) # test hypotheses with bain. Use difference to refer to the difference between means results <- bain(ttest, "difference=0; difference>0; difference<0") # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` d) An example of the Bayesian one group t-test: ```{r ttest_ex4, eval=FALSE} # load the bain package which includes the simulated sesamesim data se library(bain) # compare post measurements with the reference value 30 ttest <- t_test(sesamesim$postnumb) # Check name of estimate set.seed(100) # test hypotheses with bain versus the reference value 30. Use x to refer to the mean results <- bain(ttest, "x=30; x>30; x<30") # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` e) An example of the Bayesian equivalence test ```{r ttest_ex5, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # collect the data for the boys in the vector x and for the girs in the # vector y x<-sesamesim$postnumb[which(sesamesim$sex==1)] y<-sesamesim$postnumb[which(sesamesim$sex==2)] # execute student's t-test ttest <- t_test(x,y, var.equal = TRUE) # compute the pooled within standard deviation using the variance of x # (ttest$v[1]) and y (ttest$v[2]) pwsd <- sqrt(((length(x) -1) * ttest$v[1] + (length(y)-1) * ttest$v[2])/ ((length(x) -1) + (length(y) -1))) # print pwsd in order to be able to include it in the hypothesis. Its value # is 12.60 print(pwsd) # set a seed value set.seed(100) # test hypotheses (the means of boy and girl differ less than .2 * pwsd = # 2.52 VERSUS the means differ more than .2 * pwsd = 2.52) with bain # note that, .2 is a value for Cohen's d reflecting a "small" effect, that # is, the means differ less or more than .2 pwsd. # use x and y to refer to the means. results <- bain(ttest, "x - y > -2.52 & x - y < 2.52") # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` **2. Examples Using a `lm` Object** a) An example of the Bayesian ANOVA ```{r ttest_ex6, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # make a factor of variable site sesamesim$site <- as.factor(sesamesim$site) # execute an analysis of variance using lm() which, due to the -1, returns # estimates of the means per group anov <- lm(postnumb~site-1,sesamesim) # take a look at the estimated means and their names coef(anov) # set a seed value set.seed(100) # test hypotheses with bain results <- bain(anov, "site1=site2=site3=site4=site5; site2>site5>site1> site3>site4") # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` b) An example of the Bayesian ANCOVA ```{r ttest_ex7, eval=FALSE} # load the bain package which includes the simulated sesamesim data se library(bain) # make a factor of variable site sesamesim$site <- as.factor(sesamesim$site) # Center the covariate. If centered the coef() command below displays the # adjusted means. If not centered the intercepts are displayed. sesamesim$prenumb <- sesamesim$prenumb - mean(sesamesim$prenumb) # execute an analysis of covariance using lm() which, due to the -1, returns # estimates of the adjusted means per group ancov <- lm(postnumb~site+prenumb-1,sesamesim) # take a look at the estimated adjusted means, the regression coefficient # of the covariate and their names coef(ancov) # set a seed value set.seed(100) # test hypotheses with bain results <- bain(ancov, "site1=site2=site3=site4=site5; site2>site5>site1>site3>site4") # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` c) An example of the Bayesian multiple regression ```{r ttest_ex8, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # execute a multiple regression using lm() regr <- lm(postnumb ~ age + peabody + prenumb,sesamesim) # take a look at the estimated regression coefficients and their names coef(regr) # set a seed value set.seed(100) # test hypotheses with bain. Note that standardize = FALSE denotes that the # hypotheses are in terms of unstandardized regression coefficients results<-bain(regr, "age = 0 & peab=0 & pre=0 ; age > 0 & peab > 0 & pre > 0" , standardize = FALSE) # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) # Since it is only meaningful to compare regression coefficients if they are # measured on the same scale, bain can also evaluate standardized regression # coefficients (based on the seBeta function by Jeff Jones and Niels Waller): # set a seed value set.seed(100) # test hypotheses with bain. Note that standardize = TRUE denotes that the # hypotheses are in terms of standardized regression coefficients results<-bain(regr, "age = peab = pre ; pre > age > peab",standardize = TRUE) # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` d) An example of the Bayesian ANOVA: Sensitivity Analysis Note that, most the code below can be replaced by a call to the function `bain_sensitivity`. See the help file for this function for further instructions (call `?bain_sensitivity`). ```{r ttest_ex9, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # make a factor of variable site sesamesim$site <- as.factor(sesamesim$site) # execute an analysis of variance using lm() which, due to the -1, returns # estimates of the means per group anov <- lm(postnumb~site-1,sesamesim) # take a look at the estimated means and their names coef(anov) # set a seed value set.seed(100) # test hypotheses with bain results1 <- bain(anov, "site1=site2=site3=site4=site5; site2>site5>site1> site3>site4", fraction = 1) # display the results print(results1) # obtain the descriptives table summary(results1, ci = 0.95) # execute a sensitivity analysis. See Hoijtink, Mulder, # van Lissa, and Gu (2019) for elaborations. set.seed(100) results2 <- bain(anov, "site1=site2=site3=site4=site5; site2>site5>site1> site3>site4",fraction = 2) print(results2) set.seed(100) results3 <- bain(anov, "site1=site2=site3=site4=site5; site2>site5>site1> site3>site4",fraction = 3) print(results3) ``` e) An example of Bayesian Regression with missing data. RUNNING THIS EXAMPLE WILL TAKE ABOUT 60 SECONDS ```{r ttest_ex20, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # load mice (multiple imputation of missing data) # inspect the mice help file to obtain further information library(mice) # create missing values in four variables from the sesamesim data set misdat <- cbind(sesamesim$prenumb,sesamesim$postnumb,sesamesim$funumb,sesamesim$peabody) colnames(misdat) <- c("prenumb","postnumb","funumb","peabody") misdat <- as.data.frame(misdat) set.seed(1) pmis <- .80 # for (i in 1:240){ uni<-runif(1) if (pmis < uni) { misdat$funumb[i]<-NA } uni<-runif(1) if (pmis < uni) { misdat$prenumb[i]<-NA misdat$postnumb[i]<-NA } uni<-runif(1) if (pmis < uni) { misdat$peabody[i]<-NA } } # print data summaries - note the missing values (NAs) summary(misdat) # use mice to create 1000 imputed data matrices. Note that, the approach used # below # is only one manner in which mice can be instructed. Many other # options are available. M <- 1000 out <- mice(data = misdat, m = M, seed=999, meth=c("norm","norm","norm","norm"), diagnostics = FALSE, printFlag = FALSE) # create vectors in which 1000 fits and complexities can be stored for each # of two hypotheses and their complement fits1 <- vector("numeric",1000) compls1 <- vector("numeric",1000) fits2 <- vector("numeric",1000) compls2 <- vector("numeric",1000) fitscomplement <- vector("numeric",1000) complscomplement <- vector("numeric",1000) # analyse each imputed data set with lm and bain, store the resulting # fits and complexities set.seed(100) for(i in 1:M) { regr <- lm(funumb~prenumb+postnumb,complete(out,i)) result <- bain(regr,"prenumb=0 & postnumb=0;prenumb>0 & postnumb>0") fits1[i]<-result$fit$Fit[1] compls1[i]<-result$fit$Com[1] fits2[i]<-result$fit$Fit[2] compls2[i]<-result$fit$Com[2] fitscomplement[i]<-result$fit$Fit[4] complscomplement[i]<-result$fit$Com[4] } # compute the Bayes factor from the imputed fits and complexities # see Equation 23 in Hoijtink, H., Gu, X., Mulder, J., and Rosseel, Y. (2019). # Computing Bayes Factors from Data with Missing Values. # Psychological Methods, 24, 253-268. BF1u <- sum(fits1)/sum(compls1) BF2u <- sum(fits2)/sum(compls2) BFcomplementu <- sum(fitscomplement)/sum(complscomplement) # compute PMPc PMPcH1 <- BF1u/(BF1u + BF2u + BFcomplementu) PMPcH2 <- BF2u/(BF1u + BF2u + BFcomplementu) PMPcHcomplement <- BFcomplementu/(BF1u + BF2u + BFcomplementu) ``` **3. Examples Using a `lavaan` object** a) An example of Bayesian confirmatory factor analysis. See the tutorial by Van Lissa, Gu, Mulder, Rosseel, van Zundert, and Hoijtink (2020) for further elaborations. ```{r ttest_ex10, eval=FALSE} # Load the bain and lavaan libraries. Visit www.lavaan.org for # lavaan mini-tutorials, examples, and elaborations library(bain) library(lavaan) # Specify and fit the confirmatory factor model model1 <- ' A =~ Ab + Al + Af + An + Ar + Ac B =~ Bb + Bl + Bf + Bn + Br + Bc ' # Use the lavaan sem function to execute the confirmatory factor analysis fit1 <- sem(model1, data = sesamesim, std.lv = TRUE) # Inspect the parameter names coef(fit1) # Formulate hypotheses, call bain, obtain summary stats hypotheses1 <- " A=~Ab > .6 & A=~Al > .6 & A=~Af > .6 & A=~An > .6 & A=~Ar > .6 & A=~Ac >.6 & B=~Bb > .6 & B=~Bl > .6 & B=~Bf > .6 & B=~Bn > .6 & B=~Br > .6 & B=~Bc >.6" set.seed(100) y <- bain(fit1,hypotheses1,fraction=1,standardize=TRUE) sy <- summary(y, ci = 0.95) ``` b) An example of Bayesian latent regression. See the tutorial by Van Lissa, Gu, Mulder, Rosseel, van Zundert, and Hoijtink (2020) for further elaborations. ```{r ttest_ex11, eval=FALSE} # Load the bain and lavaan libraries. Visit www.lavaan.org for # lavaan mini-tutorials, examples, and elaborations library(bain) library(lavaan) # Specify and fit the latent regression model model2 <- ' A =~ Ab + Al + Af + An + Ar + Ac B =~ Bb + Bl + Bf + Bn + Br + Bc A ~ B + age + peabody ' fit2 <- sem(model2, data = sesamesim, std.lv = TRUE) # Inspect the parameter names coef(fit2) # Formulate hypotheses, call bain, obtain summary stats hypotheses2 <- "A~B > A~peabody = A~age = 0; A~B > A~peabody > A~age = 0; A~B > A~peabody > A~age > 0" set.seed(100) y1 <- bain(fit2, hypotheses2, fraction = 1, standardize = TRUE) sy1 <- summary(y1, ci = 0.99) ``` c) An example of Bayesian multiple group regression. See the tutorial by Van Lissa, Gu, Mulder, Rosseel, van Zundert, and Hoijtink (2020) for further elaborations. ```{r ttest_ex12, eval=FALSE} # Load the bain and lavaan libraries. Visit www.lavaan.org for # lavaan mini-tutorials, examples, and elaborations library(bain) library(lavaan) # Specify A multiple group regression model model3 <- ' postnumb ~ prenumb + peabody ' # Assign labels to the groups to be used when formulating # hypotheses sesamesim$sex <- factor(sesamesim$sex, labels = c("boy", "girl")) # Fit the multiple group regression model fit3 <- sem(model3, data = sesamesim, std.lv = TRUE, group = "sex") # Inspect the parameter names coef(fit3) # Formulate and evaluate the hypotheses hypotheses3 <- "postnumb~prenumb.boy = postnumb~prenumb.girl & postnumb~peabody.boy = postnumb~peabody.girl; postnumb~prenumb.boy < postnumb~prenumb.girl & postnumb~peabody.boy < postnumb~peabody.girl " set.seed(100) y1 <- bain(fit3, hypotheses3, standardize = TRUE) sy1 <- summary(y1, ci = 0.95) ``` **4. Examples Using a named vector** a) An example of the Bayesian ANOVA using a named vector as input for bain ```{r ttest_ex13, eval=FALSE} # load the bain package which includes the simulated sesamesim data se library(bain) # make a factor of variable site sesamesim$site <- as.factor(sesamesim$site) # execute an analysis of variance using lm() which, due to the -1, returns # estimates of the means per group anov <- lm(postnumb~site-1,sesamesim) # take a look at the estimated means and their names coef(anov) # collect the estimates means in a vector estimate <- coef(anov) # give names to the estimates in anov names(estimate) <- c("site1", "site2", "site3","site4","site5") # create a vector containing the sample size of each group # (in case of missing values in the variables used, the command # below has to be modified such that only the cases without # missing values are counted) ngroup <- table(sesamesim$site) # compute for each group the covariance matrix of the parameters # of that group and collect these in a list # for the ANOVA this is simply a list containing for each group the variance # of the mean note that, the within group variance as estimated using lm is # used to compute the variance of each of the means! See, Hoijtink, Gu, and # Mulder (2019) for further elaborations. var <- summary(anov)$sigma**2 cov1 <- matrix(var/ngroup[1], nrow=1, ncol=1) cov2 <- matrix(var/ngroup[2], nrow=1, ncol=1) cov3 <- matrix(var/ngroup[3], nrow=1, ncol=1) cov4 <- matrix(var/ngroup[4], nrow=1, ncol=1) cov5 <- matrix(var/ngroup[5], nrow=1, ncol=1) covlist <- list(cov1, cov2, cov3, cov4,cov5) # set a seed value set.seed(100) # test hypotheses with bain. Note that there are multiple groups # characterized by one mean, therefore group_parameters=0. Note that are no # joint parameters, therefore, joint_parameters=0. results <- bain(estimate, "site1=site2=site3=site4=site5; site2>site5>site1>site3>site4", n=ngroup,Sigma=covlist,group_parameters=1,joint_parameters = 0) # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` b) An example of the Bayesian Welch's ANOVA (that is an ANOVA that does not assume equal within group variances) using a named vector as input for bain ```{r ttest_ex21, eval=FALSE} # Load the bain library library(bain) # make a factor of variable site sesamesim$site <- as.factor(sesamesim$site) # collect the estimated means in a vector estimates <- aggregate(sesamesim$postnumb,list(sesamesim$site),mean)[,2] # give names to the estimates names(estimates) <- c("site1", "site2", "site3","site4","site5") # create a vector containing the sample sizes of each group ngroup <- table(sesamesim$site) # compute the variances in each group and subquently the squared standard errors vars <- aggregate(sesamesim$postnumb,list(sesamesim$site),var) vargroup <- vars[,2]/ngroup # collect the variances of the means in a covariance listcov1 <- matrix(vargroup[1], nrow=1, ncol=1) cov2 <- matrix(vargroup[2], nrow=1, ncol=1) cov3 <- matrix(vargroup[3], nrow=1, ncol=1) cov4 <- matrix(vargroup[4], nrow=1, ncol=1) cov5 <- matrix(vargroup[5], nrow=1, ncol=1) covlist <- list(cov1, cov2, cov3, cov4,cov5) # set a seed value set.seed(100) # test hypotheses using bain. Note that there are multiple groups # characterized by one mean, therefore group_parameters=1. Note that # there are no joint parameters, therefore, joint_parameters=0. results <- bain(estimates, "site1=site2=site3=site4=site5;site2>site5>site1>site4>site3", n=ngroup,Sigma=covlist,group_parameters=1,joint_parameters = 0) print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` c) Bayesian robust (against non-normality) ANOVA (unequal within group variances). Robust ANOVA is elaborated in Bosman and Hoijtink (unpublished) which can be downloaded from the `bain` website. ```{r ttest_ex19, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # load the WRS2 package which renders the trimmed sample mean and # corresponding standard error library(WRS2) # make a factor of variable site sesamesim$site <- as.factor(sesamesim$site) # create a vector containing the sample sizes of each group # (in case of missing values in the variables used, the command # below has to be modified such that only the cases without # missing values are counted) ngroup <- table(sesamesim$site) # Compute the 20\% sample trimmed mean for each site estimates <- c(mean(sesamesim$postnumb[sesamesim$site == 1], tr = 0.2), mean(sesamesim$postnumb[sesamesim$site == 2], tr = 0.2), mean(sesamesim$postnumb[sesamesim$site == 3], tr = 0.2), mean(sesamesim$postnumb[sesamesim$site == 4], tr = 0.2), mean(sesamesim$postnumb[sesamesim$site == 5], tr = 0.2)) # give names to the estimates names(estimates) <- c("s1", "s2", "s3","s4","s5") # display the estimates and their names print(estimates) # Compute the sample trimmed mean standard error for each site se <- c(trimse(sesamesim$postnumb[sesamesim$site == 1]), trimse(sesamesim$postnumb[sesamesim$site == 2]), trimse(sesamesim$postnumb[sesamesim$site == 3]), trimse(sesamesim$postnumb[sesamesim$site == 4]), trimse(sesamesim$postnumb[sesamesim$site == 5])) # Square the standard errors to obtain the variances of the sample # trimmed means var <- se^2 # Store the variances in a list of matrices covlist <- list(matrix(var[1]),matrix(var[2]), matrix(var[3]),matrix(var[4]), matrix(var[5])) # set a seed value set.seed(100) # test hypotheses with bain results <- bain(estimates,"s1=s2=s3=s4=s5;s2>s5>s1>s3>s4", n=ngroup,Sigma=covlist,group_parameters=1,joint_parameters= 0) # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` d) An example of the Bayesian ANCOVA using a named vector as input for bain ```{r ttest_ex14, eval=FALSE} # load the bain package which includes the simulated sesamesim data se library(bain) # make a factor of variable site sesamesim$site <- as.factor(sesamesim$site) # center the covariate. If centered the coef() command below displays the # adjusted means. If not centered the intercepts are displayed. sesamesim$prenumb <- sesamesim$prenumb - mean(sesamesim$prenumb) # execute an analysis of covariance using lm() which, due to the -1, returns # estimates of the adjusted means per group ancov2 <- lm(postnumb~site+prenumb-1,sesamesim) # take a look at the estimated adjusted means and their names coef(ancov2) # collect the estimates of the adjusted means and regression coefficient of # the covariate in a vector (the vector has to contain first the # adjusted means and next the regression coefficients of the covariates) estimates <- coef(ancov2) # assign names to the estimates names(estimates)<- c("v.1", "v.2", "v.3", "v.4","v.5", "pre") # compute the sample size per group # (in case of missing values in the variables used, the command # below has to be modified such that only the cases without # missing values are counted) ngroup <- table(sesamesim$site) # compute for each group the covariance matrix of the parameters of that # group and collect these in a list note that, the residual variance as # estimated using lm is used to compute these covariance matrices var <- (summary(ancov2)$sigma)**2 # below, for each group, the covariance matrix of the adjusted mean and # covariate is computed # see Hoijtink, Gu, and Mulder (2019) for further explanation and elaboration cat1 <- subset(cbind(sesamesim$site,sesamesim$prenumb), sesamesim$site == 1) cat1[,1] <- 1 cat1 <- as.matrix(cat1) cov1 <- var * solve(t(cat1) %*% cat1) # cat2 <- subset(cbind(sesamesim$site,sesamesim$prenumb), sesamesim$site == 2) cat2[,1] <- 1 cat2 <- as.matrix(cat2) cov2 <- var * solve(t(cat2) %*% cat2) # cat3 <- subset(cbind(sesamesim$site,sesamesim$prenumb), sesamesim$site == 3) cat3[,1] <- 1 cat3 <- as.matrix(cat3) cov3 <- var * solve(t(cat3) %*% cat3) # cat4 <- subset(cbind(sesamesim$site,sesamesim$prenumb), sesamesim$site == 4) cat4[,1] <- 1 cat4 <- as.matrix(cat4) cov4 <- var * solve(t(cat4) %*% cat4) # cat5 <- subset(cbind(sesamesim$site,sesamesim$prenumb), sesamesim$site == 5) cat5[,1] <- 1 cat5 <- as.matrix(cat5) cov5 <- var * solve(t(cat5) %*% cat5) # covariances <- list(cov1, cov2, cov3, cov4,cov5) # set a seed value set.seed(100) # test hypotheses with bain. Note that there are multiple groups # characterized by one adjusted mean, therefore group_parameters=1 Note that # there is one covariate, therefore, joint_parameters=1. results2<-bain(estimates,"v.1=v.2=v.3=v.4=v.5;v.2 > v.5 > v.1 > v.3 >v.4;", n=ngroup,Sigma=covariances,group_parameters=1,joint_parameters = 1) # display the results print(results2) # obtain the descriptives table summary(results2, ci = 0.95) ``` e) An example of the Bayesian one within factor repeated measures analysis ```{r ttest_ex15, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # estimate the means of three repeated measures of number knowledge # the 1 denotes that these means have to be estimated within <- lm(cbind(prenumb,postnumb,funumb)~1, data=sesamesim) # take a look at the estimated means and their names coef(within) # note that the model specified in lm has three dependent variables. # Consequently, the estimates rendered by lm are collected in a "matrix". # Since bain needs a named vector containing the estimated means, the [1:3] # code is used to select the three means from a matrix and store them in a # vector. estimate <- coef(within)[1:3] # give names to the estimates in anov names(estimate) <- c("pre", "post", "fu") # compute the sample size # (in case of missing values in the variables used, the command # below has to be modified such that only the cases without # missing values are counted) ngroup <- nrow(sesamesim) # compute the covariance matrix of the three means covmatr <- list(vcov(within)) # set a seed value set.seed(100) # test hypotheses with bain. Note that there is one group, with three # means therefore group_parameters=3. results <- bain(estimate,"pre = post = fu; pre < post < fu", n=ngroup, Sigma=covmatr, group_parameters=3, joint_parameters = 0) # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` f) An example of the Bayesian one between factor and one within factor repeated measures analysis ```{r ttest_ex16, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # make a factor of the variable sex sesamesim$sex <- factor(sesamesim$sex) # estimate the means of prenumb, postnumb, and funumb for boys and girls # the -1 denotes that the means have to estimated bw <- lm(cbind(prenumb, postnumb, funumb)~sex-1, data=sesamesim) # take a look at the estimated means and their names coef(bw) # collect the estimated means in a vector est1 <-coef(bw)[1,1:3] # the three means for sex = 1 est2 <-coef(bw)[2,1:3] # the three means for sex = 2 estimate <- c(est1,est2) # give names to the estimates in anov names(estimate) <- c("pre1", "post1", "fu1","pre2", "post2", "fu2") # determine the sample size per group # (in case of missing values in the variables used, the command # below has to be modified such that only the cases without # missing values are counted) ngroup<-table(sesamesim$sex) # cov1 has to contain the covariance matrix of the three means in group 1. # cov2 has to contain the covariance matrix in group 2 # typing vcov(bw) in the console pane highlights the structure of # the covariance matrix of all 3+3=6 means # it has to be dissected in to cov1 and cov2 cov1 <- c(vcov(bw)[1,1],vcov(bw)[1,3],vcov(bw)[1,5],vcov(bw)[3,1], vcov(bw)[3,3],vcov(bw)[3,5],vcov(bw)[5,1],vcov(bw)[5,3],vcov(bw)[5,5]) cov1 <- matrix(cov1,3,3) cov2 <- c(vcov(bw)[2,2],vcov(bw)[2,4],vcov(bw)[2,6],vcov(bw)[4,2], vcov(bw)[4,4],vcov(bw)[4,6],vcov(bw)[6,2],vcov(bw)[6,4],vcov(bw)[6,6]) cov2 <- matrix(cov2,3,3) covariance<-list(cov1,cov2) # set a seed value set.seed(100) # test hypotheses with bain. Note that there are multiple groups # characterized by three means, therefore group_parameters=3. Note that there # are no additional parameters, therefore, joint_parameters=0. results <-bain(estimate, "pre1 - pre2 = post1 - post2 = fu1 -fu2; pre1 - pre2 > post1 - post2 > fu1 -fu2" , n=ngroup, Sigma=covariance, group_parameters=3, joint_parameters = 0) # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` g) An example of the Bayesian logistic regression as the counterpart of Bayesian multiple regression ```{r ttest_ex17, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # regression coefficients can only be mutually compared if the # corresponding predictors are measured on the same scale, therefore the # predictors are standardized sesamesim$age <- (sesamesim$age - mean(sesamesim$age))/sd(sesamesim$age) sesamesim$peabody <- (sesamesim$peabody - mean(sesamesim$peabody))/ sd(sesamesim$peabody) sesamesim$prenumb <- (sesamesim$prenumb - mean(sesamesim$prenumb))/ sd(sesamesim$prenumb) # estimate the logistic regression coefficients logreg <- glm(viewenc ~ age + peabody + prenumb, family=binomial, data=sesamesim) # take a look at the estimates and their names coef(logreg) # collect the estimated intercept and regression coefficients in a vector estimate <- coef(logreg) # give names to the estimates names(estimate) <- c("int", "age", "peab" ,"pre" ) # compute the sample size. Note that, this is an analysis with ONE group # (in case of missing values in the variables used, the command # below has to be modified such that only the cases without # missing values are counted) ngroup <- nrow(sesamesim) # compute the covariance matrix of the intercept and regression coefficients covmatr <- list(vcov(logreg)) # set a seed value set.seed(100) # test hypotheses with bain. Note that there is one group, with four # parameters (intercept plus three regression coefficients) therefore # group_parameters=4. results <- bain(estimate, "age = peab = pre; age > pre > peab", n=ngroup, Sigma=covmatr, group_parameters=4, joint_parameters = 0) # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` h) An example of the Bayesian logistic regression as the counterpart of an ANCOVA. This is Example 3 from Hoijtink, Gu, and Mulder (2019). The research question concerns the probability of encouragement to view sesame street (a dichotomous 0/1 meaning no/yes dependent variable) for boys (sex=1) and girls (sex=2) adjusted for their age. This is a kind of ANCOVA with a dichotomous instead of a continuous dependent variable. ```{r ttest_ex18, eval=FALSE} # load the bain package which includes the simulated sesamesim data set library(bain) # load the numDeriv package which will be used to compute the covariance # matrix of the adjusted group coefficients and regression coefficient of age # for the boys and the girls # using the estimates obtained using the data for # the boys AND the girls. library(numDeriv) # make a factor of the variable sex sesamesim$sex <- factor(sesamesim$sex) # center the covariate age sesamesim$age <- sesamesim$age - mean(sesamesim$age) # determine sample size per sex group # (in case of missing values in the variables used, the command # below has to be modified such that only the cases without # missing values are counted) ngroup <- table(sesamesim$sex) # execute the logistic regression, -1 ensures that the coefficients # for boys and girl are estimated adjusted for the covariate age anal <- glm(viewenc ~ sex + age - 1, family=binomial, data=sesamesim) # take a look at the estimates and their names coef(anal) # collect the estimates obtained using the data of the boys AND the # girls in a vector. This vector contains first the group specific # parameters followed by the regression coefficient of the covariate. estimates <-coef(anal) # give names to the estimates names(estimates) <- c("boys", "girls", "age") # use numDeriv to compute the Hessian matrix and subsequently the # covariance matrix for each of the two (boys and girls) groups. # The vector f should contain the regression coefficient of the group # at hand and the regression coefficient of the covariate. # # the first group data <- subset(cbind(sesamesim$sex,sesamesim$age,sesamesim$viewenc), sesamesim$sex==1) f <- 1 f[1] <- estimates[1] # the regression coefficient of boys f[2] <- estimates[3] # the regression coefficient of age # # within the for loop below the log likelihood of the logistic # regression model is computed using the data for the boys logist1 <- function(x){ out <- 0 for (i in 1:ngroup[1]){ out <- out + data[i,3]*(x[1] + x[2]*data[i,2]) - log (1 + exp(x[1] + x[2]*data[i,2])) } return(out) } hes1 <- hessian(func=logist1, x=f) # multiply with -1 and invert to obtain the covariance matrix for the # first group cov1 <- -1 * solve(hes1) # # the second group data <- subset(cbind(sesamesim$sex,sesamesim$age,sesamesim$viewenc), sesamesim$sex==2) f[1] <- estimates[2] # the regression coefficient of girls f[2] <- estimates[3] # the regression coefficient of age # within the for loop below the log likelihood of the logistic # regression model is computed using the data for the girls logist2 <- function(x){ out <- 0 for (i in 1:ngroup[2]){ out <- out + data[i,3]*(x[1] + x[2]*data[i,2]) - log (1 + exp(x[1] + x[2]*data[i,2])) } return(out) } hes2 <- hessian(func=logist2, x=f) # multiply with -1 and invert to obtain the covariance matrix cov2 <- -1 * solve(hes2) # #make a list of covariance matrices covariance<-list(cov1,cov2) # # set a seed value set.seed(100) # test hypotheses with bain. Note that there are multiple groups # characterized by one adjusted group coefficient, # therefore group_parameters=1. Note that there is one covariate, # therefore, joint_parameters=1. results <- bain(estimates, "boys < girls & age > 0", n=ngroup, Sigma=covariance, group_parameters=1, joint_parameters = 1) # display the results print(results) # obtain the descriptives table summary(results, ci = 0.95) ``` i) An example of Bayesian confirmatory factor analysis with a named vector object. This option uses `bain.default()`, so the user must manually extract the required information. See the tutorial by Van Lissa, Gu, Mulder, Rosseel, van Zundert, and Hoijtink (2020) for further elaborations. First, we fit a structural equation model using `lavaan`: ```{r, eval = FALSE} library(bain) library(lavaan) model1 <- 'A =~ Ab + Al + Af + An + Ar + Ac B =~ Bb + Bl + Bf + Bn + Br + Bc' fit1 <- sem(model1, data = sesamesim, std.lv = TRUE) ``` We want to specify the following hypotheses: ```{r, eval = FALSE} hypotheses <- "(Ab, Al, Af, An, Ar, Ac) >.6 & (Bb, Bl, Bf, Bn, Br, Bc) >.6" ``` Next, we extract the required information from the `lavaan` model object, in the order of the parameter table above: ```{r, eval = FALSE} # Extract standardized parameter estimates (argument x) PE1 <- parameterEstimates(fit1, standardized = TRUE) # Identify which parameter estimates are factor loadings loadings <- which(PE1$op == "=~") # Collect the standardized "std.all" factor loadings "=~" estimates <- PE1$std.all[loadings] # Assign names to the standardized factor loadings names(estimates) <- c("Ab", "Al", "Af", "An", "Ar", "Ac", "Bb", "Bl", "Bf", "Bn", "Br", "Bc") # Extract sample size (argument n) n <- nobs(fit1) # Extract the standardized model parameter covariance matrix, # selecting only the rows and columns for the loadings covariance <- lavInspect(fit1, "vcov.std.all")[loadings, loadings] # Give the covariance matrix the same names as the estimate dimnames(covariance) <- list(names(estimates), names(estimates)) # Put the covariance matrix in a list covariance <- list(covariance) # Specify number of group parameters; this simply means that there # are 12 parameters in the estimate. group_parameters <- 12 # Then, the number of joint parameters. This is always 0 for evaluations of # SEM hypotheses: joint_parameters <- 0 ``` Now, we can evaluate the hypotheses using `bain.default()`: ```{r, eval = FALSE} res <- bain(x = estimates, hypothesis = hypotheses, n = n, Sigma = covariance, group_parameters = group_parameters, joint_parameters = joint_parameters) res ``` Finally, we print the results of the bain analysis and obtain the estimates and 95% central credibility intervals ```{r, eval = FALSE} sres <- summary(res, ci = 0.95) sres ```