--- title: "Introduction to MAIHDA" author: "Hamid Bulut" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to MAIHDA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) ``` ## Introduction The MAIHDA package provides specialized tools for conducting Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy. This modern epidemiological approach is highly effective for investigating intersectional health inequalities and understanding how joint social categories (e.g., Race x Gender x Education) influence individual outcomes. By utilizing multilevel mixed-effects models (via `lme4` or `brms`), MAIHDA allows researchers to: 1. Automatically construct intersectional strata. 2. Estimate between-stratum variance and Variance Partition Coefficients (VPC). 3. Evaluate the Proportional Change in Variance (PCV): a descriptive, model-dependent measure of how much the between-stratum variance changes when additive main effects are added (it is not a causal decomposition, and a negative PCV does not by itself establish hidden structural inequality). 4. Launch an interactive Shiny Dashboard for code-free analysis. The fastest way in is the `maihda()` function: a single call runs the whole standard analysis and returns one object you can `print()`, `summary()`, and `plot()`. This vignette leads with that workflow, then opens it up to show the lower-level building blocks (`fit_maihda()` and friends) for when you need finer control. ## Installation ```{r eval=FALSE} # Released version (CRAN): install.packages("MAIHDA") # Development version (GitHub): # install.packages("remotes") # remotes::install_github("hdbt/MAIHDA") ``` ## The data The package includes a pedagogical subset of the National Health and Nutrition Examination Survey (`maihda_health_data`). We use it to examine how Body Mass Index (BMI) varies across intersectional demographic groups. ```{r load} library(MAIHDA) data("maihda_health_data") # A few sections below add individual-level covariates (Age, Poverty) or compare # variances across models. health_complete <- maihda_health_data[complete.cases( maihda_health_data[, c("BMI", "Age", "Gender", "Race", "Education", "Poverty")] ), ] ``` ## The `maihda()` workflow `maihda()` runs the standard MAIHDA pipeline in a single call. It is intrinsically a two-model decomposition: it fits the null model (the intersectional random intercept, excluding the stratum dimensions' main effects) and the adjusted model (the null plus the additive main effects of the stratum dimensions), summarises the null model's VPC/ICC, and reports the PCV -- the proportional change in between-stratum variance from null to adjusted, i.e. the additive share of the intersectional inequality. Write the stratum dimensions' main effects in the formula and `maihda()` treats it as the fully-specified adjusted model, deriving the null by dropping them. (Omit them -- `BMI ~ 1 + (1 | Gender:Race:Education)` -- and `maihda()` adds them to the adjusted model for you, with a `message()`, so the decomposition stays explicit.) Either way `maihda()` fits both models on the same analytic sample. ```{r maihda-run} analysis <- maihda( BMI ~ Gender + Race + Education + (1 | Gender:Race:Education), data = health_complete ) analysis # VPC/ICC (null) and PCV (null -> adjusted) analysis$formula # null: BMI ~ (1 | stratum) analysis$adjusted_formula # adjusted: BMI ~ Gender + Race + Education + (1 | stratum) ``` The returned object carries everything: the full variance components, the PCV decomposition, and both fitted models. ```{r maihda-summary} summary(analysis) # variance components, VPC/ICC, stratum estimates analysis$pcv # proportional change in between-stratum variance ``` **Interpretation.** The VPC/ICC tells us what share of the total variance in BMI lies *between* the intersectional social groups rather than *within* them. The PCV tells us how much of that between-stratum variance is the additive sum of the dimensions' main effects: if the variance drops substantially (high PCV), the between-stratum differences are largely additive; what remains is often read as the *intersectional interaction*. The PCV is a model-dependent variance change, not a causal decomposition, and a low or negative value does not by itself automatically prove a true interaction (it can also reflect suppression, rescaling on the latent scale for non-Gaussian models, sample composition, or estimation uncertainty). For publication-ready uncertainty, add `bootstrap = TRUE` (with `n_boot`) to attach parametric-bootstrap confidence intervals to the VPC and PCV. It refits the models many times, so it is omitted here. ### Visual diagnostics The dedicated [plot interpretation vignette](interpreting_plots.html) explains how to read each one. ```{r maihda-plot-vpc} # Variance partition (VPC) -- null model plot(analysis, type = "vpc") ``` ```{r maihda-plot-predicted, fig.height = 10} # Predicted stratum values with 95% CIs -- null model. # 50 strata (Gender x Race x Education), so the figure is tall to keep the # rotated stratum labels readable. plot(analysis, type = "predicted") ``` When there are many strata, the **UpSet view** (`type = "upset"`) is a more compact alternative to the rotated labels above. A dot matrix encodes each stratum's category on every dimension -- a single present/absent row for a binary 0/1 dimension, one row per level for a multi-level factor such as `Race` or `Education` -- with an intersection-size bar above and the predicted values below, all sharing one column order (largest stratum first). Highlighting carries over: with `highlight_by = "rope"`, strata whose interaction is credibly outside a *region of practical equivalence* are drawn in the accent colour. Here a ROPE of +/-0.25 BMI flags only two intersections -- a reminder that the intersectional interactions are mostly negligible in this small teaching subset (the [BRFSS case study](case_study_brfss.html) runs on full data, where more surface). `n_strata` caps the columns for readability, but any flagged stratum is always kept. Use `maihda_upset_size()` to get a width/height matched to the content. ```{r maihda-plot-upset, fig.width = 9, fig.height = 7} # UpSet view of the same strata, highlighting ROPE-relevant interactions. plot(analysis, type = "upset", n_strata = 25, select = "deviation", highlight_interactions = TRUE, highlight_by = "rope", rope = 0.25) ``` ```{r maihda-plot-effect-decomp} # Additive versus intersectional effect decomposition -- adjusted model plot(analysis, type = "effect_decomp") ``` ```{r maihda-plot-deviation, warning = FALSE} # Individual prediction-deviance dashboard -- adjusted model plot(analysis, type = "prediction_deviation") ``` ### A crossed-dimensions alternative Alongside the two-model PCV, `maihda()` offers a **crossed-dimensions** decomposition (`decomposition = "crossed-dimensions"`) that estimates the additive and interaction parts from a single model entering each dimension's main effect as a random intercept rather than a fixed effect. See `vignette("cross_classified", package = "MAIHDA")`. ### A contextual cross-classified model To model the strata *crossed with* a higher-level place or institution -- the literature's cross-classified MAIHDA (patients within strata and hospitals, students within strata and schools) -- pass `context = "school"` to `maihda()` or `fit_maihda()`. The summary then partitions the unexplained variance into between-stratum vs. between-context vs. residual, and the headline VPC/ICC becomes the between-stratum share conditional on the context random effect. Also covered in `vignette("cross_classified", package = "MAIHDA")`. ### Design-weighted MAIHDA (survey data) For complex-survey data (NHANES, PISA, ...), pass the sampling-weight column via `sampling_weights`. Survey weights are not lme4's `weights=` (those are precision weights, which rescale the residual variance), so the fit routes through `WeMix::mix()` weighted pseudo-maximum-likelihood. The whole workflow (VPC/ICC, PCV, stratum summaries, prediction, plots, and the AUC for binary outcomes) is then design-weighted, with design-consistent standard errors for the fixed effects: ```{r maihda-weighted, eval=FALSE} weighted <- maihda(outcome ~ age + (1 | gender:race:education), data = survey_data, sampling_weights = "person_weight") weighted ``` The wemix engine covers `gaussian(identity)` / `binomial(logit)` MAIHDA with the single `(1 | stratum)` intercept; crossed random effects (`context =`, `decomposition = "crossed-dimensions"`) and bootstrap intervals require lme4/brms. `engine = "brms"` also accepts `sampling_weights`, as likelihood weights (a pseudo-posterior whose point estimates target the population-weighted estimand, but with credible intervals that are not design-based). ### Comparing across groups Add a higher-level grouping variable and `maihda()` also compares the decomposition across its levels. `maihda_country_data` (OECD PISA 2018) has a real country grouping -- it asks how much the joint gender x socioeconomic-status inequality in mathematics achievement differs across six countries. This workflow has its own [group comparison vignette](group_comparison.html), so here we just point to it: ```{r maihda-group, eval=FALSE} data("maihda_country_data") by_country <- maihda(math ~ gender + ses + (1 | gender:ses), data = maihda_country_data, group = "country") by_country plot(by_country, type = "group_vpc") ``` ## Under the hood: the building blocks `maihda()` is a convenience wrapper around a set of lower-level functions: `fit_maihda()` fits one model, `calculate_pvc()` compares two, `summary()` reads the variance components, and `compare_maihda_groups()` runs the group comparison. Reach for them directly when you need finer control than the one-call workflow gives. In particular, a custom adjusted model, a stepwise decomposition, or the discriminatory-accuracy extensions for binary outcomes. ### Fit a single model `fit_maihda()` builds the intersectional strata on the fly and fits one multilevel model. Use it when you only want a single fit, e.g. the null-model VPC on its own. ```{r bb-single} model_null <- fit_maihda(BMI ~ 1 + (1 | Gender:Race:Education), data = health_complete) summary(model_null) ``` ### A custom adjusted model and the PCV `maihda()` only ever adds the stratum dimensions' own main effects to the adjusted model. To ask a different question (like how much of the between-stratum variance is explained by individual-level covariates that are not strata dimensions, here Age and Poverty) build the two models yourself and compare them with `calculate_pvc()`. PCV compares variances across models, so both must use the same sample (the `health_complete` data we prepared above): ```{r bb-custom-pcv} model_cov <- fit_maihda(BMI ~ Age + Poverty + (1 | Gender:Race:Education), data = health_complete) calculate_pvc(model_null, model_cov) ``` This PCV answers "how much between-stratum variance do Age and Poverty account for?", which is distinct from the additive-share PCV that `maihda()` reports. Add `bootstrap = TRUE` for parametric-bootstrap confidence intervals (omitted here because it refits the model many times): ```{r bb-pcv-boot, eval=FALSE} calculate_pvc(model_null, model_cov, bootstrap = TRUE, n_boot = 500) ``` ### Stepwise PCV Often researchers want to know exactly *which* variable explained the variance. `stepwise_pcv()` adds covariates one-by-one and tracks the between-stratum variance dynamically. It works on a data frame that already carries the `stratum` column -- the `maihda()` object exposes one ready to use as `analysis$model$original_data`: ```{r bb-stepwise} stepwise_results <- stepwise_pcv( data = analysis$model$original_data, # carries the strata column outcome = "BMI", vars = c("Age", "Gender", "Race", "Education", "Poverty") ) print(stepwise_results) ``` Negative step PCVs in this table indicate an "unmasking"/suppression pattern: adding a variable increased the between-stratum variance. This is a model-dependent change in variance, not proof that a hidden structural inequality was causally revealed (a negative PCV can also reflect suppression, rescaling, sample composition, or estimation uncertainty). ### Discriminatory accuracy and the response-scale VPC For a binary outcome the discriminatory accuracy (AUC / C-statistic and Median Odds Ratio) is reported automatically. It rides along on the `maihda()` summaries and headline `print()` (and on `summary(fit_maihda(...))`), so the strata's AUC sits next to the VPC with no extra call. The probability-scale (response) VPC is estimated by simulation, so it is opt-in: add `response_vpc = TRUE` (with a `seed`). You can still call the helpers directly on the fitted models when you want them on their own: ```{r bb-da, eval=FALSE} # A binary-outcome analysis ob <- maihda(Obese ~ Gender + Race + (1 | Gender:Race), data = maihda_health_data, response_vpc = TRUE, seed = 1) ob summary(ob) # carries the discriminatory_accuracy (+ vpc_response) slots # ...or call the pieces directly on the fitted maihda_model objects: maihda_discriminatory_accuracy(ob$model) # AUC + MOR, null model maihda_discriminatory_accuracy(ob$model_adjusted) # adjusted model maihda_vpc_response(ob$model, seed = 1) # probability-scale VPC ``` The dedicated [binary outcomes vignette](binary_outcomes.html) walks through the logistic model, the latent-scale VPC, and this discriminatory-accuracy recipe in full. ### The group comparison directly `compare_maihda_groups()` is the function `maihda(group = ...)` calls under the hood. Use it directly for the same stratified comparison without the rest of the one-call pipeline: ```{r bb-group, eval=FALSE} data("maihda_country_data") compare_maihda_groups(math ~ 1 + (1 | gender:ses), data = maihda_country_data, group = "country") ``` ## Where to next This vignette is the hub for the rest of the documentation: - **[MAIHDA for binary outcomes](binary_outcomes.html)** -- logistic MAIHDA, the latent-scale VPC, and a discriminatory-accuracy (AUC) recipe. - **[Interpreting MAIHDA plots and diagnostics](interpreting_plots.html)** -- how to read every plot type shown above, and what *not* to conclude from each. - **[Comparing intersectional inequality across groups](group_comparison.html)** -- the full `maihda(group = ...)` / `compare_maihda_groups()` workflow. - **[Crossed random effects: dimensions and contexts](cross_classified.html)** -- the single-model additive-vs-interaction (crossed-dimensions) alternative to the two-model PCV, and the contextual cross-classified MAIHDA (strata crossed with schools, hospitals, regions) via `context =`. - **[Interactive data analysis](interactive_app.html)** -- the no-code Shiny dashboard. ## Interactive Shiny App The MAIHDA package ships with a fully-featured, interactive Shiny Dashboard. You can upload your own data (CSV, SPSS `.sav`, Stata `.dta`), dynamically select variables, and compute Stepwise PCV tables and prediction plots. ```{r eval=FALSE} # Launch the interactive interface run_maihda_app() ``` ## References - Evans, C. R., Williams, D. R., Onnela, J. P., & Subramanian, S. V. (2018). A multilevel approach to modeling health inequalities at the intersection of multiple social identities. *Social Science & Medicine*, 203, 64-73. - Merlo, J. (2018). Multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA) within an intersectional framework. *Social Science & Medicine*, 203, 74-80.