---
title: "Introduction to MAIHDA"
author: "Hamid Bulut"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to MAIHDA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5
)
```

## Introduction

The MAIHDA package provides specialized tools for conducting Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy. This modern epidemiological approach is highly effective for investigating intersectional health inequalities and understanding how joint social categories (e.g., Race x Gender x Education) influence individual outcomes.

By utilizing multilevel mixed-effects models (via `lme4` or `brms`), MAIHDA allows researchers to:
1. Automatically construct intersectional strata.
2. Estimate between-stratum variance and Variance Partition Coefficients (VPC).
3. Evaluate the Proportional Change in Variance (PCV): a descriptive, model-dependent measure of how much the between-stratum variance changes when additive main effects are added (it is not a causal decomposition, and a negative PCV does not by itself establish hidden structural inequality).
4. Launch an interactive Shiny Dashboard for code-free analysis.

The fastest way in is the `maihda()` function: a single call runs the whole
standard analysis and returns one object you can `print()`, `summary()`, and
`plot()`. This vignette leads with that workflow, then opens it up to show the
lower-level building blocks (`fit_maihda()` and friends) for when you need finer
control.

## Installation

```{r eval=FALSE}
# Released version (CRAN):
install.packages("MAIHDA")

# Development version (GitHub):
# install.packages("remotes")
# remotes::install_github("hdbt/MAIHDA")
```

## The data

The package includes a pedagogical subset of the National Health and Nutrition
Examination Survey (`maihda_health_data`). We use it to examine how Body Mass
Index (BMI) varies across intersectional demographic groups.

```{r load}
library(MAIHDA)
data("maihda_health_data")

# A few sections below add individual-level covariates (Age, Poverty) or compare
# variances across models.
health_complete <- maihda_health_data[complete.cases(
  maihda_health_data[, c("BMI", "Age", "Gender", "Race", "Education", "Poverty")]
), ]
```

## The `maihda()` workflow

`maihda()` runs the standard MAIHDA pipeline in a single call. It is intrinsically
a two-model decomposition: it fits the null model (the intersectional
random intercept, excluding the stratum dimensions' main effects) and  the
 adjusted model (the null plus the additive main effects of the stratum
dimensions), summarises the null model's VPC/ICC, and reports the PCV -- the
proportional change in between-stratum variance from null to adjusted, i.e. the
additive share of the intersectional inequality.

Write the stratum dimensions' main effects in the formula and `maihda()` treats it
as the fully-specified adjusted model, deriving the null by dropping them. (Omit
them -- `BMI ~ 1 + (1 | Gender:Race:Education)` -- and `maihda()` adds them to the
adjusted model for you, with a `message()`, so the decomposition stays explicit.)
Either way `maihda()` fits both models on the same analytic sample.

```{r maihda-run}
analysis <- maihda(
  BMI ~ Gender + Race + Education + (1 | Gender:Race:Education),
  data = health_complete
)

analysis                # VPC/ICC (null) and PCV (null -> adjusted)
analysis$formula        # null:     BMI ~ (1 | stratum)
analysis$adjusted_formula  # adjusted: BMI ~ Gender + Race + Education + (1 | stratum)
```

The returned object carries everything: the full variance components, the PCV
decomposition, and both fitted models.

```{r maihda-summary}
summary(analysis)       # variance components, VPC/ICC, stratum estimates
analysis$pcv            # proportional change in between-stratum variance
```

**Interpretation.** The VPC/ICC tells us what share of the total variance in BMI
lies *between* the intersectional social groups rather than *within* them. The PCV
tells us how much of that between-stratum variance is the additive sum of the
dimensions' main effects: if the variance drops substantially (high PCV), the
between-stratum differences are largely additive; what remains is often read as
the *intersectional interaction*. The PCV is a model-dependent variance change, not a causal decomposition, and a low or negative
value does not by itself automatically prove a true interaction (it can also reflect
suppression, rescaling on the latent scale for non-Gaussian models, sample
composition, or estimation uncertainty).

For publication-ready uncertainty, add `bootstrap = TRUE` (with `n_boot`) to attach
parametric-bootstrap confidence intervals to the VPC and PCV. It refits the models
many times, so it is omitted here.

### Visual diagnostics

The dedicated [plot interpretation vignette](interpreting_plots.html) explains how
to read each one.

```{r maihda-plot-vpc}
# Variance partition (VPC) -- null model
plot(analysis, type = "vpc")
```

```{r maihda-plot-predicted, fig.height = 10}
# Predicted stratum values with 95% CIs -- null model.
# 50 strata (Gender x Race x Education), so the figure is tall to keep the
# rotated stratum labels readable.
plot(analysis, type = "predicted")
```

When there are many strata, the **UpSet view** (`type = "upset"`) is a more compact
alternative to the rotated labels above. A dot matrix encodes each stratum's
category on every dimension -- a single present/absent row for a binary 0/1
dimension, one row per level for a multi-level factor such as `Race` or
`Education` -- with an intersection-size bar above and the predicted values below,
all sharing one column order (largest stratum first). Highlighting carries over:
with `highlight_by = "rope"`, strata whose interaction is credibly outside a
*region of practical equivalence* are drawn in the accent colour. Here a ROPE of
+/-0.25 BMI flags only two intersections -- a reminder that the intersectional
interactions are mostly negligible in this small teaching subset (the
[BRFSS case study](case_study_brfss.html) runs on full data, where more surface).
`n_strata` caps the columns for readability, but any flagged stratum is always
kept. Use `maihda_upset_size()` to get a width/height matched to the content.

```{r maihda-plot-upset, fig.width = 9, fig.height = 7}
# UpSet view of the same strata, highlighting ROPE-relevant interactions.
plot(analysis, type = "upset", n_strata = 25, select = "deviation",
     highlight_interactions = TRUE, highlight_by = "rope", rope = 0.25)
```

```{r maihda-plot-effect-decomp}
# Additive versus intersectional effect decomposition -- adjusted model
plot(analysis, type = "effect_decomp")
```

```{r maihda-plot-deviation, warning = FALSE}
# Individual prediction-deviance dashboard -- adjusted model
plot(analysis, type = "prediction_deviation")
```

### A crossed-dimensions alternative

Alongside the two-model PCV, `maihda()` offers a **crossed-dimensions**
decomposition (`decomposition = "crossed-dimensions"`) that estimates the
additive and interaction parts from a single model entering each dimension's
main effect as a random intercept rather than a fixed effect. See
`vignette("cross_classified", package = "MAIHDA")`.

### A contextual cross-classified model

To model the strata *crossed with* a higher-level place or institution -- the
literature's cross-classified MAIHDA (patients within strata and hospitals,
students within strata and schools) -- pass `context = "school"` to `maihda()` or
`fit_maihda()`. The summary then partitions the unexplained variance into
between-stratum vs. between-context vs. residual, and the headline VPC/ICC becomes
the between-stratum share conditional on the context random effect. Also covered in
`vignette("cross_classified", package = "MAIHDA")`.

### Design-weighted MAIHDA (survey data)

For complex-survey data (NHANES, PISA, ...), pass the sampling-weight column via
`sampling_weights`. Survey weights are not lme4's `weights=` (those are
precision weights, which rescale the residual variance), so the fit routes through
`WeMix::mix()` weighted pseudo-maximum-likelihood. The whole workflow
(VPC/ICC, PCV, stratum summaries, prediction, plots, and the AUC for binary
outcomes) is then design-weighted, with design-consistent standard errors for the
fixed effects:

```{r maihda-weighted, eval=FALSE}
weighted <- maihda(outcome ~ age + (1 | gender:race:education),
                   data = survey_data, sampling_weights = "person_weight")
weighted
```

The wemix engine covers  `gaussian(identity)` / `binomial(logit)`
MAIHDA with the single `(1 | stratum)` intercept; crossed random effects
(`context =`, `decomposition = "crossed-dimensions"`) and bootstrap intervals
require lme4/brms. `engine = "brms"` also accepts `sampling_weights`, as
likelihood weights (a pseudo-posterior whose point estimates target the
population-weighted estimand, but with credible intervals that are not
design-based).

### Comparing across groups

Add a higher-level grouping variable and `maihda()` also compares the
decomposition across its levels. `maihda_country_data` (OECD PISA 2018) has a real
country grouping -- it asks how much the joint gender x socioeconomic-status
inequality in mathematics achievement differs across six countries. This workflow
has its own [group comparison vignette](group_comparison.html), so here we just
point to it:

```{r maihda-group, eval=FALSE}
data("maihda_country_data")
by_country <- maihda(math ~ gender + ses + (1 | gender:ses),
                     data = maihda_country_data, group = "country")
by_country
plot(by_country, type = "group_vpc")
```

## Under the hood: the building blocks

`maihda()` is a convenience wrapper around a set of lower-level functions:
`fit_maihda()` fits one model, `calculate_pvc()` compares two, `summary()` reads
the variance components, and `compare_maihda_groups()` runs the group comparison.
Reach for them directly when you need finer control than the one-call workflow
gives. In particular, a custom adjusted model, a stepwise decomposition,
or the discriminatory-accuracy extensions for binary outcomes.

### Fit a single model

`fit_maihda()` builds the intersectional strata on the fly and fits one multilevel model. Use it when you
only want a single fit, e.g. the null-model VPC on its own.

```{r bb-single}
model_null <- fit_maihda(BMI ~ 1 + (1 | Gender:Race:Education), data = health_complete)
summary(model_null)
```

### A custom adjusted model and the PCV

`maihda()` only ever adds the stratum dimensions' own main effects to the
adjusted model. To ask a different question (like how much of the between-stratum
variance is explained by individual-level covariates that are not strata
dimensions, here Age and Poverty) build the two models yourself and compare them
with `calculate_pvc()`. PCV compares variances across models, so both must use the
same sample (the `health_complete` data we prepared above):

```{r bb-custom-pcv}
model_cov <- fit_maihda(BMI ~ Age + Poverty + (1 | Gender:Race:Education),
                        data = health_complete)

calculate_pvc(model_null, model_cov)
```

This PCV answers "how much between-stratum variance do Age and Poverty account
for?", which is distinct from the additive-share PCV that `maihda()` reports.
Add `bootstrap = TRUE` for parametric-bootstrap confidence intervals (omitted here
because it refits the model many times):

```{r bb-pcv-boot, eval=FALSE}
calculate_pvc(model_null, model_cov, bootstrap = TRUE, n_boot = 500)
```

### Stepwise PCV

Often researchers want to know exactly *which* variable explained the variance.
`stepwise_pcv()` adds covariates one-by-one and tracks the between-stratum variance
dynamically. It works on a data frame that already carries the `stratum` column --
the `maihda()` object exposes one ready to use as `analysis$model$original_data`:

```{r bb-stepwise}
stepwise_results <- stepwise_pcv(
  data    = analysis$model$original_data,  # carries the strata column
  outcome = "BMI",
  vars    = c("Age", "Gender", "Race", "Education", "Poverty")
)

print(stepwise_results)
```

Negative step PCVs in this table indicate an "unmasking"/suppression pattern: adding a variable increased the between-stratum variance. This is a model-dependent change in variance, not proof that a hidden structural inequality was causally revealed (a negative PCV can also reflect suppression, rescaling, sample composition, or estimation uncertainty).

### Discriminatory accuracy and the response-scale VPC

For a binary outcome the discriminatory accuracy (AUC / C-statistic and Median
Odds Ratio) is reported automatically. It rides along on the `maihda()`
summaries and headline `print()` (and on `summary(fit_maihda(...))`), so the strata's
AUC sits next to the VPC with no extra call. The probability-scale (response) VPC is
estimated by simulation, so it is opt-in: add `response_vpc = TRUE` (with a `seed`).
You can still call the helpers directly on the fitted models when you want them on
their own:

```{r bb-da, eval=FALSE}
# A binary-outcome analysis
ob <- maihda(Obese ~ Gender + Race + (1 | Gender:Race), data = maihda_health_data,
             response_vpc = TRUE, seed = 1)
ob                         
summary(ob)                 # carries the discriminatory_accuracy (+ vpc_response) slots

# ...or call the pieces directly on the fitted maihda_model objects:
maihda_discriminatory_accuracy(ob$model)           # AUC + MOR, null model
maihda_discriminatory_accuracy(ob$model_adjusted)  # adjusted model
maihda_vpc_response(ob$model, seed = 1)            # probability-scale VPC
```

The dedicated [binary outcomes vignette](binary_outcomes.html) walks through the
logistic model, the latent-scale VPC, and this discriminatory-accuracy recipe in
full.

### The group comparison directly

`compare_maihda_groups()` is the function `maihda(group = ...)` calls under the
hood. Use it directly for the same stratified comparison without the rest of the
one-call pipeline:

```{r bb-group, eval=FALSE}
data("maihda_country_data")
compare_maihda_groups(math ~ 1 + (1 | gender:ses),
                      data = maihda_country_data, group = "country")
```

## Where to next

This vignette is the hub for the rest of the documentation:

- **[MAIHDA for binary outcomes](binary_outcomes.html)** -- logistic MAIHDA, the
  latent-scale VPC, and a discriminatory-accuracy (AUC) recipe.
- **[Interpreting MAIHDA plots and diagnostics](interpreting_plots.html)** -- how
  to read every plot type shown above, and what *not* to conclude from each.
- **[Comparing intersectional inequality across groups](group_comparison.html)** --
  the full `maihda(group = ...)` / `compare_maihda_groups()` workflow.
- **[Crossed random effects: dimensions and contexts](cross_classified.html)** --
  the single-model additive-vs-interaction (crossed-dimensions) alternative to the
  two-model PCV, and the contextual cross-classified MAIHDA (strata crossed with
  schools, hospitals, regions) via `context =`.
- **[Interactive data analysis](interactive_app.html)** -- the no-code Shiny
  dashboard.

## Interactive Shiny App

The MAIHDA package ships with a fully-featured, interactive Shiny Dashboard.

You can upload your own data (CSV, SPSS `.sav`, Stata `.dta`), dynamically select variables, and compute Stepwise PCV tables and prediction plots.

```{r eval=FALSE}
# Launch the interactive interface
run_maihda_app()
```

## References

- Evans, C. R., Williams, D. R., Onnela, J. P., & Subramanian, S. V. (2018). A multilevel approach to modeling health inequalities at the intersection of multiple social identities. *Social Science & Medicine*, 203, 64-73.

- Merlo, J. (2018). Multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA) within an intersectional framework. *Social Science & Medicine*, 203, 74-80.