A Tutorial on Mixtures of Quantile Regressions

This tutorial is written for applied researchers across education, economics, sociology, political science, public health, and adjacent fields, who suspect that one regression line is hiding more than one story, and who care about more than the average. You do not need prior exposure to quantile regression or to mixture models. Each idea is introduced in words before any equation, every analysis step is shown in runnable code, and every result is visualized.

By the end you will be able to: recognize when a mixture of quantile regressions is the right tool; fit one with mixqr; read and visualize the estimates; classify observations into the recovered subgroups; check whether the answer can be trusted; describe effects across the whole outcome distribution; and report the analysis reproducibly.

This tutorial covers the core of the framework. mixqr is built on a single EM platform with an engine/extension contract, so the same machinery also supports expectile and M-quantile component families, component-specific penalized selection, and non-crossing multi-quantile estimation (see the Extensions article), while the companion package mixqrgate adds location-varying gating.

library(mixqr)
library(ggplot2)
library(quantreg)

1. The problem: one line, two stories

Most regression models a single relationship: “a one-unit change in x moves y by β.” But populations are often mixtures of unobserved subgroups, each with its own relationship. A few examples from the social sciences:

  • Schools. The link between family socioeconomic status (SES) and a student’s test score is steep in some schools and shallow in others, and you may not have a clean label for “which kind of school.”
  • Labor markets. Returns to education differ between a primary, credentialed segment and a secondary, casualized one.
  • Program evaluation. A policy helps a “responder” subgroup and barely moves a “non-responder” subgroup. The average effect describes neither.

When you fit one line to a mixture, you get a number that averages away the groups and mirrors none of them. A finite mixture of regressions instead assumes the data come from a small number of latent groups (called components or regimes) and estimates a separate regression in each, discovering the grouping and the group-specific effects jointly, without a pre-existing label (McLachlan and Peel 2000).

mixqr does this for quantile regressions rather than mean regressions. That buys two things, explained next.

2. Why quantiles, and why a mixture

Why quantiles. A quantile regression at level \(\tau\) models the conditional \(\tau\)-th quantile of the outcome (the median at \(\tau = 0.5\), a lower tail at \(\tau = 0.1\), an upper tail at \(\tau = 0.9\)) instead of the mean (Koenker and Bassett 1978; Koenker 2005). The median is far more robust to skew and outliers than the mean, and fitting several quantiles reveals effects that a mean model cannot: a predictor may lift the bottom of the distribution while leaving the top alone.

Why a mixture. Within each latent regime, mixqr fits a quantile regression; across regimes, a mixing probability says how prevalent each regime is. Formally, with a latent class \(Z \in \{1, \dots, m\}\) and \(\Pr(Z = j) = \pi_j\), the conditional \(\tau\)-th quantile in regime \(j\) is \[ Q_\tau(Y \mid x, Z = j) = x'\beta_j , \] and the components are estimated by an EM algorithm so that the grouping and the \(\beta_j\) are found together (Wu and Yao 2016).

Why not simpler tools?

Approach What it misses
OLS / single mean regression averages the regimes away; sensitive to skew
A single quantile regression robust and distributional, but still one line
Cluster first (k-means), then regress two-stage; clusters ignore the relationship, and the second-stage SEs ignore the clustering uncertainty
mixqr finds regimes and their quantile-specific effects jointly, with calibrated uncertainty

3. An illustrative dataset

To keep everything reproducible and transparent, we simulate a student- achievement example with a known structure, then let mixqr recover it. (You would use your own data; the workflow is identical.)

We generate 600 students. Each belongs to one of two unobserved school regimes:

  • Supportive schools (60% of students): a high baseline and a strong SES gradient.
  • Under-resourced schools (40%): a lower baseline and a weaker gradient.

Two features make this a job for quantile methods. Scores are right-skewed (a few unusually high achievers), so the median is a steadier summary than the mean. Within each regime the spread of scores grows with SES, so SES affects the top of the distribution differently from the bottom, a pattern a mean model cannot express.

simulate_schools <- function(n = 600, pi_A = 0.6) {
  ses <- rnorm(n)                                   # standardized SES
  regime <- ifelse(runif(n) < pi_A, "A", "B")
  err <- exp(rnorm(n, 0, 0.45)) - 1                 # right-skewed, median 0
  spread <- ifelse(regime == "A", 6, 9) * exp(0.30 * ses)  # grows with SES
  score <- ifelse(regime == "A",
                  56 + 8.0 * ses,                   # supportive schools
                  46 + 3.0 * ses) + spread * err
  data.frame(score = score, ses = ses, regime = factor(regime))
}

schools <- simulate_schools()
head(schools)
#>      score         ses regime
#> 1 48.77229  0.52058907      B
#> 2 46.61438 -1.07969076      A
#> 3 55.72871  0.13923812      A
#> 4 58.51637 -0.08474878      A
#> 5 53.17665 -0.66663962      A
#> 6 40.02345 -2.51608903      B

The raw scatter already hints at two fans of points, but a single median line (dashed) splits the difference and represents neither regime:

single <- rq(score ~ ses, tau = 0.5, data = schools)

ggplot(schools, aes(ses, score)) +
  geom_point(alpha = 0.35, colour = "grey30") +
  geom_abline(intercept = coef(single)[1], slope = coef(single)[2],
              linewidth = 1.1, linetype = "dashed") +
  labs(x = "Family SES (standardized)", y = "Math score",
       title = "One median line hides the subgroups",
       subtitle = "A single quantile regression averages two regimes together") +
  theme_mixqr()

Scatter of score against SES with a single pooled median line.

4. Fitting your first model

We fit a two-component mixture of median regressions. We ask for stochastic-EM standard errors (the recommended, calibrated method; see Section 8) and several restarts because the mixture likelihood is multimodal.

fit <- mixqr(score ~ ses, data = schools, tau = 0.5, m = 2,
             nstart = 25, variance = "stochEM",
             vcontrol = mixqr_vcontrol(B = 200, seed = 1))
fit
#> Mixture of quantile regressions (mixqr)
#>   engine: ald   tau = 0.5   components m = 2   n = 600
#>   converged: TRUE in 29 iterations
#> 
#> Mixing probabilities (pi):
#>  comp1  comp2 
#> 0.2626 0.7374 
#> 
#> Component coefficients (beta):
#>               comp1   comp2
#> (Intercept) 44.8695 55.3545
#> ses          2.2444  7.7639
#> 
#> logLik = -1844.296   AIC = 3702.59   BIC = 3733.37

Read the output as two regressions plus the mixing probabilities. summary() adds standard errors, \(z\)-values, and the diagnostics discussed later:

summary(fit)
#> Mixture of quantile regressions (mixqr) -- summary
#>   engine: ald   tau = 0.5   m = 2   n = 600
#> 
#> Component 1  (pi = 0.2626):
#>             Estimate Std.Err z value Pr(>|z|)    
#> (Intercept)  44.8695  0.3719 120.649  < 2e-16 ***
#> ses           2.2444  0.3879   5.786 7.21e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Component 2  (pi = 0.7374):
#>             Estimate Std.Err z value Pr(>|z|)    
#> (Intercept)  55.3545  0.2026  273.27   <2e-16 ***
#> ses           7.7639  0.1926   40.31   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Missing-information fraction (separability): 0.242
#> Responsibility overlap (0 = separated, 1 = overlapping): 0.293
#> 
#> logLik = -1844.296 (ALD working likelihood)   AIC = 3702.59   BIC = 3733.37

The estimated components recover the design: one regime with a high intercept and a steep SES slope (the supportive schools), and one with a lower intercept and a shallower slope (the under-resourced schools). Components are ordered by slope, so component 1 is always the shallower-gradient regime.

5. Visualizing the estimates

5.1 The two regimes

The single most useful picture overlays the component median lines on the data, colored by each student’s most likely regime.

# Components are ordered by SES slope, so component 1 is the shallower-gradient
# (under-resourced) regime and component 2 the steeper (supportive) one.
comp_label <- c("Under-resourced schools", "Supportive schools")
pal2 <- c("Under-resourced schools" = "#e07b39",
          "Supportive schools"      = "#1b6ca8")

schools$regime_hat <- factor(comp_label[predict(fit, type = "class")],
                             levels = comp_label)

xseq <- seq(min(schools$ses), max(schools$ses), length.out = 100)
lines_df <- do.call(rbind, lapply(1:2, function(j) {
  data.frame(ses = xseq,
             score = as.numeric(cbind(1, xseq) %*% fit$beta[, j]),
             regime = factor(comp_label[j], levels = comp_label))
}))

ggplot(schools, aes(ses, score)) +
  geom_point(aes(colour = regime_hat), alpha = 0.45, size = 1.8) +
  geom_line(data = lines_df, aes(colour = regime), linewidth = 1.3) +
  scale_colour_manual(values = pal2, name = "Recovered regime") +
  labs(x = "Family SES (standardized)", y = "Math score",
       title = "Two SES–achievement gradients, recovered jointly",
       subtitle = "Component median regressions from mixqr") +
  theme_mixqr()

Score against SES, points coloured by recovered regime, with the two component median lines.

5.2 A coefficient comparison

For reporting, a coefficient plot with confidence intervals communicates the substantive contrast between regimes at a glance. We pull the component coefficients and their intervals straight from confint().

ci <- confint(fit)                      # Wald intervals from stochastic-EM SEs
keep <- grep("^comp", rownames(ci))
coef_df <- data.frame(
  label  = rownames(ci)[keep],
  est    = c(as.numeric(fit$beta)),
  lo     = ci[keep, 1], hi = ci[keep, 2]
)
comp_idx <- as.integer(sub("comp(\\d+):.*", "\\1", coef_df$label))
coef_df$regime <- factor(comp_label[comp_idx], levels = comp_label)
coef_df$term <- sub("comp\\d+:", "", coef_df$label)
coef_df$term <- factor(coef_df$term, labels = c("Intercept", "SES slope"))

ggplot(coef_df, aes(est, regime, colour = regime)) +
  geom_pointrange(aes(xmin = lo, xmax = hi), linewidth = 0.9, size = 0.7) +
  facet_wrap(~ term, scales = "free_x") +
  scale_colour_manual(values = pal2, guide = "none") +
  labs(x = "Estimate (95% CI)", y = NULL,
       title = "Regime-specific effects with uncertainty",
       subtitle = "The SES gradient is far steeper in the supportive regime") +
  theme_mixqr()

Coefficient estimates with 95% confidence intervals for each regime, by term.

6. Who is in which group?

Mixture models give soft memberships: each observation has a posterior probability of belonging to each regime. Hard labels (type = "class") take the most likely regime, but the probabilities themselves are informative: many points are assigned with near-certainty, a few are ambiguous.

post <- predict(fit, type = "posterior")
schools$p_supportive <- post[, 2]       # component 2 = supportive (steeper) regime

ggplot(schools, aes(p_supportive)) +
  geom_histogram(bins = 30, fill = "#1b6ca8", colour = "white") +
  labs(x = "Posterior probability of the supportive regime",
       y = "Students",
       title = "Most students are classified with confidence",
       subtitle = "Mass near 0 and 1 means confident assignments") +
  theme_mixqr()

Histogram of the probability of belonging to regime A.

Here about 75% of students are assigned to a regime with probability above 0.9. Only a few sit in the ambiguous middle. The package summarizes the separation in one number, the responsibility overlap (0 = perfectly separated, 1 = fully overlapping), moderate in this example:

fit$diagnostics$overlap
#> [1] 0.2934361

7. Diagnostics: can you trust it?

Two checks and a caveat matter for a mixture of quantile regressions.

7.1 The component error densities

mixqr’s semiparametric engine estimates each regime’s error distribution non-parametrically, constrained so its \(\tau\)-th quantile is exactly zero (Hall and Presnell 1999). Plotting these densities shows the shape of each regime’s residuals and confirms the constraint (the mass below 0 equals \(\tau\)).

fit_kd <- mixqr(score ~ ses, data = schools, tau = 0.5, m = 2,
                engine = "kdEM", init = "ald", nstart = 15)

rng <- range(vapply(fit_kd$density, function(d) range(d$grid), numeric(2)))
tt <- seq(rng[1], rng[2], length.out = 400)
dens_df <- do.call(rbind, lapply(1:2, function(j) {
  data.frame(resid = tt, density = fit_kd$density[[j]]$eval(tt),
             regime = factor(comp_label[j], levels = comp_label))
}))

ggplot(dens_df, aes(resid, density, colour = regime)) +
  geom_line(linewidth = 1.1) +
  geom_vline(xintercept = 0, linetype = 3, colour = "grey40") +
  scale_colour_manual(values = pal2, name = "Regime") +
  labs(x = "Residual", y = "Density",
       title = "Each regime has its own (skewed) error distribution",
       subtitle = "Estimated by the Wu & Yao kernel-density EM; tau-quantile fixed at 0") +
  theme_mixqr()

Estimated component error densities with the constraint marker at zero.

7.2 Two engines agree

The fast "ald" engine assumes a parametric (asymmetric Laplace) error; the "kdEM" engine estimates the error shape. When they give similar component lines, you can trust the parametric default; when they diverge, the data have structure the parametric model misses. Here they agree closely.

rbind(ALD = as.numeric(fit$beta), kdEM = as.numeric(fit_kd$beta))
#>          [,1]     [,2]     [,3]     [,4]
#> ALD  44.86947 2.244374 55.35448 7.763871
#> kdEM 45.08783 2.248773 55.42191 7.815351

7.3 A caveat worth knowing

The semiparametric estimator can be biased when the regimes overlap heavily and the errors are strongly asymmetric (a property of the estimating equations, not a bug; see Wu and Yao (2016), sec. 6). The overlap diagnostic above is your early warning, and cross-checking the two engines is the practical guard. With well-separated regimes (the common, useful case), the estimates are reliable.

8. Inference done right

mixqr offers two standard-error methods:

  • variance = "sparsity" — fast, but conditional on the classification: it treats group membership as known and therefore understates uncertainty. summary() flags this.
  • variance = "stochEM" — a stochastic-EM multiple-imputation estimator (Wu and Yao 2016, Alg. 3.1) that propagates the classification and mixing uncertainty. Use this method for inference. A Monte-Carlo benchmark (see the Validation article) shows its intervals reach roughly nominal 95% coverage for the regression coefficients.
confint(fit)
#>                        2.5 %    97.5 %
#> comp1:(Intercept) 44.1405626 45.598387
#> comp1:ses          1.4841146  3.004633
#> comp2:(Intercept) 54.9574726 55.751493
#> comp2:ses          7.3864166  8.141325
#> pi1                0.2140562  0.311070

One caveat: the mixing-probability estimate carries a finite-sample bias, so its interval is correctly sized but can sit off-center. Here the true split is 60/40, yet the estimate is about 74/26 — a live reminder to report \(\pi\) as an approximate split, not to the decimal.

9. Beyond the median: the whole distribution

The real payoff of quantile mixtures is describing each regime across the outcome distribution, beyond its median. We refit at \(\tau = 0.1, 0.5, 0.9\) and draw each regime’s conditional-quantile lines. (Components are slope-ordered, so they align across \(\tau\).)

taus <- c(0.1, 0.5, 0.9)
multi <- do.call(rbind, lapply(taus, function(tt) {
  f <- mixqr(score ~ ses, data = schools, tau = tt, m = 2, nstart = 15)
  do.call(rbind, lapply(1:2, function(j) {
    data.frame(ses = xseq,
               score = as.numeric(cbind(1, xseq) %*% f$beta[, j]),
               regime = factor(comp_label[j], levels = comp_label), tau = tt)
  }))
}))

ggplot(schools, aes(ses, score)) +
  geom_point(alpha = 0.18, colour = "grey45", size = 1.2) +
  geom_line(data = multi,
            aes(colour = regime, group = interaction(regime, tau),
                linewidth = factor(tau))) +
  scale_colour_manual(values = pal2, name = "Regime") +
  scale_linewidth_manual(values = c("0.1" = 0.5, "0.5" = 1.2, "0.9" = 0.5),
                         name = "Quantile") +
  labs(x = "Family SES (standardized)", y = "Math score",
       title = "Conditional quantile bands by regime",
       subtitle = "Wider bands at high SES reveal growing within-regime inequality") +
  theme_mixqr()

Conditional quantile lines at tau = 0.1, 0.5, 0.9 for each regime.

The vertical spread between a regime’s 10th- and 90th-percentile lines is its within-regime inequality at a given SES. Where that gap widens with SES, the top and bottom of the regime pull apart — a distributional finding invisible to a mean model. (Fitting several \(\tau\) separately can in principle let a regime’s quantile lines cross; jointly enforcing non-crossing is the role of a companion extension to mixqr.)

10. How many groups?

We assumed two regimes. In practice you choose m. mixqr_select() compares component counts. Cross-validated predictive log-likelihood ("cv") is engine- agnostic and penalizes complexity; AIC/BIC are available too (with the usual caveat that mixture-component counts sit on a parameter-space boundary).

sel <- mixqr_select(score ~ ses, data = schools, tau = 0.5, m = 1:4,
                    criterion = "cv", folds = 5, nstart = 8,
                    control = mixqr_control(seed = 1))
sel$table
#>   m cv_negloglik
#> 1 1     1977.857
#> 2 2     1850.712
#> 3 3     1849.940
#> 4 4     1856.446

ggplot(sel$table, aes(m, cv_negloglik)) +
  geom_line(colour = "#1b6ca8") +
  geom_point(size = 3, colour = "#1b6ca8") +
  geom_point(data = sel$table[sel$table$m == 2, ],
             size = 5, shape = 21, fill = "#e07b39", colour = "#e07b39") +
  labs(x = "Number of components (m)", y = "CV held-out negative log-likelihood",
       title = "Choosing the number of regimes",
       subtitle = "Lower is better; the elbow at two regimes is the parsimonious choice") +
  theme_mixqr()

Cross-validated score against number of components.

The sharp drop from one to two components, then the near-flat stretch beyond, is the signature of two genuine regimes. The strict cross-validation minimum here falls at m = 3 (1850 versus 1851 at m = 2 — a rounding-error apart, while m = 1 is far worse). Read the elbow, not the bare minimum: two regimes is the parsimonious, defensible choice.

11. Reporting and reproducibility

A complete, reproducible write-up needs little:

  • the quantile level \(\tau\), the number of components m and how it was chosen, the engine, and the number of restarts;
  • component coefficients with stochastic-EM intervals (Section 8);
  • the overlap / separability diagnostic (Section 6) and, ideally, the two-engine cross-check (Section 7);
  • a seed.
# everything needed to reproduce the headline fit
fit <- mixqr(score ~ ses, data = schools, tau = 0.5, m = 2,
             nstart = 25, variance = "stochEM",
             vcontrol = mixqr_vcontrol(B = 200, seed = 1))
summary(fit)

That is the whole arc. One model call turned “one line, two stories” into two regimes, their quantile-specific effects, calibrated intervals, and diagnostics that say when to trust them. The same few lines of code scale from this teaching example to your own data.

Citation

If mixqr supports your research, please cite the package and the underlying method:

citation("mixqr")

Venkitasubramanian, K. (2026). mixqr: Finite Mixtures of Quantile Regressions. R package version 0.1.0.9000. https://github.com/kvenkita/mixqr

Wu, Q. & Yao, W. (2016). Mixtures of quantile regressions. Computational Statistics & Data Analysis, 93, 162–176.

References

Hall, Peter, and Brett Presnell. 1999. “Density Estimation Under Constraints.” Journal of Computational and Graphical Statistics 8 (2): 259–77.
Koenker, Roger. 2005. Quantile Regression. Cambridge University Press.
Koenker, Roger, and Gilbert Bassett. 1978. “Regression Quantiles.” Econometrica 46 (1): 33–50.
McLachlan, Geoffrey, and David Peel. 2000. Finite Mixture Models. Wiley.
Wu, Qiang, and Weixin Yao. 2016. “Mixtures of Quantile Regressions.” Computational Statistics & Data Analysis 93: 162–76. https://doi.org/10.1016/j.csda.2014.04.014.