Three method extensions built on the core EM, each closing a documented gap in the mixture-of-quantile-regressions toolkit.
family = "expectile" /
"mquantile"). Asymmetric-least-squares (Newey & Powell 1987) and asymmetric-
Huber (Breckling & Chambers 1988) component losses, fitted by IRLS through new
registry engines. Expectile components are crossing-free in the asymmetry level
by construction; M-quantile dials between the quantile and the expectile.mixqr_pen()). A SCAD /
adaptive-LASSO / LASSO / MCP penalty on the weighted check-loss M-step (the
quantile analogue of Khalili & Chen 2007), with each component getting its own
sparse support, a mixture BIC path for tuning, and component pruning. The inner
solve reuses rqPen; selectedVars() reports the active set per component.mixqr_nc()). Fits a vector of quantile levels jointly with one latent
classification shared across all levels (a coupled E-step), closing the two
problems Wu & Yao (2016, sec. 5) leave open: cross-level classification
ambiguity and within-component crossing (repaired by monotone rearrangement,
Chernozhukov, Fernandez-Val & Galichon 2010). sim_mixqr_cross() provides a
crossing-exhibiting design for demonstrations.First CRAN release.
Exported two extension-API building blocks, weighted_rq() and
constrained_kde(), so companion packages (location-varying gating,
non-crossing) can reuse the component and error-density machinery without
forking the core.
Post-review refinements (correctness and performance) addressing two independent adversarial peer reviews:
Constraint integrity (R1). The constrained KDE now preserves the
tau-quantile = 0 constraint in every feasible case: the two-constant
Hall-Presnell weights are used only when non-negative and well-conditioned,
otherwise a per-point empirical-likelihood tilt (Hall & Presnell 1999) enforces
the constraint (verified for tau in {0.05, 0.5, 0.9, 0.95}). Genuinely
infeasible (one-sided) components are flagged via fit$diagnostics$constraint
and a warning, never silently mis-calibrated.
Faithful Algorithm 3.1 (R2). The stochastic-EM P-step now draws the mixing probabilities (rejection-sampled, eq. 3.4) and the error density (bootstrap), not only the regression coefficients.
Calibrated standard errors (R3). Sparsity SEs are disclosed as
classification-conditional (in summary()); under-supported components and
rank-deficient weighted designs now warn; se_method/se_conditional recorded.
kdEM performance (R4). The E-step uses O(n) grid interpolation and the grid
is built by a binned/FFT KDE (stats::density), removing the O(n^2) cost --
kdEM is now ~3x ALD (was ~220x), meeting the speed target.
Bounded separability diagnostic (R6). mi_fraction is now a bounded
trace ratio in [0, 1] (previously could return ~1e14 on imbalanced clusters).
New responsibility-based overlap diagnostic (fit$diagnostics$overlap),
independent of the stochastic-EM path.
Real data (R5). Ships the engine dataset (Brinkman 1981 ethanol-combustion
data, the Wu & Yao Fig. 5 example); the README example now uses it. Added a
golden test reproducing the Wu & Yao Table 1 simulation means.
Selection rigor (R7). mixqr_select() gains criterion = "cv"
(K-fold cross-validated held-out predictive log-likelihood) that PENALISES
complexity and works for either engine; AIC/BIC selection now emits the
mixture-boundary caveat and the ALD likelihood is labelled a working likelihood.
Slope-based identifiability (R8). Default label ordering is now by slope (aligned with Wu & Yao Thm 2.1's distinct-slope condition), and the distinctness guard uses a scale-relative threshold.
Robustness / UX (R10). Rank-deficient (collinear) designs now error
clearly; added confint.mixqr() (Wald intervals).
Calibrated standard errors. The sparsity variance now reads f(0) off a
kernel density estimate of the component residuals (Wu & Yao 2016, p.166)
rather than the ALD working density. A Monte-Carlo benchmark
(inst/benchmarks/se_coverage.R) shows variance = "stochEM" now achieves
~95% (near-nominal) coverage for the regression coefficients, up from ~67-77%;
the mixing-probability intervals reflect the documented finite-sample pi-bias.
Diagnostics & docs. New mixqr() help sections on the Wu & Yao sec.6
semiparametric bias and on standard-error validity; predict(type = "quantile_byclass"); component-collapse and ALD non-monotonicity warnings;
fit$total_iter (total EM iterations across starts). Removed the dead
package URL.
Documentation site. A full pkgdown website with a comprehensive applied
tutorial ("A Tutorial on Mixtures of Quantile Regressions") featuring
publication-ready ggplot2 visualizations, a get-started vignette, and a
validation & diagnostics article. Added inst/CITATION, author/affiliation
metadata, and documented every exported method.
First release. The frequentist EM substrate (sub-project 01 of the QMM suite).
mixqr() fits finite mixtures of tau-quantile regressions with two engines:
"ald" (fast parametric asymmetric-Laplace mixture, genuine likelihood + AIC/BIC)
and "kdEM" (Wu & Yao 2016 kernel-density EM with nonparametric component error
densities, unequal or pooled), via a generic pluggable EM driver mixqr_em().V_W + (1 + 1/B) V_B) with a
cluster-separability diagnostic.mixqr_select() for component-count selection (AIC/BIC).sim_mixqr2() / sim_mixqr3() reproducing the Wu & Yao
2- and 3-component designs.register_mixqr_engine()) and reserved
diagnostics$crossing / diagnostics$class_stability slots — the integration
channel for QMM sub-projects 03 (gating) and 04 (non-crossing).Note: this v0.1 is pure R. Rcpp acceleration of the KDE/E-step hot loops is planned.