--- title: "Gaussian VCMoE Simulation Tutorial" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Gaussian VCMoE Simulation Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` This vignette shows the basic steps for a Gaussian varying-coefficient mixture-of-experts model: 1. install and load `VCMoE`; 2. simulate a small Gaussian example; 3. fit `vcmoe_fit()`; 4. inspect coefficient functions, posterior probabilities, and diagnostics. The example uses `k = 2`, which is the easiest setting for learning the package API and checking that the label-alignment diagnostics behave as expected. ## Installation from GitHub Install the package from GitHub with `remotes`: ```{r install, eval=FALSE} install.packages("remotes") remotes::install_github("qc-zhao/VCMoE") ``` Then load the package: ```{r} library(VCMoE) ``` ## Simulate Gaussian data `simulate_vcmoe_gaussian()` returns a data frame and the known truth used to generate it. The response is Gaussian, while the component-specific expert coefficients and gating probabilities vary with the continuous coordinate `u`. ```{r simulate} sim <- simulate_vcmoe_gaussian( n = 240, k = 2, seed = 1, separation = 1.4, scenario = "well_separated" ) head(sim$data) str(sim$truth, max.level = 1) ``` ## Fit the model The formula has two parts: - `y ~ z1` is the expert mean model; - `| x1` is the gating model. The varying coordinate is supplied separately through `u = "u"`. By default, `vcmoe_fit()` uses Epanechnikov density weights, a scaled local-linear basis, and unit scaling of `u`. ```{r fit} fit <- vcmoe_fit( y ~ z1 | x1, data = sim$data, u = "u", family = "gaussian", k = 2, bandwidth = 0.30, u_grid = seq(0.1, 0.9, length.out = 5), control = list(maxit = 80, n_starts = 2, seed = 2) ) fit ``` ## Coefficients and predictions Expert coefficients are returned as an array indexed by grid point, component, and term. Posterior probabilities are returned with one column per component. ```{r coefficients} expert_coef <- coef(fit, "expert") dim(expert_coef) expert_coef[, , "z1"] ``` ```{r predictions} posterior <- predict(fit, type = "posterior") head(posterior) rowSums(head(posterior)) fitted_mean <- predict(fit, type = "mean") head(fitted_mean) ``` ## Diagnostics Always inspect diagnostics before interpreting coefficient functions. The most important early checks are grid convergence, label ambiguity, component proportions, posterior entropy, and effective local sample size. ```{r diagnostics} diagnostics <- vcmoe_diagnostics(fit) diagnostics[, c("u", "converged", "ambiguous", "posterior_entropy", "effective_n")] ``` ## Plots `plot_coefficients()` shows the fitted coefficient functions. For a simulated example, the plot is mainly a quick visual check that the fitted paths are smooth and labels remain connected across the grid. ```{r coefficient-plot} plot_coefficients(fit, "expert") ``` `plot_posterior()` summarizes posterior component probabilities over `u`. ```{r posterior-plot} plot_posterior(fit) ``` ## Optional: bandwidth selection For real analyses, bandwidth should usually be selected rather than fixed by hand. The cross-validation selector uses held-out predictive log-likelihood and returns a final refit by default. ```{r bandwidth, eval=FALSE} selection <- vcmoe_select_bandwidth( y ~ z1 | x1, data = sim$data, u = "u", family = "gaussian", k = 2, bandwidth_grid = c(0.24, 0.30, 0.36), folds = 3, u_grid = seq(0.1, 0.9, length.out = 5), control = list(maxit = 80, n_starts = 2, seed = 3), seed = 4 ) selection selection$best_bandwidth ```