A-quick-tour-of-StMoE

Introduction

StMoE (Skew-t Mixture-of-Experts) provides a flexible and robust modelling framework for heterogenous data with possibly skewed, heavy-tailed distributions and corrupted by atypical observations. StMoE consists of a mixture of K skew-t expert regressors network (of degree p) gated by a softmax gating network (of degree q) and is represented by:

  • The gating network parameters alpha’s of the softmax net.
  • The experts network parameters: The location parameters (regression coefficients) beta’s, scale parameters sigma’s, the skewness parameters lambda’s and the degree of freedom parameters nu’s. StMoE thus generalises mixtures of (normal, skew-normal, t, and skew-t) distributions and mixtures of regressions with these distributions. For example, when q = 0, we retrieve mixtures of (skew-t, t-, skew-normal, or normal) regressions, and when both p = 0 and q = 0, it is a mixture of (skew-t, t-, skew-normal, or normal) distributions. It also reduces to the standard (normal, skew-normal, t, and skew-t) distribution when we only use a single expert (K = 1).

Model estimation/learning is performed by a dedicated expectation conditional maximization (ECM) algorithm by maximizing the observed data log-likelihood. We provide simulated examples to illustrate the use of the model in model-based clustering of heterogeneous regression data and in fitting non-linear regression functions.

It was written in R Markdown, using the knitr package for production.

See help(package="meteorits") for further details and references provided by citation("meteorits").

Application to a simulated dataset

Generate sample

n <- 500 # Size of the sample
alphak <- matrix(c(0, 8), ncol = 1) # Parameters of the gating network
betak <- matrix(c(0, -2.5, 0, 2.5), ncol = 2) # Regression coefficients of the experts
sigmak <- c(0.5, 0.5) # Standard deviations of the experts
lambdak <- c(3, 5) # Skewness parameters of the experts
nuk <- c(5, 7) # Degrees of freedom of the experts network t densities
x <- seq.int(from = -1, to = 1, length.out = n) # Inputs (predictors)

# Generate sample of size n
sample <- sampleUnivStMoE(alphak = alphak, betak = betak, sigmak = sigmak, 
                          lambdak = lambdak, nuk = nuk, x = x)
y <- sample$y

Set up StMoE model parameters

K <- 2 # Number of regressors/experts
p <- 1 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

stmoe <- emStMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - StMoE: Iteration: 1 | log-likelihood: -397.948314671532
## EM - StMoE: Iteration: 2 | log-likelihood: -350.93972199686
## EM - StMoE: Iteration: 3 | log-likelihood: -349.240283640744
## EM - StMoE: Iteration: 4 | log-likelihood: -347.770731236186
## EM - StMoE: Iteration: 5 | log-likelihood: -346.431831376919
## EM - StMoE: Iteration: 6 | log-likelihood: -345.159432404738
## EM - StMoE: Iteration: 7 | log-likelihood: -343.932277630414
## EM - StMoE: Iteration: 8 | log-likelihood: -342.746164097378
## EM - StMoE: Iteration: 9 | log-likelihood: -341.61066331776
## EM - StMoE: Iteration: 10 | log-likelihood: -340.529285089636
## EM - StMoE: Iteration: 11 | log-likelihood: -339.50610071294
## EM - StMoE: Iteration: 12 | log-likelihood: -338.546145505096
## EM - StMoE: Iteration: 13 | log-likelihood: -337.6486394321
## EM - StMoE: Iteration: 14 | log-likelihood: -336.812062829723
## EM - StMoE: Iteration: 15 | log-likelihood: -336.037063649317
## EM - StMoE: Iteration: 16 | log-likelihood: -335.323865049103
## EM - StMoE: Iteration: 17 | log-likelihood: -334.669120856753
## EM - StMoE: Iteration: 18 | log-likelihood: -334.065165483472
## EM - StMoE: Iteration: 19 | log-likelihood: -333.501607812029
## EM - StMoE: Iteration: 20 | log-likelihood: -332.965794988019
## EM - StMoE: Iteration: 21 | log-likelihood: -332.434277272932
## EM - StMoE: Iteration: 22 | log-likelihood: -331.878002557026
## EM - StMoE: Iteration: 23 | log-likelihood: -331.253653101185
## EM - StMoE: Iteration: 24 | log-likelihood: -330.504141103484
## EM - StMoE: Iteration: 25 | log-likelihood: -329.559752552549
## EM - StMoE: Iteration: 26 | log-likelihood: -328.349406321036
## EM - StMoE: Iteration: 27 | log-likelihood: -326.804558485056
## EM - StMoE: Iteration: 28 | log-likelihood: -324.878320187264
## EM - StMoE: Iteration: 29 | log-likelihood: -322.572712842797
## EM - StMoE: Iteration: 30 | log-likelihood: -319.948971160812
## EM - StMoE: Iteration: 31 | log-likelihood: -317.120231764782
## EM - StMoE: Iteration: 32 | log-likelihood: -314.228878599706
## EM - StMoE: Iteration: 33 | log-likelihood: -311.416074038568
## EM - StMoE: Iteration: 34 | log-likelihood: -308.789972994702
## EM - StMoE: Iteration: 35 | log-likelihood: -306.420022880788
## EM - StMoE: Iteration: 36 | log-likelihood: -304.330485207363
## EM - StMoE: Iteration: 37 | log-likelihood: -302.520532732544
## EM - StMoE: Iteration: 38 | log-likelihood: -300.967170468015
## EM - StMoE: Iteration: 39 | log-likelihood: -299.645575292584
## EM - StMoE: Iteration: 40 | log-likelihood: -298.523866635305
## EM - StMoE: Iteration: 41 | log-likelihood: -297.57539508599
## EM - StMoE: Iteration: 42 | log-likelihood: -296.771042600862
## EM - StMoE: Iteration: 43 | log-likelihood: -296.087197942475
## EM - StMoE: Iteration: 44 | log-likelihood: -295.505213405586
## EM - StMoE: Iteration: 45 | log-likelihood: -295.007135904701
## EM - StMoE: Iteration: 46 | log-likelihood: -294.579141923347
## EM - StMoE: Iteration: 47 | log-likelihood: -294.210334251161
## EM - StMoE: Iteration: 48 | log-likelihood: -293.891756643337
## EM - StMoE: Iteration: 49 | log-likelihood: -293.616274028923
## EM - StMoE: Iteration: 50 | log-likelihood: -293.37775658278
## EM - StMoE: Iteration: 51 | log-likelihood: -293.171799844088
## EM - StMoE: Iteration: 52 | log-likelihood: -292.994502716825
## EM - StMoE: Iteration: 53 | log-likelihood: -292.841571184246
## EM - StMoE: Iteration: 54 | log-likelihood: -292.709688698353
## EM - StMoE: Iteration: 55 | log-likelihood: -292.595840191259
## EM - StMoE: Iteration: 56 | log-likelihood: -292.497616932657
## EM - StMoE: Iteration: 57 | log-likelihood: -292.412820308307
## EM - StMoE: Iteration: 58 | log-likelihood: -292.339614741704
## EM - StMoE: Iteration: 59 | log-likelihood: -292.276450806372
## EM - StMoE: Iteration: 60 | log-likelihood: -292.221954128326
## EM - StMoE: Iteration: 61 | log-likelihood: -292.175007166992
## EM - StMoE: Iteration: 62 | log-likelihood: -292.135450971376
## EM - StMoE: Iteration: 63 | log-likelihood: -292.102121573629
## EM - StMoE: Iteration: 64 | log-likelihood: -292.074057543758
## EM - StMoE: Iteration: 65 | log-likelihood: -292.050459437671
## EM - StMoE: Iteration: 66 | log-likelihood: -292.030658635995
## EM - StMoE: Iteration: 67 | log-likelihood: -292.014093071031
## EM - StMoE: Iteration: 68 | log-likelihood: -292.000207062251
## EM - StMoE: Iteration: 69 | log-likelihood: -291.988697397527
## EM - StMoE: Iteration: 70 | log-likelihood: -291.979215566862
## EM - StMoE: Iteration: 71 | log-likelihood: -291.97144859128
## EM - StMoE: Iteration: 72 | log-likelihood: -291.965150137581
## EM - StMoE: Iteration: 73 | log-likelihood: -291.960103112887
## EM - StMoE: Iteration: 74 | log-likelihood: -291.956121596583
## EM - StMoE: Iteration: 75 | log-likelihood: -291.953046073114
## EM - StMoE: Iteration: 76 | log-likelihood: -291.950739478271

Summary

stmoe$summary()
## ------------------------------------------
## Fitted Skew t Mixture-of-Experts model
## ------------------------------------------
## 
## StMoE model with K = 2 experts:
## 
##  log-likelihood df       AIC       BIC       ICL
##       -291.9507 12 -303.9507 -329.2384 -329.3226
## 
## Clustering table (Number of observations in each expert):
## 
##   1   2 
## 246 254 
## 
## Regression coefficients:
## 
##     Beta(k = 1) Beta(k = 2)
## 1   -0.04092653  -0.1102991
## X^1  2.56992285  -2.4339405
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2)
##      0.5989221     0.5147709

Plots

Mean curve

stmoe$plot(what = "meancurve")

Confidence regions

stmoe$plot(what = "confregions")

Clusters

stmoe$plot(what = "clusters")

Log-likelihood

stmoe$plot(what = "loglikelihood")

Application to real dataset

Load data

library(MASS)
data("mcycle")
x <- mcycle$times
y <- mcycle$accel

Set up StMoE model parameters

K <- 4 # Number of regressors/experts
p <- 2 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

stmoe <- emStMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - StMoE: Iteration: 1 | log-likelihood: -600.826947369113
## EM - StMoE: Iteration: 2 | log-likelihood: -589.977357431034
## EM - StMoE: Iteration: 3 | log-likelihood: -587.167104670215
## EM - StMoE: Iteration: 4 | log-likelihood: -586.39063569506
## EM - StMoE: Iteration: 5 | log-likelihood: -585.772911963497
## EM - StMoE: Iteration: 6 | log-likelihood: -583.979686780225
## EM - StMoE: Iteration: 7 | log-likelihood: -577.639617948489
## EM - StMoE: Iteration: 8 | log-likelihood: -570.532765150934
## EM - StMoE: Iteration: 9 | log-likelihood: -565.320213419045
## EM - StMoE: Iteration: 10 | log-likelihood: -562.869076654372
## EM - StMoE: Iteration: 11 | log-likelihood: -562.282402938482
## EM - StMoE: Iteration: 12 | log-likelihood: -562.191915724672
## EM - StMoE: Iteration: 13 | log-likelihood: -562.190451737714

Summary

stmoe$summary()
## ------------------------------------------
## Fitted Skew t Mixture-of-Experts model
## ------------------------------------------
## 
## StMoE model with K = 4 experts:
## 
##  log-likelihood df       AIC       BIC      ICL
##       -562.1905 30 -592.1905 -635.5457 -635.542
## 
## Clustering table (Number of observations in each expert):
## 
##  1  2  3  4 
## 28 37 31 37 
## 
## Regression coefficients:
## 
##     Beta(k = 1) Beta(k = 2)  Beta(k = 3)  Beta(k = 4)
## 1    -7.1531379  992.484590 -2096.748499 193.34820147
## X^1   1.2479320 -103.890334   132.327916  -8.94276956
## X^2  -0.1088423    2.432542    -2.040076   0.09275632
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2) Sigma2(k = 3) Sigma2(k = 4)
##       27.15949      454.1761      336.1893      1085.644

Plots

Mean curve

stmoe$plot(what = "meancurve")

Confidence regions

stmoe$plot(what = "confregions")

Clusters

stmoe$plot(what = "clusters")

Log-likelihood

stmoe$plot(what = "loglikelihood")