A-quick-tour-of-StMoE

Introduction

StMoE (Skew-t Mixture-of-Experts) provides a flexible and robust modelling framework for heterogenous data with possibly skewed, heavy-tailed distributions and corrupted by atypical observations. StMoE consists of a mixture of K skew-t expert regressors network (of degree p) gated by a softmax gating network (of degree q) and is represented by:

  • The gating network parameters alpha’s of the softmax net.
  • The experts network parameters: The location parameters (regression coefficients) beta’s, scale parameters sigma’s, the skewness parameters lambda’s and the degree of freedom parameters nu’s. StMoE thus generalises mixtures of (normal, skew-normal, t, and skew-t) distributions and mixtures of regressions with these distributions. For example, when q = 0, we retrieve mixtures of (skew-t, t-, skew-normal, or normal) regressions, and when both p = 0 and q = 0, it is a mixture of (skew-t, t-, skew-normal, or normal) distributions. It also reduces to the standard (normal, skew-normal, t, and skew-t) distribution when we only use a single expert (K = 1).

Model estimation/learning is performed by a dedicated expectation conditional maximization (ECM) algorithm by maximizing the observed data log-likelihood. We provide simulated examples to illustrate the use of the model in model-based clustering of heterogeneous regression data and in fitting non-linear regression functions.

It was written in R Markdown, using the knitr package for production.

See help(package="meteorits") for further details and references provided by citation("meteorits").

Application to a simulated dataset

Generate sample

n <- 500 # Size of the sample
alphak <- matrix(c(0, 8), ncol = 1) # Parameters of the gating network
betak <- matrix(c(0, -2.5, 0, 2.5), ncol = 2) # Regression coefficients of the experts
sigmak <- c(0.5, 0.5) # Standard deviations of the experts
lambdak <- c(3, 5) # Skewness parameters of the experts
nuk <- c(5, 7) # Degrees of freedom of the experts network t densities
x <- seq.int(from = -1, to = 1, length.out = n) # Inputs (predictors)

# Generate sample of size n
sample <- sampleUnivStMoE(alphak = alphak, betak = betak, sigmak = sigmak, 
                          lambdak = lambdak, nuk = nuk, x = x)
y <- sample$y

Set up StMoE model parameters

K <- 2 # Number of regressors/experts
p <- 1 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

stmoe <- emStMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - StMoE: Iteration: 1 | log-likelihood: -401.81209245848
## EM - StMoE: Iteration: 2 | log-likelihood: -317.779679767392
## EM - StMoE: Iteration: 3 | log-likelihood: -314.67388631997
## EM - StMoE: Iteration: 4 | log-likelihood: -311.359628558564
## EM - StMoE: Iteration: 5 | log-likelihood: -307.78051824594
## EM - StMoE: Iteration: 6 | log-likelihood: -303.98230070978
## EM - StMoE: Iteration: 7 | log-likelihood: -300.097841709606
## EM - StMoE: Iteration: 8 | log-likelihood: -296.277122954763
## EM - StMoE: Iteration: 9 | log-likelihood: -292.6498848343
## EM - StMoE: Iteration: 10 | log-likelihood: -289.289896638777
## EM - StMoE: Iteration: 11 | log-likelihood: -286.232658562367
## EM - StMoE: Iteration: 12 | log-likelihood: -283.495626728641
## EM - StMoE: Iteration: 13 | log-likelihood: -281.066750731053
## EM - StMoE: Iteration: 14 | log-likelihood: -278.930154857514
## EM - StMoE: Iteration: 15 | log-likelihood: -277.065197229512
## EM - StMoE: Iteration: 16 | log-likelihood: -275.445036658335
## EM - StMoE: Iteration: 17 | log-likelihood: -274.037985350042
## EM - StMoE: Iteration: 18 | log-likelihood: -272.82433268013
## EM - StMoE: Iteration: 19 | log-likelihood: -271.78116148937
## EM - StMoE: Iteration: 20 | log-likelihood: -270.884197695598
## EM - StMoE: Iteration: 21 | log-likelihood: -270.113247329307
## EM - StMoE: Iteration: 22 | log-likelihood: -269.449817355314
## EM - StMoE: Iteration: 23 | log-likelihood: -268.879378039627
## EM - StMoE: Iteration: 24 | log-likelihood: -268.388133759789
## EM - StMoE: Iteration: 25 | log-likelihood: -267.965121843003
## EM - StMoE: Iteration: 26 | log-likelihood: -267.600625784425
## EM - StMoE: Iteration: 27 | log-likelihood: -267.286323898909
## EM - StMoE: Iteration: 28 | log-likelihood: -267.015175106688
## EM - StMoE: Iteration: 29 | log-likelihood: -266.781207013669
## EM - StMoE: Iteration: 30 | log-likelihood: -266.580691920397
## EM - StMoE: Iteration: 31 | log-likelihood: -266.409621052948
## EM - StMoE: Iteration: 32 | log-likelihood: -266.263401135313
## EM - StMoE: Iteration: 33 | log-likelihood: -266.138404785398
## EM - StMoE: Iteration: 34 | log-likelihood: -266.031407072634
## EM - StMoE: Iteration: 35 | log-likelihood: -265.94007965443
## EM - StMoE: Iteration: 36 | log-likelihood: -265.862906351406
## EM - StMoE: Iteration: 37 | log-likelihood: -265.797605376078
## EM - StMoE: Iteration: 38 | log-likelihood: -265.742320465321
## EM - StMoE: Iteration: 39 | log-likelihood: -265.695508142223
## EM - StMoE: Iteration: 40 | log-likelihood: -265.655867248312
## EM - StMoE: Iteration: 41 | log-likelihood: -265.622307336802
## EM - StMoE: Iteration: 42 | log-likelihood: -265.593910629074
## EM - StMoE: Iteration: 43 | log-likelihood: -265.569902082265
## EM - StMoE: Iteration: 44 | log-likelihood: -265.549449512629
## EM - StMoE: Iteration: 45 | log-likelihood: -265.532225052961
## EM - StMoE: Iteration: 46 | log-likelihood: -265.517739854904
## EM - StMoE: Iteration: 47 | log-likelihood: -265.505681018372
## EM - StMoE: Iteration: 48 | log-likelihood: -265.495950090745
## EM - StMoE: Iteration: 49 | log-likelihood: -265.488174458731
## EM - StMoE: Iteration: 50 | log-likelihood: -265.482022229976
## EM - StMoE: Iteration: 51 | log-likelihood: -265.47723313599
## EM - StMoE: Iteration: 52 | log-likelihood: -265.47358766487
## EM - StMoE: Iteration: 53 | log-likelihood: -265.470901303581
## EM - StMoE: Iteration: 54 | log-likelihood: -265.469018592482

Summary

stmoe$summary()
## ------------------------------------------
## Fitted Skew t Mixture-of-Experts model
## ------------------------------------------
## 
## StMoE model with K = 2 experts:
## 
##  log-likelihood df      AIC       BIC       ICL
##        -265.469 12 -277.469 -302.7567 -302.8938
## 
## Clustering table (Number of observations in each expert):
## 
##   1   2 
## 249 251 
## 
## Regression coefficients:
## 
##     Beta(k = 1) Beta(k = 2)
## 1    0.02163754  -0.0383771
## X^1  2.63534077  -2.6393947
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2)
##      0.3947074     0.6059948

Plots

Mean curve

stmoe$plot(what = "meancurve")

Confidence regions

stmoe$plot(what = "confregions")

Clusters

stmoe$plot(what = "clusters")

Log-likelihood

stmoe$plot(what = "loglikelihood")

Application to real dataset

Load data

library(MASS)
data("mcycle")
x <- mcycle$times
y <- mcycle$accel

Set up StMoE model parameters

K <- 4 # Number of regressors/experts
p <- 2 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

stmoe <- emStMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - StMoE: Iteration: 1 | log-likelihood: -597.426749159848
## EM - StMoE: Iteration: 2 | log-likelihood: -587.633539364609
## EM - StMoE: Iteration: 3 | log-likelihood: -585.542984585197
## EM - StMoE: Iteration: 4 | log-likelihood: -583.81949782034
## EM - StMoE: Iteration: 5 | log-likelihood: -578.866712252264
## EM - StMoE: Iteration: 6 | log-likelihood: -571.44800718929
## EM - StMoE: Iteration: 7 | log-likelihood: -566.442380825496
## EM - StMoE: Iteration: 8 | log-likelihood: -564.127085103519
## EM - StMoE: Iteration: 9 | log-likelihood: -563.424066027458
## EM - StMoE: Iteration: 10 | log-likelihood: -563.194976854722
## EM - StMoE: Iteration: 11 | log-likelihood: -563.031164889133
## EM - StMoE: Iteration: 12 | log-likelihood: -562.839394860444
## EM - StMoE: Iteration: 13 | log-likelihood: -562.630291064863
## EM - StMoE: Iteration: 14 | log-likelihood: -562.459796751008
## EM - StMoE: Iteration: 15 | log-likelihood: -562.355288404497
## EM - StMoE: Iteration: 16 | log-likelihood: -562.296261704008
## EM - StMoE: Iteration: 17 | log-likelihood: -562.257809497387
## EM - StMoE: Iteration: 18 | log-likelihood: -562.226835146055
## EM - StMoE: Iteration: 19 | log-likelihood: -562.19853863136
## EM - StMoE: Iteration: 20 | log-likelihood: -562.171508794506
## EM - StMoE: Iteration: 21 | log-likelihood: -562.145119027878
## EM - StMoE: Iteration: 22 | log-likelihood: -562.119182563836
## EM - StMoE: Iteration: 23 | log-likelihood: -562.093656670039
## EM - StMoE: Iteration: 24 | log-likelihood: -562.068288188687
## EM - StMoE: Iteration: 25 | log-likelihood: -562.043099343603
## EM - StMoE: Iteration: 26 | log-likelihood: -562.018145927714
## EM - StMoE: Iteration: 27 | log-likelihood: -561.993201058435
## EM - StMoE: Iteration: 28 | log-likelihood: -561.968291421414
## EM - StMoE: Iteration: 29 | log-likelihood: -561.943451338026
## EM - StMoE: Iteration: 30 | log-likelihood: -561.918286930147
## EM - StMoE: Iteration: 31 | log-likelihood: -561.893066482208
## EM - StMoE: Iteration: 32 | log-likelihood: -561.867607117531
## EM - StMoE: Iteration: 33 | log-likelihood: -561.84188765202
## EM - StMoE: Iteration: 34 | log-likelihood: -561.815834331202
## EM - StMoE: Iteration: 35 | log-likelihood: -561.789303167768
## EM - StMoE: Iteration: 36 | log-likelihood: -561.76216783815
## EM - StMoE: Iteration: 37 | log-likelihood: -561.734499446201
## EM - StMoE: Iteration: 38 | log-likelihood: -561.706092853058
## EM - StMoE: Iteration: 39 | log-likelihood: -561.676854780656
## EM - StMoE: Iteration: 40 | log-likelihood: -561.646726512576
## EM - StMoE: Iteration: 41 | log-likelihood: -561.615584508525
## EM - StMoE: Iteration: 42 | log-likelihood: -561.583301774297
## EM - StMoE: Iteration: 43 | log-likelihood: -561.54959415656
## EM - StMoE: Iteration: 44 | log-likelihood: -561.514542454158
## EM - StMoE: Iteration: 45 | log-likelihood: -561.477581939837
## EM - StMoE: Iteration: 46 | log-likelihood: -561.438874067672
## EM - StMoE: Iteration: 47 | log-likelihood: -561.398041545451
## EM - StMoE: Iteration: 48 | log-likelihood: -561.354549459744
## EM - StMoE: Iteration: 49 | log-likelihood: -561.3085452286
## EM - StMoE: Iteration: 50 | log-likelihood: -561.259571636992
## EM - StMoE: Iteration: 51 | log-likelihood: -561.207283700561
## EM - StMoE: Iteration: 52 | log-likelihood: -561.1513039981
## EM - StMoE: Iteration: 53 | log-likelihood: -561.090806954384
## EM - StMoE: Iteration: 54 | log-likelihood: -561.025147193872
## EM - StMoE: Iteration: 55 | log-likelihood: -560.95316748705
## EM - StMoE: Iteration: 56 | log-likelihood: -560.874243329723
## EM - StMoE: Iteration: 57 | log-likelihood: -560.787101069648
## EM - StMoE: Iteration: 58 | log-likelihood: -560.691094041454
## EM - StMoE: Iteration: 59 | log-likelihood: -560.585755222598
## EM - StMoE: Iteration: 60 | log-likelihood: -560.470671147694
## EM - StMoE: Iteration: 61 | log-likelihood: -560.347296217678
## EM - StMoE: Iteration: 62 | log-likelihood: -560.222412545517
## EM - StMoE: Iteration: 63 | log-likelihood: -560.11387488377
## EM - StMoE: Iteration: 64 | log-likelihood: -560.045708853332
## EM - StMoE: Iteration: 65 | log-likelihood: -560.017987203664
## EM - StMoE: Iteration: 66 | log-likelihood: -560.009261293748
## EM - StMoE: Iteration: 67 | log-likelihood: -560.006324812458

Summary

stmoe$summary()
## ------------------------------------------
## Fitted Skew t Mixture-of-Experts model
## ------------------------------------------
## 
## StMoE model with K = 4 experts:
## 
##  log-likelihood df       AIC       BIC       ICL
##       -560.0063 30 -590.0063 -633.3616 -633.3602
## 
## Clustering table (Number of observations in each expert):
## 
##  1  2  3  4 
## 28 37 31 37 
## 
## Regression coefficients:
## 
##     Beta(k = 1) Beta(k = 2)  Beta(k = 3) Beta(k = 4)
## 1   -3.44321376  992.803818 -1624.175596 303.6837848
## X^1  0.85421340 -104.020405    97.121743 -12.6171966
## X^2 -0.07986081    2.436082    -1.418995   0.1294649
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2) Sigma2(k = 3) Sigma2(k = 4)
##       13.72938      453.6252      877.9573      512.8175

Plots

Mean curve

stmoe$plot(what = "meancurve")

Confidence regions

stmoe$plot(what = "confregions")

Clusters

stmoe$plot(what = "clusters")

Log-likelihood

stmoe$plot(what = "loglikelihood")