Introduction

TMoE (t Mixture-of-Experts) provides a flexible and robust modelling framework for heterogenous data with possibly heavy-tailed distributions and corrupted by atypical observations. TMoE consists of a mixture of K t expert regressors network (of degree p) gated by a softmax gating network (of degree q) and is represented by:

The gating network parameters alpha’s of the softmax net.
The experts network parameters: The location parameters (regression coefficients) beta’s, scale parameters sigma’s, and the degree of freedom (robustness) parameters nu’s. TMoE thus generalises mixtures of (normal, t, and) distributions and mixtures of regressions with these distributions. For example, when q = 0, we retrieve mixtures of (t-, or normal) regressions, and when both p = 0 and q = 0, it is a mixture of (t-, or normal) distributions. It also reduces to the standard (normal, t) distribution when we only use a single expert (K = 1).

Model estimation/learning is performed by a dedicated expectation conditional maximization (ECM) algorithm by maximizing the observed data log-likelihood. We provide simulated examples to illustrate the use of the model in model-based clustering of heterogeneous regression data and in fitting non-linear regression functions.

It was written in R Markdown, using the knitr package for production.

See help(package="meteorits") for further details and references provided by citation("meteorits").

Application to a simulated dataset

Generate sample

n <- 500 # Size of the sample
alphak <- matrix(c(0, 8), ncol = 1) # Parameters of the gating network
betak <- matrix(c(0, -2.5, 0, 2.5), ncol = 2) # Regression coefficients of the experts
sigmak <- c(0.5, 0.5) # Standard deviations of the experts
nuk <- c(5, 7) # Degrees of freedom of the experts network t densities
x <- seq.int(from = -1, to = 1, length.out = n) # Inputs (predictors)

# Generate sample of size n
sample <- sampleUnivTMoE(alphak = alphak, betak = betak, sigmak = sigmak, 
                         nuk = nuk, x = x)
y <- sample$y

Set up tMoE model parameters

K <- 2 # Number of regressors/experts
p <- 1 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

tmoe <- emTMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
               threshold, verbose, verbose_IRLS)
## EM - tMoE: Iteration: 1 | log-likelihood: -491.218197364768
## EM - tMoE: Iteration: 2 | log-likelihood: -487.769211081239
## EM - tMoE: Iteration: 3 | log-likelihood: -487.548170950104
## EM - tMoE: Iteration: 4 | log-likelihood: -487.4481122686
## EM - tMoE: Iteration: 5 | log-likelihood: -487.362448510582
## EM - tMoE: Iteration: 6 | log-likelihood: -487.286574823043
## EM - tMoE: Iteration: 7 | log-likelihood: -487.220110334276
## EM - tMoE: Iteration: 8 | log-likelihood: -487.162599398957
## EM - tMoE: Iteration: 9 | log-likelihood: -487.11339007083
## EM - tMoE: Iteration: 10 | log-likelihood: -487.071701338089
## EM - tMoE: Iteration: 11 | log-likelihood: -487.036689536179
## EM - tMoE: Iteration: 12 | log-likelihood: -487.007501902125
## EM - tMoE: Iteration: 13 | log-likelihood: -486.983316044807
## EM - tMoE: Iteration: 14 | log-likelihood: -486.963366331757
## EM - tMoE: Iteration: 15 | log-likelihood: -486.946959226574
## EM - tMoE: Iteration: 16 | log-likelihood: -486.933479985349
## EM - tMoE: Iteration: 17 | log-likelihood: -486.922393066811
## EM - tMoE: Iteration: 18 | log-likelihood: -486.913238317931
## EM - tMoE: Iteration: 19 | log-likelihood: -486.905624604248
## EM - tMoE: Iteration: 20 | log-likelihood: -486.899222684333
## EM - tMoE: Iteration: 21 | log-likelihood: -486.893761521271
## EM - tMoE: Iteration: 22 | log-likelihood: -486.88902219152

Summary

tmoe$summary()
## -------------------------------------
## Fitted t Mixture-of-Experts model
## -------------------------------------
## 
## tMoE model with K = 2 experts:
## 
##  log-likelihood df      AIC       BIC       ICL
##        -486.889 10 -496.889 -517.9621 -518.0774
## 
## Clustering table (Number of observations in each expert):
## 
##   1   2 
## 249 251 
## 
## Regression coefficients:
## 
##     Beta(k = 1) Beta(k = 2)
## 1     0.1709799   0.1302157
## X^1   2.7391285  -2.6484651
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2)
##      0.3020173     0.3840249

Plots

Mean curve

tmoe$plot(what = "meancurve")

Confidence regions

tmoe$plot(what = "confregions")

Clusters

tmoe$plot(what = "clusters")

Log-likelihood

tmoe$plot(what = "loglikelihood")

Application to a real dataset

Load data

library(MASS)
data("mcycle")
x <- mcycle$times
y <- mcycle$accel

Set up tMoE model parameters

K <- 4 # Number of regressors/experts
p <- 2 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

tmoe <- emTMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
               threshold, verbose, verbose_IRLS)
## EM - tMoE: Iteration: 1 | log-likelihood: -584.086785478139
## EM - tMoE: Iteration: 2 | log-likelihood: -582.96450503363
## EM - tMoE: Iteration: 3 | log-likelihood: -582.132696078752
## EM - tMoE: Iteration: 4 | log-likelihood: -579.243910855542
## EM - tMoE: Iteration: 5 | log-likelihood: -570.575962920117
## EM - tMoE: Iteration: 6 | log-likelihood: -563.182226647835
## EM - tMoE: Iteration: 7 | log-likelihood: -560.05906772357
## EM - tMoE: Iteration: 8 | log-likelihood: -559.274294677457
## EM - tMoE: Iteration: 9 | log-likelihood: -558.647006619567
## EM - tMoE: Iteration: 10 | log-likelihood: -557.898561542372
## EM - tMoE: Iteration: 11 | log-likelihood: -557.020883226543
## EM - tMoE: Iteration: 12 | log-likelihood: -556.025252040372
## EM - tMoE: Iteration: 13 | log-likelihood: -554.997956365751
## EM - tMoE: Iteration: 14 | log-likelihood: -554.084355436493
## EM - tMoE: Iteration: 15 | log-likelihood: -553.324543599035
## EM - tMoE: Iteration: 16 | log-likelihood: -552.696047918341
## EM - tMoE: Iteration: 17 | log-likelihood: -552.199917195739
## EM - tMoE: Iteration: 18 | log-likelihood: -551.835354902599
## EM - tMoE: Iteration: 19 | log-likelihood: -551.583336113226
## EM - tMoE: Iteration: 20 | log-likelihood: -551.415871358505
## EM - tMoE: Iteration: 21 | log-likelihood: -551.306857875676
## EM - tMoE: Iteration: 22 | log-likelihood: -551.236451754857
## EM - tMoE: Iteration: 23 | log-likelihood: -551.190973878156
## EM - tMoE: Iteration: 24 | log-likelihood: -551.161471484484
## EM - tMoE: Iteration: 25 | log-likelihood: -551.142189830907
## EM - tMoE: Iteration: 26 | log-likelihood: -551.129468567439
## EM - tMoE: Iteration: 27 | log-likelihood: -551.120981300437
## EM - tMoE: Iteration: 28 | log-likelihood: -551.115245061542
## EM - tMoE: Iteration: 29 | log-likelihood: -551.111310019075

Summary

tmoe$summary()
## -------------------------------------
## Fitted t Mixture-of-Experts model
## -------------------------------------
## 
## tMoE model with K = 4 experts:
## 
##  log-likelihood df       AIC       BIC       ICL
##       -551.1113 26 -577.1113 -614.6858 -614.6819
## 
## Clustering table (Number of observations in each expert):
## 
##  1  2  3  4 
## 28 37 31 37 
## 
## Regression coefficients:
## 
##      Beta(k = 1) Beta(k = 2)  Beta(k = 3) Beta(k = 4)
## 1   -1.050030722 1012.267855 -1806.992474 296.6414020
## X^1 -0.101509157 -106.106794   111.228331 -12.3334506
## X^2 -0.008684436    2.492589    -1.665697   0.1265659
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2) Sigma2(k = 3) Sigma2(k = 4)
##       1.653454      437.4696      571.2414      549.5563

Plots

Mean curve

tmoe$plot(what = "meancurve")

Confidence regions

tmoe$plot(what = "confregions")

Clusters

tmoe$plot(what = "clusters")

Log-likelihood

tmoe$plot(what = "loglikelihood")

A-quick-tour-of-tMoE

Introduction

Application to a simulated dataset

Generate sample

Set up tMoE model parameters

Set up EM parameters

Estimation

Summary

Plots

Mean curve

Confidence regions

Clusters

Log-likelihood

Application to a real dataset

Load data

Set up tMoE model parameters

Set up EM parameters

Estimation

Summary

Plots

Mean curve

Confidence regions

Clusters

Log-likelihood