GPFR example

Suppose we have a functional response variable ym(t), m = 1, …, M, a functional covariate xm(t) and also a set of p = 2 scalar covariates um = (um0, um1).

A Gaussian process functional regression (GPFR) model used in this example is defined by

ym(t) = μm(t) + τm(xm(t)) + εm(t),

where μm(t) = umβ(t) is the mean function model across different curves and τm(xm(t)) is a Gaussian process with zero mean and covariance function km(θ|xm(t)). That is, τm(xm(t)) defines the covariance structure of ym(t) for the different data points within the same curve.

The error term can be assumed to be εm(t) ∼ N(0, σε2), where the noise variance σε2 can be estimated as a hyperparameter of the Gaussian process.

In the example below, the training data consist of M = 20 realisations on [−4, 4] with n = 50 points for each curve. We assume regression coefficient functions β0(t) = 1, β1(t) = sin ((0.5t)3), scalar covariates um0 ∼ N(0, 1) and um1 ∼ N(10, 52) and a functional covariate xm(t) = exp (t) + v, where v ∼ N(0, 0.12). The term τm(xm(t)) is a zero mean Gaussian process with exponential covariance kernel and σε2 = 1.

We also simulate an (M + 1)th realisation which is used to assess predictions obtained by the model estimated by using the training data of size M. The yM + 1(t) and xM + 1(t) curves are observed on equally spaced 60 time points on [−4, 4].

library(GPFDA)
require(MASS)
set.seed(100)

M <- 20
n <- 50
p <- 2  # number of scalar covariates

hp <- list('pow.ex.v'=log(10), 'pow.ex.w'=log(1),'vv'=log(1))

## Training data: M realisations -----------------
 
tt <- seq(-4,4,len=n)
b <- sin((0.5*tt)^3)

scalar_train <- matrix(NA, M, p)
t_train <- matrix(NA, M, n)
x_train <- matrix(NA, M, n)
response_train <- matrix(NA, M, n)
for(i in 1:M){
  u0 <- rnorm(1)
  u1 <- rnorm(1,10,5)
  x <- exp(tt) + rnorm(n, 0, 0.1)
  Sigma <- cov.pow.ex(hyper = hp, input = x, gamma = 1)
  diag(Sigma) <- diag(Sigma) + exp(hp$vv)
  y <- u0+u1*b + mvrnorm(n=1, mu=rep(0,n), Sigma=Sigma)
  scalar_train[i,] <- c(u0,u1)
  t_train[i,] <- tt
  x_train[i,] <- x
  response_train[i,] <- y
}

## Test data (M+1)-th realisation ------------------
n_new <- 60
t_new <- seq(-4,4,len=n_new)
b_new <- sin((0.5*t_new)^3)
u0_new <- rnorm(1)
u1_new <- rnorm(1,10,5)
scalar_new <- cbind(u0_new, u1_new)
x_new <- exp(t_new) + rnorm(n_new, 0, 0.1)
Sigma_new <- cov.pow.ex(hyper = hp, input = x_new, gamma = 1)
diag(Sigma_new) <- diag(Sigma_new) + exp(hp$vv)
response_new <- u0_new + u1_new*b_new + mvrnorm(n=1, mu=rep(0,n_new), 
                                                Sigma=Sigma_new)

The estimation of mean and covariance functions in the GPFR model is done using the gpfr function:

a1 <- gpfr(response = response_train, time = tt, uReg = scalar_train,
           fxReg = NULL, gpReg = x_train,
           fyList = list(nbasis = 23, lambda = 0.0001),
           uCoefList = list(list(lambda = 0.0001, nbasi = 23)),
           Cov = 'pow.ex', gamma = 1, fitting = T)

Note that the estimated covariance function hyperparameters are similar to the true values:

unlist(lapply(a1$hyper,exp))
#> pow.ex.v pow.ex.w       vv 
#>   10.771    0.999    1.195

Plot of raw data

To visualise all the realisations of the training data:

plot(a1, type='raw')

To visualise three realisations of the training data:

plot(a1, type='raw', realisations = 1:3)

FR fit for training data

The in-sample fit using mean function from FR model only can be seen:

plot(a1, type = 'meanFunction', realisations = 1:3)

GPFR fit for training data

The GPFR model fit to the training data is visualised by using:

plot(a1, type = 'fitted', realisations = 1:3)

Type I prediction: yM + 1 observed

If yM + 1(t) is observed over all the domain of t, the Type I prediction can be seen:

b1 <- gpfrPredict(a1, testInputGP = x_new, testTime = t_new,
               uReg = scalar_new, fxReg = NULL,
               gpReg = list('response' = response_new,
                            'input' = x_new,
                            'time' = t_new))

plot(b1, type = 'prediction', colourTrain = 'pink')
lines(t_new, response_new, type = 'b', col = 4, pch = 19, cex = 0.6, lty = 3, lwd = 2)

Type I prediction: yM + 1 partially observed

If we assume that yM + 1(t) is only partially observed, we can obtain Type I predictions via:

b2 <- gpfrPredict(a1, testInputGP = x_new, testTime = t_new,
               uReg = scalar_new, fxReg = NULL,
               gpReg = list('response' = response_new[1:20],
                          'input' = x_new[1:20],
                          'time' = t_new[1:20]))

plot(b2, type = 'prediction', colourTrain = 'pink')
lines(t_new, response_new, type = 'b', col = 4, pch = 19, cex = 0.6, lty = 3, lwd = 2)

Type II prediction: yM + 1 not observed

Type II prediction, which is made by not including any information about yM + 1(t), is visualised below.

b3 <- gpfrPredict(a1, testInputGP = x_new, testTime = t_new,
               uReg = scalar_new, fxReg = NULL, gpReg = NULL)

plot(b3, type = 'prediction', colourTrain = 'pink')
lines(t_new, response_new, type='b', col = 4, pch = 19, cex = 0.6, lty = 3, lwd = 2)