Semiparametric model

library(serosv)

Penalized splines

Proposed model

Penalized splines

A general model relating the prevalence to age can be written as a GLM

g(P(Yi = 1|ai)) = g(π(ai)) = η(ai)

  • Where g is the link function and η is the linear predictor

The linear predictor can be estimated semi-parametrically using penalized spline with truncated power basis functions of degree p and fixed knots κ1, ..., κk as followed

η(ai) = β0 + β1ai + ... + βpaip + Σk = 1kuk(ai − κk)+p

  • Where

    $$ (a_i - \kappa_k)^p_+ = \begin{cases} 0, & a_i \le \kappa_k \\ (a_i - \kappa_k)^p, & a_i > \kappa_k \end{cases} $$

In matrix notation, the mean structure model for η(ai) becomes

η = Xβ + Zu

Where η = [η(ai)...η(aN)]T, β = [β0β1....βp]T, and u = [u1u2...uk]T are the regression with corresponding design matrices

$$ X = \begin{bmatrix} 1 & a_1 & a_1^2 & ... & a_1^p \\ 1 & a_2 & a_2^2 & ... & a_2^p \\ \vdots & \vdots & \vdots & \dots & \vdots \\ 1 & a_N & a_N^2 & ... & a_N^p \end{bmatrix}, Z = \begin{bmatrix} (a_1 - \kappa_1 )_+^p & (a_1 - \kappa_2 )_+^p & \dots & (a_1 - \kappa_k)_+^p \\ (a_2 - \kappa_1 )_+^p & (a_2 - \kappa_2 )_+^p & \dots & (a_2 - \kappa_k)_+^p \\ \vdots & \vdots & \dots & \vdots \\ (a_N - \kappa_1 )_+^p & (a_N - \kappa_2 )_+^p & \dots & (a_N - \kappa_k)_+^p \end{bmatrix} $$

FOI can then be derived as

$$ \hat{\lambda}(a_i) = [\hat{\beta_1} , 2\hat{\beta_2}a_i, ..., p \hat{\beta} a_i ^{p-1} + \Sigma^k_{k=1} p \hat{u}_k(a_i - \kappa_k)^{p-1}_+] \delta(\hat{\eta}(a_i)) $$

  • Where δ(.) is determined by the link function use in the model

Penalized likelihood framework

Refer to Chapter 8.2.1

Proposed approach

A first approach to fit the model is by maximizing the following penalized likelihood

Where:

  • Xβ + Zu is the linear predictor (refer to @ref(sec-penalized-splines))

  • D is a known semi-definite penalty matrix (Wahba 1978), (Green and Silverman 1993)

  • y is the response vector

  • 1 the unit vector, c(.) is determined by the link function used

  • λ is the smoothing parameter (larger values –> smoother curves)

  • ϕ is the overdispersion parameter and equals 1 if there is no overdispersion

Fitting data

To fit the data using the penalized likelihood framework, specify framework = "pl"

Basis function can be defined via the s parameter, some values for s includes:

  • "tp" thin plate regression splines

  • "cr" cubic regression splines

  • "ps" P-splines proposed by (Eilers and Marx 1996)

  • "ad" for Adaptive smoothers

For more options, refer to the mgcv documentation (Wood 2017)

data <- parvob19_be_2001_2003
pl <- penalized_spline_model(data$age, status = data$seropositive, s = "tp", framework = "pl") 
pl$info
#> 
#> Family: binomial 
#> Link function: logit 
#> 
#> Formula:
#> spos ~ s(age, bs = s, sp = sp)
#> 
#> Estimated degrees of freedom:
#> 6.16  total = 7.16 
#> 
#> UBRE score: 0.1206458
plot(pl)


Generalized Linear Mixture Model framework

Refer to Chapter 8.2.2

Proposed approach

Looking back at @ref(eq:penlikelihood), a constraint for u would be Σkuk2 < C for some positive value C

This is equivalent to choosing (β, u) to maximise @ref(eq:penlikelihood) with D = diag(0, 1) where 0 denotes zero vector length p + 1 and 1 denotes the unit vector of length K

For a fixed value for λ this is equivalent to fitting the following generalized linear mixed model Ngo and Wand (2004)

$$ f(y|u) = exp\{ \phi^{-1} [y^T(X\beta + Zu) - c(X\beta + Zu)] + 1^Tc(y)\},\\ u \sim N(0, G) $$

  • With similar notations as before and G = σu2IK × K

Thus Z is penalized by assuming the corresponding coefficients u are random effect with u ∼ N(0, σu2I).

Fitting data

To fit the data using the penalized likelihood framework, specify framework = "glmm"

data <- parvob19_be_2001_2003
glmm <- penalized_spline_model(data$age, status = data$seropositive, s = "tp", framework = "glmm") 
#> 
#>  Maximum number of PQL iterations:  20
#> iteration 1
#> iteration 2
#> iteration 3
#> iteration 4
glmm$info$gam
#> 
#> Family: binomial 
#> Link function: logit 
#> 
#> Formula:
#> spos ~ s(age, bs = s, sp = sp)
#> 
#> Estimated degrees of freedom:
#> 6.45  total = 7.45
plot(glmm)

Eilers, Paul H. C., and Brian D. Marx. 1996. “Flexible Smoothing with b-Splines and Penalties.” Statistical Science 11 (2). https://doi.org/10.1214/ss/1038425655.
Green, P. J., and Bernard. W. Silverman. 1993. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman; Hall/CRC. https://doi.org/10.1201/b15710.
Ngo, Long, and Matthew P. Wand. 2004. “Smoothing with Mixed Model Software.” Journal of Statistical Software 9 (1). https://doi.org/10.18637/jss.v009.i01.
Ruppert, David, M. P. Wand, and R. J. Carroll. 2003. Semiparametric Regression. Cambridge University Press. https://doi.org/10.1017/cbo9780511755453.
Wahba, Grace. 1978. “Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors in Regression.” Journal of the Royal Statistical Society Series B: Statistical Methodology 40 (3): 364–72. https://doi.org/10.1111/j.2517-6161.1978.tb01050.x.
Wand, M. P. 2003. “Smoothing and Mixed Models.” Computational Statistics 18 (2): 223–49. https://doi.org/10.1007/s001800300142.
Wood, Simon N. 2017. Generalized Additive Models: An Introduction with r. Chapman; Hall/CRC. https://doi.org/10.1201/9781315370279.