Introduction

The package provides functions for performing variable selection approach in sparse GLARMA models, which are pervasive for modeling discrete-valued time series. The method consists in iteratively combining the estimation of the autoregressive moving average (ARMA) coefficients of GLARMA models with regularized methods designed to perform variable selection in regression coefficients of Generalized Linear Models (GLM). For further details on the methodology we refer the reader to [1].

We describe the GLARMA model for a single time series with additional covariates. Given the past history ℱ_t − 1 = σ(Y_s, s ≤ t − 1), we assume that where 𝒫(μ) denotes the Poisson distribution with mean μ. In () where the x_t, i’s are the p regressor variables at time t with p ≥ 1, β^⋆ = (β₀^⋆, …, β_p^⋆) is the vector of regressor coefficients, γ^⋆ = (γ₀^⋆, …, γ_q^⋆) is an absolutely summable series and Here E_t^⋆ = 0 for all t ≤ 0 and 1 ≤ q ≤ ∞. When q = ∞, Z_t^⋆ satisfies the ARMA-like recursion in (), because causal ARMA can be written as MA process of infinite order. The vector β^⋆ is assumed to be sparse, a majority of its components is equal to zero. The goal of the package is to retrieve the indices of the nonzero components of β^⋆, also called active variables, from the observations Y₁, …, Y_n.

Data

We load the dataset of observations with size n = 50 provided within the package.

data(Y)

The number of regressor variables p is equal to 30. Data is generated with γ^⋆ = (0.5) and β^⋆, such that all the β_i^⋆ = 0 except for three of them: β₁^⋆ = 1.73, β₃^⋆ = 0.38 and β₁₇^⋆ = −0.64. The design matrix X is built by taking the covariates in a Fourier basis.

Initialization

We initialize γ⁰ = (0) and β⁰ to be the coefficients estimated by function:

gamma0 = c(0)

glm_pois<-glm(Y~t(X)[,2:(p+1)],family = poisson)
beta0<-as.numeric(glm_pois$coefficients)

Estimation of γ^⋆

We can estimate γ^⋆ with the Newton-Raphson method. The output is the vector of estimation of γ^⋆. The default number of iterations of the Newton-Raphson algorithm is 100.

gamma_est_nr = NR_gamma(Y, X, beta0, gamma0, n_iter=100)
gamma_est_nr

## [1] 0.4121483

This estimation is obtained by taking initial values γ⁰ and β⁰, which will improve once we substitute the initial values by γ̂ and β̂ obtained by function.

Variable selection

We perform variable selection and obtain the coefficients which are estimated to be active and the estimates of γ^⋆ and β^⋆. We take the number of iterations of the algorithm equal to 2. We take method, which corresponds to the stability selection method with minimal λ, where is equal to 0.7 and the number of replications = 1000. For more details about stability selection and the choice of parameters we refer the reader to [1]. The function supports parallel computation. To make it work, users should download the package , which is not supported on Windows platforms.

result = variable_selection(Y, X, gamma0, k_max = 2, n_iter = 100,
    method = "min", nb_rep_ss = 1000, threshold = 0.7, parallel = FALSE,
    nb.cores = 1)
beta_est = result$beta_est
Estim_active = result$estim_active
gamma_est = result$gamma_est

## Estimated active coefficients:  1 3 17

## Estimated gamma:  0.5083098

Illustration of the estimation of β^⋆

We display a plot that illustrates which elements of β^⋆ are selected to be active and how close the estimated value $\hat{\beta_i}$ is to the actual values β_i^⋆. True values of β^⋆ are plotted in crosses and estimated values are plotted in dots.

# First, we make a dataset of estimated betas
beta_data = data.frame(beta_est)
colnames(beta_data)[1] <- "beta"
beta_data$Variable = seq(1, (p + 1), 1)
beta_data$y = 0
beta_data = beta_data[beta_data$beta != 0, ]
# Next, we make a dataset of true betas
beta_t_data = data.frame(beta)
colnames(beta_t_data)[1] <- "beta"
beta_t_data$Variable = seq(1, (p + 1), 1)
beta_t_data$y = 0
beta_t_data = beta_t_data[beta_t_data$beta != 0, ]
# Finally, we plot the result
plot = ggplot() + geom_point(data = beta_data, aes(x = Variable,
    y = y, color = beta), pch = 16, size = 5, stroke = 2) +
    geom_point(data = beta_t_data, aes(x = Variable, y = y,
        color = beta), pch = 4, size = 6, stroke = 2) +
    scale_color_gradient2(name = expression(hat(beta)),
        midpoint = 0, low = "steelblue", mid = "white",
        high = "red") + scale_x_continuous(breaks = c(1,
    seq(10, (p + 1), 10)), limits = c(0, (p + 1))) + scale_y_continuous(breaks = c(),
    limits = c(-1, 1)) + theme(legend.title = element_text(color = "black",
    size = 12, face = "bold"), legend.text = element_text(color = "black",
    size = 10)) + theme(axis.text.x = element_text(angle = 90),
    axis.text = element_text(size = 10), axis.title = element_text(size = 10,
        face = "bold"), axis.title.y = element_blank())
plot

As we can see from the plot, all the zero coefficients are estimated to be zero and all non-zero coefficients are estimated to be non-zero. Moreover, the estimates also correctly preserved the sign of the coefficients.

References

[1] M. Gomtsyan, C. Lévy-Leduc, S. Ouadah and L. Sansonnet. “Variable selection in sparse GLARMA models”, arXiv:arXiv:2007.08623v1

GlarmaVarSel package