Title: | Estimation and Prediction Methods for High-Dimensional Mixed Frequency Time Series Data |
---|---|
Description: | The 'midasml' package implements estimation and prediction methods for high-dimensional mixed-frequency (MIDAS) time-series and panel data regression models. The regularized MIDAS models are estimated using orthogonal (e.g. Legendre) polynomials and sparse-group LASSO (sg-LASSO) estimator. For more information on the 'midasml' approach see Babii, Ghysels, and Striaukas (2021, JBES forthcoming) <doi:10.1080/07350015.2021.1899933>. The package is equipped with the fast implementation of the sg-LASSO estimator by means of proximal block coordinate descent. High-dimensional mixed frequency time-series data can also be easily manipulated with functions provided in the package. |
Authors: | Jonas Striaukas [cre, aut], Andrii Babii [aut], Eric Ghysels [aut], Alex Kostrov [ctb] (Contributions to analytical gradients for non-linear low-dimensional MIDAS estimation code) |
Maintainer: | Jonas Striaukas <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.10 |
Built: | 2024-12-21 06:39:29 UTC |
Source: | CRAN |
Estimation and Prediction Methods for High-Dimensional Mixed Frequency Time Series Data
Jonas Striaukas (maintainer) [email protected], Andrii Babii [email protected], Eric Ghysels [email protected]
ALFRED monthly and quarterly series vintages
data(alfred_vintages)
data(alfred_vintages)
A list
objects
data(alfred_vintages) i <- 1 alfred_vintages[[i]] # ith variable
data(alfred_vintages) i <- 1 alfred_vintages[[i]] # ith variable
Does k-fold cross-validation for panel data sg-LASSO regression model.
The function runs sglfit nfolds+1
times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the folds omitted.
The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min
and lam.1se
. Solutions are computed for a fixed γ
cv.panel.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, nfolds = 10, foldid, method = c("pooled", "fe"), nf = NULL, parallel = FALSE, ...)
cv.panel.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, nfolds = 10, foldid, method = c("pooled", "fe"), nf = NULL, parallel = FALSE, ...)
x |
NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors. |
y |
NT by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own λ sequence based on |
gamma |
sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution. |
gindex |
p by 1 vector indicating group membership of each covariate. |
nfolds |
number of folds of the cv loop. Default set to |
foldid |
the fold assignments used. |
method |
choose between 'pooled' and 'fe'; 'pooled' forces the intercept to be fitted in sglfit, 'fe' computes the fixed effects. User must input the number of fixed effects |
nf |
number of fixed effects. Used only if |
parallel |
if |
... |
Other arguments that can be passed to sglfit. |
method='pooled'
) method='fe'
) cv.panel.sglfit object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, standardize = FALSE, intercept = FALSE) ## Not run: # Parallel require(doMC) registerDoMC(cores = 2) x = matrix(rnorm(1000 * 20), 1000, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(1000) gindex = sort(rep(1:4,times=5)) system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, standardize = FALSE, intercept = FALSE)) system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, standardize = FALSE, intercept = FALSE, parallel = TRUE)) ## End(Not run)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, standardize = FALSE, intercept = FALSE) ## Not run: # Parallel require(doMC) registerDoMC(cores = 2) x = matrix(rnorm(1000 * 20), 1000, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(1000) gindex = sort(rep(1:4,times=5)) system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, standardize = FALSE, intercept = FALSE)) system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, standardize = FALSE, intercept = FALSE, parallel = TRUE)) ## End(Not run)
Does k-fold cross-validation for sg-LASSO regression model.
The function runs sglfit nfolds+1
times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the folds omitted.
The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min
and lam.1se
. Solutions are computed for a fixed γ
cv.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, nfolds = 10, foldid, parallel = FALSE, ...)
cv.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, nfolds = 10, foldid, parallel = FALSE, ...)
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own λ sequence based on |
gamma |
sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution. |
gindex |
p by 1 vector indicating group membership of each covariate. |
nfolds |
number of folds of the cv loop. Default set to |
foldid |
the fold assignments used. |
parallel |
if |
... |
Other arguments that can be passed to sglfit. |
cv.sglfit object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE) ## Not run: # Parallel require(doMC) registerDoMC(cores = 2) x = matrix(rnorm(1000 * 20), 1000, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(1000) gindex = sort(rep(1:4,times=5)) system.time(cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE)) system.time(cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE, parallel = TRUE)) ## End(Not run)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE) ## Not run: # Parallel require(doMC) registerDoMC(cores = 2) x = matrix(rnorm(1000 * 20), 1000, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(1000) gindex = sort(rep(1:4,times=5)) system.time(cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE)) system.time(cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE, parallel = TRUE)) ## End(Not run)
Change the date to the beginning of the month date.
dateMatch(x, y)
dateMatch(x, y)
x |
date vector to match with y date vector. |
y |
date vector. |
changed date vector.
Jonas Striaukas
x <- seq(as.Date("2020-01-01"),as.Date("2020-12-01"), by = "day") set.seed(100) x <- x[-sample(1:336, 100)] y <- seq(as.Date("2020-01-01"),as.Date("2020-12-01"), by = "month") dateMatch(x,y)
x <- seq(as.Date("2020-01-01"),as.Date("2020-12-01"), by = "day") set.seed(100) x <- x[-sample(1:336, 100)] y <- seq(as.Date("2020-01-01"),as.Date("2020-12-01"), by = "month") dateMatch(x,y)
For a given set of points in X, computes the orthonormal Gegenbauer polynomials basis of L2 [a,b] for a given degree and parameter. The Gegenbauer polynomials are a special case of more general Jacobi polynomials. In turn, you may get Legendre polynomials from Gegenbauer by setting
= 0, or Chebychev's polynomials
by setting
= 1/2 or -1/2.
gb(degree, alpha, a = 0, b = 1, jmax = NULL, X = NULL)
gb(degree, alpha, a = 0, b = 1, jmax = NULL, X = NULL)
degree |
polynomial degree. |
alpha |
Gegenbauer polynomials parameter. |
a |
lower shift value (default - 0). |
b |
upper shift value (default - 1). |
jmax |
number of high-frequency lags. |
X |
optional evaluation grid vector. |
Psi weight matrix with Gegenbauer functions upto degree
.
Jonas Striaukas
degree <- 3 alpha <- 1 jmax <- 66 gb(degree = degree, alpha = alpha, a = 0, b = 1, jmax = jmax)
degree <- 3 alpha <- 1 jmax <- 66 gb(degree = degree, alpha = alpha, a = 0, b = 1, jmax = jmax)
Does information criteria for panel data sg-LASSO regression model.
The function runs sglfit 1 time; computes the path solution in lambda
sequence.
Solutions for BIC
, AIC
and AICc
information criteria are returned.
ic.panel.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, method = c("pooled","fe"), nf = NULL, ...)
ic.panel.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, method = c("pooled","fe"), nf = NULL, ...)
x |
NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors. |
y |
NT by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own |
gamma |
sg-LASSO mixing parameter. |
gindex |
p by 1 vector indicating group membership of each covariate. |
method |
choose between 'pooled' and 'fe'; 'pooled' forces the intercept to be fitted in sglfit, 'fe' computes the fixed effects. User must input the number of fixed effects |
nf |
number of fixed effects. Used only if |
... |
Other arguments that can be passed to sglfit. |
method='pooled'
) method='fe'
) ic.panel.sglfit object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) ic.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) ic.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE)
Does information criteria for sg-LASSO regression model.
The function runs sglfit 1 time; computes the path solution in lambda
sequence.
Solutions for BIC
, AIC
and AICc
information criteria are returned.
ic.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, ...)
ic.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, ...)
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own |
gamma |
sg-LASSO mixing parameter. |
gindex |
p by 1 vector indicating group membership of each covariate. |
... |
Other arguments that can be passed to sglfit. |
ic.sglfit object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) ic.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) ic.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE)
For a given set of points in X, computes the orthonormal Legendre polynomials basis of L2 [a,b] for a given degree.
lb(degree, a = 0, b = 1, jmax = NULL, X = NULL)
lb(degree, a = 0, b = 1, jmax = NULL, X = NULL)
degree |
polynomial degree. |
a |
lower shift value (default - 0). |
b |
upper shift value (default - 1). |
jmax |
number of high-frequency lags. |
X |
optional evaluation grid vector. |
Psi weight matrix with Legendre functions upto degree
.
Jonas Striaukas
degree <- 3 jmax <- 66 lb(degree = degree, a = 0, b = 1, jmax = jmax)
degree <- 3 jmax <- 66 lb(degree = degree, a = 0, b = 1, jmax = jmax)
SNP500 returns
data(market_ret)
data(market_ret)
A data.frame
object.a
market_ret
- FRED
data(market_ret) market_ret$snp500ret
data(market_ret) market_ret$snp500ret
Fits MIDAS regression model with single high-frequency covariate. Options include linear-in-parameters polynomials (e.g. Legendre) or non-linear polynomials (e.g. exponential Almon). Nonlinear polynomial optimization routines are equipped with analytical gradients, which allows fast and accurate optimization.
midas.ardl(y, x, z = NULL, loss_choice = c("mse","logit"), poly_choice = c("legendre","expalmon","beta"), poly_spec = 0, legendre_degree = 3, nbtrials = 500)
midas.ardl(y, x, z = NULL, loss_choice = c("mse","logit"), poly_choice = c("legendre","expalmon","beta"), poly_spec = 0, legendre_degree = 3, nbtrials = 500)
y |
response variable. Continuous for |
x |
high-frequency covariate lags. |
z |
other lower-frequency covariate(s) or AR lags (both can be supplied in an appended matrix). Either must be supplied. |
loss_choice |
which loss function to fit: |
poly_choice |
which MIDAS lag polynomial function to use: |
poly_spec |
which Beta density function specification to apply (applicable only for |
legendre_degree |
the degree of legendre polynomials (applicable only for |
nbtrials |
number of initial values tried in multistart optimization. Default is set to |
poly_choice
): beta
: Beta polynomial expalmon
: exponential Almon polynomial legendre
: Legendre polynomials. midas.ardl object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) z = rnorm(100) y = rnorm(100) midas.ardl(y = y, x = x, z = z)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) z = rnorm(100) y = rnorm(100) midas.ardl(y = y, x = x, z = z)
Creates a MIDAS data structure for a single high-frequency covariate and a single low-frequency dependent variable.
mixed_freq_data(data.y, data.ydate, data.x, data.xdate, x.lag, y.lag, horizon, est.start, est.end, disp.flag = TRUE)
mixed_freq_data(data.y, data.ydate, data.x, data.xdate, x.lag, y.lag, horizon, est.start, est.end, disp.flag = TRUE)
data.y |
n by 1 low-frequency time series data vector. |
data.ydate |
n by 1 low-frequency time series date vector. |
data.x |
m by 1 high-frequency time series data vector. |
data.xdate |
m by 1 high-frequency time series date vector. |
x.lag |
number of high-frequency lags to construct in high-frequency time units. |
y.lag |
number of low-frequency lags to construct in low-frequency time units. |
horizon |
forecast horizon relative to |
est.start |
estimation start date, taken as the first ... . |
est.end |
estimation end date, taken as the last ... . Remaining data after this date is dropped to out-of-sample evaluation data. |
disp.flag |
display flag to indicate whether or not to display obtained MIDAS data structure in console. |
a list of MIDAS data structure.
Jonas Striaukas
data(us_rgdp) rgdp <- us_rgdp$rgdp payems <- us_rgdp$payems payems[-1, 2] <- log(payems[-1, 2]/payems[-dim(payems)[1], 2])*100 payems <- payems[-1, ] rgdp[-1, 2] <- ((rgdp[-1, 2]/rgdp[-dim(rgdp)[1], 2])^4-1)*100 rgdp <- rgdp[-1, ] est.start <- as.Date("1990-01-01") est.end <- as.Date("2002-03-01") mixed_freq_data(rgdp[,2], as.Date(rgdp[,1]), payems[,2], as.Date(payems[,1]), x.lag = 9, y.lag = 4, horizon = 1, est.start, est.end, disp.flag = FALSE)
data(us_rgdp) rgdp <- us_rgdp$rgdp payems <- us_rgdp$payems payems[-1, 2] <- log(payems[-1, 2]/payems[-dim(payems)[1], 2])*100 payems <- payems[-1, ] rgdp[-1, 2] <- ((rgdp[-1, 2]/rgdp[-dim(rgdp)[1], 2])^4-1)*100 rgdp <- rgdp[-1, ] est.start <- as.Date("1990-01-01") est.end <- as.Date("2002-03-01") mixed_freq_data(rgdp[,2], as.Date(rgdp[,1]), payems[,2], as.Date(payems[,1]), x.lag = 9, y.lag = 4, horizon = 1, est.start, est.end, disp.flag = FALSE)
Creates a MIDAS data structure for a single high-frequency covariate based on low-frequency reference date.
mixed_freq_data_single(data.refdate, data.x, data.xdate, x.lag, horizon, est.start, est.end, disp.flag = TRUE)
mixed_freq_data_single(data.refdate, data.x, data.xdate, x.lag, horizon, est.start, est.end, disp.flag = TRUE)
data.refdate |
n by 1 date vector. |
data.x |
m by 1 high-frequency time series data vector. |
data.xdate |
m by 1 high-frequency time series date vector. |
x.lag |
number of high-frequency lags to construct in high-frequency time units. |
horizon |
forecast horizon relative to |
est.start |
estimation start date, taken as the first ... . |
est.end |
estimation end date, taken as the last ... . Remaining data after this date is dropped to out-of-sample evaluation data. |
disp.flag |
display flag to indicate whether or not to display obtained MIDAS data strcuture in console. |
a list of midas data structure.
Jonas Striaukas
data(us_rgdp) rgdp <- us_rgdp$rgdp cfnai <- us_rgdp$cfnai data.refdate <- rgdp$date data.x <- cfnai$cfnai data.xdate <- cfnai$date est.start <- as.Date("1990-01-01") est.end <- as.Date("2002-03-01") mixed_freq_data_single(data.refdate, data.x, data.xdate, x.lag = 12, horizon = 1, est.start, est.end, disp.flag = FALSE)
data(us_rgdp) rgdp <- us_rgdp$rgdp cfnai <- us_rgdp$cfnai data.refdate <- rgdp$date data.x <- cfnai$cfnai data.xdate <- cfnai$date est.start <- as.Date("1990-01-01") est.end <- as.Date("2002-03-01") mixed_freq_data_single(data.refdate, data.x, data.xdate, x.lag = 12, horizon = 1, est.start, est.end, disp.flag = FALSE)
Change the date to the beginning of the month date.
monthBegin(x)
monthBegin(x)
x |
date value. |
changed date value.
Jonas Striaukas
monthBegin(as.Date("2020-05-15"))
monthBegin(as.Date("2020-05-15"))
Change the date to the end of the month date.
monthEnd(x)
monthEnd(x)
x |
date value. |
changed date value.
Jonas Striaukas
monthEnd(as.Date("2020-05-15"))
monthEnd(as.Date("2020-05-15"))
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
## S3 method for class 'cv.panel.sglfit' predict( object, newx, s = c("lam.min", "lam.1se"), type = c("response"), method = c("pooled", "fe"), ... )
## S3 method for class 'cv.panel.sglfit' predict( object, newx, s = c("lam.min", "lam.1se"), type = c("response"), method = c("pooled", "fe"), ... )
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
choose between 'lam.min' and 'lam.1se'. |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
method |
choose between 'pooled', and 'fe'. |
... |
Not used. Other arguments to predict. |
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right indices.
The object returned depends on type.
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
## S3 method for class 'cv.sglfit' predict(object, newx, s = c("lam.min", "lam.1se"), type = c("response"), ...)
## S3 method for class 'cv.sglfit' predict(object, newx, s = c("lam.min", "lam.1se"), type = c("response"), ...)
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
choose between 'lam.min' and 'lam.1se'. |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
... |
Not used. Other arguments to predict. |
method |
choose between 'single', 'pooled', and 'fe'. |
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right indices.
The object returned depends on type.
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
## S3 method for class 'ic.panel.sglfit' predict( object, newx, s = c("bic", "aic", "aicc"), type = c("response"), method = c("pooled", "fe"), ... )
## S3 method for class 'ic.panel.sglfit' predict( object, newx, s = c("bic", "aic", "aicc"), type = c("response"), method = c("pooled", "fe"), ... )
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
choose between 'bic', 'aic', and 'aicc'. |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
method |
choose between 'pooled', and 'fe'. |
... |
Not used. Other arguments to predict. |
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right indices.
The object returned depends on type.
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
## S3 method for class 'ic.sglfit' predict(object, newx, s = c("bic", "aic", "aicc"), type = c("response"), ...)
## S3 method for class 'ic.sglfit' predict(object, newx, s = c("bic", "aic", "aicc"), type = c("response"), ...)
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
choose between 'bic', 'aic', and 'aicc'. |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
... |
Not used. Other arguments to predict. |
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right indices.
The object returned depends on type.
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
## S3 method for class 'sglpath' predict( object, newx, s = NULL, type = c("response"), method = c("single", "pooled", "fe"), ... )
## S3 method for class 'sglpath' predict( object, newx, s = NULL, type = c("response"), method = c("single", "pooled", "fe"), ... )
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
value(s) of the penalty parameter |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
method |
choose between 'single', 'pooled', and 'fe'. |
... |
Not used. Other arguments to predict. |
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right indices.
The object returned depends on type.
Fits panel data sg-LASSO regression model.
The function fits sg-LASSO regression based on chosen tuning parameter selection method_choice. Options include cross-validation and information criteria.
reg.panel.sgl(x, y, gamma = NULL, gindex, intercept = TRUE, method_choice = c("ic","cv"), nfolds = 10, method = c("pooled", "fe"), nf = NULL, verbose = FALSE, ...)
reg.panel.sgl(x, y, gamma = NULL, gindex, intercept = TRUE, method_choice = c("ic","cv"), nfolds = 10, method = c("pooled", "fe"), nf = NULL, verbose = FALSE, ...)
x |
NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors. |
y |
NT by 1 response variable. |
gamma |
sg-LASSO mixing parameter. |
gindex |
p by 1 vector indicating group membership of each covariate. |
intercept |
whether intercept be fitted ( |
method_choice |
choose between |
nfolds |
number of folds of the cv loop. Default set to |
method |
choose between 'pooled' and 'fe'; 'pooled' forces the intercept to be fitted in sglfit, 'fe' computes the fixed effects. User must input the number of fixed effects |
nf |
number of fixed effects. Used only if |
verbose |
flag to print information. |
... |
Other arguments that can be passed to |
method='pooled'
) method='fe'
) reg.panel.sgl object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) reg.panel.sgl(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, standardize = FALSE, intercept = FALSE)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) reg.panel.sgl(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, standardize = FALSE, intercept = FALSE)
Fits sg-LASSO regression model.
The function fits sg-LASSO regression based on chosen tuning parameter selection method_choice
. Options include cross-validation and information criteria.
reg.sgl(x, y, gamma = NULL, gindex, intercept = TRUE, method_choice = c("tscv","ic","cv"), verbose = FALSE, ...)
reg.sgl(x, y, gamma = NULL, gindex, intercept = TRUE, method_choice = c("tscv","ic","cv"), verbose = FALSE, ...)
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
gamma |
sg-LASSO mixing parameter. |
gindex |
p by 1 vector indicating group membership of each covariate. |
intercept |
whether intercept be fitted ( |
method_choice |
choose between |
verbose |
flag to print information. |
... |
Other arguments that can be passed to |
reg.sgl object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) reg.sgl(x = x, y = y, gamma = 0.5, gindex = gindex)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) reg.sgl(x = x, y = y, gamma = 0.5, gindex = gindex)
Real GDP release dates
data(rgdp_dates)
data(rgdp_dates)
A list
objects
data(rgdp_dates) rgdp_dates$Quarter_q # reference quarters in quarters rgdp_dates$Quarter_m # reference quarters in months rgdp_dates$Quarter_d # reference quarters in days rgdp_dates$`First release` # first release date for the reference rgdp_dates$`Second release` # second release date for the reference rgdp_dates$`Third release` # third release date for the reference
data(rgdp_dates) rgdp_dates$Quarter_q # reference quarters in quarters rgdp_dates$Quarter_m # reference quarters in months rgdp_dates$Quarter_d # reference quarters in days rgdp_dates$`First release` # first release date for the reference rgdp_dates$`Second release` # second release date for the reference rgdp_dates$`Third release` # third release date for the reference
Real GDP vintages
data(rgdp_vintages)
data(rgdp_vintages)
A list
objects
data(rgdp_vintages) rgdp_vintages$date # dates rgdp_vintages$time_series # series, q-q annual rate rgdp_vintages$realtime_period # real time dates
data(rgdp_vintages) rgdp_vintages$date # dates rgdp_vintages$time_series # series, q-q annual rate rgdp_vintages$realtime_period # real time dates
Fits sg-LASSO regression model.
The function fits sg-LASSO regression model for a sequence of tuning parameter and fixed
tuning parameter. The optimization is based on block coordinate-descent. Optionally, fixed effects are fitted.
sglfit(x, y, gamma = 1.0, nlambda = 100L, method = c("single", "pooled", "fe"), nf = NULL, lambda.factor = ifelse(nobs < nvars, 1e-02, 1e-04), lambda = NULL, pf = rep(1, nvars), gindex = 1:nvars, dfmax = nvars + 1, pmax = min(dfmax * 1.2, nvars), standardize = FALSE, intercept = FALSE, eps = 1e-08, maxit = 1000000L, peps = 1e-08)
sglfit(x, y, gamma = 1.0, nlambda = 100L, method = c("single", "pooled", "fe"), nf = NULL, lambda.factor = ifelse(nobs < nvars, 1e-02, 1e-04), lambda = NULL, pf = rep(1, nvars), gindex = 1:nvars, dfmax = nvars + 1, pmax = min(dfmax * 1.2, nvars), standardize = FALSE, intercept = FALSE, eps = 1e-08, maxit = 1000000L, peps = 1e-08)
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
gamma |
sg-LASSO mixing parameter. |
nlambda |
number of |
method |
choose between 'single', 'pooled' and 'fe'; 'single' implies standard sg-LASSO regression, 'pooled' forces the intercept to be fitted, 'fe' computes the fixed effects. User needs to input the number of fixed effects |
nf |
number of fixed effects. Used only if |
lambda.factor |
The factor for getting the minimal |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own |
pf |
the ℓ1 penalty factor of length |
gindex |
p by 1 vector indicating group membership of each covariate. |
dfmax |
the maximum number of variables allowed in the model. Useful for very large |
pmax |
the maximum number of coefficients allowed ever to be nonzero. For example, once βi ≠ 0 for some i ∈ [p], no matter how many times it exits or re-enters the model through the path, it will be counted only once. Default is |
standardize |
logical flag for variable standardization, prior to fitting the model sequence. The coefficients are always returned to the original scale. It is recommended to keep |
intercept |
whether intercept be fitted ( |
eps |
convergence threshold for block coordinate descent. Each inner block coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than thresh times the null deviance. Defaults value is |
maxit |
maximum number of outer-loop iterations allowed at fixed lambda values. Default is |
peps |
convergence threshold for proximal map of sg-LASSO penalty. Each loop continues until G group difference sup-norm, || βkG - βk-1G ||∞, is less than |
sglfit object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) sglfit(x = x, y = y, gindex = gindex, gamma = 0.5)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) sglfit(x = x, y = y, gindex = gindex, gamma = 0.5)
Fits the precision matrix Θ by running nodewise LASSO regressions.
thetafit(x, parallel = FALSE, ncores = getOption("mc.cores", NULL), intercept = FALSE, K = 20, l = 5, seed = NULL, verbose = FALSE, registerpar = TRUE, ...)
thetafit(x, parallel = FALSE, ncores = getOption("mc.cores", NULL), intercept = FALSE, K = 20, l = 5, seed = NULL, verbose = FALSE, registerpar = TRUE, ...)
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
parallel |
if |
ncores |
number of cores used in parallelization |
intercept |
whether intercept be fitted ( |
K |
number of folds of the cv loop. Default set to |
l |
the gap used to drop observations round test set data. See tscv.sglfit for more details. |
seed |
set a value for seed to control results replication, i.e. |
verbose |
if |
registerpar |
if |
... |
Other arguments that can be passed to tscv.sglfit. |
The function runs tscv.sglfit p
times by regressing j
-th covariate on all other covariates excluding j
-th covariate. The precision matrix is then constructed based on LASSO estimates. Each nodewise LASSO regression tuning parameter λ is optimized using time series cross-validation. See tscv.sglfit for more details on cross-validation implementation.
thetafit object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) thetafit(x = x, parallel = FALSE)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) thetafit(x = x, parallel = FALSE)
Does k-fold time series cross-validation for sg-LASSO regression model.
The function runs sglfit K+1
times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the test observation k ∈ K
The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min
and lam.1se
. Solutions are computed for a fixed γ
tscv.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, K = 20, l = 5, parallel = FALSE, seed = NULL, ...)
tscv.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, K = 20, l = 5, parallel = FALSE, seed = NULL, ...)
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own λ sequence based on |
gamma |
sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution. |
gindex |
p by 1 vector indicating group membership of each covariate. |
K |
number of observations drawn for the test set. Default set to |
l |
the gap used to drop observations round the test set data point. Default set to |
parallel |
if |
seed |
set a value for seed to control results replication, i.e. |
... |
Other arguments that can be passed to sglfit. |
tscv.sglfit object.
Jonas Striaukas
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE) ## Not run: # Parallel require(doMC) registerDoMC(cores = 2) x = matrix(rnorm(1000 * 20), 1000, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(1000) gindex = sort(rep(1:4,times=5)) system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE)) system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE, parallel = TRUE)) ## End(Not run)
set.seed(1) x = matrix(rnorm(100 * 20), 100, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(100) gindex = sort(rep(1:4,times=5)) tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE) ## Not run: # Parallel require(doMC) registerDoMC(cores = 2) x = matrix(rnorm(1000 * 20), 1000, 20) beta = c(5,4,3,2,1,rep(0, times = 15)) y = x%*%beta + rnorm(1000) gindex = sort(rep(1:4,times=5)) system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE)) system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, standardize = FALSE, intercept = FALSE, parallel = TRUE)) ## End(Not run)
US real GDP, Chicago National Activity Index, Nonfarm payrolls and ADS Index
data(us_rgdp)
data(us_rgdp)
A list
object.a
data(us_rgdp) us_rgdp$rgdp # - GDP data us_rgdp$cfnai # - CFNAI predictor data us_rgdp$payems # - Nonfarm payrolls predictor data us_rgdp$ads # - ADS predictor data
data(us_rgdp) us_rgdp$rgdp # - GDP data us_rgdp$cfnai # - CFNAI predictor data us_rgdp$payems # - Nonfarm payrolls predictor data us_rgdp$ads # - ADS predictor data