Package 'PoSIAdjRSquared' reference manual

Title:	Post-Selection Inference for Adjusted R Squared
Description:	Conduct post-selection inference for regression coefficients in linear models after they have been selected by adjusted R squared. The p-values and confidence intervals are valid after model selection with the same data. This allows the user to use all data for both model selection and inference without losing control over the type I error rate. The provided tests are more powerful than data splitting, which bases inference on less data since it discards all information used for selection.
Authors:	Sarah Pirenne [aut, cre] , Gerda Claeskens [aut]
Maintainer:	Sarah Pirenne <sarah.pirenne@kuleuven.be>
License:	MIT + file LICENSE
Version:	0.0.0.1
Built:	2025-03-25 06:37:03 UTC
Source:	CRAN

Compute post-selection confidence interval with specified interval

Description

This function inverts a post-selection p-value to a confidence interval.

Usage

 compute_ci_with_specified_interval(z_interval,etaj,etajTy,Sigma,tn_mu,alpha)
compute_ci_with_specified_interval(z_interval,etaj,etajTy,Sigma,tn_mu,alpha)

Arguments

`z_interval`	The intervals of type "list" where the OLS estimator gets selected: can be obtained from function "solve_selection_event"
`etaj`	Vector of type "matrix" and dimension nx1: useful in orthogonal decomposition of y (see Lemma 1 for details)
`etajTy`	The OLS estimator of the j'th selected coefficient in the selected model of type "matrix" and dimension 1x1
`Sigma`	The variance covariance matrix of dimension nxn of the error in the model
`tn_mu`	Integer for the mean of the truncated sampling distribution of the test statistic under the null hypothesis: for example, if you want to test beta_j=0, specify 0 for the mean
`alpha`	Integer for the desired significance level of the confidence interval

Value

`ci`	The two-sided (1-alpha)% confidence interval valid after model selection

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Examples

  # Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat
  k <- result$k
  R_M_phat <- result$R_M_phat
  kappa_M_phat <- result$kappa_M_phat
  R_M_k <- result$R_M_k
  kappa_M_k <- result$kappa_M_k

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
  a <- Construct_test$a
  b <- Construct_test$b
  etaj <- Construct_test$etaj
  etajTy <- Construct_test$etajTy

  # Solve selection event
  Solve <- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
  z_interval <- Solve$z_interval

  # Post-selection confidence interval
  compute_ci_with_specified_interval(z_interval, etaj, etajTy, Sigma, tn_mu = 0, alpha = 0.05)
# Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat
  k <- result$k
  R_M_phat <- result$R_M_phat
  kappa_M_phat <- result$kappa_M_phat
  R_M_k <- result$R_M_k
  kappa_M_k <- result$kappa_M_k

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
  a <- Construct_test$a
  b <- Construct_test$b
  etaj <- Construct_test$etaj
  etajTy <- Construct_test$etajTy

  # Solve selection event
  Solve <- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
  z_interval <- Solve$z_interval

  # Post-selection confidence interval
  compute_ci_with_specified_interval(z_interval, etaj, etajTy, Sigma, tn_mu = 0, alpha = 0.05)

Construct adjusted R squared

Description

This function computes the adjusted R squared and returns some useful matrices from this computation.

Usage

construct_adj_r_squared(X, k, y, n, intercept = c(TRUE, FALSE), sst)
construct_adj_r_squared(X, k, y, n, intercept = c(TRUE, FALSE), sst)

Arguments

`X`	Design matrix of type "matrix" and dimension nxp
`k`	Index set included in model k
`y`	Response vector of type "matrix" and dimension nx1
`n`	An integer for the sample size
`intercept`	Logical value: TRUE if fitted models should contain intercept, FALSE if not
`sst`	An integer for the total sum of squares

Value

`X_M_k`	The design matrix of model k
`P_M_k`	The projection matrix of model k
`R_M_k`	The orthogonal projection matrix of model k
`kappa_M_k`	Adjustment factor for model complexity kappa of model k
`adj_r_squared`	The adjusted R squared value of model k

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Examples

# Generate data
n <- 100
k <- 1:10
Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
X <- Data$X
y <- Data$y
sst <- sum((y-mean(y))^2)

construct_adj_r_squared(X, k, y, n, intercept=FALSE, sst)
# Generate data
n <- 100
k <- 1:10
Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
X <- Data$X
y <- Data$y
sst <- sum((y-mean(y))^2)

construct_adj_r_squared(X, k, y, n, intercept=FALSE, sst)

This function contructs the selection event by computing c_k, d_k and e_k which are the constants in the quadratic inequalities which characterize the model selection event. The function is used internally by the function solve_selection_event, which returns the intervals of the OLS estimator where the selection event takes place.

Usage

construct_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat)
construct_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat)

Arguments

`a`	Residual vector of type "matrix" and dimension nx1 (see Lemma 1 for details)
`b`	Vector of type "matrix" and dimension nx1: useful in orthogonal decomposition of y (see Lemma 1 for details)
`R_M_k`	The orthogonal projection matrix of model k
`kappa_M_k`	Adjustment factor for model complexity kappa of model k
`R_M_phat`	The orthogonal projection matrix of the selected model
`kappa_M_phat`	Adjustment factor for model complexity kappa of the selected model

Value

`c_k`	Constant c_k in the quadratic inequality c_kZ^2+d_kZ+e_k>=0 which characterizes the model selection event of the selected model compared to model k (see Lemma 1 for details)
`d_k`	Constant d_k in the quadratic inequality c_kZ^2+d_kZ+e_k>=0 which characterizes the model selection event of the selected model compared to model k (see Lemma 1 for details)
`e_k`	Constant e_k in the quadratic inequality c_kZ^2+d_kZ+e_k>=0 which characterizes the model selection event of the selected model compared to model k (see Lemma 1 for details)

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Construct test statistic

Description

This function constructs the OLS estimator of the j'th selected coefficient in the selected model. The functions also returns some useful vectors for post-selection inference (a and b).

Usage

 construct_test_statistic(j, X_M_phat, y, phat, Sigma, intercept)
construct_test_statistic(j, X_M_phat, y, phat, Sigma, intercept)

Arguments

`j`	The index of type "integer" of the regression coefficient
`X_M_phat`	The design matrix in the selected model
`y`	Response vector of type "matrix" and dimension nx1
`phat`	Index set included in the selected model
`Sigma`	The variance covariance matrix of dimension nxn of the error in the model
`intercept`	Logical value: TRUE if the selected model contains an intercept, FALSE if not

Value

`etaj`	Vector of type "matrix" and dimension nx1: useful in orthogonal decomposition of y (see Lemma 1 for details)
`etajTy`	The OLS estimator of the j'th selected coefficient in the selected model of type "matrix" and dimension 1x1
`a`	Residual vector of type "matrix" and dimension nx1 (see Lemma 1 for details)
`b`	Vector of type "matrix" and dimension nx1: useful in orthogonal decomposition of y (see Lemma 1 for details)

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Examples

  # Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
# Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)

Data generation normal

Description

Function to generate data according to the linear model of the form Y = X*beta + epsilon where the noise epsilon follows a standard normal distribution.

Usage

  datagen.norm(seed, n, p, rho, beta_vec)
datagen.norm(seed, n, p, rho, beta_vec)

Arguments

`seed`	Integer for seed
`n`	Integer for sample size
`p`	Integer for number of variables in the design matrix
`rho`	Integer for correlation between variables in the design matrix
`beta_vec`	True regression coefficient vector of length p

Value

`X`	Design matrix of type "matrix" and dimension nxp
`y`	Response vector of type "matrix" and dimension nx1
`true_y`	True response vector, i.e. without the noise, of type "matrix" and dimension nx1

Examples

  datagen.norm(seed = 7, n = 100, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
datagen.norm(seed = 7, n = 100, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))

Data generation normal with intercept

Description

Function to generate data according to the linear model of the form Y = X*beta + epsilon where the noise epsilon follows a standard normal distribution and the first column of X consists of 1's such that an intercept is included in the model.

Usage

  datagen.norm.intercept(seed, n, p, rho, beta_vec)
datagen.norm.intercept(seed, n, p, rho, beta_vec)

Arguments

`seed`	Integer for seed
`n`	Integer for sample size
`p`	Integer for number of variables in the design matrix
`rho`	Integer for correlation between variables in the design matrix
`beta_vec`	True regression coefficient vector of length p

Value

`X`	Design matrix of type "matrix" and dimension nxp
`y`	Response vector of type "matrix" and dimension nx1
`true_y`	True response vector, i.e. without the noise, of type "matrix" and dimension nx1

Examples

  datagen.norm.intercept(seed = 7, n = 100, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
datagen.norm.intercept(seed = 7, n = 100, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))

Equal tailed interval

Description

This function inverts a post-selection p-value to a confidence interval.

Usage

equal_tailed_interval(z_interval, etajTy, alpha, tn_mu, tn_sigma)
equal_tailed_interval(z_interval, etajTy, alpha, tn_mu, tn_sigma)

Arguments

`z_interval`	The intervals of type "list" where the OLS estimator gets selected: can be obtained from function "solve_selection_event"
`etajTy`	The OLS estimator of the j'th selected coefficient in the selected model of type "matrix" and dimension 1x1
`alpha`	Integer for the desired significance level of the confidence interval
`tn_mu`	Integer for the mean of the truncated sampling distribution of the test statistic under the null hypothesis: for example, if you want to test beta_j=0, specify 0 for the mean
`tn_sigma`	Integer for the variance of the truncated sampling distribution of the test statistic

Value

`L`	lower bound
`U`	upper bound

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Examples

# Generate data
n <- 100
Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
X <- Data$X
y <- Data$y

# Select model
result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
phat <- result$phat
X_M_phat <- result$X_M_phat
k <- result$k
R_M_phat <- result$R_M_phat
kappa_M_phat <- result$kappa_M_phat
R_M_k <- result$R_M_k
kappa_M_k <- result$kappa_M_k

# Estimate Sigma from residuals of full model
full_model <- lm(y ~ 0 + X)
sigma_hat <- sd(resid(full_model))
Sigma <- diag(n)*(sigma_hat)^2

# Construct test statistic
Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
a <- Construct_test$a
b <- Construct_test$b
etaj <- Construct_test$etaj
etajTy <- Construct_test$etajTy

# Solve selection event
Solve <- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
z_interval <- Solve$z_interval

# Post-selection confidence interval
tn_sigma <- sqrt((t(etaj)%*%Sigma)%*%etaj)
equal_tailed_interval(z_interval, etajTy, alpha = 0.05, tn_mu = 0, tn_sigma)

# Generate data
n <- 100
Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
X <- Data$X
y <- Data$y

# Select model
result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
phat <- result$phat
X_M_phat <- result$X_M_phat
k <- result$k
R_M_phat <- result$R_M_phat
kappa_M_phat <- result$kappa_M_phat
R_M_k <- result$R_M_k
kappa_M_k <- result$kappa_M_k

# Estimate Sigma from residuals of full model
full_model <- lm(y ~ 0 + X)
sigma_hat <- sd(resid(full_model))
Sigma <- diag(n)*(sigma_hat)^2

# Construct test statistic
Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
a <- Construct_test$a
b <- Construct_test$b
etaj <- Construct_test$etaj
etajTy <- Construct_test$etajTy

# Solve selection event
Solve <- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
z_interval <- Solve$z_interval

# Post-selection confidence interval
tn_sigma <- sqrt((t(etaj)%*%Sigma)%*%etaj)
equal_tailed_interval(z_interval, etajTy, alpha = 0.05, tn_mu = 0, tn_sigma)

f

Description

Function used internally by compute_ci_with_specified_interval for calculating valid confidence intervals post-selection.

Arguments

`z_interval`	The intervals of type "list" where the OLS estimator gets selected: can be obtained from function "solve_selection_event"
`etajTy`	The OLS estimator of the j'th selected coefficient in the selected model of type "matrix" and dimension 1x1
`mu`	Integer for the mean of the truncated sampling distribution of the test statistic (updated iteratively in compute_ci_with_specified_interval)
`tn_sigma`	Integer for the variance of the truncated sampling distribution of the test statistic

Value

The cumulative distribution function of a truncated gaussian distribution evaluated in the observed test statistic

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Find root

Description

This function is used internally by the function compute_ci_with_specified_interval for inverting post-selection p-values to confidence intervals.

Usage

find_root(z_interval, etajTy, tn_sigma, y, lb, ub, tol=1e-6)
find_root(z_interval, etajTy, tn_sigma, y, lb, ub, tol=1e-6)

Arguments

`z_interval`	The intervals of type "list" where the OLS estimator gets selected: can be obtained from function "solve_selection_event"
`etajTy`	The OLS estimator of the j'th selected coefficient in the selected model of type "matrix" and dimension 1x1
`tn_sigma`	Integer for the variance of the truncated sampling distribution of the test statistic
`y`	For example 1.0-0.5alpha for finding the lower bound of a (1-alpha)% confidence interval, and 0.5alpha for finding the upper bound of a (1-alpha)% confidence interval
`lb`	Lower bound in current iteration
`ub`	Upper bound in current iteration
`tol`	Tolerance parameter: default set to 1e-6

Value

Returns confidence interval bound

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Fit all subset linear models

Description

This function fits all possible combinations of linear models and returns the selected model based on adjusted R^2.

Usage

 fit_all_subset_linear_models(y, X, intercept)
fit_all_subset_linear_models(y, X, intercept)

Arguments

`y`	Response vector of type "matrix" and dimension nx1
`X`	Design matrix of type "matrix" and dimension nxp
`intercept`	Logical value: TRUE if fitted models should contain intercept, FALSE if not

Value

`k`	Index set included in model k
`best_model`	The selected model fit (lm object)
`phat`	Index set included in the selected model
`X_M_phat`	The design matrix in the selected model
`best_adj_r_squared`	The adjusted R^2 value of the selected model
`R_M_phat`	The orthogonal projection matrix of the selected model
`kappa_M_phat`	Adjustment factor for model complexity kappa of the selected model
`R_M_k`	The orthogonal projection matrix of model k
`kappa_M_k`	Adjustment factor for model complexity kappa of model k

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Examples

  # Generate data
  Data <- datagen.norm(seed = 7, n = 100, p = 3, rho = 0, beta_vec = c(1,0.5,0))
  X <- Data$X
  y <- Data$y

  # Select model
  fit_all_subset_linear_models(y, X, intercept=FALSE)
# Generate data
  Data <- datagen.norm(seed = 7, n = 100, p = 3, rho = 0, beta_vec = c(1,0.5,0))
  X <- Data$X
  y <- Data$y

  # Select model
  fit_all_subset_linear_models(y, X, intercept=FALSE)

Fit all linear models of a specified size

Description

This function fits all possible combinations of a pre-specified size of linear models and returns the selected model based on adjusted R^2.

Usage

 fit_specified_size_subset_linear_models(y, X, size, intercept)
fit_specified_size_subset_linear_models(y, X, size, intercept)

Arguments

`y`	Response vector of type "matrix" and dimension nx1
`X`	Design matrix of type "matrix" and dimension nxp
`size`	Size of type "integer" of the fitted models
`intercept`	Logical value: TRUE if fitted models should contain intercept, FALSE if not

Value

`k`	Index set included in model k
`best_model`	The selected model fit (lm object)
`phat`	Index set included in the selected model
`X_M_phat`	The design matrix in the selected model
`best_adj_r_squared`	The adjusted R^2 value of the selected model
`R_M_phat`	The orthogonal projection matrix of the selected model
`kappa_M_phat`	Adjustment factor for model complexity kappa of the selected model
`R_M_k`	The orthogonal projection matrix of model k
`kappa_M_k`	Adjustment factor for model complexity kappa of model k

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Examples

  # Generate data
  Data <- datagen.norm(seed = 7, n = 100, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  fit_specified_size_subset_linear_models(y, X, size = 9, intercept=FALSE)
# Generate data
  Data <- datagen.norm(seed = 7, n = 100, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  fit_specified_size_subset_linear_models(y, X, size = 9, intercept=FALSE)

Pivot with specified interval

Description

This function returns the value of the cumulative distribution function of a truncated gaussian distribution evaluated in the observed test statistic. Its output is used by the function postselp_value_specified_interval by taking 2*min(output,1-output) for a p-value for a two-sided test.

Usage

pivot_with_specified_interval(z_interval, etaj, etajTy, tn_mu, tn_sigma)
pivot_with_specified_interval(z_interval, etaj, etajTy, tn_mu, tn_sigma)

Arguments

`z_interval`	The intervals of type "list" where the OLS estimator gets selected: can be obtained from function "solve_selection_event"
`etaj`	Vector of type "matrix" and dimension nx1: useful in orthogonal decomposition of y (see Lemma 1 for details)
`etajTy`	The OLS estimator of the j'th selected coefficient in the selected model of type "matrix" and dimension 1x1
`tn_mu`	Integer for the mean of the truncated sampling distribution of the test statistic under the null hypothesis: for example, if you want to test beta_j=0, specify 0 for the mean
`tn_sigma`	Integer for the variance of the truncated sampling distribution of the test statistic

Value

The cumulative distribution function of a truncated gaussian distribution evaluated in the observed test statistic

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Examples

  # Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat
  k <- result$k
  R_M_phat <- result$R_M_phat
  kappa_M_phat <- result$kappa_M_phat
  R_M_k <- result$R_M_k
  kappa_M_k <- result$kappa_M_k

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
  a <- Construct_test$a
  b <- Construct_test$b
  etaj <- Construct_test$etaj
  etajTy <- Construct_test$etajTy

  # Solve selection event
  Solve <- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
  z_interval <- Solve$z_interval

  # Post-selection inference for beta_j=0
  tn_sigma <- sqrt((t(etaj)%*%Sigma)%*%etaj)
  pivot_with_specified_interval(z_interval, etaj, etajTy, tn_mu = 0, tn_sigma)
# Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat
  k <- result$k
  R_M_phat <- result$R_M_phat
  kappa_M_phat <- result$kappa_M_phat
  R_M_k <- result$R_M_k
  kappa_M_k <- result$kappa_M_k

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
  a <- Construct_test$a
  b <- Construct_test$b
  etaj <- Construct_test$etaj
  etajTy <- Construct_test$etajTy

  # Solve selection event
  Solve <- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
  z_interval <- Solve$z_interval

  # Post-selection inference for beta_j=0
  tn_sigma <- sqrt((t(etaj)%*%Sigma)%*%etaj)
  pivot_with_specified_interval(z_interval, etaj, etajTy, tn_mu = 0, tn_sigma)

Post-selection p-value specified interval

Description

This function returns a p-value for the test whether the regression coefficient equals tn_mu (e.g. 0) with a two-sided alternative. The p-value is valid given the model selection, because it conditions on the specified intervals of the OLS estimator where the regression coefficient actually gets selected. The intervals contained in the object "z_interval" can be obtained from the function "solve_selection_event".

Usage

 postselp_value_specified_interval(z_interval, etaj, etajTy, tn_mu, tn_sigma)
postselp_value_specified_interval(z_interval, etaj, etajTy, tn_mu, tn_sigma)

Arguments

`z_interval`	The intervals of type "list" where the OLS estimator gets selected: can be obtained from function "solve_selection_event"
`etaj`	Vector of type "matrix" and dimension nx1: useful in orthogonal decomposition of y (see Lemma 1 for details)
`etajTy`	The OLS estimator of the j'th selected coefficient in the selected model of type "matrix" and dimension 1x1
`tn_mu`	Integer for the mean of the truncated sampling distribution of the test statistic under the null hypothesis: for example, if you want to test beta_j=0, specify 0 for the mean
`tn_sigma`	Integer for the variance of the truncated sampling distribution of the test statistic

Value

p_value

The p-value for a two-sided test which is valid after model selection

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Examples

  # Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat
  k <- result$k
  R_M_phat <- result$R_M_phat
  kappa_M_phat <- result$kappa_M_phat
  R_M_k <- result$R_M_k
  kappa_M_k <- result$kappa_M_k

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
  a <- Construct_test$a
  b <- Construct_test$b
  etaj <- Construct_test$etaj
  etajTy <- Construct_test$etajTy

  # Solve selection event
  Solve <- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
  z_interval <- Solve$z_interval

  # Post-selection inference for beta_j=0
  tn_sigma <- sqrt((t(etaj)%*%Sigma)%*%etaj)
  postselp_value_specified_interval(z_interval, etaj, etajTy, tn_mu = 0, tn_sigma)
# Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat
  k <- result$k
  R_M_phat <- result$R_M_phat
  kappa_M_phat <- result$kappa_M_phat
  R_M_k <- result$R_M_k
  kappa_M_k <- result$kappa_M_k

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
  a <- Construct_test$a
  b <- Construct_test$b
  etaj <- Construct_test$etaj
  etajTy <- Construct_test$etajTy

  # Solve selection event
  Solve <- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
  z_interval <- Solve$z_interval

  # Post-selection inference for beta_j=0
  tn_sigma <- sqrt((t(etaj)%*%Sigma)%*%etaj)
  postselp_value_specified_interval(z_interval, etaj, etajTy, tn_mu = 0, tn_sigma)

Solve selection event

Description

This function solves the selection event by calculating the intervals of the OLS estimator where the regression coefficient actually gets selected.

Usage

 solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)

Arguments

`a`	Residual vector of type "matrix" and dimension nx1 (see Lemma 1 for details)
`b`	Vector of type "matrix" and dimension nx1: useful in orthogonal decomposition of y (see Lemma 1 for details)
`R_M_k`	The orthogonal projection matrix of model k
`kappa_M_k`	Adjustment factor for model complexity kappa of model k
`R_M_phat`	The orthogonal projection matrix of the selected model
`kappa_M_phat`	Adjustment factor for model complexity kappa of the selected model
`k`	Index set included in model k

Value

`intervals_list`	The intervals of the OLS estimator for which the inequality in Lemma 1 holds
`z_interval`	The intersection of the intervals of the OLS estimator for which the inequality in Lemma 1 holds: post-selection inference is conditioned on those intervals

Note

This function will give the error message: "Error in if (D >= 0.001): missing value where TRUE/FALSE needed" if it is run for a coefficient which is not part of the selected model. Only run solve_selection_event for selected indices.

References

Pirenne, S. and Claeskens, G. (2024). Exact Post-Selection Inference for Adjusted R Squared.

Examples

  # Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat
  k <- result$k
  R_M_phat <- result$R_M_phat
  kappa_M_phat <- result$kappa_M_phat
  R_M_k <- result$R_M_k
  kappa_M_k <- result$kappa_M_k

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
  a <- Construct_test$a
  b <- Construct_test$b

  # Solve selection event
  solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
# Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat
  k <- result$k
  R_M_phat <- result$R_M_phat
  kappa_M_phat <- result$kappa_M_phat
  R_M_k <- result$R_M_k
  kappa_M_k <- result$kappa_M_k

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
  a <- Construct_test$a
  b <- Construct_test$b

  # Solve selection event
  solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)

Package 'PoSIAdjRSquared'

Help Index

Compute post-selection confidence interval with specified interval

Description

Usage

Arguments

Value

References

Examples

Construct adjusted R squared

Description

Usage

Arguments

Value

References

Examples

Construct selection event

Description

Usage

Arguments

Value

References

Construct test statistic

Description

Usage

Arguments

Value

References

Examples

Data generation normal

Description

Usage

Arguments

Value

Examples

Data generation normal with intercept

Description

Usage

Arguments

Value

Examples

Equal tailed interval

Description

Usage

Arguments

Value

References

Examples

f

Description

Arguments

Value

References

Find root

Description

Usage

Arguments

Value

References

Fit all subset linear models

Description

Usage

Arguments

Value

References

Examples

Fit all linear models of a specified size

Description

Usage

Arguments

Value

References

Examples

Pivot with specified interval

Description

Usage

Arguments

Value

References

Examples