Title: | Adaptive Approaches for Signal Detection in Pharmacovigilance |
---|---|
Description: | A collection of several pharmacovigilance signal detection methods based on adaptive lasso. Additional lasso-based and propensity score-based signal detection approaches are also supplied. See Courtois et al <doi:10.1186/s12874-021-01450-3>. |
Authors: | Emeline Courtois [cre], Ismaïl Ahmed [aut], Hervé Perdry [ctb] |
Maintainer: | Emeline Courtois <[email protected]> |
License: | GPL-2 |
Version: | 0.2-3 |
Built: | 2024-12-13 06:46:09 UTC |
Source: | CRAN |
This package fits adaptive lasso approaches in high dimension for signal detection in pharmacovigilance. In addition to classical implementations found in the litterature, we implemented two approaches particularly appropriated to variable selections framework, which is the one that stands in pharmacovigilance. We also supply in this package signal detection approaches based on lasso regression and propensity score in high dimension.
Emeline Courtois
Maintainer:
Emeline Courtois <[email protected]>
Fit a first lasso regression and use Bayesian Information Criterion to determine '
adaptive weights (see lasso_bic
function for more details),
then run an adaptive lasso with this penalty weighting.
BIC is used for the adaptive lasso for variable selection.
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
Depends on the glmnet
and relax.glmnet
function from the package
glmnet
.
adapt_bic(x, y, gamma = 1, maxp = 50, path = TRUE, betaPos = TRUE, ...)
adapt_bic(x, y, gamma = 1, maxp = 50, path = TRUE, betaPos = TRUE, ...)
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
gamma |
Tunning parameter to defined the penalty weights. See details below. Default is set to 1. |
maxp |
A limit on how many relaxed coefficients are allowed.
Default is 50, in |
path |
Since |
betaPos |
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is |
... |
Other arguments that can be passed to |
The adaptive weight for a given covariate i is defined by
where
is the NON PENALIZED regression coefficient
associated to covariate
obtained with lasso-bic.
An object with S3 class "adaptive"
.
aws |
Numeric vector of penalty weights derived from lasso-bic. Length equal to nvars. |
criterion |
Character, indicates which criterion is used with the
adaptive lasso for variable selection. For |
beta |
Numeric vector of regression coefficients in the adaptive lasso.
If |
selected_variables |
Character vector, names of variable(s) selected
with this adaptive approach.
If |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) ab <- adapt_bic(x = drugs, y = ae, maxp = 50)
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) ab <- adapt_bic(x = drugs, y = ae, maxp = 50)
Compute the CISL procedure (see cisl
for more details) to determine
adaptive penalty weights, then run an adaptive lasso with this penalty weighting.
BIC is used for the adaptive lasso for variable selection.
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
Depends on the glmnet
function from the package glmnet
.
adapt_cisl( x, y, cisl_nB = 100, cisl_dfmax = 50, cisl_nlambda = 250, cisl_ncore = 1, maxp = 50, path = TRUE, betaPos = TRUE, ... )
adapt_cisl( x, y, cisl_nB = 100, cisl_dfmax = 50, cisl_nlambda = 250, cisl_ncore = 1, maxp = 50, path = TRUE, betaPos = TRUE, ... )
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
cisl_nB |
|
cisl_dfmax |
|
cisl_nlambda |
|
cisl_ncore |
|
maxp |
A limit on how many relaxed coefficients are allowed.
Default is 50, in |
path |
Since |
betaPos |
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is |
... |
Other arguments that can be passed to |
The CISL procedureis first implemented with its default value except for
dfmax
and nlambda
through parameters cisl_dfmax
and
cisl_nlambda
.
In addition, the betaPos
parameter is set to FALSE in cisl
.
For each covariate ,
cisl_nB
values of the CISL quantity
are estimated.
The adaptive weight for a given covariate
is defined by
If is the null vector, the associated adaptve weights in infinty.
If
is always positive, rather than "forcing" the variable into
the model, we set the corresponding adaptive weight to 1/
cisl_nB
.
An object with S3 class "adaptive"
.
aws |
Numeric vector of penalty weights derived from CISL. Length equal to nvars. |
criterion |
Character, indicates which criterion is used with the
adaptive lasso for variable selection. For |
beta |
Numeric vector of regression coefficients in the adaptive lasso.
If |
selected_variables |
Character vector, names of variable(s) selected
with this adaptive approach.
If |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) acisl <- adapt_cisl(x = drugs, y = ae, cisl_nB = 50, maxp=10)
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) acisl <- adapt_cisl(x = drugs, y = ae, cisl_nB = 50, maxp=10)
Fit a first lasso regression with cross-validation to determine adaptive weights.
Run a cross-validation to determine an optimal lambda.
Two options for implementing cross-validation for the adaptive lasso are possible through the type_cv
parameter (see bellow).
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
The cross-validation criterion used is deviance.
Depends on the cv.glmnet
function from the package glmnet
.
adapt_cv( x, y, gamma = 1, nfolds = 5, foldid = NULL, type_cv = "proper", betaPos = TRUE, ... )
adapt_cv( x, y, gamma = 1, nfolds = 5, foldid = NULL, type_cv = "proper", betaPos = TRUE, ... )
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
gamma |
Tunning parameter to defined the penalty weights. See details below. Default is set to 1. |
nfolds |
Number of folds - default is 5. Although |
foldid |
An optional vector of values between 1 and |
type_cv |
Character, indicates which implementation of cross-validation is performed for the adaptive lasso: a "naive" one, where adaptive weights obtained on the full data are used, and a "proper" one, where adaptive weights are calculated for each training sets. Could be either "naive" or "proper". Default is "proper". |
betaPos |
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is |
... |
Other arguments that can be passed to |
The adaptive weight for a given covariate i is defined by
where
is the PENALIZED regression coefficient associated
to covariate
obtained with cross-validation.
An object with S3 class "adaptive"
.
aws |
Numeric vector of penalty weights derived from cross-validation. Length equal to nvars. |
criterion |
Character, indicates which criterion is used with the
adaptive lasso for variable selection. For |
beta |
Numeric vector of regression coefficients in the adaptive lasso.
If |
selected_variables |
Character vector, names of variable(s) selected
with this adaptive approach.
If |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) acv <- adapt_cv(x = drugs, y = ae, nfolds = 5)
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) acv <- adapt_cv(x = drugs, y = ae, nfolds = 5)
Compute odd-ratios between each covariate of x
and y
then derived
adaptive weights to incorporate in an adaptive lasso.
BIC or cross-validation could either be used for the adaptive lasso for variable selection.
Two options for implementing cross-validation for the adaptive lasso are possible through the type_cv
parameter (see bellow).
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
The cross-validation criterion used is deviance.
Depends on the glmnet
and relax.glmnet
function from the package
glmnet
.
adapt_univ( x, y, gamma = 1, criterion = "bic", maxp = 50, path = TRUE, nfolds = 5, foldid = NULL, type_cv = "proper", betaPos = TRUE, ... )
adapt_univ( x, y, gamma = 1, criterion = "bic", maxp = 50, path = TRUE, nfolds = 5, foldid = NULL, type_cv = "proper", betaPos = TRUE, ... )
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
gamma |
Tunning parameter to defined the penalty weights. See details below. Default is set to 1. |
criterion |
Character, indicates which criterion is used with the adaptive lasso for variable selection. Could be either "bic" or "cv". Default is "bic" |
maxp |
Used only if |
path |
Used only if |
nfolds |
Used only if |
foldid |
Used only if |
type_cv |
Used only if |
betaPos |
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is |
... |
Other arguments that can be passed to |
The adaptive weight for a given covariate i is defined by
where
, with
is the odd-ratio associated to covariate
with the outcome.
An object with S3 class "adaptive"
.
aws |
Numeric vector of penalty weights derived from odds-ratios. Length equal to nvars. |
criterion |
Character, same as input. Could be either "bic" or "cv". |
beta |
Numeric vector of regression coefficients in the adaptive lasso.
If |
selected_variables |
Character vector, names of variable(s) selected
with this adaptive approach.
If |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) au <- adapt_univ(x = drugs, y = ae, criterion ="cv", nfolds = 3)
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) au <- adapt_univ(x = drugs, y = ae, criterion ="cv", nfolds = 3)
Implementation of CISL and the stability selection according to subsampling options.
cisl( x, y, r = 4, nB = 100, dfmax = 50, nlambda = 250, nMin = 0, replace = TRUE, betaPos = TRUE, ncore = 1 )
cisl( x, y, r = 4, nB = 100, dfmax = 50, nlambda = 250, nMin = 0, replace = TRUE, betaPos = TRUE, ncore = 1 )
x |
Input matrix, of dimension nobs x nvars. Each row is an
observation vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
r |
Number of control in the CISL sampling. Default is 4. See details below for other implementations. |
nB |
Number of sub-samples. Default is 100. |
dfmax |
Corresponds to the maximum size of the models visited with the lasso (E in the paper). Default is 50. |
nlambda |
Number of lambda values as is |
nMin |
Minimum number of events for a covariate to be considered.
Default is 0, all the covariates from |
replace |
Should sampling be with replacement? Default is TRUE. |
betaPos |
If |
ncore |
The number of calcul units used for parallel computing.
This has to be set to 1 if the |
CISL is a variation of the stability method adapted to characteristics of pharmacovigilance databases.
Tunning r = 4
and replace = TRUE
are used to implement our CISL sampling.
For instance, r = NULL
and replace = FALSE
can be used to
implement the sampling in Stability Selection.
An object with S3 class "cisl"
.
prob |
Matrix of dimension nvars x |
q05 |
5 |
q10 |
10 |
q15 |
15 |
q20 |
20 |
Ismail Ahmed
Ahmed, I., Pariente, A., & Tubert-Bitter, P. (2018). "Class-imbalanced subsampling lasso algorithm for discovering adverse drug reactions". Statistical Methods in Medical Research. 27(3), 785–797, doi:10.1177/0962280216643116
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lcisl <- cisl(x = drugs, y = ae, nB = 50)
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lcisl <- cisl(x = drugs, y = ae, nB = 50)
Simple simulated data, used to demonstrate the features of functions from adapt4cv package.
large sparse and binary matrix with 117160 rows and 300 columns. Drug matrix exposure: each row corresponds to an individual and each column corresponds to a drug.
large spase and binary vector of length 117160. Indicator of the presence/absence of an adverse event for ech individual. Only the first 30 drugs (out of the 300) are associated with the outcome.
data(ExamplePvData)
data(ExamplePvData)
Estimate a propensity score to a given drug exposure by
(i) selecting among other drug covariates in x
which ones to
include in the PS estimation model automatically using lasso-bic
approach,
(ii) estimating a score using a classical logistic regression
with the afore selected covariates.
Internal function, not supposed to be used directly.
est_ps_bic(idx_expo, x, penalty = rep(1, nvars - 1), ...)
est_ps_bic(idx_expo, x, penalty = rep(1, nvars - 1), ...)
idx_expo |
Index of the column in |
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
penalty |
TEST OPTION penalty weights in the variable selection to include in the PS. |
... |
Other arguments that can be passed to |
betaPos
option of lasso_bic
function is set to
FALSE
and maxp
is set to 20.
For optimal storage, the returned elements indicator_expo
and
score
are Matrix with ncol = 1.
An object with S3 class "ps", "bic"
.
expo_name |
Character, name of the drug exposure for which the PS was
estimated. Correspond to |
.
indicator_expo |
One-column Matrix object.
Indicator of the drug exposure for which the PS was estimated.
Defined by |
.
score_variables |
Character vector, names of covariates(s) selected with the lasso-bic approach to include in the PS estimation model. Could be empty. |
score |
One-column Matrix object, the estimated score. |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) psb2 <- est_ps_bic(idx_expo = 2, x = drugs) psb2$score_variables #selected variables to include in the PS model of drug_2
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) psb2 <- est_ps_bic(idx_expo = 2, x = drugs) psb2$score_variables #selected variables to include in the PS model of drug_2
Estimate a propensity score to a given drug exposure by
(i) selecting among other drug covariates in x
which ones to
include in the PS estimation model automatically using hdPS algorithm,
(ii) estimating a score using a classical logistic regression
with the afore selected covariates.
Internal function, not supposed to be used directly.
est_ps_hdps(idx_expo, x, y, keep_total = 20)
est_ps_hdps(idx_expo, x, y, keep_total = 20)
idx_expo |
Index of the column in |
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
keep_total |
number of covariates to include in the PS estimation model according to the hdps algorithm ordering. Default is 20. |
Compared to the situation of the classic use of hdps (i) there is only one dimension (the co-exposition matrix) (ii) no need to expand covariates since they are already binary. In other words, in our situation hdps consists in the "prioritize covariates" step from the original algorithm, using Bross formula. We consider the correction on the interpretation on this formula made by Richard Wyss (drug epi).
An object with S3 class "ps", "hdps"
.
expo_name |
Character, name of the drug exposure for which the PS was
estimated. Correspond to |
.
indicator_expo |
One-column Matrix object. Indicator of the drug
exposure for which the PS was estimated.
Defined by |
.
score_variables |
Character vector, names of covariates(s) selected with the hdPS algorithm to include in the PS estimation model. Could be empty. |
score |
One-column Matrix object, the estimated score. |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
Schneeweiss, S., Rassen, J. A., Glynn, R. J., Avorn, J., Mogun, H., Brookhart, M. A. (2009). "High-dimensional propensity score adjustment in studies of treatment effects using health care claims data". Epidemiology. 20, 512–522, doi:10.1097/EDE.0b013e3181a663cc
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) pshdps2 <- est_ps_hdps(idx_expo = 2, x = drugs, y = ae, keep_total = 10) pshdps2$score_variables #selected variables to include in the PS model of drug_2
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) pshdps2 <- est_ps_hdps(idx_expo = 2, x = drugs, y = ae, keep_total = 10) pshdps2$score_variables #selected variables to include in the PS model of drug_2
Estimate a propensity score to a given drug exposure (treatment)
with extreme gradient boosting.
Depends on xgboost
package.
Internal function, not supposed to be used directly.
est_ps_xgb( idx_expo, x, parameters = list(eta = 0.1, max_depth = 6, objective = "binary:logistic", nthread = 1), nrounds = 200, ... )
est_ps_xgb( idx_expo, x, parameters = list(eta = 0.1, max_depth = 6, objective = "binary:logistic", nthread = 1), nrounds = 200, ... )
idx_expo |
Index of the column in |
x |
Input matrix, of dimension nobs x nvars. Each row is an
observation vector. Can be in sparse matrix format (inherit from class
|
parameters |
correspond to |
nrounds |
Maximum number of boosting iterations. Default is 200. |
... |
Other arguments that can be passed to |
An object with S3 class "ps", "xgb"
.
expo_name |
Character, name of the drug exposure for which the PS was
estimated. Correspond to |
.
indicator_expo |
One-column Matrix object. Indicator of the drug
exposure for which the PS was estimated.
Defined by |
.
score_variables |
Character vector, names of covariates(s) used in
a at list one tree in the gradient tree boosting algorithm.
Obtained with |
score |
One-column Matrix object, the estimated score. |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) psxgb2 <- est_ps_xgb(idx_expo = 2, x = drugs, nrounds = 100) psxgb2$score_variables #selected variables to include in the PS model of drug_2
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) psxgb2 <- est_ps_xgb(idx_expo = 2, x = drugs, nrounds = 100) psxgb2$score_variables #selected variables to include in the PS model of drug_2
Fit a lasso regression and use the Bayesian Information Criterion (BIC)
to select a subset of selected covariates.
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
Depends on the glmnet
and relax.glmnet
functions from the package glmnet
.
lasso_bic(x, y, maxp = 50, path = TRUE, betaPos = TRUE, ...)
lasso_bic(x, y, maxp = 50, path = TRUE, betaPos = TRUE, ...)
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
maxp |
A limit on how many relaxed coefficients are allowed.
Default is 50, in |
path |
Since |
betaPos |
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is |
... |
Other arguments that can be passed to |
For each tested penalisation parameter , a standard version of the BIC
is implemented.
where is the log-likelihood of the non-penalized multiple logistic
regression model that includes the set of covariates with a non-zero coefficient
in the penalised regression coefficient vector associated to
,
and
is the number of covariates with a non-zero coefficient
in the penalised regression coefficient vector associated to
,
The optimal set of covariates according to this approach is the one associated with
the classical multiple logistic regression model which minimizes the BIC.
An object with S3 class "log.lasso"
.
beta |
Numeric vector of regression coefficients in the lasso.
In |
selected_variables |
Character vector, names of variable(s) selected with the
lasso-bic approach.
If |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lb <- lasso_bic(x = drugs, y = ae, maxp = 20)
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lb <- lasso_bic(x = drugs, y = ae, maxp = 20)
cv.glmnet
Fit a first cross-validation on lasso regression and return selected covariates.
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
Depends on the cv.glmnet
function from the package glmnet
.
lasso_cv(x, y, nfolds = 5, foldid = NULL, betaPos = TRUE, ...)
lasso_cv(x, y, nfolds = 5, foldid = NULL, betaPos = TRUE, ...)
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
nfolds |
Number of folds - default is 5. Although |
foldid |
An optional vector of values between 1 and |
betaPos |
Should the covariates selected by the procedure be positively
associated with the outcome ? Default is |
... |
Other arguments that can be passed to |
An object with S3 class "log.lasso"
.
beta |
Numeric vector of regression coefficients in the lasso.
In |
selected_variables |
Character vector, names of variable(s) selected with the
lasso-cv approach.
If |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lcv <- lasso_cv(x = drugs, y = ae, nfolds = 3)
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lcv <- lasso_cv(x = drugs, y = ae, nfolds = 3)
Performed K lasso logistic regression with K different permuted version of the outcome.
For earch of the lasso regression, the (i.e. the smaller
such as all penalized regression coefficients are shrunk to zero)
is obtained.
The median value of these K
is used to for variable selection
in the lasso regression with the non-permuted outcome.
Depends on the
glmnet
function from the package glmnet
.
lasso_perm(x, y, K = 20, keep = NULL, betaPos = TRUE, ncore = 1, ...)
lasso_perm(x, y, K = 20, keep = NULL, betaPos = TRUE, ncore = 1, ...)
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
K |
Number of permutations of |
keep |
Do some variables of |
betaPos |
Should the covariates selected by the procedure be positively
associated with the outcome ? Default is |
ncore |
The number of calcul units used for parallel computing. Default is 1, no parallelization is implemented. |
... |
Other arguments that can be passed to |
The selected with this approach is defined as the closest
from the median value of the K
obtained
with permutation of the outcome.
An object with S3 class "log.lasso"
.
beta |
Numeric vector of regression coefficients in the lasso
In |
selected_variables |
Character vector, names of variable(s) selected with the
lasso-perm approach.
If |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
Sabourin, J. A., Valdar, W., & Nobel, A. B. (2015). "A permutation approach for selecting the penalty parameter in penalized model selection". Biometrics. 71(4), 1185–1194, doi:10.1111/biom.12359
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lp <- lasso_perm(x = drugs, y = ae, K = 10)
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lp <- lasso_perm(x = drugs, y = ae, K = 10)
Implement the adjustment on propensity score for all the drug exposures
of the input drug matrix x
which have more than a given
number of co-occurence with the outcome.
The binary outcome is regressed on a drug exposure and its
estimated PS, for each drug exposure considered after filtering.
With this approach, a p-value is obtained for each drug and a
variable selection is performed over the corrected for multiple
comparisons p-values.
ps_adjust( x, y, n_min = 3, betaPos = TRUE, est_type = "bic", threshold = 0.05, ncore = 1 )
ps_adjust( x, y, n_min = 3, betaPos = TRUE, est_type = "bic", threshold = 0.05, ncore = 1 )
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
n_min |
Numeric, Minimal number of co-occurence between a drug covariate and the outcome y to estimate its score. See details belows. Default is 3. |
betaPos |
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is |
est_type |
Character, indicates which approach is used to estimate the PS. Could be either "bic", "hdps" or "xgb". Default is "bic". |
threshold |
Threshold for the p-values. Default is 0.05. |
ncore |
The number of calcul units used for parallel computing. Default is 1, no parallelization is implemented. |
The PS could be estimated in different ways: using lasso-bic approach,
the hdps algorithm or gradient tree boosting.
The scores are estimated using the default parameter values of
est_ps_bic
, est_ps_hdps
and est_ps_xgb
functions
(see documentation for details).
We apply the same filter and the same multiple testing correction as in
the paper UPCOMING REFERENCE: first, PS are estimated only for drug covariates which have
more than n_min
co-occurence with the outcome y
.
Adjustment on the PS is performed for these covariates and
one sided or two-sided (depend on betaPos
parameter)
p-values are obtained.
The p-values of the covariates not retained after filtering are set to 1.
All these p-values are then adjusted for multiple comparaison with the
Benjamini-Yekutieli correction.
COULD BE VERY LONG. Since this approach (i) estimate a score for several
drug covariates and (ii) perform an adjustment on these scores,
parallelization is highly recommanded.
An object with S3 class "ps", "adjust", "*"
, where
"*"
is "bic"
, "hdps"
or "xgb"
according on how the
score were estimated.
estimates |
Regression coefficients associated with the drug covariates. Numeric, length equal to the number of selected variables with this approach. Some elements could be NA if (i) the corresponding covariate was filtered out, (ii) adjustment model did not converge. Trying to estimate the score in a different way could help, but it's not insured. |
corrected_pvals |
One sided p-values if |
selected_variables |
Character vector, names of variable(s)
selected with the ps-adjust approach.
If |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
Benjamini, Y., & Yekuteli, D. (2001). "The Control of the False Discovery Rate in Multiple Testing under Dependency". The Annals of Statistics. 29(4), 1165–1188, doi: doi:10.1214/aos/1013699998.
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) adjps <- ps_adjust(x = drugs, y = ae, n_min = 10)
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) adjps <- ps_adjust(x = drugs, y = ae, n_min = 10)
Implement the adjustment on propensity score for one drug exposure. The binary outcome is regressed on the drug exposure of interest and its estimated PS. Internal function, not supposed to be used directly.
ps_adjust_one(ps_est, y)
ps_adjust_one(ps_est, y)
ps_est |
An object of class |
y |
Binary response variable, numeric. |
The PS could be estimated in different ways: using lasso-bic approach,
the hdPS algorithm or gradient tree boosting using functions
est_ps_bic
, est_ps_hdps
and est_ps_xgb
respectivelly.
An object with S3 class "ps","adjust"
expo_name |
Character, name of the drug exposure for which the PS was estimated. |
estimate |
Regression coefficient associated with the drug exposure in adjustment on PS. |
pval_1sided |
One sided p-value associated with the drug exposure in adjustment on PS. |
pval_2sided |
Two sided p-value associated with the drug exposure in adjustment on PS. |
Could return NA if the adjustment on the PS did not converge.
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) pshdps2 <- est_ps_hdps(idx_expo = 2, x = drugs, y = ae, keep_total = 10) adjps2 <- ps_adjust_one(ps_est = pshdps2, y = ae) adjps2$estimate #estimated strength of association between drug_2 and the outcome by PS adjustment
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) pshdps2 <- est_ps_hdps(idx_expo = 2, x = drugs, y = ae, keep_total = 10) adjps2 <- ps_adjust_one(ps_est = pshdps2, y = ae) adjps2$estimate #estimated strength of association between drug_2 and the outcome by PS adjustment
Implement the weighting on propensity score with Matching Weights (MW)
or the Inverse Probability of Treatment Weighting (IPTW) for all the
drug exposures of the input drug matrix x
which have more
than a given number of co-occurence with the outcome.
The binary outcome is regressed on a drug exposure through a
classical weighted regression, for each drug exposure
considered after filtering.
With this approach, a p-value is obtained for each drug and a
variable selection is performed over the corrected for multiple
comparisons p-values.
ps_pond( x, y, n_min = 3, betaPos = TRUE, weights_type = c("mw", "iptw"), truncation = FALSE, q = 0.025, est_type = "bic", threshold = 0.05, ncore = 1 )
ps_pond( x, y, n_min = 3, betaPos = TRUE, weights_type = c("mw", "iptw"), truncation = FALSE, q = 0.025, est_type = "bic", threshold = 0.05, ncore = 1 )
x |
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
|
y |
Binary response variable, numeric. |
n_min |
Numeric, Minimal number of co-occurence between a drug covariate and the outcome y to estimate its score. See details belows. Default is 3. |
betaPos |
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is |
weights_type |
Character. Indicates which type of weighting is implemented. Could be either "mw" or "iptw". |
truncation |
Bouleen, should we do weight truncation?
Default is |
q |
If |
est_type |
Character, indicates which approach is used to estimate the propensity score. Could be either "bic", "hdps" or "xgb". Default is "bic". |
threshold |
Threshold for the p-values. Default is 0.05. |
ncore |
The number of calcul units used for parallel computing. Default is 1, no parallelization is implemented. |
The MW are defined by
and weights from IPTW by
where is the drug exposure indicator.
The PS could be estimated in different ways: using lasso-bic approach,
the hdps algorithm or gradient tree boosting.
The scores are estimated using the default parameter values of
est_ps_bic
, est_ps_hdps
and est_ps_xgb
functions
(see documentation for details).
We apply the same filter and the same multiple testing correction as in
the paper UPCOMING REFERENCE: first, PS are estimated only for drug covariates which have
more than n_min
co-occurence with the outcome y
.
Adjustment on the PS is performed for these covariates and
one sided or two-sided (depend on betaPos
parameter)
p-values are obtained.
The p-values of the covariates not retained after filtering are set to 1.
All these p-values are then adjusted for multiple comparaison with the
Benjamini-Yekutieli correction.
COULD BE VERY LONG. Since this approach (i) estimate a score for several
drug covariates and (ii) perform an adjustment on these scores,
parallelization is highly recommanded.
An object with S3 class "ps", "*" ,"**"
,
where "*"
is "mw"
or "iptw"
, same as the
input parameter weights_type
, and "**"
is
"bic"
, "hdps"
or "xgb"
according on how the
score was estimated.
estimates |
Regression coefficients associated with the drug covariates. Numeric, length equal to the number of selected variables with this approach. Some elements could be NA if (i) the corresponding covariate was filtered out, (ii) weigted regression did not converge. Trying to estimate the score in a different way could help, but it's not insured. |
corrected_pvals |
One sided p-values if |
selected_variables |
Character vector, names of variable(s)
selected with the weighting on PS based approach.
If |
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
Benjamini, Y., & Yekuteli, D. (2001). "The Control of the False Discovery Rate in Multiple Testing under Dependency". The Annals of Statistics. 29(4), 1165–1188, doi: doi:10.1214/aos/1013699998.
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) pondps <- ps_pond(x = drugs, y = ae, n_min = 10, weights_type = "iptw")
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) pondps <- ps_pond(x = drugs, y = ae, n_min = 10, weights_type = "iptw")
Implement the weighting on propensity score with Matching Weights (MW) or the Inverse Probability of Treatment Weighting (IPTW) for one drug exposure. The binary outcome is regressed on the drug exposure of interest through a classical weighted regression. Internal function, not supposed to be used directly.
ps_pond_one( ps_est, y, weights_type = c("mw", "iptw"), truncation = FALSE, q = 0.025 )
ps_pond_one( ps_est, y, weights_type = c("mw", "iptw"), truncation = FALSE, q = 0.025 )
ps_est |
An object of class |
y |
Binary response variable, numeric. |
weights_type |
Character. Indicates which type of weighting is implemented. Could be either "mw" or "iptw". |
truncation |
Bouleen, should we do weight truncation?
Default is |
q |
If |
The MW are defined by
and weights from IPTW by
where is the drug exposure indicator.
The PS could be estimated in different ways: using lasso-bic approach,
the hdPS algorithm or gradient tree boosting using functions
est_ps_bic
, est_ps_hdps
and est_ps_xgb
respectivelly.
An object with S3 class "ps","*"
,
where "*"
is "mw"
or "iptw"
, same as the
input parameter weights_type
expo_name |
Character, name of the drug exposure for which the PS was estimated. |
estimate |
Regression coefficient associated with the drug exposure in adjustment on PS. |
pval_1sided |
One sided p-value associated with the drug exposure in adjustment on PS. |
pval_2sided |
Two sided p-value associated with the drug exposure in adjustment on PS. |
Could return NA if the adjustment on the PS did not converge.
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) pshdps2 <- est_ps_hdps(idx_expo = 2, x = drugs, y = ae, keep_total = 10) pondps2 <- ps_pond_one(ps_est = pshdps2, y = ae, weights_type = "iptw") pondps2$estimate #estimated strength of association between drug_2 and the outcome by PS weighting
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) pshdps2 <- est_ps_hdps(idx_expo = 2, x = drugs, y = ae, keep_total = 10) pondps2 <- ps_pond_one(ps_est = pshdps2, y = ae, weights_type = "iptw") pondps2$estimate #estimated strength of association between drug_2 and the outcome by PS weighting
Return the Sensitivity and the False Discovery Rate of an approach implemeted by the main functions of adapt4pv package.
summary_stat(object, true_pos, q = 10)
summary_stat(object, true_pos, q = 10)
object |
An object of class |
true_pos |
Character vector, names of the true positives controls |
q |
Quantile value for variable selection with
an object of class |
A data frame wich details for the signal detection method
implemented in object
: its number of generated signals, its
sensitivity and its false discovery rate.
Emeline Courtois
Maintainer: Emeline Courtois
[email protected]
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lcv <- lasso_cv(x = drugs, y = ae, nfolds = 3) summary_stat(object = lcv, true_pos = colnames(drugs)[1:10]) # the data are not simulated in such a way that there are true positives
set.seed(15) drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20) colnames(drugs) <- paste0("drugs",1:ncol(drugs)) ae <- rbinom(100, 1, 0.3) lcv <- lasso_cv(x = drugs, y = ae, nfolds = 3) summary_stat(object = lcv, true_pos = colnames(drugs)[1:10]) # the data are not simulated in such a way that there are true positives