Title: | Sparse Bayesian Models for Regression, Subgroup Analysis, and Panel Data |
---|---|
Description: | Sparse modeling provides a mean selecting a small number of non-zero effects from a large possible number of candidate effects. This package includes a suite of methods for sparse modeling: estimation via EM or MCMC, approximate confidence intervals with nominal coverage, and diagnostic and summary plots. The method can implement sparse linear regression and sparse probit regression. Beyond regression analyses, applications include subgroup analysis, particularly for conjoint experiments, and panel data. Future versions will include extensions to models with truncated outcomes, propensity score, and instrumental variable analysis. |
Authors: | Marc Ratkovic and Dustin Tingley |
Maintainer: | Marc Ratkovic <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2 |
Built: | 2024-12-22 06:48:11 UTC |
Source: | CRAN |
Sparse modeling provides a mean selecting a small number of non-zero effects from a large possible number of candidate effects. This package includes a suite of methods for sparse modeling: estimation via EM or MCMC, approximate confidence intervals with nominal coverage, and diagnostic and summary plots. Beyond regression analyses, applications include subgroup analysis, particularly for conjoint experiments, and panel data. Future versions will include extensions to limited dependent variables, models with truncated outcomes, and propensity score and instrumental variable analysis.
Package: | sparsereg |
Type: | Package |
Version: | 1.0 |
Date: | 2015-03-20 |
License: | GPL (>= 2) |
Marc Ratkovic and Dustin Tingley Maintainer: Marc Ratkovic ([email protected])
Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper.
FindIt
, glmnet
Function for plotting differences in posterior density estimates for separate parameters from sparse regression analysis.
difference(x,type="mode",var1=NULL,var2=NULL,plot.it=TRUE, main="Difference",xlabel="Effect", ylabel="Density")
difference(x,type="mode",var1=NULL,var2=NULL,plot.it=TRUE, main="Difference",xlabel="Effect", ylabel="Density")
x |
Object of class sparsereg. |
type |
Whether to difference the posterior mode or posterior mean. Options are "mode" and "mean". |
var1 , var2
|
Variables names for the effects to difference. |
plot.it |
Whether to plot the density of the difference. |
main , xlabel , ylabel
|
Main title, x-axis label, and y-axis label. |
Generates a density of the estimated posterior of the difference between the effects of two variables.
Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper.
sparsereg, plot.sparsereg, summary.sparsereg, violinplot, print.sparsereg
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) difference(s1,var1=1,var2=2) ## End(Not run)
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) difference(s1,var1=1,var2=2) ## End(Not run)
Function for plotting coefficients from sparsereg analysis.
## S3 method for class 'sparsereg' plot(x,...)
## S3 method for class 'sparsereg' plot(x,...)
x |
Object from output of class sparsereg. |
... |
Additional items to pass to plot. Options below. |
The function returns up to three plots in one figure. Each plot corresponds with main effects, interaction effects, and two-way interactions. Additional options to pass below.
main1, main2, main3 Main titles for plots of main effects, interactive effects, and two-way interactions.
xlabel Label for x-axis.
plot.one Takes on the value of FALSE or 1, 2, or 3, denoting whether to return a single plot for main effects (1), interactive effects (2), or two-way interactions (3).
Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper.
sparsereg, plot.sparsereg, summary.sparsereg, violinplot, difference
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) plot(s1) ## End(Not run)
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) plot(s1) ## End(Not run)
The funciton prints a summary of the estimated posterior mode of each parameter.
## S3 method for class 'sparsereg' print(x,... )
## S3 method for class 'sparsereg' print(x,... )
x |
Object of class sparsereg. |
... |
Additional arguments to pass to print. None supported in this version. |
Uses the summary function from the package coda to return a summary of the posterior mode of a sparsereg object.
Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper.
sparsereg, plot.sparsereg, summary.sparsereg, violinplot, difference
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) print(s1) ## End(Not run)
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) print(s1) ## End(Not run)
Function for fitting a Bayesian LASSOplus model for sparse models with uncertainty, facilitating the discovery of various types of interactions. Function takes a dependent variable, an optional matrix of (pre-treatment) covariates, and a (optional) matrix of categorical treatment variables. Includes correct calculation of uncertainty estimates, including for data with repeated observations.
sparsereg(y, X, treat=NULL, EM=FALSE, gibbs=200, burnin=200, thin=10, type="linear", scale.type="none", baseline.vec=NULL, id=NULL, id2=NULL, id3=NULL, save.temp=FALSE, conservative=TRUE)
sparsereg(y, X, treat=NULL, EM=FALSE, gibbs=200, burnin=200, thin=10, type="linear", scale.type="none", baseline.vec=NULL, id=NULL, id2=NULL, id3=NULL, save.temp=FALSE, conservative=TRUE)
y |
Dependent variable. |
X |
Covariates. Typical vocabulary would refer to these as "pre-treatment" covariates. |
treat |
Matrix of categorical treatment variables. May be a matrix with one column in the case of there being only one treatment variable. |
EM |
Whether to fit model via EM or MCMC. EM is much quicker, but only returns point estimates. MCMC is slower, but returns posterior intervals and approximate confidence intervals. |
gibbs |
Number of posterior samples to save. Between each saved sample, thin samples are drawn. |
burnin |
Number of burnin samples. Between each burnin sample, thin samples are drawn. These iterations will not be included in the resulting analysis. |
thin |
Extent of thinning of the MCMC chain. Between each posterior sample, whether burnin or saved, thin draws are made. |
type |
Type of regression model to fit. Allowed types are linear or probit. |
baseline.vec |
Optional vector with one entry for each column of the treatment matrix. Each entry gives the baseline condition for that treatment, which then during pre-processing is omitted for estimation so it serves as an excluded category. |
id , id2 , id3
|
Vectors the same lenght of the sample denoting clustering in the data. In a conjoint experiment with repeated observations, these correspond with respondent IDs. Up to three different sets of random effects are allowed. |
scale.type |
Indicates the types of interactions that will be created and used in estimation. scale.type="none" generates no interactions and corresponds to simply running LASSOplus with no interactions between variables. scale.type="TX" creates interactions between each X variable and each level of the treatment variables. scale.type="TT" creates interactions between each level of separate treatment variables. scale.type="TTX" interacts each X variable with all values generated by scale.type="TT". Note that users can create their own interactions of interest, select scale.type="none", to return the sparse version of the user specified model. |
save.temp |
Whether to save intermediate output in a file named temp_sparsereg. Useful for very long runs. |
conservative |
Experimental. If set to FALSE, the estimate is less conservative in selecting a variable. |
The function sparsereg allows for estimation of a broad range of sparse regressions. The method allows for continuous, binary, and censored outcomes. In experimental data, it can be used for subgroup analysis. It pre-processes lower-order terms to generate higher-order interactions terms that are uncorrelated with their lower order component, with pre-processing generated through scale.type. In observational data, it can be used in place of a standard regression, especially in the presence of a large number of variables. The method also adjusts uncertainty estimates when there are repeated observations through using random effects. For example, a conjoint design may have the same people make several comparisons, or a panel data regression may have multiple observations on the same unit.
The object contains the estimated posterior for all of the modeled effects, and analyzing the object is facilitated by the functions plot, summary, violinplot, and difference.
beta.mode |
Matrix of sparse (mode) estimates with rows equal to number of effects and columns for posterior samples. |
beta.mean |
Matrix of mean estimates with rows equal to number of effects and columns for posterior samples. These estimates are not sparse, but they do predict better than the mode. |
beta.ci |
Matrix of effects used to calculate approximate confidence intervals. |
sigma.sq |
Vector of posterior estimate of error variance. |
X |
Matrix of covariates fit. Includes interaction terms, depending on scale.type. |
varmat |
Matrix of showing which lower-order terms correspond with which effects. Used in producing figures. |
baseline |
Vector of baseline categories for treatments. |
modeltype |
Type of sparsereg model fit. In this case, onestage. Used by summary functions. |
Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper.
Egami, Naoki and Imai, Kosuke. 2015. "Causal Interaction in High-Dimension." Working paper.
plot.sparsereg, summary.sparsereg, violinplot, difference, print.sparsereg
## Not run: set.seed(1) n<-500 k<-5 treat<-sample(c("a","b","c"),n,replace=TRUE,pr=c(.5,.25,.25)) treat2<-sample(c("a","b","c","d"),n,replace=TRUE,pr=c(.25,.25,.25,.25)) Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,m=rep(0,k),S=Sigma) y.true<-3+X[,2]*2+(treat=="a")*2 +(treat=="b")*(-2)+X[,2]*(treat=="b")*(-2)+ X[,2]*(treat2=="c")*2 y<-y.true+rnorm(n,sd=2) ##Fit a linear model. s1<-sparsereg(y, X, cbind(treat,treat2), scale.type="TX") s1.EM<-sparsereg(y, X, cbind(treat,treat2), EM=TRUE, scale.type="TX") ##Summarize results from MCMC fit summary(s1) plot(s1) violinplot(s1) ##Summarize results from MCMC fit summary(s1.EM) plot(s1.EM) ##Extension using a baseline category s1.base<-sparsereg(y, X, treat, scale.type="TX", baseline.vec="a") summary(s1.base) plot(s1.base) violinplot(s1.base) ## End(Not run)
## Not run: set.seed(1) n<-500 k<-5 treat<-sample(c("a","b","c"),n,replace=TRUE,pr=c(.5,.25,.25)) treat2<-sample(c("a","b","c","d"),n,replace=TRUE,pr=c(.25,.25,.25,.25)) Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,m=rep(0,k),S=Sigma) y.true<-3+X[,2]*2+(treat=="a")*2 +(treat=="b")*(-2)+X[,2]*(treat=="b")*(-2)+ X[,2]*(treat2=="c")*2 y<-y.true+rnorm(n,sd=2) ##Fit a linear model. s1<-sparsereg(y, X, cbind(treat,treat2), scale.type="TX") s1.EM<-sparsereg(y, X, cbind(treat,treat2), EM=TRUE, scale.type="TX") ##Summarize results from MCMC fit summary(s1) plot(s1) violinplot(s1) ##Summarize results from MCMC fit summary(s1.EM) plot(s1.EM) ##Extension using a baseline category s1.base<-sparsereg(y, X, treat, scale.type="TX", baseline.vec="a") summary(s1.base) plot(s1.base) violinplot(s1.base) ## End(Not run)
The function prints and returns a summary table for a sparsereg object.
## S3 method for class 'sparsereg' summary(object,... )
## S3 method for class 'sparsereg' summary(object,... )
object |
Object of type sparsereg. |
... |
Additional items to pass to summary. Options below. |
Generates a table for an object of class sparsereg. Additional arguments to pass summary below.
interval Length of posterior interval to return. Must be between 0 and 1, default is .9. The symmetric interval is returned.
ci Type of interval to return. Options are "quantile" (default) for quantiles and "HPD" for the highest posterior density interval.
order How to order returned coefficients. Options are "magnitude", sorted by magnitude and omitting zero effects, "sort", sorted by size from highest to lowest and omitting zero effects, and "none" which returns all effects
normal Whether to return the normal approximate confidence interval (default of TRUE) or posterior interval (FALSE).
select Either "mode" or a number between 0 and 1. Whether to select variables for printing off the median of the mode (default) or off the probability of being non-zero.
printit Whether to print a summary table.
stage Currently this argument is ignored.
Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper.
sparsereg, plot.sparsereg, violinplot, difference, print.sparsereg
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) summary(s1) ## End(Not run)
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) summary(s1) ## End(Not run)
The function produces a violin plot for specified effects. This can be useful for presenting or examining particular marginal effects of interest.
violinplot(x, columns=NULL, newlabels=NULL, type="mode", stage=NULL)
violinplot(x, columns=NULL, newlabels=NULL, type="mode", stage=NULL)
x |
Object of class sparsereg. |
columns |
A vector of numbers (or strings) corresponding to columns (or column names) to produce plots for. |
newlabels |
New labels for columns rather than variable names in object. If empty, variable names are used. |
type |
Options are "mode" and "mean". Whether to plot the posterior mode or mean. |
stage |
Currently, this argument is ignored. |
Generates a violin plot for coefficients from object from class sparsereg. The desired coefficients can be requested using the columns argument and they can be assigned new names through newlabels.
Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper.
sparsereg, plot.sparsereg, summary.sparsereg, difference, print.sparsereg
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) violinplot(s1,1:3) ## End(Not run)
## Not run: set.seed(1) n<-500 k<-100 Sigma<-diag(k) Sigma[Sigma==0]<-.5 X<-mvrnorm(n,mu=rep(0,k),Sigma=Sigma) y.true<-3+X[,2]*2+X[,3]*(-3) y<-y.true+rnorm(n) ##Fit a linear model with five covariates. s1<-sparsereg(y,X[,1:5]) violinplot(s1,1:3) ## End(Not run)