Title: | Easy, Fast, and Pretty Specification Curve Analysis |
---|---|
Description: | Making specification curve analysis easy, fast, and pretty. It improves upon existing offerings with additional features and 'tidyverse' integration. Users can easily visualize and evaluate how their models behave under different specifications with a high degree of customization. For a description and applications of specification curve analysis see Simonsohn, Simmons, and Nelson (2020) <doi:10.1038/s41562-020-0912-z>. |
Authors: | Zayne Sember [aut, cre, cph] |
Maintainer: | Zayne Sember <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.2 |
Built: | 2024-12-09 06:32:17 UTC |
Source: | CRAN |
A subset of data from the California Cooperative Oceanic Fisheries Investigations. Each observation describes a sample of ocean water collected.
bottles
bottles
## 'bottles' A data frame with 500 rows and 62 columns:
Cast count
Bottle Count
Line and Station
Depth ID
Bottle depth in meters
Water temperature in degrees Celsius
Salinity (Practical Salinity Scale 1978)
Milliliters of oxygen per liter of seawater
Potential Density (Sigma Theta), Kg/M³
Oxygen percent saturation
Oxygen micromoles per kilogram seawater
Niskin bottle sample was collected from
Record Indicator
Temperature Precision
Quality Code
Salinity Precision
Quality Code
Quality Code
Quality Code
Quality Code
Quality Code
Micrograms Chlorophyll-a per liter seawater
Quality Code
Micrograms Phaeopigment per liter seawater
Quality Code
Micromoles Phosphate per liter of seawater
Quality Code
Micromoles Silicate per liter of seawater
Quality Code
Micromoles Nitrite per liter of seawater
Quality Code
Micromoles Nitrate per liter of seawater
Quality Code
Micromoles Ammonia per liter of seawater
Quality Code
14C Assimilation of Replicate 1
Precision of 14C Assimilation of Replicate 1
Quality Code
14C Assimilation of Replicate 2
Precision of 14C Assimilation of Replicate 2
Quality Code
14C Assimilation of Dark/Control Bottle
Precision of 14C Assimilationof Dark/Control Bottle
Quality Code
Mean 14C Assimilation of Replicates 1 and 2
Precision of Mean 14C Assimilation of Replicates 1 and 2
Quality Code
Elapsed incubation time of primary productivity experiment
Light intensities of the incubation tubes
Reported Depth (from pressure) in meters
Reported (Potential) Temperature in degrees Celsius
Reported Salinity (from Specific Volume Anomoly, M³/Kg)
Reported Dynamic Height in units of dynamic meters
Reported Ammonium concentration
Reported Oxygen micromoles/kilogram
Dissolved Inorganic Carbon micromoles per kilogram solution
Dissolved Inorganic Carbon on a replicate sample
Total Alkalinity micromoles per kilogram solution
Total Alkalinity on a replicate sample
pH (the degree of acidity/alkalinity of a solution)
pH on a replicate sample
Quality Comment
<https://calcofi.org/data/oceanographic-data/bottle-database/>
Extracts the control variable names and coefficients from a model summary.
controlExtractor(model, x, feols_model = F)
controlExtractor(model, x, feols_model = F)
model |
A model summary object. |
x |
A string containing the independent variable name. |
feols_model |
An indicator for whether 'model' is a 'fixest::feols()' model. Defaults to 'FALSE'. |
A dataframe with two columns, 'term' contains the name of the control and 'coef' contains the coefficient estimate.
m <- summary(lm(Salnty ~ STheta + T_degC, bottles)) controlExtractor(model = m, x = "STheta"); m <- summary(lm(Salnty ~ STheta*T_degC + O2Sat, bottles)) controlExtractor(model = m, x = "STheta");
m <- summary(lm(Salnty ~ STheta + T_degC, bottles)) controlExtractor(model = m, x = "STheta"); m <- summary(lm(Salnty ~ STheta*T_degC + O2Sat, bottles)) controlExtractor(model = m, x = "STheta");
Removes duplicate control variables from user input.
duplicate_remover(controls, x)
duplicate_remover(controls, x)
controls |
A vector of strings containing control variable names. |
x |
A string containing the independent variable name. |
A vector of strings containing control variable names
duplicate_remover(controls = c("control1", "control2*control3"), x = "independentVariable");
duplicate_remover(controls = c("control1", "control2*control3"), x = "independentVariable");
Builds models formulae with every combination of control variables possible.
formula_builder(y, x, controls, fixedEffects = NA)
formula_builder(y, x, controls, fixedEffects = NA)
y |
A string containing the dependent variable name. |
x |
A string containing the independent variable name. |
controls |
A vector of strings containing control variable names. |
fixedEffects |
A string containing the name of a variable to use for fixed effects, defaults to 'NA' indicating no fixed effects desired. |
A vector of formula objects using every possible combination of controls.
formula_builder("dependentVariable", "independentVariable", c("control1", "control2")); formula_builder("dependentVariable", "independentVariable", c("control1*control2"), fixedEffects="month");
formula_builder("dependentVariable", "independentVariable", c("control1", "control2")); formula_builder("dependentVariable", "independentVariable", c("control1*control2"), fixedEffects="month");
'paste_factory()' constructs the right hand side of the regression as a a string i.e. "x + control1 + control2".
paste_factory(controls, x)
paste_factory(controls, x)
controls |
A vector of strings containing control variable names. |
x |
A string containing the independent variable name. |
A string concatenating independent and control variables separated by '+'.
paste_factory(controls = c("control1", "control2"), x = "independentVariable");
paste_factory(controls = c("control1", "control2"), x = "independentVariable");
plotAIC() plots the Akaike information criterion across model specifications. Only available for nonlinear regression models.
plotAIC(sca_data, title = "", showIndex = TRUE, plotVars = TRUE)
plotAIC(sca_data, title = "", showIndex = TRUE, plotVars = TRUE)
sca_data |
A data frame returned by 'sca()' containing model estimates from the specification curve analysis. |
title |
A string to use as the plot title. Defaults to an empty string, '""'. |
showIndex |
A boolean indicating whether to label the model index on the the x-axis. Defaults to 'TRUE'. |
plotVars |
A boolean indicating whether to include a panel on the plot showing which variables are present in each model. Defaults to 'TRUE'. |
If 'plotVars = TRUE' returns a grid grob (i.e. the output of a call to 'grid.draw'). If 'plotVars = FALSE' returns a ggplot object.
plotAIC(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "AIC"); plotAIC(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), showIndex = FALSE, plotVars = FALSE); plotAIC(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2));
plotAIC(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "AIC"); plotAIC(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), showIndex = FALSE, plotVars = FALSE); plotAIC(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2));
plotControlDistributions() plots the distribution of coefficients for each control variable included in the model specifications.
plotControlDistributions(sca_data, title = "", type = "density")
plotControlDistributions(sca_data, title = "", type = "density")
sca_data |
A data frame returned by 'sca()' containing model estimates from the specification curve analysis. |
title |
A string to use as the plot title. Defaults to an empty string, '""'. |
type |
A string indicating what type of distribution plot to produce. When 'type = "density"' density plots are produced. When 'type = "hist"' or 'type = "histogram"' histograms are produced. Defaults to '"density"'. |
A ggplot object.
plotControlDistributions(sca_data = sca(y="Salnty", x="T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "Control Variable Distributions") plotControlDistributions(sca_data = sca(y = "Salnty", x="T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), type = "hist") plotControlDistributions(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2), type = "density")
plotControlDistributions(sca_data = sca(y="Salnty", x="T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "Control Variable Distributions") plotControlDistributions(sca_data = sca(y = "Salnty", x="T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), type = "hist") plotControlDistributions(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2), type = "density")
plotCurve() takes the data frame output of sca() and produces a ggplot of the independent variable's coefficient (as indicated in the call to sca()) across model specifications. By default a panel is added showing which control variables are present in each model. Note that the ggplot output by this function can only be further customized when 'plotVars = FALSE', i.e. when the control variable panel is not included.
plotCurve( sca_data, title = "", showIndex = TRUE, plotVars = TRUE, ylab = "Coefficient", plotSE = "bar" )
plotCurve( sca_data, title = "", showIndex = TRUE, plotVars = TRUE, ylab = "Coefficient", plotSE = "bar" )
sca_data |
A data frame returned by 'sca()' containing model estimates from the specification curve analysis. |
title |
A string to use as the plot title. Defaults to an empty string, '""'. |
showIndex |
A boolean indicating whether to label the model index on the the x-axis. Defaults to 'TRUE'. |
plotVars |
A boolean indicating whether to include a panel on the plot showing which variables are present in each model. Defaults to 'TRUE'. |
ylab |
A string to be used as the y-axis label. Defaults to '"Coefficient"'. |
plotSE |
A string indicating whether to display standard errors as bars or plots. For bars 'plotSE = "bar"', for ribbons 'plotSE = "ribbon"'. If any other value is supplied then no standard errors are included. Defaults to '"bar"'. |
If 'plotVars = TRUE' returns a grid grob (i.e. the output of a call to 'grid.draw'). If 'plotVars = FALSE' returns a ggplot object.
plotCurve(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA", "O2Sat"), data=bottles, progressBar=TRUE, parallel=FALSE), title = "Salinity and Temperature Models", showIndex = TRUE, plotVars = TRUE, ylab = "Coefficient value", plotSE = "ribbon"); plotCurve(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA*O2Sat", "ChlorA", "O2Sat"), data=bottles, progressBar=FALSE, parallel=FALSE), showIndex = TRUE, plotVars = TRUE, plotSE = "ribbon"); plotCurve(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA*NO3uM", "O2Sat", "ChlorA", "NO3uM"), data=bottles, progressBar = TRUE, parallel = TRUE, workers=2), plotSE="");
plotCurve(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA", "O2Sat"), data=bottles, progressBar=TRUE, parallel=FALSE), title = "Salinity and Temperature Models", showIndex = TRUE, plotVars = TRUE, ylab = "Coefficient value", plotSE = "ribbon"); plotCurve(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA*O2Sat", "ChlorA", "O2Sat"), data=bottles, progressBar=FALSE, parallel=FALSE), showIndex = TRUE, plotVars = TRUE, plotSE = "ribbon"); plotCurve(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA*NO3uM", "O2Sat", "ChlorA", "NO3uM"), data=bottles, progressBar = TRUE, parallel = TRUE, workers=2), plotSE="");
plotDeviance() plots the deviance of residuals across model specifications. Only available for linear regression models.
plotDeviance(sca_data, title = "", showIndex = TRUE, plotVars = TRUE)
plotDeviance(sca_data, title = "", showIndex = TRUE, plotVars = TRUE)
sca_data |
A data frame returned by 'sca()' containing model estimates from the specification curve analysis. |
title |
A string to use as the plot title. Defaults to an empty string, '""'. |
showIndex |
A boolean indicating whether to label the model index on the the x-axis. Defaults to 'TRUE'. |
plotVars |
A boolean indicating whether to include a panel on the plot showing which variables are present in each model. Defaults to 'TRUE'. |
If 'plotVars = TRUE' returns a grid grob (i.e. the output of a call to 'grid.draw'). If 'plotVars = FALSE' returns a ggplot object.
plotDeviance(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "Model Deviance"); plotDeviance(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), showIndex = FALSE, plotVars = FALSE); plotDeviance(sca_data = sca(y = "Salnty", x="T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2));
plotDeviance(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "Model Deviance"); plotDeviance(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), showIndex = FALSE, plotVars = FALSE); plotDeviance(sca_data = sca(y = "Salnty", x="T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2));
plotR2Adj() plots the adjusted R-squared across model specifications. Only available for linear regression models. Note when fixed effects are are specified the within adjusted R-squared is used (i.e. 'fixest::r2()' with 'type="war2"').
plotR2Adj(sca_data, title = "", showIndex = TRUE, plotVars = TRUE)
plotR2Adj(sca_data, title = "", showIndex = TRUE, plotVars = TRUE)
sca_data |
A data frame returned by 'sca()' containing model estimates from the specification curve analysis. |
title |
A string to use as the plot title. Defaults to an empty string, '""'. |
showIndex |
A boolean indicating whether to label the model index on the the x-axis. Defaults to 'TRUE'. |
plotVars |
A boolean indicating whether to include a panel on the plot showing which variables are present in each model. Defaults to 'TRUE'. |
If 'plotVars = TRUE' returns a grid grob (i.e. the output of a call to 'grid.draw'). If 'plotVars = FALSE' returns a ggplot object.
plotR2Adj(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "Adjusted R^2"); plotR2Adj(sca_data = sca(y="Salnty", x="T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), showIndex = FALSE, plotVars = FALSE); plotR2Adj(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2));
plotR2Adj(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "Adjusted R^2"); plotR2Adj(sca_data = sca(y="Salnty", x="T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), showIndex = FALSE, plotVars = FALSE); plotR2Adj(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2));
plotRMSE() plots the root mean square error across model specifications. Only available for linear regression models.
plotRMSE(sca_data, title = "", showIndex = TRUE, plotVars = TRUE)
plotRMSE(sca_data, title = "", showIndex = TRUE, plotVars = TRUE)
sca_data |
A data frame returned by 'sca()' containing model estimates from the specification curve analysis. |
title |
A string to use as the plot title. Defaults to an empty string, '""'. |
showIndex |
A boolean indicating whether to label the model index on the the x-axis. Defaults to 'TRUE'. |
plotVars |
A boolean indicating whether to include a panel on the plot showing which variables are present in each model. Defaults to 'TRUE'. |
If 'plotVars = TRUE' returns a grid grob (i.e. the output of a call to 'grid.draw'). If 'plotVars = FALSE' returns a ggplot object.
plotRMSE(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA", "O2Sat"), data=bottles, progressBar=TRUE, parallel=FALSE), title = "RMSE"); plotRMSE(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA*O2Sat"), data=bottles, progressBar=FALSE, parallel=FALSE), showIndex = FALSE, plotVars = FALSE); plotRMSE(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA*NO3uM", "O2Sat*NO3uM"), data=bottles, progressBar = TRUE, parallel=TRUE, workers=2));
plotRMSE(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA", "O2Sat"), data=bottles, progressBar=TRUE, parallel=FALSE), title = "RMSE"); plotRMSE(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA*O2Sat"), data=bottles, progressBar=FALSE, parallel=FALSE), showIndex = FALSE, plotVars = FALSE); plotRMSE(sca_data = sca(y="Salnty", x="T_degC", c("ChlorA*NO3uM", "O2Sat*NO3uM"), data=bottles, progressBar = TRUE, parallel=TRUE, workers=2));
plotVars() plots the variables included in each model specification in order of model index. Returns a ggplot object that can then be combined with the output of other functions like plotRMSE() if further customization of each plot is desired.
plotVars(sca_data, title = "", colorControls = FALSE)
plotVars(sca_data, title = "", colorControls = FALSE)
sca_data |
A data frame returned by 'sca()' containing model estimates from the specification curve analysis. |
title |
A string to use as the plot title. Defaults to an empty string, '""'. |
colorControls |
A boolean indicating whether to give each variable a color to improve readability. Defaults to 'FALSE'. |
A ggplot object.
plotVars(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "Model Variable Specifications"); plotVars(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), colorControls = TRUE); plotVars(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2));
plotVars(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE), title = "Model Variable Specifications"); plotVars(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*O2Sat"), data = bottles, progressBar = FALSE, parallel = FALSE), colorControls = TRUE); plotVars(sca_data = sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2));
sca() is the workhorse function of the package–this estimates models with every possible combination of the controls supplied and returns a data frame where each row contains the pertinent information and parameters for a given model by default. This data frame can then be input to plotCurve() or any other plotting function in the package. Alternatively, if 'returnFormulae = TRUE', it returns a list of formula objects with every possible combination of controls.
sca( y, x, controls, data, weights = NULL, family = "linear", link = NULL, fixedEffects = NULL, returnFormulae = FALSE, progressBar = TRUE, parallel = FALSE, workers = 2 )
sca( y, x, controls, data, weights = NULL, family = "linear", link = NULL, fixedEffects = NULL, returnFormulae = FALSE, progressBar = TRUE, parallel = FALSE, workers = 2 )
y |
A string containing the column name of the dependent variable in data. |
x |
A string containing the column name of the independent variable in data. |
controls |
A vector of strings containing the column names of the control variables in data. |
data |
A dataframe containing y, x, controls, and (optionally) the variables to be used for fixed effects or clustering. |
weights |
Optional string with the column name in 'data' that contains weights. |
family |
A string indicating the family of models to be used. Defaults to "linear" for OLS regression but supports all families supported by 'glm()'. |
link |
A string specifying the link function to be used for the model. Defaults to 'NULL' for OLS regression using 'lm()' or 'fixest::feols()' depending on whether fixed effects are supplied. Supports all link functions supported by the family parameter of 'glm()'. |
fixedEffects |
A string containing the column name of the variable in data desired for fixed effects. Defaults to NULL in which case no fixed effects are included. |
returnFormulae |
A boolean. When 'TRUE' a list of model formula objects is returned but the models are not estimated. Defaults to 'FALSE' in which case a dataframe of model results is returned. |
progressBar |
A boolean indicating whether the user wants a progress bar for model estimation. Defaults to 'TRUE'. |
parallel |
A boolean indicating whether to parallelize model estimation. Parallelization only offers a speed advantage when a large (> 1000) number of models is being estimated. Defaults to 'FALSE'. |
workers |
An integer indicating the number of workers to use for parallelization. Defaults to 2. |
When 'returnFormulae' is 'FALSE', a dataframe where each row contains the independent variable coefficient estimate, standard error, test statistic, p-value, model specification, and measures of model fit.
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE); sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2); sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = FALSE, returnFormulae = TRUE);
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar = TRUE, parallel = FALSE); sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2); sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat*NO3uM"), data = bottles, progressBar = TRUE, parallel = FALSE, returnFormulae = TRUE);
Takes in the data frame output by 'sca()' and returns a list with the data frame and labels to make a plot to visualize the controls included in each spec curve model.
scp(sca_data)
scp(sca_data)
sca_data |
A data frame output by 'sca'. |
A list containing a data frame, control coefficients, and control names.
scp(sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar=TRUE, parallel=FALSE));
scp(sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"), data = bottles, progressBar=TRUE, parallel=FALSE));
Takes in a data frame, regression formula, and bootstrapping parameters and estimates bootstrapped standard errors for models with and without fixed effects.
se_boot(data, formula, n_x, n_samples, sample_size, weights = NULL)
se_boot(data, formula, n_x, n_samples, sample_size, weights = NULL)
data |
A data frame containing the variables provided in 'formula'. |
formula |
A string containing a regression formula, with or without fixed effects. |
n_x |
An integer representing the number of independent variables in the regression. |
n_samples |
An integer indicating how many times the model should be estimated with a random subset of the data. |
sample_size |
An integer indicating how many observations are in each random subset of the data. |
weights |
Optional string with the column name in 'data' that contains weights. |
A named list containing bootstrapped standard errors for each coefficient.
se_boot(data = bottles, formula = "Salnty ~ T_degC + ChlorA + O2Sat", n_x = 3, n_samples = 4, sample_size = 300) se_boot(data = data.frame(x1 = rnorm(50000, mean=4, sd=10), x2 = rnorm(50000, sd=50), ID = rep(1:100, 500), area = rep(1:50, 1000), y = rnorm(50000)), formula = "y ~ x1 + x2 | ID", n_x = 2, n_samples = 10, sample_size = 1000)
se_boot(data = bottles, formula = "Salnty ~ T_degC + ChlorA + O2Sat", n_x = 3, n_samples = 4, sample_size = 300) se_boot(data = data.frame(x1 = rnorm(50000, mean=4, sd=10), x2 = rnorm(50000, sd=50), ID = rep(1:100, 500), area = rep(1:50, 1000), y = rnorm(50000)), formula = "y ~ x1 + x2 | ID", n_x = 2, n_samples = 10, sample_size = 1000)
se_compare() takes in a regression formula (with or without fixed effects), data, and the types of standard errors desired, including clustered, heteroskedasticity-consistent, and bootstrapped. It then returns a data frame with coefficient and standard error estimates for easy comparison and plotting.
se_compare( formula, data, weights = NULL, types = "all", cluster = NULL, clusteredOnly = FALSE, fixedEffectsOnly = FALSE, bootSamples = NULL, bootSampleSize = NULL )
se_compare( formula, data, weights = NULL, types = "all", cluster = NULL, clusteredOnly = FALSE, fixedEffectsOnly = FALSE, bootSamples = NULL, bootSampleSize = NULL )
formula |
A string containing a regression formula, with or without fixed effects. |
data |
A data frame containing the variables provided in 'formula' and any clustering variables passed to 'cluster'. |
weights |
Optional string with the column name in 'data' that contains weights. |
types |
A string or vector of strings specifying what types of standard errors are desired. Defaults to "all". The following types are supported for non-fixed effects models: With clustering: "HC0, "HC1", "HC2", "HC3". Without clustering: "iid" (i.e. normal standard errors), "HC0, "HC1", "HC2", "HC3", "HC4", "HC4m", "HC5", "bootstrapped". The following types are supported for fixed effects models: With clustering: "CL_FE" (clustered by fixed effects, i.e. the default standard errors reported by 'feols()' if no clusters are supplied), if clusters are supplied then the conventional clustered standard errors from 'feols()' are estimated for each clustering variable. Two- way clustered standard errors are not supported at this time. Without clustering: "HC0, "HC1", "HC2", "HC3", "HC4", "HC4m", "HC5", "bootstrapped". |
cluster |
A string or vector of strings specifying variables present in 'data' to be used for clustering standard errors. |
clusteredOnly |
A boolean indicating whether only standard errors with clustering should be estimated, defaults to 'FALSE'. |
fixedEffectsOnly |
A boolean indicating whether only standard errors for fixed effects models should be estimated, defaults to 'FALSE'. |
bootSamples |
An integer or vector of integers indicating how many times the model should be estimated with a random subset of the data. If a vector then every combination of 'bootSamples' and 'bootSampleSize' are estimated. |
bootSampleSize |
An integer or vector of integers indicating how many observations are in each random subset of the data. If a vector then every combination of 'bootSamples' and 'bootSampleSize' are estimated. |
A data frame where row represents an independent variable in the model and each column a type of standard error. Coefficient estimates for each variable are also included (column '"estimate"' for non-fixed effects model and column '"estimate_FE"' for fixed effects models). Columns are automatically named to specify the standard error type.
Some examples:
"iid" = normal standard errors, i.e. assuming homoskedasticity
"CL_FE" = standard errors clustered by fixed effects
"bootstrap_k8n300_FE" = bootstrapped standard errors for a fixed effects model where 'bootSamples = 8' and 'bootSampleSize = 300'
"CL_Depth_ID_FE" = standard errors clustered by the variable "Depth_ID" for a model with fixed effects
"HC0_Sta_ID" = HC0 standard errors clustered by the variable "Sta_ID"
Note: for fixed effects models the "(Intercept)" row will be all 'NA' because the intercept is not reported by 'feols()' when fixed effects are present.
se_compare(formula = "Salnty ~ T_degC + ChlorA + O2Sat | Sta_ID", data = bottles, types = "all", cluster = c("Depth_ID", "Sta_ID"), fixedEffectsOnly = FALSE, bootSamples=c(4, 8, 10), bootSampleSize=c(300, 500)) se_compare(formula = "Salnty ~ T_degC + ChlorA + O2Sat", data = bottles, types = "bootstrapped", bootSamples = c(8, 10), bootSampleSize = c(300, 500)) se_compare(formula = "Salnty ~ T_degC + ChlorA", data = bottles, types = c("HC0", "HC1", "HC3"))
se_compare(formula = "Salnty ~ T_degC + ChlorA + O2Sat | Sta_ID", data = bottles, types = "all", cluster = c("Depth_ID", "Sta_ID"), fixedEffectsOnly = FALSE, bootSamples=c(4, 8, 10), bootSampleSize=c(300, 500)) se_compare(formula = "Salnty ~ T_degC + ChlorA + O2Sat", data = bottles, types = "bootstrapped", bootSamples = c(8, 10), bootSampleSize = c(300, 500)) se_compare(formula = "Salnty ~ T_degC + ChlorA", data = bottles, types = c("HC0", "HC1", "HC3"))
Removes the 'AsIs' class attribute from the input. Taken from: <https://stackoverflow.com/a/12866609>
unAsIs(x)
unAsIs(x)
x |
An object with the 'AsIs' class attribute. |
An object without the 'AsIs' class attribute.
unAsIs(x = I(c(1:4)));
unAsIs(x = I(c(1:4)));