Title: | Estimations using Conley Standard Errors |
---|---|
Description: | Functions calculating Conley (1999) <doi:10.1016/S0304-4076(98)00084-0> standard errors. The package started by merging and extending multiple packages and other published scripts on this econometric technique. It strongly emphasizes computational optimization. Details are available in the function documentation and in the vignette. |
Authors: | Christian Düben [aut, cre], Richard Bluhm [cph], Luis Calderon [cph], Darin Christensen [cph], Timothy Conley [cph], Thiemo Fetzer [cph], Leander Heldring [cph] |
Maintainer: | Christian Düben <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.7 |
Built: | 2024-12-24 06:58:59 UTC |
Source: | CRAN |
This function estimates ols, logit, probit, and poisson models with Conley standard errors.
conleyreg( formula, data, dist_cutoff, model = c("ols", "logit", "probit", "poisson"), unit = NULL, time = NULL, lat = NULL, lon = NULL, kernel = c("bartlett", "uniform"), lag_cutoff = 0, intercept = TRUE, verbose = TRUE, ncores = NULL, par_dim = c("cross-section", "time", "r", "cpp"), dist_comp = NULL, crs = NULL, st_distance = FALSE, dist_which = NULL, sparse = FALSE, batch = TRUE, batch_ram_opt = NULL, float = FALSE, rowwise = FALSE, reg_ram_opt = FALSE, dist_mat = NULL, dist_mat_conv = TRUE, vcov = FALSE, gof = FALSE )
conleyreg( formula, data, dist_cutoff, model = c("ols", "logit", "probit", "poisson"), unit = NULL, time = NULL, lat = NULL, lon = NULL, kernel = c("bartlett", "uniform"), lag_cutoff = 0, intercept = TRUE, verbose = TRUE, ncores = NULL, par_dim = c("cross-section", "time", "r", "cpp"), dist_comp = NULL, crs = NULL, st_distance = FALSE, dist_which = NULL, sparse = FALSE, batch = TRUE, batch_ram_opt = NULL, float = FALSE, rowwise = FALSE, reg_ram_opt = FALSE, dist_mat = NULL, dist_mat_conv = TRUE, vcov = FALSE, gof = FALSE )
formula |
regression equation as formula or character string. Avoid interactions and transformations inside the equation. I.e. use
|
data |
input data. Either (i) in non-spatial data frame format (includes tibbles and data tables) with columns denoting coordinates or (ii) in sf format. In
case of an sf object, all non-point geometry types are converted to spatial points, based on the feature's centroid. When using a non-spatial data frame format
with projected, i.e. non-longlat, coordinates, |
dist_cutoff |
the distance cutoff in km |
model |
the applied model. Either |
unit |
the variable identifying the cross-sectional dimension. Only needs to be specified, if data is not cross-sectional. Assumes that units do not change their location over time. |
time |
the variable identifying the time dimension. Only needs to be specified, if data is not cross-sectional. |
lat |
the variable specifying the latitude |
lon |
the variable specifying the longitude |
kernel |
the kernel applied within the radius. Either |
lag_cutoff |
the cutoff along the time dimension. Defaults to 0, meaning that standard errors are only adjusted cross-sectionally. |
intercept |
logical specifying whether to include an intercept. Defaults to |
verbose |
logical specifying whether to print messages on intermediate estimation steps. Defaults to |
ncores |
the number of CPU cores to use in the estimations. Defaults to the machine's number of CPUs. |
par_dim |
the dimension along which the function parallelizes in panel applications. Can be set to |
dist_comp |
choice between |
crs |
the coordinate reference system, if the data is projected. Object of class crs or input string to |
st_distance |
logical specifying whether distances should be computed via |
dist_which |
the type of distance to use when |
sparse |
logical specifying whether to use sparse rather than dense (regular) matrices in distance computations. Defaults to |
batch |
logical specifying whether distances are inserted into a sparse matrix element by element ( |
batch_ram_opt |
the degree to which batch insertion should be optimized for RAM usage. Can be set to one out of the three levels: |
float |
logical specifying whether distance matrices should use the float ( |
rowwise |
logical specifying whether to store individual rows of the distance matrix only, instead of the full matrix. If |
reg_ram_opt |
logical specifying whether the regression should be optimized for RAM usage. Defaults to |
dist_mat |
a distance matrix. Pre-computing a distance matrix and passing it to this argument is only more efficient than having |
dist_mat_conv |
logical specifying whether to convert the distance matrix to a list, if the panel turns out to be unbalanced because of missing values. This
setting is only relevant, if you enter a balanced panel's distance matrix not derived via |
vcov |
logical specifying whether to return variance-covariance matrix ( |
gof |
logical specifying whether to return goodness of fit measures. Defaults to |
This code is an extension and modification of earlier Conley standard error implementations by (i) Richard Bluhm, (ii) Luis Calderon and Leander Heldring, (iii) Darin Christensen and Thiemo Fetzer, and (iv) Timothy Conley. Results vary across implementations because of different distance functions and buffer shapes.
This function has reasonable defaults. If your machine has insufficent RAM to allocate the default dense matrices, try sparse matrices. If the RAM error persists,
try setting a lower dist_cutoff
, use floats, select a uniform kernel, experiment with batch_ram_opt
, reg_ram_opt
, or batch
.
Consult the vignette, vignette("conleyreg_introduction", "conleyreg")
, for a more extensive discussion.
Returns a lmtest::coeftest
matrix of coefficient estimates and standard errors by default. Can be changed to the variance-covariance matrix by
specifying vcov = TRUE
.
Calderon L, Heldring L (2020).
“Spatial standard errors for several commonly used M-estimators.”
Mimeo.
Conley TG (1999).
“GMM estimation with cross sectional dependence.”
Journal of Econometrics, 92(1), 1-45.
Conley TG (2008).
“Spatial Econometrics.”
In Durlauf SN, Blume LE (eds.), Microeconometrics, 303-313.
London: Palgrave Macmillan.
## Not run: # Generate cross-sectional example data data <- rnd_locations(100, output_type = "data.frame") data$y <- sample(c(0, 1), 100, replace = TRUE) data$x1 <- stats::runif(100, -50, 50) # Estimate ols model with Conley standard errors using a 1000 km radius conleyreg(y ~ x1, data, 1000, lat = "lat", lon = "lon") # Estimate logit model conleyreg(y ~ x1, data, 1000, "logit", lat = "lat", lon = "lon") # Estimate ols model with fixed effects data$x2 <- sample(1:5, 100, replace = TRUE) conleyreg(y ~ x1 | x2, data, 1000, lat = "lat", lon = "lon") # Estimate ols model using panel data data$time <- rep(1:10, each = 10) data$unit <- rep(1:10, times = 10) conleyreg(y ~ x1, data, 1000, unit = "unit", time = "time", lat = "lat", lon = "lon") # Estimate same model with an sf object of another projection as input data <- sf::st_as_sf(data, coords = c("lon", "lat"), crs = 4326) |> sf::st_transform(crs = "+proj=aeqd") conleyreg(y ~ x1, data, 1000) ## End(Not run)
## Not run: # Generate cross-sectional example data data <- rnd_locations(100, output_type = "data.frame") data$y <- sample(c(0, 1), 100, replace = TRUE) data$x1 <- stats::runif(100, -50, 50) # Estimate ols model with Conley standard errors using a 1000 km radius conleyreg(y ~ x1, data, 1000, lat = "lat", lon = "lon") # Estimate logit model conleyreg(y ~ x1, data, 1000, "logit", lat = "lat", lon = "lon") # Estimate ols model with fixed effects data$x2 <- sample(1:5, 100, replace = TRUE) conleyreg(y ~ x1 | x2, data, 1000, lat = "lat", lon = "lon") # Estimate ols model using panel data data$time <- rep(1:10, each = 10) data$unit <- rep(1:10, times = 10) conleyreg(y ~ x1, data, 1000, unit = "unit", time = "time", lat = "lat", lon = "lon") # Estimate same model with an sf object of another projection as input data <- sf::st_as_sf(data, coords = c("lon", "lat"), crs = 4326) |> sf::st_transform(crs = "+proj=aeqd") conleyreg(y ~ x1, data, 1000) ## End(Not run)
This function estimates the distance matrix separately from Conley standard errors. Such step can be helpful when running multiple Conley standard error estimations
based on the same distance matrix. A pre-requisite of using this function is that the data must not be modified between applying this function and inserting the
results into conleyreg
.
dist_mat( data, unit = NULL, time = NULL, lat = NULL, lon = NULL, dist_comp = NULL, dist_cutoff = NULL, crs = NULL, verbose = TRUE, ncores = NULL, par_dim = c("cross-section", "time", "r", "cpp"), sparse = FALSE, batch = TRUE, batch_ram_opt = NULL, dist_round = FALSE, st_distance = FALSE, dist_which = NULL )
dist_mat( data, unit = NULL, time = NULL, lat = NULL, lon = NULL, dist_comp = NULL, dist_cutoff = NULL, crs = NULL, verbose = TRUE, ncores = NULL, par_dim = c("cross-section", "time", "r", "cpp"), sparse = FALSE, batch = TRUE, batch_ram_opt = NULL, dist_round = FALSE, st_distance = FALSE, dist_which = NULL )
data |
input data. Either (i) in non-spatial data frame format (includes tibbles and data tables) with columns denoting coordinates or (ii) in sf format. In
case of an sf object, all non-point geometry types are converted to spatial points, based on the feature's centroid. When using a non-spatial data frame format
the with projected, i.e. non-longlat, coordinates, |
unit |
the variable identifying the cross-sectional dimension. Only needs to be specified, if data is not cross-sectional. Assumes that units do not change their location over time. |
time |
the variable identifying the time dimension. Only needs to be specified, if data is not cross-sectional. |
lat |
the variable specifying the latitude |
lon |
the variable specifying the longitude |
dist_comp |
choice between |
dist_cutoff |
the distance cutoff in km. If not specified, the distances matrices contain all bilateral distances. If specified, the cutoff most be as least
as large as the largest distance cutoff in the Conley standard error corrections in which you use the resulting matrix. If you e.g. specify distance cutoffs of
100, 200, and 500 km in the subsequent |
crs |
the coordinate reference system, if the data is projected. Object of class crs or input string to |
verbose |
logical specifying whether to print messages on intermediate estimation steps. Defaults to |
ncores |
the number of CPU cores to use in the estimations. Defaults to the machine's number of CPUs. |
par_dim |
the dimension along which the function parallelizes in unbalanced panel applications. Can be set to |
sparse |
logical specifying whether to use sparse rather than dense (regular) matrices in distance computations. Defaults to |
batch |
logical specifying whether distances are inserted into a sparse matrix element by element ( |
batch_ram_opt |
the degree to which batch insertion should be optimized for RAM usage. Can be set to one out of the three levels: |
dist_round |
logical specifying whether to round distances to full kilometers. This further reduces memory consumption and can be a solution when even sparse matrices cannot accomodate the data. Note, though, that this rounding introduces a bias. |
st_distance |
logical specifying whether distances should be computed via |
dist_which |
the type of distance to use when |
This function runs the distance matrix estimations separately from the Conley standard error correction. You can pass the resulting object to the
dist_mat
argument in conleyreg
, skipping the distance matrix computations and various checks in that function. Pre-computing the distance matrix
is only more efficient than deriving it via conleyreg
when estimating various models that use the same distance matrices. The input data must not be
modified between calling this function and inserting the results into conleyreg
. Do not reorder the observations, add or delete variables, or undertake
any other operation on the data.
Returns an object of S3 class conley_dist
. It contains modified distance matrices, the used dist_cutoff
, a sparse matrix identifier, and
information on the potential panel structure. In the cross-sectional case and the balanced panel case, the distances are stored in one matrix, while in unbalanced
panel applications, distances come as a list of matrices. The function optimizes the distance matrices with respect to computational performance, setting
distances beyond dist_cutoff
to zero and actual off-diagonal zeros to NaN. Hence, these objects are only to be used in conleyreg
.
## Not run: # Generate cross-sectional example data data <- rnd_locations(100, output_type = "data.frame") data$y <- sample(c(0, 1), 100, replace = TRUE) data$x1 <- stats::runif(100, -50, 50) # Compute distance matrix in cross-sectional case dm <- dist_mat(data, lat = "lat", lon = "lon") # Compute distance matrix in panel case data$time <- rep(1:10, each = 10) data$unit <- rep(1:10, times = 10) dm <- dist_mat(data, unit = "unit", time = "time", lat = "lat", lon = "lon") # Use distance matrix in conleyreg function conleyreg(y ~ x1, data, 1000, dist_mat = dm) ## End(Not run)
## Not run: # Generate cross-sectional example data data <- rnd_locations(100, output_type = "data.frame") data$y <- sample(c(0, 1), 100, replace = TRUE) data$x1 <- stats::runif(100, -50, 50) # Compute distance matrix in cross-sectional case dm <- dist_mat(data, lat = "lat", lon = "lon") # Compute distance matrix in panel case data$time <- rep(1:10, each = 10) data$unit <- rep(1:10, times = 10) dm <- dist_mat(data, unit = "unit", time = "time", lat = "lat", lon = "lon") # Use distance matrix in conleyreg function conleyreg(y ~ x1, data, 1000, dist_mat = dm) ## End(Not run)
This function draws random locations in longlat format.
rnd_locations( nobs, xmin = -180, xmax = 180, ymin = -90, ymax = 90, output_type = c("data.table", "data.frame", "sf") )
rnd_locations( nobs, xmin = -180, xmax = 180, ymin = -90, ymax = 90, output_type = c("data.table", "data.frame", "sf") )
nobs |
number of observations |
xmin |
minimum longitude |
xmax |
maximum longitude |
ymin |
minimum latitude |
ymax |
maximum latitude |
output_type |
type of output object. Either |
By default, this function draws a global sample of random locations. You can restrict it to a certain region via specifying xmin
, xmax
, ymin
,
and ymax
. The function draws from a uniform spatial distribution that assumes the planet to be a perfect sphere. The spherical assumption is common in GIS
functions, but deviates from Earth's exact shape.
Returns a data.table, data.frame, or sf object of unprojected (longlat) points.
data <- rnd_locations(1000)
data <- rnd_locations(1000)