Title: | Computing Scales of Spatial Smoothing for Confounding Adjustment |
---|---|
Description: | Computes the effective range of a smoothing matrix, which is a measure of the distance to which smoothing occurs. This is motivated by the application of spatial splines for adjusting for unmeasured spatial confounding in regression models, but the calculation of effective range can be applied to smoothing matrices in other contexts. For algorithmic details, see Rainey and Keller (2024) "spconfShiny: an R Shiny application..." <doi:10.1371/journal.pone.0311440> and Keller and Szpiro (2020) "Selecting a Scale for Spatial Confounding Adjustment" <doi:10.1111/rssa.12556>. |
Authors: | Kayleigh Keller [aut, cre], Maddie Rainey [aut] |
Maintainer: | Kayleigh Keller <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2024-11-04 03:33:46 UTC |
Source: | CRAN |
Calculates the effective range for a spline basis matrix.
compute_effective_range( X, coords = X[, c("x", "y")], df = 3, nsamp = min(1000, nrow(X)), smoothedCurve = FALSE, newd = seq(0, 1, 100), scale_factor = 1, returnFull = FALSE, cl = NULL, namestem = "tprs", inds = NULL, verbose = FALSE, span = 0.1 ) compute_effective_range_nochecks( X, inds, newd, D, smoothedCurve = FALSE, scale_factor = 1, returnFull = FALSE, cl = NULL, span = 0.1 )
compute_effective_range( X, coords = X[, c("x", "y")], df = 3, nsamp = min(1000, nrow(X)), smoothedCurve = FALSE, newd = seq(0, 1, 100), scale_factor = 1, returnFull = FALSE, cl = NULL, namestem = "tprs", inds = NULL, verbose = FALSE, span = 0.1 ) compute_effective_range_nochecks( X, inds, newd, D, smoothedCurve = FALSE, scale_factor = 1, returnFull = FALSE, cl = NULL, span = 0.1 )
X |
Matrix of spline values. See |
coords |
Matrix of point coordinates. Defaults to the |
df |
Degrees of freedom for which effective range should be computed. |
nsamp |
Number of observations from |
smoothedCurve |
Should the effective range be computed using the procedure introduced by Keller and Szpiro, 2020, ( |
newd |
Distance values at which to make loess predictions. Should correspond to distances in the same units as |
scale_factor |
Factor by which range should be scaled. Often physical distance corresponding to resolution of grid. Defaults to 1, so that range is reported on the same scale as distance in |
returnFull |
Should the mean and median curves be returned ( |
cl |
Cluster object, or number of cluster instances to create. Defaults to no parallelization. |
namestem |
Stem of names of columns of X corresponding to evaluated splines.
Defaults to |
inds |
Indices of observations to use for computation. Passed to |
verbose |
Control printing of a df counter to console. |
span |
Passed to |
D |
Distance matrix for coordinates. |
Using the given spline basis and the inputted coordinates, the effective bandwidth is computed for the given degrees of freedom. This is accomplished by computing a distance matrix from the coordinates and a smoothing matrix from the basis.
Setting smoothedCurve = TRUE
(see Keller and Szpiro, 2020, for details), for each column of smoothing weights, a LOESS curve is fit to the smoothing weights as a function of the distances, and the distance where the curve first crosses zero is obtained.
Setting smoothedCurve = FALSE
(see Rainey and Keller, 2024, for details), for each column of smoothing weights, the smallest distance that corresponds with the first negative smoothing weight is obtained.
Then, for both procedures, the median of the obtained distances is reported as the effective bandwidth.
The columns of X
are selected by name, and so are assumed to have a numeric value in the column name that indicates the spline number. For example, the columns containing the first three splines should be "1", "2", and "3". IF there is a fixed character prefix, that can be supplied via namestem
. For example, if the columns are "s1", "s2", "s3", then set namestem="s"
.
The effective bandwidth for each value of df
. If returnFull = FALSE
, then this is a vector of the same length as df
. If returnFull = TRUE
and smoothedCurve = TRUE
, this is a list that additionally contains values of the pointwise median and mean of the smoothed curves.
Keller and Szpiro (2020). Selecting a scale for spatial confounding adjustment. Journal of the Royal Statistical Society, Series A https://doi.org/10.1111/rssa.12556.
Rainey and Keller (2024). spconfShiny: An R Shiny application for calculating the spatial scale of smoothing splines for point data. PLOS ONE https://doi.org/10.1371/journal.pone.0311440
M <- 16 tprs_df <- 10 si <- seq(0, 1, length=M+1)[-(M+1)] gridcoords <- expand.grid(x=si, y=si) tprsX <- computeTPRS(coords = gridcoords, maxdf = tprs_df+1) compute_effective_range(X=tprsX, coords=gridcoords, df=3:10, smoothedCurve=FALSE) xloc <- runif(n=100, min=0, max=10) X <- splines::ns(x=xloc, df=4, intercept=TRUE) colnames(X) <- paste0("s", 1:ncol(X)) xplot <- 0:10 compute_effective_range(X=X, coords=as.matrix(xloc), df=2:4, newd=xplot, namestem="s", smoothedCurve = TRUE)
M <- 16 tprs_df <- 10 si <- seq(0, 1, length=M+1)[-(M+1)] gridcoords <- expand.grid(x=si, y=si) tprsX <- computeTPRS(coords = gridcoords, maxdf = tprs_df+1) compute_effective_range(X=tprsX, coords=gridcoords, df=3:10, smoothedCurve=FALSE) xloc <- runif(n=100, min=0, max=10) X <- splines::ns(x=xloc, df=4, intercept=TRUE) colnames(X) <- paste0("s", 1:ncol(X)) xplot <- 0:10 compute_effective_range(X=X, coords=as.matrix(xloc), df=2:4, newd=xplot, namestem="s", smoothedCurve = TRUE)
Calculates a loess curve for the smoothing matrix entries, as a function of distance between points.
compute_lowCurve(S, D, newd, cl = NULL, span = 0.1)
compute_lowCurve(S, D, newd, cl = NULL, span = 0.1)
S |
Smoothing matrix, or a subset of columns from a smoothing matrix. |
D |
Distance matrix, or a subset of columns from a distance matrix. |
newd |
Distances to use for loess prediction. |
cl |
Cluster object, or number of cluster instances to create. Defaults to no parallelization. |
span |
Passed to |
For each column in S
, a loess curve is fit to the values as a function of the distances between points, which are taken from the columns of D
. Thus, the order of rows and columns in S
should match the order of rows and columns in D
.
For a large number of locations, this procedure may be somewhat slow. The cl
argument can be used to parallelize the operation using clusterMap
.
List with three elements: -by-
matrix, where
is the length of
newd
and is the number of columns in
S
; a vector of length giving the median curve value; a vector of length
giving the mean curve value.
xloc <- runif(n=100, min=0, max=10) X <- splines::ns(x=xloc, df=4, intercept=TRUE) S <- computeS(X) d <- as.matrix(dist(xloc)) xplot <- 0:10 lC <- compute_lowCurve(S, D=d, newd=xplot) matplot(xplot, lC$SCurve, type="l", col="black") points(xplot, lC$SCurveMedian, type="l", col="red")
xloc <- runif(n=100, min=0, max=10) X <- splines::ns(x=xloc, df=4, intercept=TRUE) S <- computeS(X) d <- as.matrix(dist(xloc)) xplot <- 0:10 lC <- compute_lowCurve(S, D=d, newd=xplot) matplot(xplot, lC$SCurve, type="l", col="black") points(xplot, lC$SCurveMedian, type="l", col="red")
Calculates the smoothing (or "hat") matrix from a design matrix.
computeS(x, inds = 1:nrow(x))
computeS(x, inds = 1:nrow(x))
x |
Matrix of spline values, assumed to have full rank. A data frame is coerced into a matrix. |
inds |
Column indices of smoothing matrix to return (corresponding to rows in |
Given a matrix X
of spline values, this computes S=X(X'X)^(-1)X'
. When x
has many rows, this can be quite large. The inds
argument can be used to return a subset of columns from S
.
An -by-
matrix, where
is the length of
inds
and is the number of rows in
x
.
# Simple design matrix case X <- cbind(1, rep(c(0, 1), each=4)) S <- computeS(X) # More complex example xloc <- runif(n=100, min=0, max=10) X <- splines::ns(x=xloc, df=4, intercept=TRUE) S <- computeS(X) S2 <- computeS(X, inds=1:4)
# Simple design matrix case X <- cbind(1, rep(c(0, 1), each=4)) S <- computeS(X) # More complex example xloc <- runif(n=100, min=0, max=10) X <- splines::ns(x=xloc, df=4, intercept=TRUE) S <- computeS(X) S2 <- computeS(X, inds=1:4)
Compute TPRS basis for given spatial coordinates
computeTPRS(coords, maxdf, rearrange = TRUE, intercept = FALSE) arrangeTPRS(tprs, intercept = FALSE)
computeTPRS(coords, maxdf, rearrange = TRUE, intercept = FALSE) arrangeTPRS(tprs, intercept = FALSE)
coords |
Data frame containing the coordinates. |
maxdf |
Largest number of splines to include in TPRS basis |
rearrange |
Logical indicator of whether to rearrange the columns of TPRS basis. |
intercept |
Logical indicator of whether or not to remove the intercept column from the basis when |
tprs |
Matrix of TPRS basis values (from |
computeTPRS
creates a thin-plate regression spline (TPRS) basis from a two-dimensional set of coordinate locations using the mgcv
package.
The output from mgcv
is structured to have the linear terms as the last columns of the matrix. Use arrangeTPRS()
to arrange the matrix columns to be in order of increasing resolution. Specifically, it moves the last two columns to the left of the matrix and the third-from last column, which corresponds to the intercept, is optionally removed.
An -by-
matrix of spline basis functions where
is the number of rows in
coords
and is equal to
maxdf
x <- runif(100) y <- runif(100) mat <- computeTPRS(data.frame(x, y), maxdf=4)
x <- runif(100) y <- runif(100) mat <- computeTPRS(data.frame(x, y), maxdf=4)
Calculates the zero of a function by linear interpolation between the first two points either side of zero.
find_first_zero_cross(x)
find_first_zero_cross(x)
x |
Function values, assumed to be ordered |
Index of first value of x
that lies below 0. Decimal values will be returned using a simple interpolation of the two values straddling 0.
find_zeros_cross
, compute_effective_range
For a set of distance and smoothing matrix values, determines the smallest distance that corresponds with negative value for each column of the smoothing matrix.
find_zeros_cross(D, S)
find_zeros_cross(D, S)
D |
Distance matrix, or a subset of columns from a distance matrix. |
S |
Smoothing matrix, or a subset of columns from a smoothing matrix. |
Vector of length equal to the number of columns in D
and S
. Each value is the smallest observed distance (from a column of D
) that has a negative value in the corresponding column of S
.
Wrapper function for fitting and predicting from loess()
.
fitLoess(y, x, newx = x, span = 0.5, ...)
fitLoess(y, x, newx = x, span = 0.5, ...)
y |
Dependent variable values |
x |
Independent variable values |
newx |
Values of x to use for prediction. |
span |
Controls the amount of smoothing. Passed to |
... |
Additional arguments passed to |
A vector of the same length of newx
providing the predictions from a loess smooth.
x <- seq(0, 5, length=50) y <- cos(4*x) + rnorm(50, sd=0.5) xplot <- seq(0, 5, length=200) lfit <- fitLoess(y=y, x=x, newx=xplot, span=0.2) plot(x, y) points(xplot, lfit, type="l")
x <- seq(0, 5, length=50) y <- cos(4*x) + rnorm(50, sd=0.5) xplot <- seq(0, 5, length=200) lfit <- fitLoess(y=y, x=x, newx=xplot, span=0.2) plot(x, y) points(xplot, lfit, type="l")