--- title: "A Short Introduction to splines2" author: Wenjie Wang date: "`r Sys.Date()`" bibliography: - ../inst/bib/splines2.bib vignette: > %\VignetteIndexEntry{A Short Introduction to splines2} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: rmarkdown::html_vignette: number_sections: yes toc: yes --- ```{r setup, echo = FALSE} knitr::opts_knit$set(global.par = TRUE) knitr::opts_chunk$set(fig.width = 7, fig.height = 4) ``` ```{r set-par, echo = FALSE} library(graphics) par(mar = c(2.5, 2.5, 0.5, 0.1), mgp = c(1.5, 0.5, 0)) ```
# Introduction The R package **splines2** is intended to be a user-friendly supplementary package to the base package **splines**. It provides functions to construct a variety of regression spline basis functions that are not available from **splines**. Most functions have a very similar user interface with the function `splines::bs()`. More specifically, **splines2** allows users to construct the basis functions of - B-splines - M-splines - I-splines - C-splines - periodic splines - natural cubic splines - generalized Bernstein polynomials along with their integrals (except C-splines) and derivatives of given order by closed-form recursive formulas. Compared to **splines**, the package **splines2** provides convenient interfaces for spline derivatives with consistent handling on `NA`'s. Most of the implementations are in *C++* with the help of **Rcpp** and **RcppArmadillo** since v0.3.0, which boosted the computational performance. In the remainder of this vignette, we illustrate the basic usage of most functions in the package through examples. We refer readers to [Wang and Yan (2021)](https://dx.doi.org/10.6339/21-JDS1020) for a more formal introduction to the package with applications to shape-restricted regression. See the package manual for more details about function usage. ```{r load-lib} library(splines2) packageVersion("splines2") ```
# B-splines {#bSpline} ## B-spline Basis Functions The `bSpline()` function generates the basis matrix for B-splines and extends the function `bs()` of the package **splines** by providing 1) the piece-wise constant basis functions when `degree = 0`, 2) the derivatives of basis functions for a positive `derivs`, 3) the integrals of basis functions if `integral = TRUE`, 4) periodic basis functions based on B-splines if `periodic = TRUE`. One example of linear B-splines with three internal knots is as follows: ```{r bSpline, fig.cap="B-splines of degree one with three internal knots placed at 0.3, 0.5, and 0.6."} knots <- c(0.3, 0.5, 0.6) x <- seq(0, 1, 0.01) bsMat <- bSpline(x, knots = knots, degree = 1, intercept = TRUE) plot(bsMat, mark_knots = "all") ``` ## Integrals and Derivatives of B-splines For convenience, the package also provides functions `ibs()` and `dbs()` for constructing the B-spline integrals and derivatives, respectively. Two toy examples are as follows: ```{r ibs, fig.cap="Piecewise linear B-splines (left) and their integrals (right)."} ibsMat <- ibs(x, knots = knots, degree = 1, intercept = TRUE) op <- par(mfrow = c(1, 2)) plot(bsMat, mark_knots = "internal") plot(ibsMat, mark_knots = "internal") abline(h = c(0.15, 0.2, 0.25), lty = 2, col = "gray") ``` ```{r dbs, fig.cap="Cubic B-spline (left) and their first derivative (right)."} bsMat <- bSpline(x, knots = knots, intercept = TRUE) dbsMat <- dbs(x, knots = knots, intercept = TRUE) plot(bsMat, mark_knots = "internal") plot(dbsMat, mark_knots = "internal") ``` We may also obtain the derivatives easily by the `deriv()` method as follows: ```{r dbsMat} is_equivalent <- function(a, b) { all.equal(a, b, check.attributes = FALSE) } stopifnot(is_equivalent(dbsMat, deriv(bsMat))) ``` ## Periodic B-splines The function `bSpline()` produces periodic spline basis functions following @piegl1997nurbs [chapter 12] when `periodic = TRUE` is specified. Different from the regular basis functions, the `x` is allowed to be placed outside the boundary and the `Boundary.knots` defines the cyclic interval. For instance, one may obtain the periodic cubic B-spline basis functions with cyclic interval (0, 1) as follows: ```{r pbs} px <- seq(0, 3, 0.01) pbsMat <- bSpline(px, knots = knots, Boundary.knots = c(0, 1), intercept = TRUE, periodic = TRUE) ipMat <- ibs(px, knots = knots, Boundary.knots = c(0, 1), intercept = TRUE, periodic = TRUE) dp1Mat <- deriv(pbsMat) dp2Mat <- deriv(pbsMat, derivs = 2) par(mfrow = c(1, 2)) plot(pbsMat, ylab = "Periodic B-splines", mark_knots = "boundary") plot(ipMat, ylab = "The integrals", mark_knots = "boundary") plot(dp1Mat, ylab = "The 1st derivatives", mark_knots = "boundary") plot(dp2Mat, ylab = "The 2nd derivatives", mark_knots = "boundary") ``` For reference, the corresponding integrals and derivatives are also visualized.
# M-Splines {#mSpline} ## M-spline Basis Functions M-splines [@ramsay1988monotone] can be considered the normalized version of B-splines with unit integral within boundary knots. An example given by @ramsay1988monotone was a quadratic M-splines with three internal knots placed at 0.3, 0.5, and 0.6. The default boundary knots are the range of `x`, and thus 0 and 1 in this example. ```{r mSpline, fig.cap = "Quadratic M-spline with three internal knots placed at 0.3, 0.5, and 0.6."} msMat <- mSpline(x, knots = knots, degree = 2, intercept = TRUE) par(op) plot(msMat, mark_knots = "all") ``` The derivative of the given order of M-splines can be obtained by specifying a positive integer to argument `dervis` of `mSpline()`. Similarly, for an existing `mSpline` object generated by `mSpline()`, one can use the `deriv()` method for derivaitives. For example, the first derivative of the M-splines given in the previous example can be obtained equivalently as follows: ```{r mSpline-derivs} dmsMat1 <- mSpline(x, knots = knots, degree = 2, intercept = TRUE, derivs = 1) dmsMat2 <- deriv(msMat) stopifnot(is_equivalent(dmsMat1, dmsMat2)) ``` ## Periodic M-Splines The `mSpline()` function produces periodic splines based on M-spline basis functions when `periodic = TRUE` is specified. The `Boundary.knots` defines the cyclic interval, which is the same with the periodic B-splines. ```{r pms-basis, fig.cap = "Cubic periodic M-splines."} pmsMat <- mSpline(px, knots = knots, intercept = TRUE, periodic = TRUE, Boundary.knots = c(0, 1)) plot(pmsMat, ylab = "Periodic Basis", mark_knots = "boundary") ``` We may still specify the argument `derivs` in `mSpline()` or use the corresponding `deriv()` method to obtain the derivatives when `periodic = TRUE`. ```{r pms-deriv, fig.cap = "The first derivatives of the periodic M-splines."} dpmsMat <- deriv(pmsMat) plot(dpmsMat, ylab = "The 1st derivatives", mark_knots = "boundary") ``` Furthermore, we can obtain the integrals of the periodic M-splines by specifying `integral = TRUE`. The integral is integrated from the left boundary knot. ```{r pms-integral, fig.cap = "The integrals of the periodic M-splines."} ipmsMat <- mSpline(px, knots = knots, intercept = TRUE, periodic = TRUE, Boundary.knots = c(0, 1), integral = TRUE) plot(ipmsMat, ylab = "Integrals", mark_knots = "boundary") abline(h = seq.int(0, 3), lty = 2, col = "gray") ```
# I-Splines {#iSpline} I-splines [@ramsay1988monotone] are simply the integral of M-splines and thus monotonically nondecreasing with unit maximum value. A monotonically nondecreasing (nonincreasing) function can be fitted by a linear combination of I-spline basis functions with nonnegative (nonpositive) coefficients *plus a constant*, where the coefficient of the constant is unconstrained. The example given by @ramsay1988monotone was the I-splines corresponding to the quadratic M-splines with three internal knots placed at 0.3, 0.5, and 0.6. Notice that the degree of I-splines is defined from the associated M-splines instead of their polynomial degree. ```{r iSpline, fig.cap = "I-splines of degree two with three internal knots placed at 0.3, 0.5, and 0.6."} isMat <- iSpline(x, knots = knots, degree = 2, intercept = TRUE) plot(isMat, mark_knots = "internal") ``` The corresponding M-spline basis matrix can be obtained easily as the first derivatives of the I-splines by the `deriv()` method. ```{r msMat} stopifnot(is_equivalent(msMat, deriv(isMat))) ``` We may specify `derivs = 2` in the `deriv()` method for the second derivatives of the I-splines, which are equivalent to the first derivatives of the corresponding M-splines. ```{r dmsMat} dmsMat3 <- deriv(isMat, 2) stopifnot(is_equivalent(dmsMat1, dmsMat3)) ```
# C-Splines {#cSpline} Convex splines [@meyer2008inference] called C-splines are scaled integrals of I-splines with unit maximum value at the right boundary knot. @meyer2008inference applied C-splines to shape-restricted regression analysis. The monotone (nondecreasing) property of I-spines ensures the convexity of C-splines. A convex regression function can be estimated using linear combinations of the C-spline basis functions with nonnegative coefficients, plus an unconstrained linear combination of a constant and an identity function $g(x)=x$. If the underlying regression function is both increasing and convex, the coefficient on the identity function is restricted to be nonnegative as well. We may specify the argument `scale = FALSE` in the function `cSpline()` to disable the scaling of the integrals of I-splines. Then the actual integrals of the corresponding I-splines will be returned. If `scale = TRUE` (by default), each C-spline basis is scaled to have unit height at the right boundary knot. ```{r cSpline-scaled, fig.cap = "C-splines of degree two with three internal knots placed at 0.3, 0.5, and 0.6."} csMat1 <- cSpline(x, knots = knots, degree = 2, intercept = TRUE) plot(csMat1) abline(h = 1, v = knots, lty = 2, col = "gray") ``` Similarly, the `deriv()` method can be used to obtain the derivatives. A nested call of `deriv()` is supported for derivatives of a higher order. However, the argument `derivs` of the `deriv()` method can be specified directly for better computational performance. For example, the first and second derivatives can be obtained by the following equivalent approaches, respectively. ```{r cSpline-not-scaled} csMat2 <- cSpline(x, knots = knots, degree = 2, intercept = TRUE, scale = FALSE) stopifnot(is_equivalent(isMat, deriv(csMat2))) stopifnot(is_equivalent(msMat, deriv(csMat2, 2))) stopifnot(is_equivalent(msMat, deriv(deriv(csMat2)))) ``` # Generalized Bernstein Polynomials The Bernstein polynomials are equivalent to B-splines without internal knots and have also been applied to shape-constrained regression analysis [e.g., @wang2012csda]. The $i$-th basis of the generalized Bernstein polynomials of degree $n$ over $[a, b]$ is defined as follows: $$ B_i^n(x)=\frac{1}{(b-a)^n}{n\choose i}(x-a)^i (b-x)^{n-i},~i\in\{0,\ldots,n\}, $$ where $a\le x\le b$. It reduces to regular Bernstein polynomials defined over $[0, 1]$ when $a = 0$ and $b = 1$. We may obtain the basis matrix of the generalized using the function `bernsteinPoly()`. For example, the Bernstein polynomials of degree 4 over $[0, 1]$ and is generated as follows: ```{r bp-1, fig.cap = "Bernstein polynomials of degree 4 over [0, 1] (left) and the generalized version over [- 1, 1] (right)."} x1 <- seq.int(0, 1, 0.01) x2 <- seq.int(- 1, 1, 0.01) bpMat1 <- bernsteinPoly(x1, degree = 4, intercept = TRUE) bpMat2 <- bernsteinPoly(x2, degree = 4, intercept = TRUE) par(mfrow = c(1, 2)) plot(bpMat1) plot(bpMat2) ``` In addition, we may specify `integral = TRUE` or `derivs = 1` in `bernsteinPoly()` for their integrals or first derivatives, respectively. ```{r bp-2, fig.height=6, fig.cap = "The integrals (upper panel) and the first derivatives (lower panel) of Bernstein polynomials of degree 4."} ibpMat1 <- bernsteinPoly(x1, degree = 4, intercept = TRUE, integral = TRUE) ibpMat2 <- bernsteinPoly(x2, degree = 4, intercept = TRUE, integral = TRUE) dbpMat1 <- bernsteinPoly(x1, degree = 4, intercept = TRUE, derivs = 1) dbpMat2 <- bernsteinPoly(x2, degree = 4, intercept = TRUE, derivs = 1) par(mfrow = c(2, 2)) plot(ibpMat1, ylab = "Integrals") plot(ibpMat2, ylab = "Integrals") plot(dbpMat1, ylab = "Derivatives") plot(dbpMat2, ylab = "Derivatives") ``` Similarly, we may also use the `deriv()` method to get derivatives of an existing `bernsteinPoly` object. ```{r bp-deriv} stopifnot(is_equivalent(dbpMat1, deriv(bpMat1))) stopifnot(is_equivalent(dbpMat2, deriv(bpMat2))) stopifnot(is_equivalent(dbpMat1, deriv(ibpMat1, 2))) stopifnot(is_equivalent(dbpMat2, deriv(ibpMat2, 2))) ```
# Natural Cubic Splines ## Nonnegative Natural Cubic Basis Functions The package provides two variants of the natural cubic splines that can be constructed by `naturalSpline()` and `nsk()`, respectively, both of which are different from `splines::ns()`. The `naturalSpline()` function returns nonnegative basis functions (within the boundary) for natural cubic splines by utilizing a closed-form null space derived from the second derivatives of cubic B-splines. When `integral = TRUE`, the function `naturalSpline()` returns the integral of each natural spline basis. ```{r ns-basis, fig.cap = "Nonnegative natural cubic splines (left) and corresponding integrals (right)."} nsMat <- naturalSpline(x, knots = knots, intercept = TRUE) insMat <- naturalSpline(x, knots = knots, intercept = TRUE, integral = TRUE) par(mfrow = c(1, 2)) plot(nsMat, ylab = "Basis") plot(insMat, ylab = "Integrals") stopifnot(is_equivalent(nsMat, deriv(insMat))) ``` Similarly, one may directly specify the argument `derivs` in `naturalSpline()` or use the corresponding `deriv()` method to obtain the derivatives of spline basis functions. ```{r ns-deriv, fig.cap = "The derivatives of natural cubic splines."} d1nsMat <- naturalSpline(x, knots = knots, intercept = TRUE, derivs = 1) d2nsMat <- deriv(nsMat, 2) matplot(x, d1nsMat, type = "l", ylab = "The 1st derivatives") matplot(x, d2nsMat, type = "l", ylab = "The 2nd derivatives") ``` ## Natural Cubic Basis Functions with Unit Heights at Knots The function `nsk()` produces another variant of natural cubic splines, where only one of the spline basis functions is nonzero with unit height at every boundary and internal knot. As a result, the coefficients of the basis functions are the values of the spline function at the knots, which makes it more straightforward to interpret the coefficient estimates. This idea originated from the function `nsk()` of the **survival** package (introduced in version 3.2-8). The implementation of the `nsk()` of **splines2** essentially follows the `survival::nsk()` function. One noticeable argument for `nsk()` is `trim` (equivalent to the argument `b` of `survival::nsk()`). One may specify `trim = 0.05` to exclude 5% of the data from both sides when setting the boundary knots, which can be a more sensible choice in practice due to possible outliers. The `trim` argument is also available for `naturalSpline()`, which however is zero by default for backward compatibility. An illustration of the basis functions generated by `nsk()` is as follows: ```{r nsk} nskMat <- nsk(x, knots = knots, intercept = TRUE) par(op) plot(nskMat, ylab = "nsk()", mark_knots = "all") abline(h = 1, col = "red", lty = 3) ``` We can visually verify that only one basis function takes a value of one at each knot.
# Helper and Alias Functions ## Update Spline's Specification by `update()` {#update} The `update()` function is an S3 method to generate spline basis functions with new `x`, `degree`, `knots`, or `df` specifications. The first argument is an existing `splines2` object and additional named arguments will be passed to the corresponding functions to update the spline basis functions. Suppose we want to add two more knots to `nskMat` for natural cubic spline basis functions and exclude 5% of the data from both sides in total when placing the boundary knots. We can utilize the `update()` method as follows: ```{r update-nsk} nskMat2 <- update(nskMat, knots = c(knots, 0.2, 0.4), trim = 0.025) knots(nskMat2) stopifnot(all.equal(quantile(x, c(0.025, 0.975), names = FALSE), knots(nskMat2, "boundary"))) ``` ## Evaluation by `predict()` {#predict} The `predict()` method for `splines2` objects allows one to evaluate the spline function if a coefficient vector is specified via the `coef` argument. In addition, it internally calls the `update()` method to update the basis functions before computing the spline function, which can be useful to get the derivatives of the spline function. If the `coef` argument is not specified, the `predict()` method will be equivalent to the `update()` method. For instance, we can compute the first derivative of the I-spline function from the previous example with a coefficient vector `seq(0.1, by = 0.1, length.out = ncol(isMat))` at $x = (0.275, 0.525, 0.8)$ as follows: ```{r predict} new_x <- c(0.275, 0.525, 0.8) names(new_x) <- paste0("x=", new_x) (isMat2 <- predict(isMat, newx = new_x)) # the basis functions at new x stopifnot(all.equal(predict(isMat, newx = new_x), update(isMat, x = new_x))) ## compute the first derivative of the I-spline function in different ways beta <- seq(0.1, by = 0.1, length.out = ncol(isMat)) deriv_ispline1 <- predict(isMat, newx = new_x, coef = beta, derivs = 1) deriv_ispline2 <- predict(update(isMat, x = new_x, derivs = 1), coef = beta) deriv_ispline3 <- c(predict(deriv(isMat), newx = new_x) %*% beta) stopifnot(all.equal(deriv_ispline1, deriv_ispline2)) stopifnot(all.equal(deriv_ispline2, deriv_ispline3)) ``` ## Visualization by `plot()` {#plot} As one may notice in the previous examples, we may visualize the spline basis functions easily with the `plot()` method. By default, the spline basis functions are visualized at 101 equidistant grid points within the range of `x`, which can be tweaked by arguments `from`, `to`, and `n`. In addition, we can indicate the placement of knots by vertical lines through the argument `mark_knots`. The available options are `"none"`, `"internal"`, `"boundary"`, and `"all"`. A fitted spline function can be visualized by specifying the argument `coef`. An example of `nsk()` is as follows: ```{r plot-coef} beta <- seq.int(0.2, length.out = ncol(nskMat), by = 0.2) plot(nskMat, ylab = "nsk()", mark_knots = "all", coef = beta) abline(h = beta, col = seq_along(beta), lty = 3) ``` ## Including Spline Basis Functions in Model Formulas It is common to directly include spline basis functions in a model formula. To avoid a lengthy model formula, the package provides alias functions that are summarized in the following table: | Function | Equivalent Alias | |-------------------|------------------| | `bSpline()` | `bsp()` | | `mSpline()` | `msp()` | | `iSpline()` | `isp()` | | `cSpline()` | `csp()` | | `bernsteinPoly()` | `bpoly()` | | `naturalSpline()` | `nsp()` | One may create new alias functions. For example, we can create a new alias function simply named `b()` for B-splines and obtain equivalent models as follows: ```{r formula-alias} b <- bSpline # create an alias for B-splines mod1 <- lm(weight ~ b(height, degree = 1, df = 3), data = women) iknots <- with(women, knots(bSpline(height, degree = 1, df = 3))) mod2 <- lm(weight ~ bSpline(height, degree = 1, knots = iknots), data = women) pred1 <- predict(mod1, head(women, 10)) pred2 <- predict(mod2, head(women, 10)) stopifnot(all.equal(pred1, pred2)) ``` Nevertheless, there is a possible pitfall when using a customized wrapper function for spline basis functions along with a data-dependent placement of knots. When we make model predictions for a given new data, the placement of the internal/boundary knots can be different from the original placement that depends on the training set. As a result, the spline basis functions generated for prediction may not be the same as the counterparts used in the model fitting. A simple example is as follows: ```{r formula-wrap-failed} ## generates quadratic spline basis functions based on log(x) qbs <- function(x, ...) { splines2::bSpline(log(x), ..., degree = 2) } mod3 <- lm(weight ~ qbs(height, df = 5), data = women) mod4 <- lm(weight ~ bsp(log(height), degree = 2, df = 5), data = women) stopifnot(all.equal(unname(coef(mod3)), unname(coef(mod4)))) # the same coef pred3 <- predict(mod3, head(women, 10)) pred4 <- predict(mod4, head(women, 10)) all.equal(pred3, pred4) pred0 <- predict(qbs(women$height, df = 5), newx = head(log(women$height), 10), coef = coef(mod3)[- 1]) + coef(mod3)[1] stopifnot(all.equal(pred0, pred4, check.names = FALSE)) ``` Although the coefficient estimates are the same, the prediction results by using the `predict.lm()` differ. Using an alias function in the model formula produces correct results. To resolve this issue, we can create an S3 method for `stats::makepredictcall()` as follows: ```{r predict-qbs} ## generates quadratic spline basis functions based on log(x) with a new class qbs <- function(x, ...) { res <- splines2::bSpline(log(x), ..., degree = 2) class(res) <- c("qbs", class(res)) return(res) } ## a utility to help model.frame() create the right matrices makepredictcall.qbs <- function(var, call) { if (as.character(call)[1L] == "qbs" || (is.call(call) && identical(eval(call[[1L]]), qbs))) { at <- attributes(var)[c("knots", "Boundary.knots", "intercept", "periodic", "derivs", "integral")] call <- call[1L:2L] call[names(at)] <- at } call } ## the same example mod3 <- lm(weight ~ qbs(height, df = 5), data = women) mod4 <- lm(weight ~ bsp(log(height), degree = 2, df = 5), data = women) stopifnot(all.equal(unname(coef(mod3)), unname(coef(mod4)))) # the same coef pred3 <- predict(mod3, head(women, 10)) pred4 <- predict(mod4, head(women, 10)) all.equal(pred3, pred4) # should be TRUE this time ``` ## Extract Specifications by `$` The basis specifications are saved as attributes of the returned *splines2* objects, which means that we can extract one of the specifications by `attr()`. Alternatively, we can treat *splines2* objects as lists and use the corresponding `$` method. For example, it is straightforward to extract the specified `trim` of `nskMat2` by `attr(nskMat2, "trim")` or simply `nskMat2$trim`. ```{r extract} c(nskMat2$trim, attr(nskMat2, "trim")) ```
# Reference {-}