| Title: | Fast Local Polynomial Regression and Kernel Density Estimation |
|---|---|
| Description: | Non-Uniform Fast Fourier Transform ('NUFFT')-accelerated local polynomial regression and kernel density estimation for large, scattered, or complex-valued datasets. Provides automatic bandwidth selection via Generalized Cross-Validation (GCV) for regression and Likelihood Cross-Validation (LCV) for density estimation. This is the 'R' port of the 'fastLPR' 'MATLAB'/'Python' toolbox, achieving O(N + M log M) computational complexity through custom 'NUFFT' implementation with Gaussian gridding. Supports 1D/2D/3D data, complex-valued responses, heteroscedastic variance estimation, and confidence interval computation. Performance optimized with vectorized 'R' code and compiled helpers via 'Rcpp'/'RcppArmadillo'. Extends the 'FKreg' toolbox of Wang et al. (2022) <doi:10.48550/arXiv.2204.07716> with 'Python' and 'R' ports. Applied in Li et al. (2022) <doi:10.1016/j.neuroimage.2022.119190>. Uses 'NUFFT' methods based on Greengard and Lee (2004) <doi:10.1137/S003614450343200X>, binning-accelerated kernel estimation of Wand (1994) <doi:10.1080/10618600.1994.10474656>, and local polynomial regression framework of Fan and Gijbels (1996, ISBN:978-0412983214). |
| Authors: | Ying Wang [aut, cre], Min Li [aut] |
| Maintainer: | Ying Wang <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.1 |
| Built: | 2026-05-21 09:46:29 UTC |
| Source: | https://github.com/cran/fastlpr |
Performs kernel density estimation using NUFFT acceleration with automatic bandwidth selection via Likelihood Cross-Validation (LCV).
cv_fastkde(x, h = NULL, opt = NULL)cv_fastkde(x, h = NULL, opt = NULL)
x |
N x d matrix of data points (N samples, d dimensions). |
h |
Bandwidth parameter(s) or grid for LCV selection. |
opt |
Options list with fields similar to cv_fastlpr. |
KDE results list containing estimated density, evaluation grid, and LCV results.
x <- matrix(rnorm(200), ncol = 1) kde <- cv_fastkde(x)x <- matrix(rnorm(200), ncol = 1) kde <- cv_fastkde(x)
This is the main function for fastLPR toolbox. It performs nonparametric regression using kernel-weighted local polynomial methods with NUFFT (Non-Uniform Fast Fourier Transform) acceleration. The function automatically selects the optimal bandwidth via Generalized Cross-Validation (GCV) when multiple bandwidth candidates are provided.
cv_fastlpr(x, y, h = NULL, opt = NULL)cv_fastlpr(x, y, h = NULL, opt = NULL)
x |
N x d matrix of predictors (N samples, d dimensions). Can be real or complex-valued. Each row is one observation. For complex-valued predictors, use x = real + 1i*imag. |
y |
N x 1 vector of responses. Can be real or complex-valued. Must have same number of rows as x. |
h |
Bandwidth parameter(s) (optional, default: automatic selection). Scalar: same bandwidth for all dimensions. 1 x d vector: different bandwidth per dimension. k x d matrix: grid of k bandwidth combinations for GCV selection. Use get_hlist() to generate bandwidth candidates. |
opt |
Options list (optional) with fields:
|
Regression results list containing:
Fitted values at evaluation grid points (N_grid x 1)
Interpolant object for prediction at any point. Use: y_pred = regs$fpp_yhat(x_new)
GCV results list (if multiple bandwidths provided)
Selected bandwidth (1-SE rule)
Bandwidth with minimum GCV
GCV values for all bandwidths
Evaluation grid points (list for multi-dimensional)
Grid vectors (list)
Options used for regression
Original predictor data
Original response data
Ying Wang, Min Li
Wang, Y., & Li, M. (2024). Fast and Exact Kernel-Weighted Regression for Large-Scale Scattered and Complex-Valued Data. Journal of Statistical Software (under review).
cv_fastkde, get_hlist, fastlpr_interval, fastlpr_plot
# Example 1: 1D regression with automatic bandwidth selection x <- matrix(runif(500) * 20, ncol = 1) y <- sin(x) + 0.2 * rnorm(500) hlist <- get_hlist(20, c(0.01, 1), "logspace") opt <- list(order = 1) # Local linear regs <- cv_fastlpr(x, y, hlist, opt) # Plot results plot(x, y, pch = ".", col = "black") # Note: fpp_yhat is an interpolation function x_pred <- seq(min(x), max(x), length.out = 100) y_pred <- regs$fpp_yhat(x_pred) lines(x_pred, y_pred, col = "blue", lwd = 2)# Example 1: 1D regression with automatic bandwidth selection x <- matrix(runif(500) * 20, ncol = 1) y <- sin(x) + 0.2 * rnorm(500) hlist <- get_hlist(20, c(0.01, 1), "logspace") opt <- list(order = 1) # Local linear regs <- cv_fastlpr(x, y, hlist, opt) # Plot results plot(x, y, pch = ".", col = "black") # Note: fpp_yhat is an interpolation function x_pred <- seq(min(x), max(x), length.out = 100) y_pred <- regs$fpp_yhat(x_pred) lines(x_pred, y_pred, col = "blue", lwd = 2)
Visualizes kernel density estimation results.
fastkde_plot(fpp, x_range = NULL, n_points = 1000, ...)fastkde_plot(fpp, x_range = NULL, n_points = 1000, ...)
fpp |
KDE interpolator from cv_fastkde. |
x_range |
Optional range for plotting. |
n_points |
Number of points for plotting (default: 1000). |
... |
Additional plotting parameters. |
Called for its side effect (plotting). Returns NULL invisibly.
x <- matrix(rnorm(200), ncol = 1) kde <- cv_fastkde(x) fastkde_plot(kde$fpp)x <- matrix(rnorm(200), ncol = 1) kde <- cv_fastkde(x) fastkde_plot(kde$fpp)
Visualizes the bandwidth selection process for kernel density estimation, showing the LCV score as a function of bandwidth with the selected optimum marked.
fastkde_plot_bandwidth(kde, main = "Bandwidth Selection (LCV)", xlab = NULL, ylab = NULL, ...)fastkde_plot_bandwidth(kde, main = "Bandwidth Selection (LCV)", xlab = NULL, ylab = NULL, ...)
kde |
A |
main |
Title for the plot. |
xlab |
Label for the x-axis (default: auto-generated). |
ylab |
Label for the y-axis (default: auto-generated). |
... |
Additional arguments passed to |
Called for its side effect (plotting). Returns NULL invisibly.
x <- matrix(rnorm(500), ncol = 1) hlist <- get_hlist(20, c(0.05, 2)) kde <- cv_fastkde(x, hlist) fastkde_plot_bandwidth(kde)x <- matrix(rnorm(500), ncol = 1) hlist <- get_hlist(20, c(0.05, 2)) kde <- cv_fastkde(x, hlist) fastkde_plot_bandwidth(kde)
Computes pointwise confidence intervals for the conditional mean or prediction intervals for new observations, using the mean and variance estimates from local polynomial regression.
fastlpr_interval(mu, sigma, alpha = 0.05, type = "confidence")fastlpr_interval(mu, sigma, alpha = 0.05, type = "confidence")
mu |
Mean regression result from |
sigma |
Variance regression result from |
alpha |
Significance level (default: 0.05 for 95% intervals). |
type |
Type of interval: |
List with lower and upper interval bounds.
cv_fastlpr, fastlpr_plot_interval
x <- matrix(runif(200), ncol = 1) y <- sin(2 * pi * x) + rnorm(200, sd = 0.2) mu <- cv_fastlpr(x, y) sigma <- cv_fastlpr(x, (y - mu$yhat)^2) ci <- fastlpr_interval(mu, sigma)x <- matrix(runif(200), ncol = 1) y <- sin(2 * pi * x) + rnorm(200, sd = 0.2) mu <- cv_fastlpr(x, y) sigma <- cv_fastlpr(x, (y - mu$yhat)^2) ci <- fastlpr_interval(mu, sigma)
Visualizes fitted regression results.
fastlpr_plot(fpp_yhat, x_range = NULL, n_points = 1000, ...)fastlpr_plot(fpp_yhat, x_range = NULL, n_points = 1000, ...)
fpp_yhat |
Interpolant function or result object. |
x_range |
Optional range for plotting. |
n_points |
Number of points for plotting (default: 1000). |
... |
Additional plotting parameters. |
Called for its side effect (plotting). Returns NULL invisibly.
x <- matrix(runif(200), ncol = 1) y <- sin(2 * pi * x) + rnorm(200, sd = 0.2) fit <- cv_fastlpr(x, y) fastlpr_plot(fit$fpp_yhat)x <- matrix(runif(200), ncol = 1) y <- sin(2 * pi * x) + rnorm(200, sd = 0.2) fit <- cv_fastlpr(x, y) fastlpr_plot(fit$fpp_yhat)
Visualizes confidence or prediction interval bands for local polynomial regression estimates. Adds shaded interval regions to an existing plot.
fastlpr_plot_interval(ci, col = "green", alpha = 0.2, add = TRUE, ...)fastlpr_plot_interval(ci, col = "green", alpha = 0.2, add = TRUE, ...)
ci |
An interval structure returned by |
col |
Color for the interval band (default: |
alpha |
Transparency level for the band (default: 0.2). |
add |
Logical; if |
... |
Additional arguments passed to plotting functions. |
Called for its side effect (plotting). Returns NULL invisibly.
set.seed(42) x <- sort(runif(200)) y <- sin(2 * pi * x) + rnorm(200, sd = 0.3) hlist <- get_hlist(20, c(0.01, 0.5)) regs <- cv_fastlpr(x, y, hlist)set.seed(42) x <- sort(runif(200)) y <- sin(2 * pi * x) + rnorm(200, sd = 0.3) hlist <- get_hlist(20, c(0.01, 0.5)) regs <- cv_fastlpr(x, y, hlist)
Evaluates fitted regression at new predictor values.
fastlpr_predict(regs, x_new)fastlpr_predict(regs, x_new)
regs |
Regression results from cv_fastlpr. |
x_new |
New predictor values (matrix). |
Predicted response values.
x <- matrix(runif(200), ncol = 1) y <- sin(2 * pi * x) + rnorm(200, sd = 0.2) fit <- cv_fastlpr(x, y) x_new <- matrix(seq(0, 1, length.out = 50), ncol = 1) yhat <- fastlpr_predict(fit, x_new)x <- matrix(runif(200), ncol = 1) y <- sin(2 * pi * x) + rnorm(200, sd = 0.2) fit <- cv_fastlpr(x, y) x_new <- matrix(seq(0, 1, length.out = 50), ncol = 1) yhat <- fastlpr_predict(fit, x_new)
Creates a grid of bandwidth candidates for cross-validation. Port from MATLAB's get_hlist.m (unified API v2.0).
get_hlist(n, range, spacing = "logspace")get_hlist(n, range, spacing = "logspace")
n |
Number of bandwidth candidates per dimension. Scalar: same number of points for all dimensions. Vector: specify number of points per dimension. Typical values: 20 for 1D, c(15, 15) for 2D. |
range |
Range of bandwidths (min, max) for each dimension. 1D: c(h_min, h_max) or matrix with 1 row. Multi-D: matrix with one row per dimension. Rule of thumb: h_min = 0.1 * sd(x), h_max = 1.0 * sd(x). |
spacing |
Spacing function type (default: "logspace"). "logspace": logarithmic spacing (better for exploring bandwidth scales). "linear": linear spacing. |
Matrix of bandwidth candidates (n_total x d). Each row is one bandwidth combination.
# 1D: 20 bandwidths from 0.01 to 1 (log scale, default) hlist <- get_hlist(20, c(0.01, 1)) # 2D: 10x10 grid of bandwidths hlist <- get_hlist(10, rbind(c(0.01, 1), c(0.01, 1))) # Linear spacing hlist <- get_hlist(10, c(0.5, 0.6), "linear")# 1D: 20 bandwidths from 0.01 to 1 (log scale, default) hlist <- get_hlist(20, c(0.01, 1)) # 2D: 10x10 grid of bandwidths hlist <- get_hlist(10, rbind(c(0.01, 1), c(0.01, 1))) # Linear spacing hlist <- get_hlist(10, c(0.5, 0.6), "linear")
Returns information about the Rcpp backend including version and available acceleration features.
get_rcpp_info()get_rcpp_info()
A list with Rcpp version and capabilities if available, NULL otherwise.
get_rcpp_info()get_rcpp_info()
Tests whether an object is of class fastkde_result.
is_fastkde_result(x)is_fastkde_result(x)
x |
An R object to test. |
TRUE if x inherits from class "fastkde_result",
FALSE otherwise.
is_fastkde_result(list()) # FALSEis_fastkde_result(list()) # FALSE
Tests whether an object is of class fastlpr_result.
is_fastlpr_result(x)is_fastlpr_result(x)
x |
An R object to test. |
TRUE if x inherits from class "fastlpr_result",
FALSE otherwise.
is_fastlpr_result(list()) # FALSEis_fastlpr_result(list()) # FALSE
Utility function to generate multi-dimensional grids.
multispace(x_min, x_max, N, space = "linear", type = "array", transpose = FALSE)multispace(x_min, x_max, N, space = "linear", type = "array", transpose = FALSE)
x_min |
Minimum values per dimension. |
x_max |
Maximum values per dimension. |
N |
Grid size per dimension. |
space |
Grid spacing: |
type |
Output type: |
transpose |
Whether to transpose output (default: FALSE). |
List of grid vectors.
grid <- multispace(0, 1, 50)grid <- multispace(0, 1, 50)
Tests whether the compiled Rcpp code is loaded and functional. This is useful for checking if C++ acceleration will be used.
rcpp_available()rcpp_available()
Logical. TRUE if Rcpp functions are available, FALSE otherwise.
rcpp_available()rcpp_available()
Utility function to set a single default option field.
set_defaults(opt, field, default)set_defaults(opt, field, default)
opt |
User-provided options list. |
field |
Field name to set. |
default |
Default value for the field. |
Merged options list.
opt <- list(order = 1) opt <- set_defaults(opt, "kernel", "gaussian")opt <- list(order = 1) opt <- set_defaults(opt, "kernel", "gaussian")
Standardizes data to zero mean and unit variance.
zscore(x)zscore(x)
x |
Numeric vector or matrix. |
Standardized data with attributes for mean and standard deviation.
result <- zscore(rnorm(100, mean = 5, sd = 2)) mean(result$z) # approximately 0result <- zscore(rnorm(100, mean = 5, sd = 2)) mean(result$z) # approximately 0