This package is considered a duplicate. The official version of this package is found at:https://collinerickson.r-universe.dev/GauPro

Package: GauPro 0.2.14

Collin Erickson

GauPro: Gaussian Process Fitting

Fits a Gaussian process model to data. Gaussian processes are commonly used in computer experiments to fit an interpolating model. The model is stored as an 'R6' object and can be easily updated with new data. There are options to run in parallel, and 'Rcpp' has been used to speed up calculations. For more info about Gaussian process software, see Erickson et al. (2018) <doi:10.1016/j.ejor.2017.10.002>.

Authors:Collin Erickson [aut, cre]

GauPro_0.2.14.tar.gz
GauPro_0.2.14.tar.gz(r-4.5-noble)GauPro_0.2.14.tar.gz(r-4.4-noble)
GauPro_0.2.14.tgz(r-4.4-emscripten)GauPro_0.2.14.tgz(r-4.3-emscripten)
GauPro.pdf |GauPro.html✨
GauPro/json (API)
NEWS

# Install 'GauPro' in R:

install.packages('GauPro', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/collinerickson/gaupro/issues0 issues

Uses libs:

openblas– Optimized BLAS
c++– GNU Standard C++ Library v3
openmp– GCC OpenMP (GOMP) support library

On CRAN:

openblas cpp openmp

4.35 score 1 packages 616 downloads 1 mentions 71 exports 63 dependencies

Last updated 24 days agofrom:c3e812a023. Checks:3 OK. Indexed: no.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 09 2025
R-4.5-linux-x86_64	OK	Mar 09 2025
R-4.4-linux-x86_64	OK	Mar 09 2025

Exports:arma_mult_cube_vec corr_cubic_matrix_symC corr_exponential_matrix_symC corr_gauss_dCdX corr_gauss_matrix corr_gauss_matrix_armaC corr_gauss_matrix_sym_armaC corr_gauss_matrix_symC corr_gauss_matrixC corr_latentfactor_matrix_symC corr_latentfactor_matrixmatrixC corr_matern32_matrix_symC corr_matern52_matrix_symC corr_orderedfactor_matrix_symC corr_orderedfactor_matrixmatrixC Cubic Exponential FactorKernel GauPro GauPro_base GauPro_Gauss GauPro_Gauss_LOO GauPro_kernel_model GauPro_kernel_model_LOO Gaussian Gaussian_devianceC Gaussian_hessianC Gaussian_hessianCC Gaussian_hessianR GowerFactorKernel gpkm gradfuncarray gradfuncarrayR IgnoreIndsKernel k_Cubic k_Exponential k_FactorKernel k_Gaussian k_GowerFactorKernel k_IgnoreIndsKernel k_LatentFactorKernel k_Matern32 k_Matern52 k_OrderedFactorKernel k_Periodic k_PowerExp k_RatQuad k_Triangle k_White kernel_cubic_dC kernel_exponential_dC kernel_gauss_dC kernel_latentFactor_dC kernel_matern32_dC kernel_matern52_dC kernel_orderedFactor_dC kernel_product kernel_sum LatentFactorKernel Matern32 Matern52 OrderedFactorKernel Periodic PowerExp RatQuad sqrt_matrix trend_0 trend_c trend_LM Triangle White

Dependencies:base64enc bslib cachem cli colorspace cpp11 digest dplyr evaluate fansi farver fastmap fontawesome fs generics ggplot2 glue gtable highr htmltools isoband jquerylib jsonlite knitr labeling lattice lbfgs lifecycle magrittr MASS Matrix memoise mgcv mime mixopt munsell nlme numDeriv pillar pkgconfig purrr R6 rappdirs RColorBrewer Rcpp RcppArmadillo rlang rmarkdown sass scales splitfngr stringi stringr tibble tidyr tidyselect tinytex utf8 vctrs viridisLite withr xfun yaml

A Guide to the GauPro R package

Collin Erickson

Rendered fromGauPro.Rmdusingknitr::rmarkdownon Mar 09 2025.

Last update: 2025-03-09
Started: 2017-08-17

Derivatives for estimating Gaussian process parameters

Collin Erickson

Rendered fromderivatives.Rmdusingknitr::rmarkdownon Mar 09 2025.

Last update: 2024-09-27
Started: 2017-08-17

Introduction to Gaussian Processes

Collin Erickson

Rendered fromIntroductionToGPs.Rmdusingknitr::rmarkdownon Mar 09 2025.

Last update: 2025-03-09
Started: 2017-08-17

Leave-one-out cross-validation and error correction

Collin Erickson

Rendered fromCrossValidationErrorCorrection.Rmdusingknitr::rmarkdownon Mar 09 2025.

Last update: 2022-11-14
Started: 2017-08-17

Spatial derivatives of Gaussian process models

Collin Erickson

Rendered fromsurface_derivatives.Rmdusingknitr::rmarkdownon Mar 09 2025.

Last update: 2025-03-09
Started: 2017-08-17

Citation

To cite package ‘GauPro’ in publications use:

Erickson C (2025). GauPro: Gaussian Process Fitting. R package version 0.2.14, https://CRAN.R-project.org/package=GauPro.

Corresponding BibTeX entry:

  @Manual{,
    title = {GauPro: Gaussian Process Fitting},
    author = {Collin Erickson},
    year = {2025},
    note = {R package version 0.2.14},
    url = {https://CRAN.R-project.org/package=GauPro},
  }

Readme and manuals

README

GauPro

GauPro

Overview

This package allows you to fit a Gaussian process regression model to a dataset. A Gaussian process (GP) is a commonly used model in computer simulation. It assumes that the distribution of any set of points is multivariate normal. A major benefit of GP models is that they provide uncertainty estimates along with their predictions.

Installation

You can install like any other package through CRAN.

install.packages('GauPro')

The most up-to-date version can be downloaded from my Github account.

# install.packages("devtools")
devtools::install_github("CollinErickson/GauPro")

Example in 1-Dimension

This simple shows how to fit the Gaussian process regression model to data. The function gpkm creates a Gaussian process kernel model fit to the given data.

library(GauPro)
#> Loading required package: mixopt
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> Loading required package: ggplot2
#> Loading required package: splitfngr
#> Loading required package: numDeriv
#> Loading required package: rmarkdown
#> Loading required package: tidyr

n <- 12
x <- seq(0, 1, length.out = n)
y <- sin(6*x^.8) + rnorm(n,0,1e-1)
gp <- gpkm(x, y)
#> * Argument 'kernel' is missing. It has been set to 'matern52'. See documentation for more details.

Plotting the model helps us understand how accurate the model is and how much uncertainty it has in its predictions. The green and red lines are the 95% intervals for the mean and for samples, respectively.

gp$plot1D()

Factor data: fitting the diamonds dataset`diamonds`

The model fit using gpkm can also be used with data/formula input and can properly handle factor data.

In this example, the diamonds data set is fit by specifying the formula and passing a data frame with the appropriate columns.

library(ggplot2)
diamonds_subset <- diamonds[sample(1:nrow(diamonds), 60), ]
dm <- gpkm(price ~ carat + cut + color + clarity + depth,
           diamonds_subset)
#> * Argument 'kernel' is missing. It has been set to 'matern52'. See documentation for more details.

Calling summary on the model gives details about the model, including diagnostics about the model fit and the relative importance of the features.

summary(dm)
#> Formula:
#>   price ~ carat + cut + color + clarity + depth 
#> 
#> Residuals:
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> -6589.09  -217.68    37.85  -165.28   181.42  1619.37 
#> 
#> Feature importance:
#>   carat     cut   color clarity   depth 
#>  1.5497  0.2130  0.3275  0.3358  0.0003 
#> 
#> AIC: 1008.96 
#> 
#> Pseudo leave-one-out R-squared       :   0.901367 
#> Pseudo leave-one-out R-squared (adj.):   0.8427204 
#> 
#> Leave-one-out coverage on 60 samples (small p-value implies bad fit):
#>  68%:     0.7        p-value:   0.7839 
#>  95%:    0.95        p-value:   1

We can also plot the model to get a visual idea of how each input affects the output.

plot(dm)

Constructing a kernel

In this case, the kernel was chosen automatically by looking at which dimensions were continuous and which were discrete, and then using a Matern 5/2 on the continuous dimensions (1,5), and separate ordered factor kernels on the other dimensions since those columns in the data frame are all ordinal factors. We can construct our own kernel using products and sums of any kernels, making sure that the continuous kernels ignore the factor dimensions.

Suppose we want to construct a kernel for this example that uses the power exponential kernel for the two continuous dimensions, ordered kernels on cut and color, and a Gower kernel on clarity. First we construct the power exponential kernel that ignores the 3 factor dimensions. Then we construct

cts_kernel <- k_IgnoreIndsKernel(k=k_PowerExp(D=2), ignoreinds = c(2,3,4))
factor_kernel2 <- k_OrderedFactorKernel(D=5, xindex=2, nlevels=nlevels(diamonds_subset[[2]]))
factor_kernel3 <- k_OrderedFactorKernel(D=5, xindex=3, nlevels=nlevels(diamonds_subset[[3]]))
factor_kernel4 <- k_GowerFactorKernel(D=5, xindex=4, nlevels=nlevels(diamonds_subset[[4]]))

# Multiply them
diamond_kernel <- cts_kernel * factor_kernel2 * factor_kernel3 * factor_kernel4

Now we can pass this kernel into gpkm and it will use it.

dm <- gpkm(price ~ carat + cut + color + clarity + depth,
           diamonds_subset, kernel=diamond_kernel)
dm$plotkernel()

Using kernels

A key modeling decision for Gaussian process models is the choice of kernel. The kernel determines the covariance and the behavior of the model. The default kernel is the Matern 5/2 kernel (Matern52), and is a good choice for most cases. The Gaussian, or squared exponential, kernel (Gaussian) is a common choice but often leads to a bad fit since it assumes the process the data comes from is infinitely differentiable. Other common choices that are available include the Exponential, Matern 3/2 (Matern32), Power Exponential (PowerExp), Cubic, Rational Quadratic (RatQuad), and Triangle (Triangle).

These kernels only work on numeric data. For factor data, the kernel will default to a Latent Factor Kernel (LatentFactorKernel) for character and unordered factors, or an Ordered Factor Kernel (OrderedFactorKernel) for ordered factors. As long as the input is given in as a data frame and the columns have the proper types, then the default kernel will properly handle it by applying the numeric kernel to the numeric inputs and the factor kernel to the factor and character inputs.

Kernels are stored as R6 objects. They can all be created using the R6 object generator (e.g., Matern52$new()), or using the k_<kernel name> shortcut function (e.g., k_Matern52()). The latter is easier to use (and recommended) since R will show the function arguments and autocomplete.

The following table shows details on all the kernels available.

Kernel	Function	Continuous/ discrete	Equation	Notes
Gaussian	`k_Gaussian`	cts		Often causes issues since it assumes infinite differentiability. Experts don’t recommend using it.
Matern 3/2	`k_Matern32`	cts		Assumes one time differentiability. This is often too low of an assumption.
Matern 5/2	`k_Matern52`	cts		Assumes two time differentiability. Generally the best.
Exponential	`k_Exponential`	cts		Equivalent to Matern 1/2. Assumes no differentiability.
Triangle	`k_Triangle`	cts
Power exponential	`k_PowerExp`	cts
Periodic	`k_Periodic`	cts	$k(x, y) = \sigma^2 * \exp(-\sum(\alpha_isin(p (x_i-y_i))^2))$	The only kernel that takes advantage of periodic data. But there is often incoherence far apart, so you will likely want to multiply by one of the standard kernels.
Cubic	`k_Cubic`	cts
Rational quadratic	`k_RatQuad`	cts
Latent factor kernel	`k_LatentFactorKernel`	factor		This embeds each discrete value into a low dimensional space and calculates the distances in that space. This works well when there are many discrete values.
Ordered factor kernel	`k_OrderedFactorKernel`	factor		This maintains the order of the discrete values. E.g., if there are 3 levels, it will ensure that 1 and 2 have a higher correlation than 1 and 3. This is similar to embedding into a latent space with 1 dimension and requiring the values to be kept in numerical order.
Factor kernel	`k_FactorKernel`	factor		This fits a parameter for every pair of possible values. E.g., if there are 4 discrete values, it will fit 6 (4 choose 2) values. This doesn’t scale well. When there are many discrete values, use any of the other factor kernels.
Gower factor kernel	`k_GowerFactorKernel`	factor	$k(x,y) = \begin{cases} 1, & \text{if } x=y \ p, & \text{if } x \neq y \end{cases}$	This is a very simple factor kernel. For the relevant dimension, the correlation will either be 1 if the value are the same, or $p$ if they are different.
Ignore indices	`k_IgnoreIndsKernel`	N/A		Use this to create a kernel that ignores certain dimensions. Useful when you want to fit different kernel types to different dimensions or when there is a mix of continuous and discrete dimensions.

Factor kernels: note that these all only work on a single dimension. If there are multiple factor dimensions in your input, then they each will be given a different factor kernel.

Combining kernels

Kernels can be combined by multiplying or adding them directly.

The following example uses the product of a periodic and a Matern 5/2 kernel to fit periodic data.

x <- 1:20
y <- sin(x) + .1*x^1.3
combo_kernel <- k_Periodic(D=1) * k_Matern52(D=1)
gp <- gpkm(x, y, kernel=combo_kernel, nug.min=1e-6)
#> * nug is at minimum value after optimizing. Check the fit to see it this caused a bad fit. Consider changing nug.min. This is probably fine for noiseless data.
gp$plot()

For an example of a more complex kernel being constructed, see the diamonds section above.

Intro to GPs

(This section used to be the main vignette on CRAN for this package.)

This R package provides R code for fitting Gaussian process models to data. The code is created using the R6 class structure, which is why $ is used to access object methods.

A Gaussian process fits a model to a dataset, which gives a function that gives a prediction for the mean at any point along with a variance of this prediction.

Suppose we have the data below

x <- seq(0,1,l=10)
y <- abs(sin(2*pi*x))^.8
ggplot(aes(x,y), data=cbind(x,y)) +
  geom_point()

A linear model (LM) will fit a straight line through the data and clearly does not describe the underlying function producing the data.

ggplot(aes(x,y), data=cbind(x,y)) +
    geom_point() +
    stat_smooth(method='lm')
#> `geom_smooth()` using formula = 'y ~ x'

A Gaussian process is a type of model that assumes that the distribution of points follows a multivariate distribution.

In GauPro, we can fit a GP model with Gaussian correlation function using the function gpkm.

library(GauPro)
gp <- gpkm(x, y, kernel=k_Gaussian(D=1), parallel=FALSE)

Now we can plot the predictions given by the model. Shown below, this model looks much better than a linear model.

gp$plot1D()

A very useful property of GP’s is that they give a predicted error. The red lines give an approximate 95% confidence interval for the value at each point (measure value, not the mean). The width of the prediction interval is largest between points and goes to zero near data points, which is what we would hope for.

GP models give distributions for the predictions. Realizations from these distributions give an idea of what the true function may look like. Calling $cool1Dplot() on the 1-D gp object shows 20 realizations. The realizations are most different away from the design points.

if (requireNamespace("MASS", quietly = TRUE)) {
  gp$cool1Dplot()
}

Using kernels

The kernel, or covariance function, has a large effect on the Gaussian process being estimated. Many different kernels are available in the gpkm() function which creates the GP object.

The example below shows what the Matern 5/2 kernel gives.

kern <- k_Matern52(D=1)
gpk <- gpkm(matrix(x, ncol=1), y, kernel=kern, parallel=FALSE)
if (requireNamespace("MASS", quietly = TRUE)) {
  plot(gpk)
}

The exponential kernel is shown below. You can see that it has a huge effect on the model fit. The exponential kernel assumes the correlation between points dies off very quickly, so there is much more uncertainty and variation in the predictions and sample paths.

kern.exp <- k_Exponential(D=1)
gpk.exp <- gpkm(matrix(x, ncol=1), y, kernel=kern.exp, parallel=FALSE)
if (requireNamespace("MASS", quietly = TRUE)) {
  plot(gpk.exp)
}

Trends

Along with the kernel the trend can also be set. The trend determines what the mean of a point is without any information from the other points. I call it a trend instead of mean because I refer to the posterior mean as the mean, whereas the trend is the mean of the normal distribution. Currently the three options are to have a mean 0, a constant mean (default and recommended), or a linear model.

With the exponential kernel above we see some regression to the mean. Between points the prediction reverts towards the mean of 0.2986876. Also far away from any data the prediction will near this value.

Below when we use a mean of 0 we do not see this same reversion.

kern.exp <- k_Exponential(D=1)
trend.0 <- trend_0$new()
gpk.exp <- gpkm(matrix(x, ncol=1), y, kernel=kern.exp, trend=trend.0, parallel=FALSE)
if (requireNamespace("MASS", quietly = TRUE)) {
  plot(gpk.exp)
}

Help Manual

Help page	Topics
Kernel product	*.GauPro_kernel
Kernel sum	+.GauPro_kernel
Cube multiply over first dimension	arma_mult_cube_vec
Correlation Cubic matrix in C (symmetric)	corr_cubic_matrix_symC
Correlation Gaussian matrix in C (symmetric)	corr_exponential_matrix_symC
Correlation Gaussian matrix gradient in C using Armadillo	corr_gauss_dCdX
Gaussian correlation	corr_gauss_matrix
Correlation Gaussian matrix in C using Armadillo	corr_gauss_matrix_armaC
Correlation Gaussian matrix in C using Armadillo (symmetric)	corr_gauss_matrix_sym_armaC
Correlation Gaussian matrix in C (symmetric)	corr_gauss_matrix_symC
Correlation Gaussian matrix in C using Rcpp	corr_gauss_matrixC
Correlation Latent factor matrix in C (symmetric)	corr_latentfactor_matrix_symC
Correlation Latent factor matrix in C (symmetric)	corr_latentfactor_matrixmatrixC
Correlation Matern 3/2 matrix in C (symmetric)	corr_matern32_matrix_symC
Correlation Gaussian matrix in C (symmetric)	corr_matern52_matrix_symC
Correlation ordered factor matrix in C (symmetric)	corr_orderedfactor_matrix_symC
Correlation ordered factor matrix in C (symmetric)	corr_orderedfactor_matrixmatrixC
Cubic Kernel R6 class	Cubic k_Cubic
Exponential Kernel R6 class	Exponential k_Exponential
Factor Kernel R6 class	FactorKernel k_FactorKernel
GauPro_selector	GauPro
Class providing object with methods for fitting a GP model	GauPro_base
Corr Gauss GP using inherited optim	GauPro_Gauss
Corr Gauss GP using inherited optim	GauPro_Gauss_LOO
Kernel R6 class	GauPro_kernel
Beta Kernel R6 class	GauPro_kernel_beta
Gaussian process model with kernel	GauPro_kernel_model
Corr Gauss GP using inherited optim	GauPro_kernel_model_LOO
Trend R6 class	GauPro_trend
Gaussian Kernel R6 class	Gaussian k_Gaussian
Calculate the Gaussian deviance in C	Gaussian_devianceC
Calculate Hessian for a GP with Gaussian correlation	Gaussian_hessianC
Gaussian hessian in C	Gaussian_hessianCC
Calculate Hessian for a GP with Gaussian correlation	Gaussian_hessianR
Gower factor Kernel R6 class	GowerFactorKernel k_GowerFactorKernel
Gaussian process regression model	gpkm
Calculate gradfunc in optimization to speed up. NEEDS TO APERM dC_dparams Doesn't need to be exported, should only be useful in functions.	gradfuncarray
Calculate gradfunc in optimization to speed up. NEEDS TO APERM dC_dparams Doesn't need to be exported, should only be useful in functions.	gradfuncarrayR
Kernel R6 class	IgnoreIndsKernel k_IgnoreIndsKernel
Derivative of cubic kernel covariance matrix in C	kernel_cubic_dC
Derivative of Matern 5/2 kernel covariance matrix in C	kernel_exponential_dC
Derivative of Gaussian kernel covariance matrix in C	kernel_gauss_dC
Derivative of covariance matrix of X with respect to kernel parameters for the Latent Factor Kernel	kernel_latentFactor_dC
Derivative of Matern 5/2 kernel covariance matrix in C	kernel_matern32_dC
Derivative of Matern 5/2 kernel covariance matrix in C	kernel_matern52_dC
Derivative of covariance matrix of X with respect to kernel parameters for the Ordered Factor Kernel	kernel_orderedFactor_dC
Gaussian Kernel R6 class	kernel_product
Gaussian Kernel R6 class	kernel_sum
Latent Factor Kernel R6 class	k_LatentFactorKernel LatentFactorKernel
Matern 3/2 Kernel R6 class	k_Matern32 Matern32
Matern 5/2 Kernel R6 class	k_Matern52 Matern52
Ordered Factor Kernel R6 class	k_OrderedFactorKernel OrderedFactorKernel
Periodic Kernel R6 class	k_Periodic Periodic
Power Exponential Kernel R6 class	k_PowerExp PowerExp
Predict for class GauPro	predict.GauPro
Print summary.GauPro	print.summary.GauPro
Rational Quadratic Kernel R6 class	k_RatQuad RatQuad
Find the square root of a matrix	sqrt_matrix
Summary for GauPro object	summary.GauPro
Trend R6 class	trend_0
Trend R6 class	trend_c
Trend R6 class	trend_LM
Triangle Kernel R6 class	k_Triangle Triangle
White noise Kernel R6 class	k_White White

Package: GauPro 0.2.14

GauPro: Gaussian Process Fitting

A Guide to the GauPro R package

Derivatives for estimating Gaussian process parameters

Introduction to Gaussian Processes

Leave-one-out cross-validation and error correction

Spatial derivatives of Gaussian process models

Citation

Readme and manuals

README

GauPro

Overview

Installation

Example in 1-Dimension

Factor data: fitting the diamonds datasetdiamonds

Constructing a kernel

Using kernels

Combining kernels

Intro to GPs

Using kernels

Trends

Help Manual

Usage by other packages (reverse dependencies)

Factor data: fitting the diamonds dataset`diamonds`