Package 'NPRED'

Title: Predictor Identifier: Nonparametric Prediction
Description: Partial informational correlation (PIC) is used to identify the meaningful predictors to the response from a large set of potential predictors. Details of methodologies used in the package can be found in Sharma, A., Mehrotra, R. (2014). <doi:10.1002/2013WR013845>, Sharma, A., Mehrotra, R., Li, J., & Jha, S. (2016). <doi:10.1016/j.envsoft.2016.05.021>, and Mehrotra, R., & Sharma, A. (2006). <doi:10.1016/j.advwatres.2005.08.007>.
Authors: Ashish Sharma [aut] , Raj Mehrotra [aut], Sanjeev Jha [aut], Jingwan Li [aut], Ze Jiang [aut, cre]
Maintainer: Ze Jiang <[email protected]>
License: GPL-3
Version: 1.1.0
Built: 2024-10-07 06:47:23 UTC
Source: CRAN

Help Index


Calculate the ratio of conditional error standard deviations

Description

Calculate the ratio of conditional error standard deviations

Usage

calc.scaleSTDratio(x, zin, zout)

Arguments

x

A vector of response.

zin

A matrix containing the meaningful predictors selected from a large set of possible predictors (z).

zout

A matrix containing the remaining possible predictors after taking out the meaningful predictors (zin).

Value

The STD ratio.

References

Sharma, A., Mehrotra, R., 2014. An information theoretic alternative to model a natural system using observational information alone. Water Resources Research, 50(1): 650-660.


Generate predictor and response data.

Description

Generate predictor and response data.

Usage

data.gen.ar1(nobs, ndim = 9)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

Examples

# AR1 model from paper with 9 dummy variables
data.ar1 <- data.gen.ar1(500)
stepwise.PIC(data.ar1$x, data.ar1$dp)

Generate predictor and response data.

Description

Generate predictor and response data.

Usage

data.gen.ar4(nobs, ndim = 9)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

Examples

# AR4 model from paper with total 9 dimensions
data.ar4 <- data.gen.ar4(500)
stepwise.PIC(data.ar4$x, data.ar4$dp)

Generate predictor and response data.

Description

Generate predictor and response data.

Usage

data.gen.ar9(nobs, ndim = 9)

Arguments

nobs

The data length to be generated.

ndim

The number of potential predictors (default is 9).

Value

A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.

Examples

# AR9 model from paper with total 9 dimensions
data.ar9 <- data.gen.ar9(500)
stepwise.PIC(data.ar9$x, data.ar9$dp)

Sample data : AR9 model: x(i)=0.3*x(i-1)-0.6*x(i-4)-0.5*x(i-9)+eps

Description

A dataset containing 500 rows (data length) and 16 columns. The first column is response data and the rest columns are possible predictors.

Usage

data(data1)

Sample data : AR4 model: x(i)=0.6*x(i-1)-0.4*x(i-4)+eps

Description

A dataset containing 500 rows (data length) and 16 columns. The first column is response data and the rest columns are possible predictors.

Usage

data(data2)

Sample data : AR1 model: x(i)=0.9*x(i-1)+0.866*eps

Description

A dataset containing 500 rows (data length) and 16 columns. The first column is response data and the rest columns are possible predictors.

Usage

data(data3)

Modified k-nearest neighbour conditional bootstrap or regression function estimation with extrapolation

Description

Modified k-nearest neighbour conditional bootstrap or regression function estimation with extrapolation

Usage

knn(
  x,
  z,
  zout,
  k = 0,
  pw,
  reg = TRUE,
  nensemble = 100,
  tailcorrection = TRUE,
  tailprob = 0.25,
  tailfac = 0.2,
  extrap = TRUE
)

Arguments

x

A vector of response.

z

A matrix of existing predictors.

zout

A matrix of predictor values the response is to be estimated at.

k

The number of nearest neighbours used. The default value is 0, indicating Lall and Sharma default is used.

pw

A vector of partial weights of the same length of z.

reg

A logical operator to inform whether a conditional expectation should be output or not nensemble, Used if reg=F and represents the number of realisations that are generated Value.

nensemble

An integer the specifies the number of ensembles used. The default is 100.

tailcorrection

A logical value, T (default) or F, that denotes whether a reduced value of k (number of nearest neighbours) should be used in the tails of any conditioning plane. Whether one is in the tails or not is determined based on the nearest neighbour response value.

tailprob

A scalar that denotes the p-value of the cdf (on either extreme) the tailcorrection takes effect. The default value is 0.25.

tailfac

A scalar that specifies the lowest fraction of the default k that can be used in the tails. Depending on the how extreme one is in the tails, the actual k decreases linearly from k (for a p-value greater than tailprob) to tailfac*k proportional to the actual p-value of the nearest neighbour response, divided by tailprob. The default value is 0.2.

extrap

A logical value, T (default) or F, that denotes whether a kernel extraplation method is used to predict x.

Value

A matrix of responses having same rows as zout if reg=T, or having nensemble columns is reg=F.

References

Sharma, A., Tarboton, D.G. and Lall, U., 1997. Streamflow simulation: A nonparametric approach. Water resources research, 33(2), pp.291-308.

Sharma, A. and O'Neill, R., 2002. A nonparametric approach for representing interannual dependence in monthly streamflow sequences. Water resources research, 38(7), pp.5-1.

Examples

data(data1) # AR9 model   x(i)=0.3*x(i-1)-0.6*x(i-4)-0.5*x(i-9)+eps
x <- data1[, 1] # response
py <- data1[, -1] # possible predictors
ans.ar9 <- stepwise.PIC(x, py) # identify the meaningful predictors and estimate partial weights
z <- py[, ans.ar9$cpy] # predictor matrix
pw <- ans.ar9$wt # partial weights

# vector denoting where we want outputs, can be a matrix representing grid.
zout <- apply(z, 2, mean)

knn(x, z, zout, reg = TRUE, pw = pw) # knn regression estimate using partial weights.

knn(x, z, zout, reg = FALSE, pw = pw) # alternatively, knn conditional bootstrap (100 realisations).
# Mean of the conditional bootstrap estimate should be
# approximately the same as the regression estimate.

zout <- ts(data.gen.ar9(500, ndim = length(ans.ar9$cpy))$dp) # new input
xhat1 <- xhat2 <- x
xhat1 <- NPRED::knn(x, z, zout, k = 5, reg = TRUE, extrap = FALSE) # without extrapolation
xhat2 <- NPRED::knn(x, z, zout, k = 5, reg = TRUE, extrap = TRUE) # with extrapolation

ts.plot(ts(x), ts(xhat1), ts(xhat2),
  col = c("black", "red", "blue"), ylim = c(-5, 5),
  lwd = c(2, 2, 1)
)

Leave one out cross validation.

Description

Leave one out cross validation.

Usage

knnregl1cv(x, z, k = 0, pw)

Arguments

x

A vector of response.

z

A matrix of predictors.

k

The number of nearest neighbours used. The default is 0, indicating Lall and Sharma default is used.

pw

A vector of partial weights of the same length of z.

Value

A vector of L1CV estimates of the response.

References

Lall, U., Sharma, A., 1996. A Nearest Neighbor Bootstrap For Resampling Hydrologic Time Series. Water Resources Research, 32(3): 679-693.

Sharma, A., Mehrotra, R., 2014. An information theoretic alternative to model a natural system using observational information alone. Water Resources Research, 50(1): 650-660.


Calculate PIC

Description

Calculate PIC

Usage

pic.calc(X, Y, Z = NULL)

Arguments

X

A vector of response.

Y

A matrix of new predictors.

Z

A matrix of pre-existing predictors that could be NULL if no prior predictors exist.

Value

A list of 2 elements: the partial mutual information (pmi), and partial informational correlation (pic).

References

Sharma, A., Mehrotra, R., 2014. An information theoretic alternative to model a natural system using observational information alone. Water Resources Research, 50(1): 650-660.

Galelli S., Humphrey G.B., Maier H.R., Castelletti A., Dandy G.C. and Gibbs M.S. (2014) An evaluation framework for input variable selection algorithms for environmental data-driven models, Environmental Modelling and Software, 62, 33-51, DOI: 10.1016/j.envsoft.2014.08.015.


Calculate Partial Weight

Description

Calculate Partial Weight

Usage

pw.calc(x, py, cpy, cpyPIC)

Arguments

x

A vector of response.

py

A matrix containing possible predictors of x.

cpy

The column numbers of the meaningful predictors (cpy).

cpyPIC

Partial informational correlation (cpyPIC).

Value

A vector of partial weights(pw) of the same length of z.

References

Sharma, A., Mehrotra, R., 2014. An information theoretic alternative to model a natural system using observational information alone. Water Resources Research, 50(1): 650-660.


Calculate stepwise PIC

Description

Calculate stepwise PIC

Usage

stepwise.PIC(x, py, nvarmax = 100, alpha = 0.1)

Arguments

x

A vector of response.

py

A matrix containing possible predictors of x.

nvarmax

The maximum number of variables to be selected.

alpha

The significance level used to judge whether the sample estimate in Equation

PIC^=sqrt(1exp(2PI^)\hat{PIC} = sqrt(1-exp(-2\hat{PI})

is significant or not. A default alpha value is 0.1.

Value

A list of 2 elements: the column numbers of the meaningful predictors (cpy), and partial informational correlation (cpyPIC).

References

Sharma, A., Mehrotra, R., 2014. An information theoretic alternative to model a natural system using observational information alone. Water Resources Research, 50(1): 650-660.

Examples

data(data1) # AR9 model   x(i)=0.3*x(i-1)-0.6*x(i-4)-0.5*x(i-9)+eps
x <- data1[, 1] # response
py <- data1[, -1] # possible predictors
stepwise.PIC(x, py)

data(data2) # AR4 model:  x(i)=0.6*x(i-1)-0.4*x(i-4)+eps
x <- data2[, 1] # response
py <- data2[, -1] # possible predictors
stepwise.PIC(x, py)

data(data3) # AR1 model  x(i)=0.9*x(i-1)+0.866*eps
x <- data3[, 1] # response
py <- data3[, -1] # possible predictors
stepwise.PIC(x, py)

Sample data: Data over Sydney region

Description

A dataset containing Rainfall (15 stations), NCEP and CSIRO (7 atmospheric variables).

Usage

data(Sydney)