Title: | Linear Dynamical System Reconstruction |
---|---|
Description: | Streamflow (and climate) reconstruction using Linear Dynamical Systems. The advantage of this method is the additional state trajectory which can reveal more information about the catchment or climate system. For details of the method please refer to Nguyen and Galelli (2018) <doi:10.1002/2017WR022114>. |
Authors: | Hung Nguyen [aut, cre], Stefano Galelli [ctb] |
Maintainer: | Hung Nguyen <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 0.0.2 |
Built: | 2024-12-05 07:08:42 UTC |
Source: | CRAN |
Calculate reconstruction metrics from the instrumental period
calculate_metrics(sim, obs, z, norm.fun = mean)
calculate_metrics(sim, obs, z, norm.fun = mean)
sim |
A vector of reconstruction output for instrumental period |
obs |
A vector of all observations |
z |
A vector of left out indices in cross validation |
norm.fun |
The function (unquoted name) used to calculate the normalizing constant. Default is |
A named vector of performance metrics
calculate_metrics(rnorm(100), rnorm(100), z = 1:10) calculate_metrics(rnorm(100), rnorm(100), z = 1:10, norm.fun = sd)
calculate_metrics(rnorm(100), rnorm(100), z = 1:10) calculate_metrics(rnorm(100), rnorm(100), z = 1:10, norm.fun = sd)
Call a reconstruction method subroutine according to the method required
call_method( y, u, v, method, init, num.restarts, return.init, ub, lb, num.islands, pop.per.island, niter, tol )
call_method( y, u, v, method, init, num.restarts, return.init, ub, lb, num.islands, pop.per.island, niter, tol )
y |
Catchment output, preprocessed from data |
u |
Input matrix for a single-model reconstruction, or a list of input matrices for an ensemble reconstruction. |
v |
Same as u. |
method |
By default this is "EM". There are experimental methods but you should not try. |
init |
A list, each element is a vector of initial values for the parameters. If |
num.restarts |
The number of initial conditions to start the EM search; ignored if |
return.init |
If |
ub |
Upper bounds, a vector whose length is the number of parameters |
lb |
Lower bounds |
num.islands |
Number of islands (if method is GA; experimental) |
pop.per.island |
Initial population per island (if method is GA; experimental) |
niter |
Maximum number of iterations, default 1000 |
tol |
Tolerance for likelihood convergence, default 1e-5. Note that the log-likelihood is normalized by dividing by the number of observations. |
The results as produced by LDS_EM_restart()
when the default method (EM) is called. Other methods are experimental.
Calculate the Pearson's correlation using the numerically stable formulation (see References). Internal function.
corr(x, y)
corr(x, y)
x |
First variable |
y |
Second variable |
Pearson's correlation
//www.johndcook.com/blog/2008/11/05/how-to-calculate-pearson-correlation-accurately/
Cross validate LDS model. This is a wrapper for LDS_reconstruction
cvLDS( Qa, u, v, start.year, method = "EM", transform = "log", num.restarts = 50, Z = make_Z(Qa$Qa), metric.space = "transformed", use.raw = FALSE, ub = NULL, lb = NULL, num.islands = 4, pop.per.island = 100, niter = 1000, tol = 1e-05 )
cvLDS( Qa, u, v, start.year, method = "EM", transform = "log", num.restarts = 50, Z = make_Z(Qa$Qa), metric.space = "transformed", use.raw = FALSE, ub = NULL, lb = NULL, num.islands = 4, pop.per.island = 100, niter = 1000, tol = 1e-05 )
Qa |
Observations: a data.frame of annual streamflow with at least two columns: year and Qa. |
u |
Input matrix for a single-model reconstruction, or a list of input matrices for an ensemble reconstruction. |
v |
Same as u. |
start.year |
Starting year of the climate proxies, i.e, the first year of the paleo period. |
method |
By default this is "EM". There are experimental methods but you should not try. |
transform |
Flow transformation, either "log", "boxcox" or "none". Note that if the Box-Cox transform is used, the confidence interval after back-transformation is simply the back-transform of the trained onfidence interval; this is hackish and not entirely accurate. |
num.restarts |
The number of initial conditions to start the EM search; ignored if |
Z |
A list of cross-validation folds. If |
metric.space |
Either "transformed" or "original", the space to calculate the performance metrics. |
use.raw |
Whether performance metrics are calculated on the raw time series. Experimental; don't use. |
ub |
Upper bounds, a vector whose length is the number of parameters |
lb |
Lower bounds |
num.islands |
Number of islands (if method is GA; experimental) |
pop.per.island |
Initial population per island (if method is GA; experimental) |
niter |
Maximum number of iterations, default 1000 |
tol |
Tolerance for likelihood convergence, default 1e-5. Note that the log-likelihood is normalized by dividing by the number of observations. |
A list of cross validation results
metrics.dist: distribution of performance metrics across all cross-validation runs; a matrix, one column for each metric, with column names.
metrics: average performance metrics; a named vector.
target: the (transformed) observations used for cross-validation; a data.table with two columns (year, y)
Ycv: the predicted streamflow in each cross validation run; a matrix, one column for each cross-validation run
Z: the cross-validation
# Make a shorter time horizon so that this example runs fast u <- v <- t(NPpc[601:813]) # We run this example without parallelism. foreach::registerDoSEQ() cvLDS(NPannual, u, v, start.year = 1800, num.restarts = 2, Z = make_Z(NPannual$Qa, nRuns = 1)) # Please refer to the vignette for the full run with parallel options. It takes a minute or two.
# Make a shorter time horizon so that this example runs fast u <- v <- t(NPpc[601:813]) # We run this example without parallelism. foreach::registerDoSEQ() cvLDS(NPannual, u, v, start.year = 1800, num.restarts = 2, Z = make_Z(NPannual$Qa, nRuns = 1)) # Please refer to the vignette for the full run with parallel options. It takes a minute or two.
Cross validation of PCR reconstruction.
cvPCR( Qa, pc, start.year, transform = "log", Z = NULL, metric.space = "transformed" )
cvPCR( Qa, pc, start.year, transform = "log", Z = NULL, metric.space = "transformed" )
Qa |
Observations: a data.frame of annual streamflow with at least two columns: year and Qa. |
pc |
For a single model: a data.frame, one column for each principal component. For an ensemble reconstruction: a list, each element is a data.frame of principal components. |
start.year |
Starting year of the climate proxies, i.e, the first year of the paleo period. |
transform |
Flow transformation, either "log", "boxcox" or "none". Note that if the Box-Cox transform is used, the confidence interval after back-transformation is simply the back-transform of the trained onfidence interval; this is hackish and not entirely accurate. |
Z |
A list of cross-validation folds. If |
metric.space |
Either "transformed" or "original", the space to calculate the performance metrics. |
A list of cross validation results
metrics.dist: distribution of performance metrics across all cross-validation runs; a matrix, one column for each metric, with column names.
metrics: average performance metrics; a named vector.
obs: the (transformed) observations, a data.table with two columns (year, y)
Ycv: the predicted streamflow in each cross validation run; a matrix, one column for each cross-validation run
Z: the cross-validation fold
cvPCR(NPannual, NPpc, start.year = 1200)
cvPCR(NPannual, NPpc, start.year = 1200)
Get the confidence interval of Q = exp(Y) when Y is normal, i.e, Q is log-normal.
exp_ci(y, sigma2)
exp_ci(y, sigma2)
y |
A vector of model estimates |
sigma2 |
The variance of y as learned from a model |
A data.table with the updated confidence intervals
Inverse Box-Cox transform
inv_boxcox(x, lambda)
inv_boxcox(x, lambda)
x |
A numeric vector to be transformed |
lambda |
Lambda parameter |
The inverse Box-Cox
inv_boxcox(x = rnorm(10), lambda = 1) inv_boxcox(x = rnorm(10), lambda = 0) # exp(x)
inv_boxcox(x = rnorm(10), lambda = 1) inv_boxcox(x = rnorm(10), lambda = 0) # exp(x)
Estimate the hidden state and expected log-likelihood given the observations, exogeneous input and system parameters. This is an internal function and should not be called directly.
Kalman_smoother(y, u, v, theta, stdlik = TRUE)
Kalman_smoother(y, u, v, theta, stdlik = TRUE)
y |
Observation matrix (may need to be normalized and centered before hand) (q rows, T columns) |
u |
Input matrix for the state equation (m_u rows, T columns) |
v |
Input matrix for the output equation (m_v rows, T columns) |
theta |
A list of system parameters (A, B, C, D, Q, R)' |
stdlik |
Boolean, whether the likelihood is divided by the number of observations. Standardizing the likelihood this way may speed up convergence in the case of long time series. |
A list of fitted elements (X, Y, V, J, and lik)
This code only works on one dimensional state and output at the moment. Therefore, transposing is skipped, and matrix inversion is treated as /, and log(det(Sigma)) is treated as log(Sigma).
Kling-Gupta Efficiency
KGE(yhat, y)
KGE(yhat, y)
yhat |
Model outputs |
y |
Observations |
KGE
KGE(rnorm(100), rnorm(100))
KGE(rnorm(100), rnorm(100))
Warning This is an experimental feature. Use with care.
LDS_BFGS(y, u, v, ub, lb, num.restarts = 100, parallel = TRUE)
LDS_BFGS(y, u, v, ub, lb, num.restarts = 100, parallel = TRUE)
y |
Transformed and standardized streamflow |
u |
Input matrix for a single-model reconstruction, or a list of input matrices for an ensemble reconstruction. |
v |
Same as u. |
ub |
Upper bounds, a vector whose length is the number of parameters |
lb |
Lower bounds |
num.restarts |
The number of initial conditions to start the EM search; ignored if |
parallel |
Logical, whether parallel computation is used |
A list of reconstruction results; see LDS_reconstruction
Warning This is an experimental feature. Use with care.
LDS_BFGS_with_update( y, u, v, lambda = 1, ub, lb, num.restarts = 100, parallel = TRUE )
LDS_BFGS_with_update( y, u, v, lambda = 1, ub, lb, num.restarts = 100, parallel = TRUE )
y |
Transformed and standardized streamflow |
u |
Input matrix for a single-model reconstruction, or a list of input matrices for an ensemble reconstruction. |
v |
Same as u. |
lambda |
weight for penalty |
ub |
Upper bounds, a vector whose length is the number of parameters |
lb |
Lower bounds |
num.restarts |
The number of initial conditions to start the EM search; ignored if |
parallel |
Logical, whether parallel computation is used |
A list of reconstruction results; see LDS_reconstruction
Estimate the hidden state and model parameters given observations and exogenous inputs using the EM algorithm. This is the key backend routine of this package.
LDS_EM(y, u, v, theta0, niter = 1000L, tol = 1e-05)
LDS_EM(y, u, v, theta0, niter = 1000L, tol = 1e-05)
y |
Observation matrix (may need to be normalized and centered before hand) (q rows, T columns) |
u |
Input matrix for the state equation (m_u rows, T columns) |
v |
Input matrix for the output equation (m_v rows, T columns) |
theta0 |
A vector of initial values for the parameters |
niter |
Maximum number of iterations, default 1000 |
tol |
Tolerance for likelihood convergence, default 1e-5. Note that the log-likelihood is normalized |
A list of model results
theta: model parameters (A, B, C, D, Q, R, mu1, V1) resulted from Mstep
fit: results of Estep
liks : vector of loglikelihood over the iteration steps
This code only works on one dimensional state and output at the moment. Therefore, transposing is skipped, and matrix inversion is treated as /, and log(det(Sigma)) is treated as log(Sigma).
This is the backend computation for LDS_reconstruction. You should not use this directly.
LDS_EM_restart(y, u, v, init, niter = 1000, tol = 1e-05, return.init = TRUE)
LDS_EM_restart(y, u, v, init, niter = 1000, tol = 1e-05, return.init = TRUE)
y |
Observation matrix (may need to be normalized and centered before hand) (q rows, T columns) |
u |
Input matrix for the state equation (m_u rows, T columns) |
v |
Input matrix for the output equation (m_v rows, T columns) |
init |
A list of initial parameters for the EM search. See make_init. |
niter |
Maximum number of iterations, default 1000 |
tol |
Tolerance for likelihood convergence, default 1e-5. Note that the log-likelihood is normalized by dividing by the number of observations. |
return.init |
Indicate whether the initial condition that results in the highest |
a list as produced by LDS_EM. If return.init is true, a vector of initial condition is included in the list as well.
Warning This is an experimental feature. Use with care.
LDS_GA( y, u, v, lambda = 1, ub, lb, num.islands = 4, pop.per.island = 100, niter = 1000, parallel = TRUE )
LDS_GA( y, u, v, lambda = 1, ub, lb, num.islands = 4, pop.per.island = 100, niter = 1000, parallel = TRUE )
y |
Transformed and standardized streamflow |
u |
Input matrix for a single-model reconstruction, or a list of input matrices for an ensemble reconstruction. |
v |
Same as u. |
lambda |
weight for penalty |
ub |
Upper bounds, a vector whose length is the number of parameters |
lb |
Lower bounds |
num.islands |
Number of islands (if method is GA; experimental) |
pop.per.island |
Initial population per island (if method is GA; experimental) |
niter |
Maximum number of iterations, default 1000 |
parallel |
Logical, whether parallel computation is used |
A list of reconstruction results; see LDS_reconstruction
The initial conditions can either be randomized (specifiled by num.restarts) or provided beforehand.
LDS_reconstruction( Qa, u, v, start.year, method = "EM", transform = "log", init = NULL, num.restarts = 50, return.init = FALSE, ub = NULL, lb = NULL, num.islands = 4, pop.per.island = 250, niter = 1000, tol = 1e-05, return.raw = FALSE )
LDS_reconstruction( Qa, u, v, start.year, method = "EM", transform = "log", init = NULL, num.restarts = 50, return.init = FALSE, ub = NULL, lb = NULL, num.islands = 4, pop.per.island = 250, niter = 1000, tol = 1e-05, return.raw = FALSE )
Qa |
Observations: a data.frame of annual streamflow with at least two columns: year and Qa. |
u |
Input matrix for a single-model reconstruction, or a list of input matrices for an ensemble reconstruction. |
v |
Same as u. |
start.year |
Starting year of the climate proxies, i.e, the first year of the paleo period. |
method |
By default this is "EM". There are experimental methods but you should not try. |
transform |
Flow transformation, either "log", "boxcox" or "none". Note that if the Box-Cox transform is used, the confidence interval after back-transformation is simply the back-transform of the trained onfidence interval; this is hackish and not entirely accurate. |
init |
A list, each element is a vector of initial values for the parameters. If |
num.restarts |
The number of initial conditions to start the EM search; ignored if |
return.init |
If |
ub |
Upper bounds, a vector whose length is the number of parameters |
lb |
Lower bounds |
num.islands |
Number of islands (if method is GA; experimental) |
pop.per.island |
Initial population per island (if method is GA; experimental) |
niter |
Maximum number of iterations, default 1000 |
tol |
Tolerance for likelihood convergence, default 1e-5. Note that the log-likelihood is normalized by dividing by the number of observations. |
return.raw |
If TRUE, state and streamflow estimates without measurement updates will be returned. |
A list of the following elements
rec: reconstruction results, a data.table with the following columns
year: calculated from Qa and the length of u
X: the estimated hidden state
Xl, Xu: lower and upper range for the 95\
Q: the reconstructed streamflow
Ql, Qu: lower and upper range for the 95\
theta: model parameters
lik: maximum likelihood
init: the initial condition that resulted in the maximum likelihood (if return.init = TRUE
).
# Make a shorter time horizon so that this example runs fast u <- v <- t(NPpc[601:813]) # We run this example without parallelism foreach::registerDoSEQ() LDS_reconstruction(NPannual, u, v, start.year = 1800, num.restarts = 1) # Please refer to the vignette for the full run with parallel options. It takes a second or two.
# Make a shorter time horizon so that this example runs fast u <- v <- t(NPpc[601:813]) # We run this example without parallelism foreach::registerDoSEQ() LDS_reconstruction(NPannual, u, v, start.year = 1800, num.restarts = 1) # Please refer to the vignette for the full run with parallel options. It takes a second or two.
Generate multiple stochastic time series from an LDS model. This is a convenient wrapper for one_LDS_rep.
LDS_rep( theta, u = NULL, v = NULL, years, num.reps = 100, mu = 0, exp.trans = TRUE )
LDS_rep( theta, u = NULL, v = NULL, years, num.reps = 100, mu = 0, exp.trans = TRUE )
theta |
A list of parameters: A, B, C, D, Q, R, x0, v0 |
u |
Input matrix for the state equation (m_u rows, T columns) |
v |
Input matrix for the output equation (m_v rows, T columns) |
years |
The years of the study horizon |
num.reps |
The number of stochastic replicates#' |
mu |
Mean of the log-transformed streamflow process |
exp.trans |
Whether exponential transformation back to the streamflow space is required. If TRUE, both Y and Q are returned, otherwise only Y. |
Same as one_LDS_rep, but the data.table consists of multiple replicates.
LDS_rep(theta, t(NPpc), t(NPpc), 1200:2012, num.reps = 10, mu = mean(log(NPannual$Qa)))
LDS_rep(theta, t(NPpc), t(NPpc), 1200:2012, num.reps = 10, mu = mean(log(NPannual$Qa)))
If init is a vector, make it a list of one element. If init is NULL, randomize it. In this case, the function will randomize the initial value by sampling uniformly within the range for each parameters (A in [0, 1], B in [-1, 1], C in [0, 1] and D in [-1, 1]).
make_init(p, q, num.restarts)
make_init(p, q, num.restarts)
p |
Dimension of u |
q |
Dimension of v |
num.restarts |
Number of randomized initial values |
A list of initial conditions, each element is an object of class theta
.
make_init(5, 5, 1) make_init(5, 5, 2)
make_init(5, 5, 1) make_init(5, 5, 2)
Make a list of cross-validation folds. Each element of the list is a vector of the cross-validation points for one cross-validation run.
make_Z(obs, nRuns = 30, frac = 0.1, contiguous = TRUE)
make_Z(obs, nRuns = 30, frac = 0.1, contiguous = TRUE)
obs |
Vector of observations. |
nRuns |
Number of repetitions. |
frac |
Fraction of left-out points. For leave-one-out, use |
contiguous |
Logical. If |
A list of cross-validation folds
Z <- make_Z(NPannual, nRuns = 30, frac = 0.25, contiguous = TRUE)
Z <- make_Z(NPannual, nRuns = 30, frac = 0.25, contiguous = TRUE)
If you already ran the cross-validation on transformed output and now wanted to calculate performance on the back-transformed one, or vice-versa, you don't have to rerun the whole cross-validation, but just need to transform or back-transform the cross-validation Ycv. This function helps you do that.
metrics_with_transform(cv, transform, lambda = NULL)
metrics_with_transform(cv, transform, lambda = NULL)
cv |
Cross-validation output as produced by cvLDS or cvPCR |
transform |
Either "log", "exp", "boxcox" or "inv_boxcox" |
lambda |
Lambda value used in Box-Cox or inverse Box-Cox |
A new cv object wit hthe new metrics
# Cross-validate with log-transform cv <- cvPCR(NPannual, NPpc, start.year = 1200, transform = 'log', metric.space = 'transformed') # Calculate metrics based on back-transformed values m <- metrics_with_transform(cv, 'exp')
# Cross-validate with log-transform cv <- cvPCR(NPannual, NPpc, start.year = 1200, transform = 'log', metric.space = 'transformed') # Calculate metrics based on back-transformed values m <- metrics_with_transform(cv, 'exp')
Maximizing expected likelihood using analytical solution
Mstep(y, u, v, fit)
Mstep(y, u, v, fit)
y |
Observation matrix (may need to be normalized and centered before hand) (q rows, T columns) |
u |
Input matrix for the state equation (m_u rows, T columns) |
v |
Input matrix for the output equation (m_v rows, T columns) |
fit |
result of Kalman_smoother |
An object of class theta
: a list of
Calculates the negative log-likelihood
negLogLik(theta, u, v, y)
negLogLik(theta, u, v, y)
theta |
A vector of model parameters, to be converted to a list |
u |
Input matrix |
v |
Input matrix |
y |
Observations |
The negative log-likelihood
A dataset contaning annual streamflow of the Mekong River measured at station Nakhon Phanom.
NPannual
NPannual
A data.table with two columns
from 1960 to 2005
mean annual streamflow, cubic meter per second
Mekong River Commission
A dataset contaning the principal components of MADA surrounding NP
NPpc
NPpc
A data.table
RMSE is normalized by the normalization constant
nRMSE(yhat, y, normConst)
nRMSE(yhat, y, normConst)
yhat |
Model outputs |
y |
Observations |
normConst |
The normalization constant |
normalized RMSE
x <- rnorm(100) y <- rnorm(100) nRMSE(x, y, sd(y))
x <- rnorm(100) y <- rnorm(100) nRMSE(x, y, sd(y))
Nash-Sutcliffe Efficiency
NSE(yhat, y)
NSE(yhat, y)
yhat |
Model outputs |
y |
Observations |
NSE
NSE(rnorm(100), rnorm(100))
NSE(rnorm(100), rnorm(100))
Make one prediction for one cross-validation run. This is a subroutine that is called by cvLDS, without any checks. You should not need to use this directly.
one_lds_cv( z, instPeriod, mu, y, u, v, method = "EM", num.restarts = 20, ub = NULL, lb = NULL, num.islands = 4, pop.per.island = 100, niter = 1000, tol = 1e-06, use.raw = FALSE )
one_lds_cv( z, instPeriod, mu, y, u, v, method = "EM", num.restarts = 20, ub = NULL, lb = NULL, num.islands = 4, pop.per.island = 100, niter = 1000, tol = 1e-06, use.raw = FALSE )
z |
A vector of left-out points, indexed according to the intrumental period |
instPeriod |
indices of the instrumental period in the whole record |
mu |
Mean of the observations |
y |
Catchment output, preprocessed from data |
u |
Input matrix for a single-model reconstruction, or a list of input matrices for an ensemble reconstruction. |
v |
Same as u. |
method |
By default this is "EM". There are experimental methods but you should not try. |
num.restarts |
The number of initial conditions to start the EM search; ignored if |
ub |
Upper bounds, a vector whose length is the number of parameters |
lb |
Lower bounds |
num.islands |
Number of islands (if method is GA; experimental) |
pop.per.island |
Initial population per island (if method is GA; experimental) |
niter |
Maximum number of iterations, default 1000 |
tol |
Tolerance for likelihood convergence, default 1e-5. Note that the log-likelihood is normalized by dividing by the number of observations. |
use.raw |
Whether performance metrics are calculated on the raw time series. Experimental; don't use. |
A vector of prediction.
Generate a single stochastic time series from an LDS model
one_LDS_rep( rep.num, theta, u = NULL, v = NULL, years, mu = 0, exp.trans = TRUE )
one_LDS_rep( rep.num, theta, u = NULL, v = NULL, years, mu = 0, exp.trans = TRUE )
rep.num |
The ID number of the replicate |
theta |
A list of parameters: A, B, C, D, Q, R, x0, v0 |
u |
Input matrix for the state equation (m_u rows, T columns) |
v |
Input matrix for the output equation (m_v rows, T columns) |
years |
The years of the study horizon |
mu |
Mean of the log-transformed streamflow process |
exp.trans |
Whether exponential transformation back to the streamflow space is required. If TRUE, both Y and Q are returned, otherwise only Y. |
A data.table. The first column is the years of the study horizon, as supplied by year
.
Subsequent columns are simX
, simY
, and simQ
which are the simulated catchment state (X),
log-transformed and centralized flow (Y) and flow (Q). The last column is the replicate ID number.
# Learn theta one_LDS_rep(1, theta, t(NPpc), t(NPpc), 1200:2012, mu = mean(log(NPannual$Qa)))
# Learn theta one_LDS_rep(1, theta, t(NPpc), t(NPpc), 1200:2012, mu = mean(log(NPannual$Qa)))
Make one prediction for one cross-validation run. This is a subroutine to be called by other cross-validation functions.
one_pcr_cv(df, z)
one_pcr_cv(df, z)
df |
The training data, with one column named y, the (transformed) observations. and other columns the predictors. |
z |
A vector of left-out points |
A vector of prediction.
Select the best reconstruction from an ensemble. Experimental, API may change.
PCR_ensemble_selection( Qa, pc, start.year, transform = "log", Z = NULL, agg.type = c("best member", "best overall"), criterion = c("RE", "CE", "nRMSE", "KGE"), return.all.metrics = FALSE )
PCR_ensemble_selection( Qa, pc, start.year, transform = "log", Z = NULL, agg.type = c("best member", "best overall"), criterion = c("RE", "CE", "nRMSE", "KGE"), return.all.metrics = FALSE )
Qa |
Observations: a data.frame of annual streamflow with at least two columns: year and Qa. |
pc |
For a single model: a data.frame, one column for each principal component. For an ensemble reconstruction: a list, each element is a data.frame of principal components. |
start.year |
Starting year of the climate proxies, i.e, the first year of the paleo period. |
transform |
Flow transformation, either "log", "boxcox" or "none". Note that if the Box-Cox transform is used, the confidence interval after back-transformation is simply the back-transform of the trained onfidence interval; this is hackish and not entirely accurate. |
Z |
A list of cross-validation folds. If |
agg.type |
Type of ensemble aggregate. There are 2 options: 'best member': the member with the best performance score is used; 'best overall': if the ensemble average is better than the best member, it will be used, otherwise the best member will be used. |
criterion |
The performance criterion to be used. |
return.all.metrics |
Logical, if TRUE, all members' performance scores (and the ensemble average's score, if |
A list of two elements:
choice: The index of the selection. If the ensemble is selected, returns 0.
cv: the cross-validation results of the choice, see cvPCR for details.
all.metrics: all members' scores, and if agg.type == 'best overall'
, the ensemble average's scores as well, in the last column.
PCR_ensemble_selection(NPannual, list(NPpc, NPpc[, 1:2]), start.year = 1200, agg.type = 'best overall', criterion = 'KGE') PCR_ensemble_selection(NPannual, list(NPpc, NPpc[, 1:2]), start.year = 1200, agg.type = 'best overall', criterion = 'KGE')
PCR_ensemble_selection(NPannual, list(NPpc, NPpc[, 1:2]), start.year = 1200, agg.type = 'best overall', criterion = 'KGE') PCR_ensemble_selection(NPannual, list(NPpc, NPpc[, 1:2]), start.year = 1200, agg.type = 'best overall', criterion = 'KGE')
Reconstruction with principal component linear regression.
PCR_reconstruction(Qa, pc, start.year, transform = "log")
PCR_reconstruction(Qa, pc, start.year, transform = "log")
Qa |
Observations: a data.frame of annual streamflow with at least two columns: year and Qa. |
pc |
For a single model: a data.frame, one column for each principal component. For an ensemble reconstruction: a list, each element is a data.frame of principal components. |
start.year |
Starting year of the climate proxies, i.e, the first year of the paleo period. |
transform |
Flow transformation, either "log", "boxcox" or "none". Note that if the Box-Cox transform is used, the confidence interval after back-transformation is simply the back-transform of the trained onfidence interval; this is hackish and not entirely accurate. |
A list of reconstruction results, with the following elements:
rec: reconstructed streamflow with 95% prediction interval; a data.table with four columns: year, Q, Ql (lower bound), and Qu (upper bound).
coeffs: the regression coefficients.
sigma: the residual standard deviation.
rec: the ensemble average reconstruction; a data.table with two columns: year and Q.
ensemble: a list of ensemble members, each element is reconstructed from one element of pc
and is itself a list of three elements: Q (a vector of reconstructed flow), coeffs and sigma.
Note that for ensemble reconstruction, ldsr
does not provide uncertainty estimates. It is up to the user to do so, for example, using ensemble spread.
PCR_reconstruction(NPannual, NPpc, start.year = 1200)
PCR_reconstruction(NPannual, NPpc, start.year = 1200)
Kalman_smoother returns X and likelihood. The penalized likelihood is the likelihood minus the sum-of-squares of the measurement update. This is used as the fitness function in genetic algorihm.
penalized_likelihood(y, u, v, theta.vec, lambda)
penalized_likelihood(y, u, v, theta.vec, lambda)
y |
Observation matrix (may need to be normalized and centered before hand) (q rows, T columns) |
u |
Input matrix for the state equation (m_u rows, T columns) |
v |
Input matrix for the output equation (m_v rows, T columns) |
theta.vec |
a vector of parameter elements (i.e, the vectorized version of theta
in |
lambda |
weight of the penalty |
The penalized likelihood (a real number)
This function propagates the state trajectory based on the exogenous inputs only (without measurement update), and calculates the corresponding log-likelihood
propagate(theta, u, v, y, stdlik = TRUE)
propagate(theta, u, v, y, stdlik = TRUE)
theta |
A list of system parameters (A, B, C, D, Q, R)' |
u |
Input matrix for the state equation (m_u rows, T columns) |
v |
Input matrix for the output equation (m_v rows, T columns) |
y |
Observations |
stdlik |
Boolean, whether the likelihood is divided by the number of observations. Standardizing the likelihood this way may speed up convergence in the case of long time series. |
A list of predictions and log-likelihood (X, Y, V, lik)
This code only works on one dimensional state and output at the moment. Therefore, transposing is skipped, and matrix inversion is treated as /, and log(det(Sigma)) is treated as log(Sigma).
Reduction of Error
RE(yhat, y, yc_bar)
RE(yhat, y, yc_bar)
yhat |
Model outputs in the validation set |
y |
Observations in the validation set |
yc_bar |
Mean observations in the calibration set |
RE
x <- rnorm(100) y <- rnorm(100) yc_bar <- mean(x[1:50]) RE(x[51:100], y[51:100], yc_bar)
x <- rnorm(100) y <- rnorm(100) yc_bar <- mean(x[1:50]) RE(x[51:100], y[51:100], yc_bar)
Theta values for Nakhon Phanom, precomputed to use in examples.
theta
theta
A list with elements A, B, C, D, Q, R, mu1, V1
Converts theta from a vector (as used in GA) to list (as used in Kalman smoothing)
vec_to_list(theta.vec, d)
vec_to_list(theta.vec, d)
theta.vec |
a vector of parameter elements |
d |
dimention of inputs |
A theta object, see make_init