Package 'sensitivity' reference manual

Title:	Global Sensitivity Analysis of Model Outputs and Importance Measures
Description:	A collection of functions for sensitivity analysis of model outputs (factor screening, global sensitivity analysis and robustness analysis), for variable importance measures of data, as well as for interpretability of machine learning models. Most of the functions have to be applied on scalar output, but several functions support multi-dimensional outputs.
Authors:	Bertrand Iooss [aut, cre], Sebastien Da Veiga [aut], Alexandre Janon [aut], Gilles Pujol [aut]
Maintainer:	Bertrand Iooss <[email protected]>
License:	GPL-2
Version:	1.30.1
Built:	2025-01-26 06:53:42 UTC
Source:	CRAN

Sensitivity Analysis

Description

Methods and functions for global sensitivity analysis of model outputs, importance measures and machine learning model interpretability

Details

The sensitivity package implements some global sensitivity analysis methods and importance measures:

Linear regression importance measures in regression or classification (logistic regression) contexts (Iooss et al., 2022; Clouvel et al., 2024):
- SRC and SRRC (src), and correlation ratio (correlRatio)
- PCC, SPCC, PRCC and SPRCC (pcc),
- LMG and LMG on ranks (lmg),
- PMVD and PMVD on ranks (pmvd),
- Johnson indices (johnson);
Bettonvil's sequential bifurcations (Bettonvil and Kleijnen, 1996) (sb);
Morris's "OAT" elementary effects screening method (morris);
Derivative-based Global Sensitivity Measures:
- Poincare constants for Derivative-based Global Sensitivity Measures (DGSM) (Lamboni et al., 2013; Roustant et al., 2017) (PoincareConstant) and (PoincareOptimal),
- Squared coefficients computation in generalized chaos via Poincare differential operators (Roustant et al., 2019) (PoincareChaosSqCoef),
- Distributed Evaluation of Local Sensitivity Analysis (DELSA) (Rakovec et al., 2014) (delsa);
Variance-based sensitivity indices (Sobol' indices) for independent inputs:
- Estimation of the Sobol' first order indices with with B-spline Smoothing (Ratto and Pagano, 2010) (sobolSmthSpl),
- Monte Carlo estimation of Sobol' indices with independent inputs (also called pick-freeze method):
  - Sobol' scheme (Sobol, 1993) to compute the indices given by the variance decomposition up to a specified order (sobol),
  - Saltelli's scheme (Saltelli, 2002) to compute first order, second order and total indices (sobolSalt),
  - Saltelli's scheme (Saltelli, 2002) to compute first order and total indices (sobol2002),
  - Mauntz-Kucherenko's scheme (Sobol et al., 2007) to compute first order and total indices using improved formulas for small indices (sobol2007),
  - Jansen-Sobol's scheme (Jansen, 1999) to compute first order and total indices using improved formulas (soboljansen),
  - Martinez's scheme using correlation coefficient-based formulas (Martinez, 2011; Touati, 2016) to compute first order and total indices, associated with theoretical confidence intervals (sobolmartinez and soboltouati),
  - Janon-Monod's scheme (Monod et al., 2006; Janon et al., 2013) to compute first order indices with optimal asymptotic variance (sobolEff),
  - Mara's scheme (Mara and Joseph, 2008) to compute first order indices with a cost independent of the dimension, via permutations on a single matrix (sobolmara),
  - Mighty estimator of first-order sensitivity indices based on rank statistics (correlation coefficient of Chatterjee, 2019; Gamboa et al., 2020) (sobolrank),
  - Owen's scheme (Owen, 2013) to compute first order and total indices using improved formulas (via 3 input independent matrices) for small indices (sobolowen),
  - Total Interaction Indices using Liu-Owen's scheme (Liu and Owen, 2006) (sobolTIIlo) and pick-freeze scheme (Fruth et al., 2014) (sobolTIIpf),
- Replication-based procedures:
  - Estimation of the Sobol' first order and closed second order indices using replicated orthogonal array-based Latin hypecube sample (Tissot and Prieur, 2015) (sobolroalhs),
  - Recursive estimation of the Sobol' first order and closed second order indices using replicated orthogonal array-based Latin hypecube sample (Gilquin et al., 2016) (sobolrec),
  - Estimation of the Sobol' first order, second order and total indices using the generalized method with replicated orthogonal array-based Latin hypecube sample (Tissot and Prieur, 2015) (sobolrep),
  - Sobol' indices estimation under inequality constraints (Gilquin et al., 2015) by extension of the replication procedure (Tissot and Prieur, 2015) (sobolroauc),
- Estimation of the Sobol' first order and total indices with Saltelli's so-called "extended-FAST" method (Saltelli et al., 1999) (fast99),
- Estimation of the Sobol' first order and total indices with kriging-based global sensitivity analysis (Le Gratiet et al., 2014) (sobolGP);
Variance-based sensitivity indices valid for dependent inputs:
- Exact computation of Shapley effects in the linear Gaussian framework (Broto et al., 2019) (shapleyLinearGaussian),
- Computation of Shapley effects in the Gaussian linear framework with an unknown block-diagonal covariance matrix (Broto et al., 2020) (shapleyBlockEstimation),
- Johnson-Shapley indices (Iooss and Clouvel, 2024) (johnsonshap),
- Estimation of Shapley effects by examining all permutations of inputs (Song et al., 2016) (shapleyPermEx),
- Estimation of Shapley effects by randomly sampling permutations of inputs (Song et al., 2016) (shapleyPermRand),
- Estimation of Shapley effects from data using nearest neighbors method (Broto et al., 2018) (shapleySubsetMc),
- Estimation of Shapley effects and all Sobol indices from data using nearest neighbors (Broto et al., 2018) (using a fast approximate algorithm) or ranking (Gamboa et al., 2020) (shapleysobol_knn) and (sobolshap_knn),
- Estimation of Shapley effects from data using nearest neighbors method (Broto et al., 2018) with an optimized/parallelized computations and bootstrap confidence intervals estimations (shapleysobol_knn),
- Estimation of Proportional Marginal Effects (PME) (Herin et al., 2024) (pme_knn);
Support index functions (support) of Fruth et al. (2016);
Sensitivity Indices based on Csiszar f-divergence (sensiFdiv) (particular cases: Borgonovo's indices and mutual-information based indices) and Hilbert-Schmidt Independence Criterion (sensiHSIC and testHSIC) (Da Veiga, 2015; De Lozzo and Marrel, 2016; Meynaoui et al., 2019);
Non-parametric variable significance test based on the empirical process (EPtest) of Klein and Rochet (2022);
First-order quantile-oriented sensitivity indices as defined in Fort et al. (2016) via a kernel-based estimator related (Maume-Deschamps and Niang, 2018) (qosa);
Target Sensitivity Analysis via Hilbert-Schmidt Independence Criterion (sensiHSIC) (Spagnol et al., 2019);
Robustness analysis by the Perturbed-Law based Indices (PLI) of Lemaitre et al. (2015), (PLIquantile) of Sueur et al. (2017), (PLIsuperquantile) of Iooss et al. (2021), and extension as (PLIquantile_multivar) and (PLIsuperquantile_multivar) ;
Extensions to multidimensional outputs for:
- Sobol' indices (sobolMultOut): Aggregated Sobol' indices (Lamboni et al., 2011; Gamboa et al., 2014) and functional (1D) Sobol' indices,
- Shapley effects and Sobol' indices (shapleysobol_knn) and (sobolshap_knn): Functional (1D) indices,
- HSIC indices (sensiHSIC) (Da Veiga, 2015): Aggregated HSIC, potentially via a PCA step (Da Veiga, 2015),
- Morris method (morrisMultOut).

Moreover, some utilities are provided: standard test-cases (testmodels), weight transformation function of the output sample (weightTSA) to perform Target Sensitivity Analysis, normal and Gumbel truncated distributions (truncateddistrib), squared integral estimate (squaredIntEstim), Addelman and Kempthorne construction of orthogonal arrays of strength two (addelman_const), discrepancy criteria (discrepancyCriteria_cplus), maximin criteria (maximin_cplus) and template file generation (template.replace).

Model managing

The sensitivity package has been designed to work either models written in R than external models such as heavy computational codes. This is achieved with the input argument model present in all functions of this package.

The argument model is expected to be either a funtion or a predictor (i.e. an object with a predict function such as lm).

If model = m where m is a function, it will be invoked once by y <- m(X).
If model = m where m is a predictor, it will be invoked once by y <- predict(m, X).

X is the design of experiments, i.e. a data.frame with p columns (the input factors) and n lines (each, an experiment), and y is the vector of length n of the model responses.

The model in invoked once for the whole design of experiment.

The argument model can be left to NULL. This is refered to as the decoupled approach and used with external computational codes that rarely run on the statistician's computer. See decoupling.

Author(s)

Bertrand Iooss, Sebastien Da Veiga, Alexandre Janon and Gilles Pujol with contributions from Paul Lemaitre for PLI, Thibault Delage and Roman Sueur for PLIquantile, Vanessa Verges for PLIquantile, PLIsuperquantile, PLIquantile_multivar and PLIsuperquantile_multivar, Laurent Gilquin for sobolroalhs, sobolroauc, sobolSalt, sobolrep, sobolrec, as well as addelman_const, discrepancyCriteria_cplus and maximin_cplus, Loic le Gratiet for sobolGP, Khalid Boumhaout, Taieb Touati and Bernardo Ramos for sobolowen and soboltouati, Jana Fruth for PoincareConstant, sobolTIIlo and sobolTIIpf, Gabriel Sarazin, Amandine Marrel, Anouar Meynaoui and Reda El Amri for their contributions to sensiHSIC and testHSIC, Joseph Guillaume and Oldrich Rakovec for delsa and parameterSets, Olivier Roustant for PoincareOptimal, PoincareChaosSqCoef, squaredIntEstim and support, Eunhye Song, Barry L. Nelson and Jeremy Staum for shapleyPermEx and shapleyPermRand, Baptiste Broto for shapleySubsetMc, shapleyLinearGaussian and shapleyBlockEstimation, Filippo Monari for (sobolSmthSpl) and (morrisMultOut), Marouane Il Idrissi for lmg, pmvd and shapleysobol_knn, associated to Margot Herin for pme_knn, Laura Clouvel for johnson, Paul Rochet for EPtest, Frank Weber and Roelof Oomen for other contributions.

(maintainer: Bertrand Iooss [email protected])

References

S. Da Veiga, F. Gamboa, B. Iooss and C. Prieur, Basics and trends in sensitivity analysis, Theory and practice in R, SIAM, 2021.

R. Faivre, B. Iooss, S. Mahevas, D. Makowski, H. Monod, editors, 2013, Analyse de sensibilite et exploration de modeles. Applications aux modeles environnementaux, Editions Quae.

L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2023, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053

B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022. https://hal.science/hal-03741384

B. Iooss, R. Kennet and P. Secchi, 2022, Different views of interpretability, In: Interpretability for Industry 4.0: Statistical and Machine Learning Approaches, A. Lepore, B. Palumbo and J-M. Poggi (Eds), Springer.

B. Iooss and A. Saltelli, 2017, Introduction: Sensitivity analysis. In: Springer Handbook on Uncertainty Quantification, R. Ghanem, D. Higdon and H. Owhadi (Eds), Springer.

A. Saltelli, K. Chan and E. M. Scott eds, 2000, Sensitivity Analysis, Wiley.

Addelman and Kempthorne construction

Description

addelman_const implements the Addelman and Kempthorne construction of orthogonal arrays of strength two.

Usage

addelman_const(dimension, levels, choice="U")
addelman_const(dimension, levels, choice="U")

Arguments

`dimension`	The number of columns of the orthogonal array.
`levels`	The number of levels of the orthogonal array. Either a prime number or a prime power number.
`choice`	A character from the list ("U","V","W","X") specifying which orthogonal array to construct (see "Details").

Details

The method of Addelman and Kempthorne allows to construct up to four orthogonal arrays. choice specify which orthogonal array is to be constructed. Note that the four orthognal arrays depends on each others through linear equations.

Value

A matrix corresponding to the orthogonal array constructed.

Author(s)

Laurent Gilquin

References

A.S. Hedayat, N.J.A. Sloane and J. Stufken, 1999, Orthogonal Arrays: Theory and Applications, Springer Series in Statistics.

Examples

dimension <- 6
levels <- 7
OA <- addelman_const(dimension,levels,choice="U")
dimension <- 6
levels <- 7
OA <- addelman_const(dimension,levels,choice="U")

Correlation Ratio

Description

correlRatio computes the correlation ratio between a quantitative variable and a qualitative variable

Usage

correlRatio(X, y)
correlRatio(X, y)

Arguments

`X`	a vector containing the quantitative variable.
`y`	a vector containing the qualitative variable (e.g. a factor).

Value

The value of the correlation ratio

Author(s)

Bertrand Iooss

References

L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053

Examples

x <- runif(100)
y <- round(x)
correlRatio(x,y)
x <- runif(100)
y <- round(x)
correlRatio(x,y)

Decoupling Simulations and Estimations

Description

tell and ask are S3 generic methods for decoupling simulations and sensitivity measures estimations. In general, they are not used by the end-user for a simple R model, but rather for an external computational code. Most of the sensitivity analyses objects of this package overload tell, whereas ask is overloaded for iterative methods only. extract is used as a post-treatment of a sobolshap_knn object

Usage

tell(x, y = NULL, ...)
ask(x, ...)
extract(x, ...)
tell(x, y = NULL, ...)
ask(x, ...)
extract(x, ...)

Arguments

`x`	a typed list storing the state of the sensitivity study (parameters, data, estimates), as returned by sensitivity analyses objects constructors, such as `src`, `morris`, etc.
`y`	a vector of model responses.
`...`	additional arguments, depending on the method used.

Details

When a sensitivity analysis method is called with no model (i.e. argument model = NULL), it generates an incomplete object x that stores the design of experiments (field X), allowing the user to launch "by hand" the corresponding simulations. The method tell allows to pass these simulation results to the incomplete object x, thereafter estimating the sensitivity measures.

The extract method is useful if in a first step the Shapley effects have been computed and thus sensitivity indices for all possible subsets are available. The resulting sobolshap_knn object can be post-treated by extract to get first-order and total Sobol indices very easily.

When the method is iterative, the data to simulate are not stored in the sensitivity analysis object x, but generated at each iteration with the ask method; see for example sb.

Value

tell doesn't return anything. It computes the sensitivity measures, and stores them in the list x. Side effect: tell modifies its argument x.

ask returns the set of data to simulate.

extract returns an object, from a sobolshap_knn object, containing first-order and total Sobol indices.

Author(s)

Gilles Pujol and Bertrand Iooss

Examples

# Example of use of fast99 with "model = NULL"
x <- fast99(model = NULL, factors = 3, n = 1000,
            q = "qunif", q.arg = list(min = -pi, max = pi))
y <- ishigami.fun(x$X)
tell(x, y)
print(x)
plot(x)
# Example of use of fast99 with "model = NULL"
x <- fast99(model = NULL, factors = 3, n = 1000,
            q = "qunif", q.arg = list(min = -pi, max = pi))
y <- ishigami.fun(x$X)
tell(x, y)
print(x)
plot(x)

Distributed Evaluation of Local Sensitivity Analysis

Description

delsa implements Distributed Evaluation of Local Sensitivity Analysis to calculate first order parameter sensitivity at multiple locations in parameter space. The locations in parameter space can either be obtained by a call to parameterSets or by specifying X0 directly, in which case the prior variance of each parameter varprior also needs to be specified. Via plot (which uses functions of the package ggplot2 and reshape2), the indices can be visualized.

Usage

delsa(model = NULL, perturb=1.01,
  par.ranges, samples, method,
  X0, varprior, varoutput,
  ...)
  
## S3 method for class 'delsa'
tell(x, y = NULL,...)

## S3 method for class 'delsa'
print(x, ...)

## S3 method for class 'delsa'
plot(x, which=1:3, ask = dev.interactive(), ...)
delsa(model = NULL, perturb=1.01,
  par.ranges, samples, method,
  X0, varprior, varoutput,
  ...)
  
## S3 method for class 'delsa'
tell(x, y = NULL,...)

## S3 method for class 'delsa'
print(x, ...)

## S3 method for class 'delsa'
plot(x, which=1:3, ask = dev.interactive(), ...)

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`perturb`	Perturbation used to calculate sensitivity at each evaluation location
`par.ranges`	A named list of minimum and maximum parameter values
`samples`	Number of samples to generate. For the `"grid"` and `"innergrid"` method, corresponds to the number of samples for each parameter, and may be a vector.
`method`	Sampling scheme. See `parameterSets`
`X0`	Parameter values at which to evaluate sensitivity indices. Can be used instead of specifying sampling `method`
`varprior`	Prior variance. If `X0` is specified, `varprior` must also be specified.
`varoutput`	Output variance. If `"summation"` is specified (default value), the ouput variance is computed by summing the first order effects. If `"empirical"` is specified, the ouput variance is estimated frome the output sample.
`...`	any other arguments for `model` which are passed unchanged each time it is called.
`x`	a list of class `"delsa"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`which`	if a subset of the plots is required, specify a subset of the numbers 1:3
`ask`	logical; if TRUE, the user is asked before each plot, see `par(ask=.)`

Details

print shows summary of the first order indices across parameter space.

plot shows: (1) the cumulative distribution function of first order sensitivity across parameter space, (2) variation of first order sensitivity in relation to model response, and (3) sensitivity in relation to parameter value.

Value

delsa returns a list of class "delsa", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a vector of model responses.
`delsafirst`	the first order indices for each location in `X0`.
`deriv`	the values of derivatives for each location in `X0`

Author(s)

Conversion for sensitivity package by Joseph Guillaume, based on original R code by Oldrich Rakovec. Addition of the varoutput argument by Bertrand Iooss (2020).

References

Rakovec, O., M. C. Hill, M. P. Clark, A. H. Weerts, A. J. Teuling, R. Uijlenhoet (2014), Distributed Evaluation of Local Sensitivity Analysis (DELSA), with application to hydrologic models, Water Resour. Res., 50, 1-18

Examples

# Test case : the non-monotonic Sobol g-function
# (there are 8 factors, all following the uniform distribution on [0,1])


library(randtoolbox)
x <- delsa(model=sobol.fun,
           par.ranges=replicate(8,c(0,1),simplify=FALSE),
           samples=100,method="sobol")

# Summary of sensitivity indices of each parameter across parameter space
print(x)

library(ggplot2)
library(reshape2)
plot(x)

# Test case : the non-monotonic Sobol g-function
# (there are 8 factors, all following the uniform distribution on [0,1])


library(randtoolbox)
x <- delsa(model=sobol.fun,
           par.ranges=replicate(8,c(0,1),simplify=FALSE),
           samples=100,method="sobol")

# Summary of sensitivity indices of each parameter across parameter space
print(x)

library(ggplot2)
library(reshape2)
plot(x)

Discrepancy measure

Description

Compute discrepancy criteria. This function uses a C++ implementation of the function discrepancyCriteria from package DiceDesign.

Usage

discrepancyCriteria_cplus(design,type='all')discrepancyCriteria_cplus(design,type='all')

Arguments

design

a matrix corresponding to the design of experiments. The discrepancy criteria are computed for a design in the unit cube [0,1] $^d$ . If this condition is not satisfied the design is automatically rescaled.

type

type of discrepancies (single value or vector) to be computed:

`'all'`	all type of discrepancies (default)
`'C2'`	centered L2-discrepancy
`'L2'`	L2-discrepancy
`'L2star'`	L2star-discrepancy
`'M2'`	modified L2-discrepancy
`'S2'`	symmetric L2-discrepancy
`'W2'`	wrap-around L2-discrepancy

Details

The discrepancy measures how far a given distribution of points deviates from a perfectly uniform one. Different discrepancies are available. For example, if we denote by $Vol(J)$ the volume of a subset $J$ of $[0; 1]^d$ and $A(X; J)$ the number of points of $X$ falling in $J$ , the $L2$ discrepancy is:

$D_{L2} (X) = \left[ \int_{[0,1]^{2d}}{} \left( \frac{A(X,J_{a,b})}{n} - Vol (J_{a,b}) \right)^{2} da db \right]^{1/2}$

where $a = (a_{1}; ... ; a_{d})'$ , $b = (b_{1};...; b_{d})'$ and $J_{a,b} = [a_{1}; b_{1}) \times ... \times [a_{d};b_{d})$ . The other L2-discrepancies are defined according to the same principle with different form from the subset $J$ . Among all the possibilities, discrepancyCriteria_cplus implements only the L2 discrepancies because it can be expressed analytically even for high dimension.

Centered L2-discrepancy is computed using the analytical expression done by Hickernell (1998). The user will refer to Pleming and Manteufel (2005) to have more details about the wrap around discrepancy.

Value

A list containing the L2-discrepancies of the design.

Author(s)

Laurent Gilquin

References

Fang K.T, Li R. and Sudjianto A. (2006) Design and Modeling for Computer Experiments, Chapman & Hall.

Hickernell F.J. (1998) A generalized discrepancy and quadrature error bound. Mathematics of Computation, 67, 299-322.

Pleming J.B. and Manteufel R.D. (2005) Replicated Latin Hypercube Sampling, 46th Structures, Structural Dynamics & Materials Conference, 16-21 April 2005, Austin (Texas) – AIAA 2005-1819.

Examples

dimension <- 2
n <- 40
X <- matrix(runif(n*dimension),n,dimension)
discrepancyCriteria_cplus(X)
dimension <- 2
n <- 40
X <- matrix(runif(n*dimension),n,dimension)
discrepancyCriteria_cplus(X)

Non-parametric variable significance test based on the empirical process

Description

EPtest builds the non-parametric variable significance test from Klein and Rochet (2022) for the null hypothesis $H_0: S^u = S$ where $S^u$ is the Sobol index for the inputs $X_i, i \in u$ ans $S$ is the Sobol index for all the inputs in $X$ .

Usage

EPtest(X, y, u = NULL, doe = NULL, Kdoe = 10, tau = 0.1)
EPtest(X, y, u = NULL, doe = NULL, Kdoe = 10, tau = 0.1)

Arguments

`X`	a matrix or data.frame that contains the numerical inputs as columns.
`y`	a vector of output.
`u`	the vector of indices of the columns of X for which we want to test the significance.
`doe`	the design of experiment on which the empirical process is to be evaluated. It should be independent from X.
`Kdoe`	if doe is null and Kdoe is specified, the design of experiment is taken as Kdoe points drawn uniformly independently on intervals delimited by the range of each input.
`tau`	a regularization parameter to approximate the limit chi2 distribution of the test statistics under H0.

Value

EPtest returns a list containing:

`statistics`	The test statistics that follows a chi-squared distribution under the null hypothesis.
`ddl`	The number of degrees of freedom used in the limit chi-square distribution for the test.
`p-value`	The test p-value.

Author(s)

Paul Rochet

References

T. Klein and P. Rochet, Test comparison for Sobol Indices over nested sets of variables, SIAM/ASA Journal on Uncertainty Quantification 10.4 (2022): 1586-1600.

Examples


# Model: Ishigami
  
n = 100
X = matrix(runif(3*n, -pi, pi), ncol = 3)
  
y = ishigami.fun(X)
	
# Test the significance of X1, H0: S1 = 0
EPtest(X[, 1], y, u = NULL)

# Test if X1 is sufficient to explain Y, H0: S1 = S123
EPtest(X, y, u = 1)
  
# Test if X3 is significant in presence of X2, H0: S2 = S23
EPtest(X[, 2:3], y, u = 1)
  
# Model: Ishigami
  
n = 100
X = matrix(runif(3*n, -pi, pi), ncol = 3)
  
y = ishigami.fun(X)
	
# Test the significance of X1, H0: S1 = 0
EPtest(X[, 1], y, u = NULL)

# Test if X1 is sufficient to explain Y, H0: S1 = S123
EPtest(X, y, u = 1)
  
# Test if X3 is significant in presence of X2, H0: S2 = S23
EPtest(X[, 2:3], y, u = 1)

Extended Fourier Amplitude Sensitivity Test

Description

fast99 implements the so-called "extended-FAST" method (Saltelli et al. 1999). This method allows the estimation of first order and total Sobol' indices for all the factors (alltogether $2p$ indices, where $p$ is the number of factors) at a total cost of $n \times p$ simulations.

Usage

fast99(model = NULL, factors, n, M = 4, omega = NULL,
       q = NULL, q.arg = NULL, ...)
## S3 method for class 'fast99'
tell(x, y = NULL, ...)
## S3 method for class 'fast99'
print(x, ...)
## S3 method for class 'fast99'
plot(x, ylim = c(0, 1), ...)
fast99(model = NULL, factors, n, M = 4, omega = NULL,
       q = NULL, q.arg = NULL, ...)
## S3 method for class 'fast99'
tell(x, y = NULL, ...)
## S3 method for class 'fast99'
print(x, ...)
## S3 method for class 'fast99'
plot(x, ylim = c(0, 1), ...)

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`factors`	an integer giving the number of factors, or a vector of character strings giving their names.
`n`	an integer giving the sample size, i.e. the length of the discretization of the s-space (see Cukier et al.).
`M`	an integer specifying the interference parameter, i.e. the number of harmonics to sum in the Fourier series decomposition (see Cukier et al.).
`omega`	a vector giving the set of frequencies, one frequency for each factor (see details below).
`q`	a vector of quantile functions names corresponding to wanted factors distributions (see details below).
`q.arg`	a list of quantile functions parameters (see details below).
`x`	a list of class `"fast99"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`ylim`	y-coordinate plotting limits.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

If not given, the set of frequencies omega is taken from Saltelli et al. The first frequency of the vector omega is assigned to each factor $X_i$ in turn (corresponding to the estimation of Sobol' indices $S_i$ and $S_{T_i}$ ), other frequencies being assigned to the remaining factors.

If the arguments q and q.args are not given, the factors are taken uniformly distributed on $[0,1]$ . The argument q must be list of character strings, giving the names of the quantile functions (one for each factor), such as qunif, qnorm... It can also be a single character string, meaning same distribution for all. The argument q.arg must be a list of lists, each one being additional parameters for the corresponding quantile function. For example, the parameters of the quantile function qunif could be list(min=1, max=2), giving an uniform distribution on $[1,2]$ . If q is a single character string, then q.arg must be a single list (rather than a list of one list).

Value

fast99 returns a list of class "fast99", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the factors sample values.
`y`	a vector of model responses.
`V`	the estimation of variance.
`D1`	the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor.
`Dt`	the estimations of VCE with respect to each factor complementary set of factors ("all but $X_i$ ").

Author(s)

Gilles Pujol

References

A. Saltelli, S. Tarantola and K. Chan, 1999, A quantitative, model independent method for global sensitivity analysis of model output, Technometrics, 41, 39–56.

R. I. Cukier, H. B. Levine and K. E. Schuler, 1978, Nonlinear sensitivity analysis of multiparameter model systems. J. Comput. Phys., 26, 1–42.

Examples

# Test case : the non-monotonic Ishigami function
x <- fast99(model = ishigami.fun, factors = 3, n = 1000,
            q = "qunif", q.arg = list(min = -pi, max = pi))
print(x)
plot(x)
# Test case : the non-monotonic Ishigami function
x <- fast99(model = ishigami.fun, factors = 3, n = 1000,
            q = "qunif", q.arg = list(min = -pi, max = pi))
print(x)
plot(x)

Johnson indices

Description

johnson computes the Johnson indices for correlated input relative importance by $R^2$ decomposition for linear and logistic regression models. These indices allocates a share of $R^2$ to each input based on the relative weight allocation (RWA) system, in the case of dependent or correlated inputs.

Usage

johnson(X, y, rank = FALSE, logistic = FALSE, nboot = 0, conf = 0.95)
## S3 method for class 'johnson'
print(x, ...)
## S3 method for class 'johnson'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'johnson'
ggplot(data,  mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
johnson(X, y, rank = FALSE, logistic = FALSE, nboot = 0, conf = 0.95)
## S3 method for class 'johnson'
print(x, ...)
## S3 method for class 'johnson'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'johnson'
ggplot(data,  mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`X`	a data frame (or object coercible by `as.data.frame`) containing the design of experiments (model input variables).
`y`	a vector containing the responses corresponding to the design of experiments (model output variables).
`rank`	logical. If `TRUE`, the analysis is done on the ranks.
`logistic`	logical. If `TRUE`, the analysis is done via a logistic regression (binomial GLM).
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level of the bootstrap confidence intervals.
`x`	the object returned by `johnson`.
`data`	the object returned by `johnson`.
`ylim`	the y-coordinate limits of the plot.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	arguments to be passed to methods, such as graphical parameters (see `par`).

Details

Logistic regression model (logistic = TRUE) and rank-based indices (rank = TRUE) are incompatible.

Value

johnson returns a list of class "johnson", containing the following components:

`call`	the matched call.
`johnson`	a data frame containing the estimations of the johnson indices, bias and confidence intervals.

Author(s)

Bertrand Iooss and Laura Clouvel

References

J.W. Johnson, 2000, A heuristic method for estimating the relative weight of predictor variables in multiple regression, Multivariate Behavioral Research, 35:1-19.

J.W. Johnson and J.M. LeBreton, 2004, History and use of relative importance indices in organizational research, Organizational Research Methods, 7:238-257.

Examples


##################################
# Same example than the one in src()

# a 100-sample with X1 ~ U(0.5, 1.5)
#                   X2 ~ U(1.5, 4.5)
#                   X3 ~ U(4.5, 13.5)

library(boot)
n <- 100
X <- data.frame(X1 = runif(n, 0.5, 1.5),
                X2 = runif(n, 1.5, 4.5),
                X3 = runif(n, 4.5, 13.5))

# linear model : Y = X1 + X2 + X3

y <- with(X, X1 + X2 + X3)

# sensitivity analysis

x <- johnson(X, y, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)


#################################
# Same examples than the ones in lmg()

library(boot)
library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

##########
# Gaussian correlated inputs

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")

#########
# Linear Model

y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-johnson(X, y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-johnson(X, y, nboot=100, conf=0.95)
print(x)
plot(x)

# Rank-based analysis
x<-johnson(X, y, rank=TRUE, nboot=100, conf=0.95)
print(x)
plot(x)

#######
# Logistic Regression
y<-as.numeric(X%*%beta + rnorm(n)>0)
x<-johnson(X,y, logistic = TRUE)
plot(x)
print(x)

#################################
# Test on a modified Linkletter fct with: 
# - multivariate normal inputs (all multicollinear)
# - in dimension 50 (there are 42 dummy inputs)
# - large-size sample (1e4)

library(mvtnorm)

n <- 1e4
d <- 50
sigma <- matrix(0.5,ncol=d,nrow=d)
diag(sigma) <- 1
X <- rmvnorm(n, rep(0,d), sigma)

y <- linkletter.fun(X)
joh <- johnson(X,y)
sum(joh$johnson) # gives the R2
plot(joh)
##################################
# Same example than the one in src()

# a 100-sample with X1 ~ U(0.5, 1.5)
#                   X2 ~ U(1.5, 4.5)
#                   X3 ~ U(4.5, 13.5)

library(boot)
n <- 100
X <- data.frame(X1 = runif(n, 0.5, 1.5),
                X2 = runif(n, 1.5, 4.5),
                X3 = runif(n, 4.5, 13.5))

# linear model : Y = X1 + X2 + X3

y <- with(X, X1 + X2 + X3)

# sensitivity analysis

x <- johnson(X, y, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)


#################################
# Same examples than the ones in lmg()

library(boot)
library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

##########
# Gaussian correlated inputs

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")

#########
# Linear Model

y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-johnson(X, y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-johnson(X, y, nboot=100, conf=0.95)
print(x)
plot(x)

# Rank-based analysis
x<-johnson(X, y, rank=TRUE, nboot=100, conf=0.95)
print(x)
plot(x)

#######
# Logistic Regression
y<-as.numeric(X%*%beta + rnorm(n)>0)
x<-johnson(X,y, logistic = TRUE)
plot(x)
print(x)

#################################
# Test on a modified Linkletter fct with: 
# - multivariate normal inputs (all multicollinear)
# - in dimension 50 (there are 42 dummy inputs)
# - large-size sample (1e4)

library(mvtnorm)

n <- 1e4
d <- 50
sigma <- matrix(0.5,ncol=d,nrow=d)
diag(sigma) <- 1
X <- rmvnorm(n, rep(0,d), sigma)

y <- linkletter.fun(X)
joh <- johnson(X,y)
sum(joh$johnson) # gives the R2
plot(joh)

Johnson-Shapley indices

Description

johnsonshap computes the Johnson-Shapley indices for correlated input relative importance. These indices allocate a share of the output variance to each input based on the relative weight allocation system, in the case of dependent or correlated inputs.

Usage

johnsonshap(model = NULL, X1, N, nboot = 0, conf = 0.95)
## S3 method for class 'johnsonshap'
print(x, ...)
## S3 method for class 'johnsonshap'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'johnsonshap'
ggplot(data,  mapping = aes(), ylim = c(0, 1), ..., 
                environment = parent.frame())
johnsonshap(model = NULL, X1, N, nboot = 0, conf = 0.95)
## S3 method for class 'johnsonshap'
print(x, ...)
## S3 method for class 'johnsonshap'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'johnsonshap'
ggplot(data,  mapping = aes(), ylim = c(0, 1), ..., 
                environment = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	a data frame (or object coercible by `as.data.frame`) containing a design of experiments (model input variables).
`N`	an integer giving the size of each replicated design for the Sobol' indices computations via the sobolrep() fct.
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level of the bootstrap confidence intervals.
`x`	the object returned by `johnsonshap`.
`data`	the object returned by `johnsonshap`.
`ylim`	the y-coordinate limits of the plot.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	arguments to be passed to methods, such as graphical parameters (see `par`).

Details

X1 is not used to run the model but just to perform the SVD; the model is run on a specific design which is internally generated.

By using bootstrap, values in the columns 'bias' and 'std. error' are arbitrarily put at 0 because of impossible computations; values in columns 'original', 'min c.i.' and 'max c.i.' are correctly computed.

Value

johnsonshap returns a list of class "johnsonshap", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a matrix containing the design of experiments.
`sobrepZ`	the Sobol' indices of the transformed inputs (independent)
`Wstar`	the standardized weight matrix.
`johnsonshap`	a data frame containing the estimations of the Johnson-Shapley indices, bias and confidence intervals.

Author(s)

Bertrand Iooss

References

B. Iooss and L. Clouvel, Une methode d'approximation des effets de Shapley en grande dimension, 54emes Journees de Statistique, Bruxelles, Belgique, July 3-7, 2023

Examples


library(ggplot2)
library(boot)

#####################################################
# Test case: the non-monotonic Sobol g-function (with independent inputs)
n <- 1000
X <- data.frame(matrix(runif(8 * n), nrow = n))
x <- johnsonshap(model = sobol.fun, X1 = X, N = n)
print(x)
plot(x)
ggplot(x)


#############################################
# 3D analytical toy functions described in Iooss & Clouvel (2023)

library(mvtnorm)

Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)
# 2 correlated inputs
Cov3d2 <- function(rho){ # correl (X1,X2)
  Cormat <- matrix(c(1,rho,0,rho,1,0,0,0,1),3,3)
  return( ( sig %*% t(sig) ) * Cormat)
}
mu3d <- c(1,0,0) ; sig3d <- c(0.25,1,1)
d <- 3 ; mu <- mu3d ; sig <- sig3d ; Covm <- Cov3d2
Xvec <- c("X1","X2","X3")

n <- 1e4    # initial sample size
N <- 1e4    # cost to estimate indices 
rho <- 0.9  # correlation coef for dependent inputs' case

################
# Linear model + a strong 2nd order interaction

toy3d <- function(x) return(x[,1]*(1+x[,1]*(cos(x[,2]+x[,3])^2))) 
# interaction X2X3
toy <- toy3d 

# Independent case

Covmat <- Covm(0)
X <- as.data.frame(Xall(n))
Y <- toy(X)
joh <- johnson(X, Y, nboot=100)
print(joh)
johshap <- johnsonshap(model = toy, X1 = X, N = N, nboot=100)
print(johshap)
ggplot(johshap)

# Dependent case

Covmat <- Covm(rho)
Xdep <- as.data.frame(Xall(n))
Ydep <- toy(Xdep)
joh <- johnson(Xdep, Ydep, nboot=0)
print(joh)
johshap <- johnsonshap(model = toy, X1 = Xdep, N = N, nboot=100)
print(johshap)
ggplot(johshap)

################
# Strongly non-inear model + a strong 2nd order interaction

toy3dNL <- function(x) return(sin(x[,1]*pi/2)*(1+x[,1]*(cos(x[,2]+x[,3])^2))) 
# non linearity in X1
toy <- toy3dNL

# Independent case

Covmat <- Covm(0)
X <- as.data.frame(Xall(n))
Y <- toy(X)
joh <- johnson(X, Y, nboot=100)
print(joh)
johshap <- johnsonshap(model = toy, X1 = X, N = N, nboot=100)
print(johshap)
ggplot(johshap)

# Dependent case

Covmat <- Covm(rho)
Xdep <- as.data.frame(Xall(n))
Ydep <- toy(Xdep)
joh <- johnson(Xdep, Ydep, nboot=0)
print(joh)
johshap <- johnsonshap(model = NULL, X1 = Xdep, N = N, nboot=100)
y <- toy(johshap$X)
tell(johshap, y)
print(johshap)
ggplot(johshap)


library(ggplot2)
library(boot)

#####################################################
# Test case: the non-monotonic Sobol g-function (with independent inputs)
n <- 1000
X <- data.frame(matrix(runif(8 * n), nrow = n))
x <- johnsonshap(model = sobol.fun, X1 = X, N = n)
print(x)
plot(x)
ggplot(x)


#############################################
# 3D analytical toy functions described in Iooss & Clouvel (2023)

library(mvtnorm)

Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)
# 2 correlated inputs
Cov3d2 <- function(rho){ # correl (X1,X2)
  Cormat <- matrix(c(1,rho,0,rho,1,0,0,0,1),3,3)
  return( ( sig %*% t(sig) ) * Cormat)
}
mu3d <- c(1,0,0) ; sig3d <- c(0.25,1,1)
d <- 3 ; mu <- mu3d ; sig <- sig3d ; Covm <- Cov3d2
Xvec <- c("X1","X2","X3")

n <- 1e4    # initial sample size
N <- 1e4    # cost to estimate indices 
rho <- 0.9  # correlation coef for dependent inputs' case

################
# Linear model + a strong 2nd order interaction

toy3d <- function(x) return(x[,1]*(1+x[,1]*(cos(x[,2]+x[,3])^2))) 
# interaction X2X3
toy <- toy3d 

# Independent case

Covmat <- Covm(0)
X <- as.data.frame(Xall(n))
Y <- toy(X)
joh <- johnson(X, Y, nboot=100)
print(joh)
johshap <- johnsonshap(model = toy, X1 = X, N = N, nboot=100)
print(johshap)
ggplot(johshap)

# Dependent case

Covmat <- Covm(rho)
Xdep <- as.data.frame(Xall(n))
Ydep <- toy(Xdep)
joh <- johnson(Xdep, Ydep, nboot=0)
print(joh)
johshap <- johnsonshap(model = toy, X1 = Xdep, N = N, nboot=100)
print(johshap)
ggplot(johshap)

################
# Strongly non-inear model + a strong 2nd order interaction

toy3dNL <- function(x) return(sin(x[,1]*pi/2)*(1+x[,1]*(cos(x[,2]+x[,3])^2))) 
# non linearity in X1
toy <- toy3dNL

# Independent case

Covmat <- Covm(0)
X <- as.data.frame(Xall(n))
Y <- toy(X)
joh <- johnson(X, Y, nboot=100)
print(joh)
johshap <- johnsonshap(model = toy, X1 = X, N = N, nboot=100)
print(johshap)
ggplot(johshap)

# Dependent case

Covmat <- Covm(rho)
Xdep <- as.data.frame(Xall(n))
Ydep <- toy(Xdep)
joh <- johnson(Xdep, Ydep, nboot=0)
print(joh)
johshap <- johnsonshap(model = NULL, X1 = Xdep, N = N, nboot=100)
y <- toy(johshap$X)
tell(johshap, y)
print(johshap)
ggplot(johshap)

LMG $R^2$ decomposition for linear and logistic regression models

Description

lmg computes the Lindeman, Merenda and Gold (LMG) indices for correlated input relative importance by $R^2$ decomposition for linear and logistic regression models. These indices allocates a share of $R^2$ to each input based on the Shapley attribution system, in the case of dependent or correlated inputs.

Usage

lmg(X, y, logistic = FALSE,  rank = FALSE, nboot = 0, 
    conf = 0.95, max.iter = 1000, parl = NULL)
## S3 method for class 'lmg'
print(x, ...)
## S3 method for class 'lmg'
plot(x, ylim = c(0,1), ...)
lmg(X, y, logistic = FALSE,  rank = FALSE, nboot = 0, 
    conf = 0.95, max.iter = 1000, parl = NULL)
## S3 method for class 'lmg'
print(x, ...)
## S3 method for class 'lmg'
plot(x, ylim = c(0,1), ...)

Arguments

`X`	a matrix or data frame containing the observed covariates (i.e., features, input variables...).
`y`	a numeric vector containing the observed outcomes (i.e., dependent variable). If `logistic=TRUE`, can be a numeric vector of zeros and ones, or a logical vector, or a factor.
`logistic`	logical. If `TRUE`, the analysis is done via a logistic regression(binomial GLM).
`rank`	logical. If `TRUE`, the analysis is done on the ranks.
`nboot`	the number of bootstrap replicates for the computation of confidence intervals.
`conf`	the confidence level of the bootstrap confidence intervals.
`max.iter`	if `logistic=TRUE`, the maximum number of iterative optimization steps allowed for the logistic regression. Default is `1000`.
`parl`	number of cores on which to parallelize the computation. If `NULL`, then no parallelization is done.
`x`	the object returned by `lmg`.
`ylim`	the y-coordinate limits of the plot.
`...`	arguments to be passed to methods, such as graphical parameters (see `par`).

Details

The computation is done using the subset procedure, defined in Broto, Bachoc and Depecker (2020), that is computing all the $R^2$ for all possible sub-models first, and then affecting the Shapley weights according to the Lindeman, Merenda and Gold (1980) definition.

For logistic regression (logistic=TRUE), the $R^2$ value is equal to:

$R^2 = 1-\frac{\textrm{model deviance}}{\textrm{null deviance}}$

If either a logistic regression model (logistic = TRUE), or any column of X is categorical (i.e., of class factor), then the rank-based indices cannot be computed. In both those cases, rank = FALSE is forced by default (with a warning).

If too many cores for the machine are passed on to the parl argument, the chosen number of cores is defaulted to the available cores minus one.

Value

lmg returns a list of class "lmg", containing the following components:

`call`	the matched call.
`lmg`	a data frame containing the estimations of the LMG indices.
`R2s`	the estimations of the $R^2$ for all possible sub-models.
`indices`	list of all subsets corresponding to the structure of R2s.
`w`	the Shapley weights.
`conf_int`	a matrix containing the estimations, biais and confidence intervals by bootstrap (if `nboot>0`).
`X`	the observed covariates.
`y`	the observed outcomes.
`logistic`	logical. `TRUE` if the analysis has been made by logistic regression.
`boot`	logical. `TRUE` if bootstrap estimates have been produced.
`nboot`	number of bootstrap replicates.
`rank`	logical. `TRUE` if a rank analysis has been made.
`parl`	number of chosen cores for the computation.
`conf`	level for the confidence intervals by bootstrap.

Author(s)

Marouane Il Idrissi

References

Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).

D.V. Budescu (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114:542-551.

U. Gromping (2006). Relative importance for linear regression in R: the Package relaimpo. Journal of Statistical Software, 17:1-27.

M. Il Idrissi, V. Chabridon and B. Iooss (2021). Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs, Environmental Modelling & Software, 143, 105115, 2021

M. Il Idrissi, V. Chabridon and B. Iooss (2021). Mesures d'importance relative par decompositions de la performance de modeles de regression, Actes des 52emes Journees de Statistiques de la Societe Francaise de Statistique (SFdS), pp 497-502, Nice, France, Juin 2021

Lindeman RH, Merenda PF, Gold RZ (1980). Introduction to Bivariate and Multivariate Analysis. Scott, Foresman, Glenview, IL.

Examples

library(parallel)
library(doParallel)
library(foreach)
library(gtools)
library(boot)

library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

############################
# Gaussian correlated inputs

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")

#############################
# Linear Model

y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-lmg(X, y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-lmg(X, y, nboot=100, conf=0.95)
print(x)
plot(x)

# Rank-based analysis
x<-lmg(X, y, rank=TRUE, nboot=100, conf=0.95)
print(x)
plot(x)

############################
# Logistic Regression
y<-as.numeric(X%*%beta + rnorm(n)>0)
x<-lmg(X,y, logistic = TRUE)
plot(x)
print(x)

# Parallel computing
#x<-lmg(X,y, logistic = TRUE, parl=2)
#plot(x)
#print(x)

library(parallel)
library(doParallel)
library(foreach)
library(gtools)
library(boot)

library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

############################
# Gaussian correlated inputs

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")

#############################
# Linear Model

y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-lmg(X, y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-lmg(X, y, nboot=100, conf=0.95)
print(x)
plot(x)

# Rank-based analysis
x<-lmg(X, y, rank=TRUE, nboot=100, conf=0.95)
print(x)
plot(x)

############################
# Logistic Regression
y<-as.numeric(X%*%beta + rnorm(n)>0)
x<-lmg(X,y, logistic = TRUE)
plot(x)
print(x)

# Parallel computing
#x<-lmg(X,y, logistic = TRUE, parl=2)
#plot(x)
#print(x)

Maximin criterion

Description

Compute the maximin criterion (also called mindist). This function uses a C++ implementation of the function mindist from package DiceDesign.

Usage

maximin_cplus(design)maximin_cplus(design)

Arguments

design

a matrix representing the design of experiments in the unit cube [0,1] $^d$ . If this last condition is not fulfilled, a transformation into [0,1] $^{d}$ is applied before the computation of the criteria.

Details

The maximin criterion is defined by:

$maximin= \min_{x_{i}\in X} \left( \gamma_{i} \right)$

where $\gamma_{i}$ is the minimal distance between the point $x_{i}$ and the other points $x_{k}$ of the design.

A higher value corresponds to a more regular scaterring of design points.

Value

A real number equal to the value of the maximin criterion for the design.

Author(s)

Laurent Gilquin

References

Gunzburer M., Burkdart J. (2004) Uniformity measures for point samples in hypercubes https://people.sc.fsu.edu/~jburkardt/.

Jonshon M.E., Moore L.M. and Ylvisaker D. (1990) Minmax and maximin distance designs, J. of Statis. Planning and Inference, 26, 131-148.

Chen V.C.P., Tsui K.L., Barton R.R. and Allen J.K. (2003) A review of design and modeling in computer experiments, Handbook of Statistics, 22, 231-261.

Examples

dimension <- 2
n <- 40
X <- matrix(runif(n*dimension),n,dimension)
maximin_cplus(X)
dimension <- 2
n <- 40
X <- matrix(runif(n*dimension),n,dimension)
maximin_cplus(X)

Morris's Elementary Effects Screening Method

Description

morris implements the Morris's elementary effects screening method (Morris, 1991). This method, based on design of experiments, allows to identify the few important factors at a cost of $r \times (p+1)$ simulations (where $p$ is the number of factors). This implementation includes some improvements of the original method: space-filling optimization of the design (Campolongo et al. 2007) and simplex-based design (Pujol 2009).

Usage

morris(model = NULL, factors, r, design, binf = 0, bsup = 1,
       scale = TRUE, ...)
## S3 method for class 'morris'
tell(x, y = NULL, ...)
## S3 method for class 'morris'
print(x, ...)
## S3 method for class 'morris'
plot(x, identify = FALSE, atpen = FALSE, y_col = NULL, 
  y_dim3 = NULL, ...)
## S3 method for class 'morris'
plot3d(x, alpha = c(0.2, 0), sphere.size = 1, y_col = NULL, 
  y_dim3 = NULL)
morris(model = NULL, factors, r, design, binf = 0, bsup = 1,
       scale = TRUE, ...)
## S3 method for class 'morris'
tell(x, y = NULL, ...)
## S3 method for class 'morris'
print(x, ...)
## S3 method for class 'morris'
plot(x, identify = FALSE, atpen = FALSE, y_col = NULL, 
  y_dim3 = NULL, ...)
## S3 method for class 'morris'
plot3d(x, alpha = c(0.2, 0), sphere.size = 1, y_col = NULL, 
  y_dim3 = NULL)

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`factors`	an integer giving the number of factors, or a vector of character strings giving their names.
`r`	either an integer giving the number of repetitions of the design, i.e. the number of elementary effect computed per factor, or a vector of two integers `c(r1, r2)` for the space-filling improvement (Campolongo et al. 2007). In this case, `r1` is the wanted design size, and `r2` ( $> \code{r1}$ ) is the size of the (bigger) population in which is extracted the design (this can throw a warning, see below).
`design`	a list specifying the design type and its parameters: `type = "oat"` for Morris's OAT design (Morris 1991), with the parameters: `levels` : either an integer specifying the number of levels of the design, or a vector of integers for different values for each factor. `grid.jump` : either an integer specifying the number of levels that are increased/decreased for computing the elementary effects, or a vector of integers for different values for each factor. If not given, it is set to `grid.jump = 1`. Notice that this default value of one does not follow Morris's recommendation of $\texttt{levels} / 2$ . `type = "simplex"` for simplex-based design (Pujol 2009), with the parameter: `scale.factor` : a numeric value, the homothety factor of the (isometric) simplexes. Edges equal one with a scale factor of one.
`binf`	either an integer, specifying the minimum value for the factors, or a vector for different values for each factor.
`bsup`	either an integer, specifying the maximum value for the factors, or a vector for different values for each factor.
`scale`	logical. If `TRUE`, the input design of experiments is scaled after building the design and before computing the elementary effects so that all factors vary within the range [0,1]. For each factor, the scaling is done relatively to its corresponding bsup and binf.
`x`	a list of class `"morris"` storing the state of the screening study (parameters, data, estimates).
`y`	a vector of model responses.
`identify`	logical. If `TRUE`, the user selects with the mouse the factors to label on the $(\mu^*,\sigma)$ graph (see `identify`).
`atpen`	logical. If `TRUE` (and `identify = TRUE`), the user-identified labels (more precisely: their lower-left corners) of the factors are plotted at the place where the user had clicked (if near enough to one of the factor points). If `FALSE` (and `identify = TRUE`), the labels are automatically adjusted to the lower, left, upper or right side of the factor point. For further information, see `identify`. Defaults to `FALSE`.
`y_col`	an integer defining the index of the column of `x$y` to be used for plotting the corresponding Morris statistics $\mu^*$ and $\sigma$ (only applies if `x$y` is a matrix or an array). If set to `NULL` (as per default) and `x$y` is a matrix or an array, the first column (respectively the first element in the second dimension) of `x$y` is used (i.e. `y_col = 1`).
`y_dim3`	an integer defining the index in the third dimension of `x$y` to be used for plotting the corresponding Morris statistics $\mu^*$ and $\sigma$ (only applies if `x$y` is an array). If set to `NULL` (as per default) and `x$y` is a three-dimensional array, the first element in the third dimension of `x$y` is used (i.e. `y_dim3 = 1`).
`alpha`	a vector of three values between 0.0 (fully transparent) and 1.0 (opaque) (see `rgl.material`). The first value is for the cone, the second for the planes.
`sphere.size`	a numeric value, the scale factor for displaying the spheres.
`...`	for `morris`: any other arguments for `model` which are passed unchanged each time it is called. For `plot.morris`: arguments to be passed to `plot.default`.

Details

plot.morris draws the $(\mu^*,\sigma)$ graph.

plot3d.morris draws the $(\mu, \mu^*,\sigma)$ graph (requires the rgl package). On this graph, the points are in a domain bounded by a cone and two planes (application of the Cauchy-Schwarz inequality).

When using the space-filling improvement (Campolongo et al. 2007) of the Morris design, we recommend to install before the "pracma" R package: its "distmat"" function makes running the function with a large number of initial estimates (r2) significantly faster (by accelerating the inter-point distances calculations).

This version of morris also supports matrices and three-dimensional arrays as output of model.

Value

morris returns a list of class "morris", containing all the input argument detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	either a vector, a matrix or a three-dimensional array of model responses (depends on the output of `model`).
`ee`	if `y` is a vector: a $(r \times p)$ - matrix of elementary effects for all the factors. if `y` is a matrix: a $(r \times p \times ncol(y))$ - array of elementary effects for all the factors and all columns of `y`. if `y` is a three-dimensional array: a $(r \times p \times dim(y)[2] \times dim(y)[3])$ - array of elementary effects for all the factors and all elements of the second and third dimension of `y`.

Notice that the statistics of interest ( $\mu$ , $\mu^*$ and $\sigma$ ) are not stored. They can be printed by the print method, but to extract numerical values, one has to compute them with the following instructions:

If x$y is a vector:

mu <- apply(x$ee, 2, mean)
mu.star <- apply(x$ee, 2, function(x) mean(abs(x)))
sigma <- apply(x$ee, 2, sd)

If x$y is a matrix:

mu <- apply(x$ee, 3, function(M){
  apply(M, 2, mean)
})
mu.star <- apply(abs(x$ee), 3, function(M){
  apply(M, 2, mean)
})
sigma <- apply(x$ee, 3, function(M){
  apply(M, 2, sd)
})

If x$y is a three-dimensional array:

mu <- sapply(1:dim(x$ee)[4], function(i){
  apply(x$ee[, , , i, drop = FALSE], 3, function(M){
    apply(M, 2, mean)
  })
}, simplify = "array")
mu.star <- sapply(1:dim(x$ee)[4], function(i){
  apply(abs(x$ee)[, , , i, drop = FALSE], 3, function(M){
    apply(M, 2, mean)
  })
}, simplify = "array")
sigma <- sapply(1:dim(x$ee)[4], function(i){
  apply(x$ee[, , , i, drop = FALSE], 3, function(M){
    apply(M, 2, sd)
  })
}, simplify = "array")

It is highly recommended to use the function with the argument scale = TRUE to avoid an uncorrect interpretation of factors that would have different orders of magnitude.

Warning messages

"keeping r' repetitions out of r": when generating the design of experiments, identical repetitions are removed, leading to a lower number than requested.

Author(s)

Gilles Pujol, with contributions from Frank Weber (2016)

References

M. D. Morris, 1991, Factorial sampling plans for preliminary computational experiments, Technometrics, 33, 161–174.

F. Campolongo, J. Cariboni and A. Saltelli, 2007, An effective screening design for sensitivity, Environmental Modelling and Software, 22, 1509–1518.

G. Pujol, 2009, Simplex-based screening designs for estimating metamodels, Reliability Engineering and System Safety 94, 1156–1160.

Examples

# Test case : the non-monotonic function of Morris
x <- morris(model = morris.fun, factors = 20, r = 4,
            design = list(type = "oat", levels = 5, grid.jump = 3))
print(x)
plot(x)

library(rgl)
plot3d.morris(x)  # (requires the package 'rgl')


# Only for demonstration purposes: a model function returning a matrix
morris.fun_matrix <- function(X){
  res_vector <- morris.fun(X)
  cbind(res_vector, 2 * res_vector)
}
x <- morris(model = morris.fun_matrix, factors = 20, r = 4,
            design = list(type = "oat", levels = 5, grid.jump = 3))
plot(x, y_col = 2)
title(main = "y_col = 2")

# Also only for demonstration purposes: a model function returning a
# three-dimensional array
morris.fun_array <- function(X){
  res_vector <- morris.fun(X)
  res_matrix <- cbind(res_vector, 2 * res_vector)
  array(data = c(res_matrix, 5 * res_matrix), 
        dim = c(length(res_vector), 2, 2))
}
x <- morris(model = morris.fun_array, factors = 20, r = 4,
            design = list(type = "simplex", scale.factor = 1))
plot(x, y_col = 2, y_dim3 = 2)
title(main = "y_col = 2, y_dim3 = 2")

# Test case : the non-monotonic function of Morris
x <- morris(model = morris.fun, factors = 20, r = 4,
            design = list(type = "oat", levels = 5, grid.jump = 3))
print(x)
plot(x)

library(rgl)
plot3d.morris(x)  # (requires the package 'rgl')


# Only for demonstration purposes: a model function returning a matrix
morris.fun_matrix <- function(X){
  res_vector <- morris.fun(X)
  cbind(res_vector, 2 * res_vector)
}
x <- morris(model = morris.fun_matrix, factors = 20, r = 4,
            design = list(type = "oat", levels = 5, grid.jump = 3))
plot(x, y_col = 2)
title(main = "y_col = 2")

# Also only for demonstration purposes: a model function returning a
# three-dimensional array
morris.fun_array <- function(X){
  res_vector <- morris.fun(X)
  res_matrix <- cbind(res_vector, 2 * res_vector)
  array(data = c(res_matrix, 5 * res_matrix), 
        dim = c(length(res_vector), 2, 2))
}
x <- morris(model = morris.fun_array, factors = 20, r = 4,
            design = list(type = "simplex", scale.factor = 1))
plot(x, y_col = 2, y_dim3 = 2)
title(main = "y_col = 2, y_dim3 = 2")

Morris's Elementary Effects Screening Method for Multidimensional Outputs

Description

morrisMultOut extend the Morris's elementary effects screening method (Morris 1991) to model with multidimensional outputs.

Usage

morrisMultOut(model = NULL, factors, r, design, binf = 0, bsup = 1,
       scale = TRUE, ...)
## S3 method for class 'morrisMultOut'
tell(x, y = NULL, ...)
morrisMultOut(model = NULL, factors, r, design, binf = 0, bsup = 1,
       scale = TRUE, ...)
## S3 method for class 'morrisMultOut'
tell(x, y = NULL, ...)

Arguments

`model`	NULL or a function returning a outputs a matrix having as columns the model outputs.
`factors`	an integer giving the number of factors, or a vector of character strings giving their names.
`r`	either an integer giving the number of repetitions of the design, i.e. the number of elementary effect computed per factor, or a vector of two integers `c(r1, r2)` for the space-filling improvement (Campolongo et al. 2007). In this case, `r1` is the wanted design size, and `r2` ( $> \code{r1}$ ) is the size of the (bigger) population in which is extracted the design (this can throw a warning, see below).
`design`	a list specifying the design type and its parameters: `type = "oat"` for Morris's OAT design (Morris 1991), with the parameters: `levels` : either an integer specifying the number of levels of the design, or a vector of integers for different values for each factor. `grid.jump` : either an integer specifying the number of levels that are increased/decreased for computing the elementary effects, or a vector of integers for different values for each factor. If not given, it is set to `grid.jump = 1`. Notice that this default value of one does not follow Morris's recommendation of $\texttt{levels} / 2$ . `type = "simplex"` for simplex-based design (Pujol 2009), with the parameter: `scale.factor` : a numeric value, the homothety factor of the (isometric) simplexes. Edges equal one with a scale factor of one.
`binf`	either an integer, specifying the minimum value for the factors, or a vector for different values for each factor.
`bsup`	either an integer, specifying the maximum value for the factors, or a vector for different values for each factor.
`scale`	logical. If `TRUE`, the input design of experiments is scaled after building the design and before computing the elementary effects so that all factors vary within the range [0,1]. For each factor, the scaling is done relatively to its corresponding bsup and binf.
`x`	a list of class `"morris"` storing the state of the screening study (parameters, data, estimates).
`y`	a vector of model responses.
`...`	for `morrisMultOut`: any other arguments for `model` which are passed unchanged each time it is called. For `plot.morris`: arguments to be passed to `plot.default`.

Details

All the methods available for object of class "morris" are available also for objects of class "morrisMultOut". See the documentation relative to the function "morris" for more details.

Value

morrisMultOut returns a list of class "c(morrisMultOut, morris)", containing all the input argument detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a matrix having as columns the model responses.
`ee`	a vector of aggregated elementary effects.

Author(s)

Filippo Monari

References

Monari F. and P. Strachan, 2017. Characterization of an airflow network model by sensitivity analysis: parameter screening, fixing, prioritizing and mapping. Journal of Building Performance Simulation, 2017, 10, 17-36.

Examples


  mdl <- function (X) t(atantemp.fun(X))

  x = morrisMultOut(model = mdl, factors = 4, r = 50, 
  design = list(type = "oat", levels = 5, grid.jump = 3), binf = -1, bsup = 5, 
    scale = FALSE)
  print(x)
  plot(x)

  x = morrisMultOut(model = NULL, factors = 4, r = 50, 
  design = list(type = "oat", levels = 5, grid.jump = 3), binf = -1, bsup = 5, 
    scale = FALSE)
  Y = mdl(x[['X']])
  tell(x, Y)	
  print(x)
  plot(x)	
mdl <- function (X) t(atantemp.fun(X))

  x = morrisMultOut(model = mdl, factors = 4, r = 50, 
  design = list(type = "oat", levels = 5, grid.jump = 3), binf = -1, bsup = 5, 
    scale = FALSE)
  print(x)
  plot(x)

  x = morrisMultOut(model = NULL, factors = 4, r = 50, 
  design = list(type = "oat", levels = 5, grid.jump = 3), binf = -1, bsup = 5, 
    scale = FALSE)
  Y = mdl(x[['X']])
  tell(x, Y)	
  print(x)
  plot(x)

Generate parameter sets

Description

Generate parameter sets from given ranges, with chosen sampling scheme

Usage

parameterSets(par.ranges, samples, method = c("sobol", "innergrid", "grid"))
parameterSets(par.ranges, samples, method = c("sobol", "innergrid", "grid"))

Arguments

`par.ranges`	A named list of minimum and maximum parameter values
`samples`	Number of samples to generate. For the `"grid"` and `"innergrid"` method, may be a vector of number of samples for each parameter.
`method`	the sampling scheme; see Details

Details

Method "sobol" generates uniformly distributed Sobol low discrepancy numbers, using the sobol function in the randtoolbox package.

Method "grid" generates a grid within the parameter ranges, including its extremes, with number of points determined by samples

Method "innergrid" generates a grid within the parameter ranges, with edges of the grid offset from the extremes. The offset is calculated as half of the resolution of the grid diff(par.ranges)/samples/2.

Value

the result is a matrix, with named columns for each parameter in par.ranges. Each row represents one parameter set.

Author(s)

Joseph Guillaume, based on similar function by Felix Andrews

Examples


X.grid <- parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)),
                          samples=c(10,10),method="grid")
plot(X.grid)

X.innergrid<-parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)),
                          samples=c(10,10),method="innergrid")
points(X.innergrid,col="red")


library(randtoolbox)
X.sobol<-parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)),
                           samples=100,method="sobol")
plot(X.sobol)

X.grid <- parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)),
                          samples=c(10,10),method="grid")
plot(X.grid)

X.innergrid<-parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)),
                          samples=c(10,10),method="innergrid")
points(X.innergrid,col="red")


library(randtoolbox)
X.sobol<-parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)),
                           samples=100,method="sobol")
plot(X.sobol)

Partial Correlation Coefficients

Description

pcc computes the Partial Correlation Coefficients (PCC), Semi-Partial Correlation Coefficients (SPCC), Partial Rank Correlation Coefficients (PRCC) or Semi-Partial Rank Correlation Coefficients (SPRCC), which are variance-based measures based on linear (resp. monotonic) assumptions, in the case of (linearly) correlated factors.

Usage

pcc(X, y, rank = FALSE, semi = FALSE, logistic = FALSE, nboot = 0, conf = 0.95)
## S3 method for class 'pcc'
print(x, ...)
## S3 method for class 'pcc'
plot(x, ylim = c(-1,1), ...)
## S3 method for class 'pcc'
ggplot(data, mapping = aes(), ..., environment
                 = parent.frame(), ylim = c(-1,1))
pcc(X, y, rank = FALSE, semi = FALSE, logistic = FALSE, nboot = 0, conf = 0.95)
## S3 method for class 'pcc'
print(x, ...)
## S3 method for class 'pcc'
plot(x, ylim = c(-1,1), ...)
## S3 method for class 'pcc'
ggplot(data, mapping = aes(), ..., environment
                 = parent.frame(), ylim = c(-1,1))

Arguments

`X`	a data frame (or object coercible by `as.data.frame`) containing the design of experiments (model input variables).
`y`	a vector containing the responses corresponding to the design of experiments (model output variables).
`rank`	logical. If `TRUE`, the analysis is done on the ranks.
`semi`	logical. If `TRUE`, semi-PCC are computed.
`logistic`	logical. If `TRUE`, the analysis is done via a logistic regression (binomial GLM).
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level of the bootstrap confidence intervals.
`x`	the object returned by `pcc`.
`data`	the object returned by `pcc`.
`ylim`	the y-coordinate limits of the plot.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	arguments to be passed to methods, such as graphical parameters (see `par`).

Details

Logistic regression model (logistic = TRUE) and rank-based indices (rank = TRUE) are incompatible.

Value

pcc returns a list of class "pcc", containing the following components:

`call`	the matched call.
`PCC`	a data frame containing the estimations of the PCC indices, bias and confidence intervals (if `rank = TRUE` and `semi = FALSE`).
`PRCC`	a data frame containing the estimations of the PRCC indices, bias and confidence intervals (if `rank = TRUE` and `semi = FALSE`).
`SPCC`	a data frame containing the estimations of the PCC indices, bias and confidence intervals (if `rank = TRUE` and `semi = TRUE`).
`SPRCC`	a data frame containing the estimations of the PRCC indices, bias and confidence intervals (if `rank = TRUE` and `semi = TRUE`).

Author(s)

Gilles Pujol and Bertrand Iooss

References

J.W. Johnson and J.M. LeBreton, 2004, History and use of relative importance indices in organizational research, Organizational Research Methods, 7:238-257.

A. Saltelli, K. Chan and E. M. Scott eds, 2000, Sensitivity Analysis, Wiley.

Examples


# a 100-sample with X1 ~ U(0.5, 1.5)
#                   X2 ~ U(1.5, 4.5)
#                   X3 ~ U(4.5, 13.5)
library(boot)
n <- 100
X <- data.frame(X1 = runif(n, 0.5, 1.5),
                X2 = runif(n, 1.5, 4.5),
                X3 = runif(n, 4.5, 13.5))

# linear model : Y = X1^2 + X2 + X3
y <- with(X, X1^2 + X2 + X3)

# sensitivity analysis
x <- pcc(X, y, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)
ggplot(x, ylim = c(-1.5,1.5))

x <- pcc(X, y, semi = TRUE, nboot = 100)
print(x)
plot(x)

# a 100-sample with X1 ~ U(0.5, 1.5)
#                   X2 ~ U(1.5, 4.5)
#                   X3 ~ U(4.5, 13.5)
library(boot)
n <- 100
X <- data.frame(X1 = runif(n, 0.5, 1.5),
                X2 = runif(n, 1.5, 4.5),
                X3 = runif(n, 4.5, 13.5))

# linear model : Y = X1^2 + X2 + X3
y <- with(X, X1^2 + X2 + X3)

# sensitivity analysis
x <- pcc(X, y, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)
ggplot(x, ylim = c(-1.5,1.5))

x <- pcc(X, y, semi = TRUE, nboot = 100)
print(x)
plot(x)

Perturbed-Law based sensitivity Indices (PLI) for failure probability

Description

PLI computes the Perturbed-Law based Indices (PLI), also known as the Density Modification Based Reliability Sensitivity Indices (DMBRSI), which are robustness indices related to a probability of exceedence of a model output (i.e. a failure probability), estimated by a Monte Carlo method. See Lemaitre et al. (2015).

Usage

PLI(failurepoints,failureprobabilityhat,samplesize,deltasvector,
       InputDistributions,type="MOY",samedelta=TRUE)
PLI(failurepoints,failureprobabilityhat,samplesize,deltasvector,
       InputDistributions,type="MOY",samedelta=TRUE)

Arguments

`failurepoints`	a matrix of failure points coordinates, one column per variable.
`failureprobabilityhat`	the estimation of failure probability P through rough Monte Carlo method.
`samplesize`	the size of the sample used to estimate P. One must have Pchap=dim(failurepoints)[1]/samplesize
`deltasvector`	a vector containing the values of delta for which the indices will be computed.
`InputDistributions`	a list of list. Each list contains, as a list, the name of the distribution to be used and the parameters. Implemented cases so far: For a mean perturbation: Gaussian, Uniform, Triangle, Left Trucated Gaussian, Left Truncated Gumbel. Using Gumbel requires the package `evd`. For a variance perturbation: Gaussian, Uniform.
`type`	a character string in which the user will specify the type of perturbation wanted. The sense of "deltasvector" varies according to the type of perturbation: type can take the value "MOY",in which case deltasvector is a vector of perturbated means. type can take the value "VAR",in which case deltasvector is a vector of perturbated variances, therefore needs to be positive integers.
`samedelta`	a boolean used with the value "MOY" for type. If it is set at TRUE, the mean perturbation will be the same for all the variables. If not, the mean perturbation will be new_mean = mean+sigma*delta where mean, sigma are parameters defined in InputDistributions and delta is a value of deltasvector.

Value

PLI returns a list of matrices, containing:

A matrix where the PLI are stored. Each column corresponds to an input, each line corresponds to a twist of amplitude delta.
A matrix where their standard deviation are stored.

Author(s)

Paul Lemaitre and Bertrand Iooss

References

C. Gauchy and J. Stenger and R. Sueur and B. Iooss, An information geometry approach for robustness analysis in uncertainty quantification of computer codes, Technometrics, 64:80-91, 2022.

P. Lemaitre, E. Sergienko, A. Arnaud, N. Bousquet, F. Gamboa and B. Iooss, Density modification based reliability sensitivity analysis, Journal of Statistical Computation and Simulation, 85:1200-1223.

E. Borgonovo and B. Iooss, 2017, Moment independent importance measures and a common rationale, In: Springer Handbook on UQ, R. Ghanem, D. Higdon and H. Owhadi (Eds).

Examples



# Model: Ishigami function with a treshold at -7
# Failure points are those < -7

  distributionIshigami = list()
	for (i in 1:3){
		distributionIshigami[[i]]=list("unif",c(-pi,pi))
		distributionIshigami[[i]]$r=("runif")
	}
  
# Monte Carlo sampling to obtain failure points

  N = 100000
	X = matrix(0,ncol=3,nrow=N)
	for( i in 1:3) X[,i] = runif(N,-pi,pi)
	T = ishigami.fun(X)
	s = sum(as.numeric(T < -7)) # Number of failure
	pdefchap = s/N      # Failure probability
	ptsdef = X[T < -7,]	# Failure points
	
# sensitivity indices with perturbation of the mean 
  
	v_delta = seq(-3,3,1/20) 
	Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N,
		deltasvector=v_delta,InputDistributions=distributionIshigami,type="MOY",
		samedelta=TRUE)
	BIshm = Toto[[1]]
	SIshm = Toto[[2]]

	par(mfrow=c(1,1),mar=c(4,5,1,1))
	plot(v_delta,BIshm[,2],ylim=c(-4,4),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,BIshm[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,BIshm[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,BIshm[,2]+1.96*SIshm[,2],col="black")
	lines(v_delta,BIshm[,2]-1.96*SIshm[,2],col="black")
	lines(v_delta,BIshm[,1]+1.96*SIshm[,1],col="darkgreen")
	lines(v_delta,BIshm[,1]-1.96*SIshm[,1],col="darkgreen")
	lines(v_delta,BIshm[,3]+1.96*SIshm[,3],col="red")
	lines(v_delta,BIshm[,3]-1.96*SIshm[,3],col="red")
	abline(h=0,lty=2)
	legend(0,3,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
  
# sensitivity indices with perturbation of the variance 

	v_delta = seq(1,5,1/4) # user parameter. (the true variance is 3.29)	
	Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N,
		deltasvector=v_delta,InputDistributions=distributionIshigami,type="VAR",
		samedelta=TRUE)
	BIshv=Toto[[1]]
	SIshv=Toto[[2]]

	par(mfrow=c(2,1),mar=c(1,5,1,1)+0.1)
	plot(v_delta,BIshv[,2],ylim=c(-.5,.5),xlab=expression(V_f),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,BIshv[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,BIshv[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,BIshv[,2]+1.96*SIshv[,2],col="black")
	lines(v_delta,BIshv[,2]-1.96*SIshv[,2],col="black")
	lines(v_delta,BIshv[,1]+1.96*SIshv[,1],col="darkgreen")
	lines(v_delta,BIshv[,1]-1.96*SIshv[,1],col="darkgreen")
	lines(v_delta,BIshv[,3]+1.96*SIshv[,3],col="red")
	lines(v_delta,BIshv[,3]-1.96*SIshv[,3],col="red")

	par(mar=c(4,5.1,1.1,1.1))
	plot(v_delta,BIshv[,2],ylim=c(-30,.7),xlab=expression(V[f]),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,BIshv[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,BIshv[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,BIshv[,2]+1.96*SIshv[,2],col="black")
	lines(v_delta,BIshv[,2]-1.96*SIshv[,2],col="black")
	lines(v_delta,BIshv[,1]+1.96*SIshv[,1],col="darkgreen")
	lines(v_delta,BIshv[,1]-1.96*SIshv[,1],col="darkgreen")
	lines(v_delta,BIshv[,3]+1.96*SIshv[,3],col="red")
	lines(v_delta,BIshv[,3]-1.96*SIshv[,3],col="red")
	legend(2.5,-10,legend=c("X1","X2","X3"),col=c("darkgreen","black","red"),
		pch=c(15,19,17),cex=1.5)
  
##############################################################
# Example with an inverse probability transform 
# (to obtain Gaussian inputs from Uniform ones)

# Monte Carlo sampling (the inputs are Uniform)

  N = 100000
	X = matrix(0,ncol=3,nrow=N)
	for( i in 1:3) X[,i] = runif(N,-pi,pi)
	T = ishigami.fun(X)
	s = sum(as.numeric(T < -7)) # Number of failure
	pdefchap = s/N      # Failure probability
	
# Empirical transform (applied on the sample)

  Xn <- matrix(0,nrow=N,ncol=3)
  for (i in 1:3){
    ecdfx <- ecdf(X[,i])
    q <- ecdfx(X[,i])
    Xn[,i] <- qnorm(q) # Gaussian anamorphosis
    # infinite max values => putting the symetrical values of min values
    Xn[which(Xn[,i]==Inf),i] <- - Xn[which.min(Xn[,i]),i] 
    }
# Visualization of a perturbed density (the one of X1 perturbed on the mean)
  delta_mean_gauss <- 1 # perturbed value on the mean of the Gaussian transform
  Xtr <- quantile(ecdfx,pnorm(Xn[,1] + delta_mean_gauss)) # backtransform
	par(mfrow=c(1,1))
  plot(density(Xtr), col="red") ; lines(density(X[,1]))
  
# sensitivity indices with perturbation of the mean 
  
  distributionIshigami = list()
	for (i in 1:3){
		distributionIshigami[[i]]=list("norm",c(0,1))
		distributionIshigami[[i]]$r=("rnorm")
	}
	
	ptsdef = Xn[T < -7,]	# Failure points # failure points with Gaussian distrib.
	
	v_delta = seq(-1.5,1.5,1/20) 
	Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N,
		deltasvector=v_delta,InputDistributions=distributionIshigami,type="MOY",
		samedelta=TRUE)
	BIshm = Toto[[1]]
	SIshm = Toto[[2]]

	par(mfrow=c(1,1),mar=c(4,5,1,1))
	plot(v_delta,BIshm[,2],ylim=c(-4,4),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,BIshm[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,BIshm[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,BIshm[,2]+1.96*SIshm[,2],col="black")
	lines(v_delta,BIshm[,2]-1.96*SIshm[,2],col="black")
	lines(v_delta,BIshm[,1]+1.96*SIshm[,1],col="darkgreen")
	lines(v_delta,BIshm[,1]-1.96*SIshm[,1],col="darkgreen")
	lines(v_delta,BIshm[,3]+1.96*SIshm[,3],col="red")
	lines(v_delta,BIshm[,3]-1.96*SIshm[,3],col="red")
	abline(h=0,lty=2)
	legend(0,3,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
  

# Model: Ishigami function with a treshold at -7
# Failure points are those < -7

  distributionIshigami = list()
	for (i in 1:3){
		distributionIshigami[[i]]=list("unif",c(-pi,pi))
		distributionIshigami[[i]]$r=("runif")
	}
  
# Monte Carlo sampling to obtain failure points

  N = 100000
	X = matrix(0,ncol=3,nrow=N)
	for( i in 1:3) X[,i] = runif(N,-pi,pi)
	T = ishigami.fun(X)
	s = sum(as.numeric(T < -7)) # Number of failure
	pdefchap = s/N      # Failure probability
	ptsdef = X[T < -7,]	# Failure points
	
# sensitivity indices with perturbation of the mean 
  
	v_delta = seq(-3,3,1/20) 
	Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N,
		deltasvector=v_delta,InputDistributions=distributionIshigami,type="MOY",
		samedelta=TRUE)
	BIshm = Toto[[1]]
	SIshm = Toto[[2]]

	par(mfrow=c(1,1),mar=c(4,5,1,1))
	plot(v_delta,BIshm[,2],ylim=c(-4,4),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,BIshm[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,BIshm[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,BIshm[,2]+1.96*SIshm[,2],col="black")
	lines(v_delta,BIshm[,2]-1.96*SIshm[,2],col="black")
	lines(v_delta,BIshm[,1]+1.96*SIshm[,1],col="darkgreen")
	lines(v_delta,BIshm[,1]-1.96*SIshm[,1],col="darkgreen")
	lines(v_delta,BIshm[,3]+1.96*SIshm[,3],col="red")
	lines(v_delta,BIshm[,3]-1.96*SIshm[,3],col="red")
	abline(h=0,lty=2)
	legend(0,3,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
  
# sensitivity indices with perturbation of the variance 

	v_delta = seq(1,5,1/4) # user parameter. (the true variance is 3.29)	
	Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N,
		deltasvector=v_delta,InputDistributions=distributionIshigami,type="VAR",
		samedelta=TRUE)
	BIshv=Toto[[1]]
	SIshv=Toto[[2]]

	par(mfrow=c(2,1),mar=c(1,5,1,1)+0.1)
	plot(v_delta,BIshv[,2],ylim=c(-.5,.5),xlab=expression(V_f),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,BIshv[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,BIshv[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,BIshv[,2]+1.96*SIshv[,2],col="black")
	lines(v_delta,BIshv[,2]-1.96*SIshv[,2],col="black")
	lines(v_delta,BIshv[,1]+1.96*SIshv[,1],col="darkgreen")
	lines(v_delta,BIshv[,1]-1.96*SIshv[,1],col="darkgreen")
	lines(v_delta,BIshv[,3]+1.96*SIshv[,3],col="red")
	lines(v_delta,BIshv[,3]-1.96*SIshv[,3],col="red")

	par(mar=c(4,5.1,1.1,1.1))
	plot(v_delta,BIshv[,2],ylim=c(-30,.7),xlab=expression(V[f]),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,BIshv[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,BIshv[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,BIshv[,2]+1.96*SIshv[,2],col="black")
	lines(v_delta,BIshv[,2]-1.96*SIshv[,2],col="black")
	lines(v_delta,BIshv[,1]+1.96*SIshv[,1],col="darkgreen")
	lines(v_delta,BIshv[,1]-1.96*SIshv[,1],col="darkgreen")
	lines(v_delta,BIshv[,3]+1.96*SIshv[,3],col="red")
	lines(v_delta,BIshv[,3]-1.96*SIshv[,3],col="red")
	legend(2.5,-10,legend=c("X1","X2","X3"),col=c("darkgreen","black","red"),
		pch=c(15,19,17),cex=1.5)
  
##############################################################
# Example with an inverse probability transform 
# (to obtain Gaussian inputs from Uniform ones)

# Monte Carlo sampling (the inputs are Uniform)

  N = 100000
	X = matrix(0,ncol=3,nrow=N)
	for( i in 1:3) X[,i] = runif(N,-pi,pi)
	T = ishigami.fun(X)
	s = sum(as.numeric(T < -7)) # Number of failure
	pdefchap = s/N      # Failure probability
	
# Empirical transform (applied on the sample)

  Xn <- matrix(0,nrow=N,ncol=3)
  for (i in 1:3){
    ecdfx <- ecdf(X[,i])
    q <- ecdfx(X[,i])
    Xn[,i] <- qnorm(q) # Gaussian anamorphosis
    # infinite max values => putting the symetrical values of min values
    Xn[which(Xn[,i]==Inf),i] <- - Xn[which.min(Xn[,i]),i] 
    }
# Visualization of a perturbed density (the one of X1 perturbed on the mean)
  delta_mean_gauss <- 1 # perturbed value on the mean of the Gaussian transform
  Xtr <- quantile(ecdfx,pnorm(Xn[,1] + delta_mean_gauss)) # backtransform
	par(mfrow=c(1,1))
  plot(density(Xtr), col="red") ; lines(density(X[,1]))
  
# sensitivity indices with perturbation of the mean 
  
  distributionIshigami = list()
	for (i in 1:3){
		distributionIshigami[[i]]=list("norm",c(0,1))
		distributionIshigami[[i]]$r=("rnorm")
	}
	
	ptsdef = Xn[T < -7,]	# Failure points # failure points with Gaussian distrib.
	
	v_delta = seq(-1.5,1.5,1/20) 
	Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N,
		deltasvector=v_delta,InputDistributions=distributionIshigami,type="MOY",
		samedelta=TRUE)
	BIshm = Toto[[1]]
	SIshm = Toto[[2]]

	par(mfrow=c(1,1),mar=c(4,5,1,1))
	plot(v_delta,BIshm[,2],ylim=c(-4,4),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,BIshm[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,BIshm[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,BIshm[,2]+1.96*SIshm[,2],col="black")
	lines(v_delta,BIshm[,2]-1.96*SIshm[,2],col="black")
	lines(v_delta,BIshm[,1]+1.96*SIshm[,1],col="darkgreen")
	lines(v_delta,BIshm[,1]-1.96*SIshm[,1],col="darkgreen")
	lines(v_delta,BIshm[,3]+1.96*SIshm[,3],col="red")
	lines(v_delta,BIshm[,3]-1.96*SIshm[,3],col="red")
	abline(h=0,lty=2)
	legend(0,3,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)

Perturbed-Law based sensitivity Indices (PLI) for quantile

Description

PLIquantile computes the Perturbed-Law based Indices (PLI) for quantile, which are robustness indices related to a quantile of a model output, estimated by a Monte Carlo method, See Sueur et al. (2017) and Iooss et al. (2020).

Usage

PLIquantile(order,x,y,deltasvector,InputDistributions,type="MOY",samedelta=TRUE,
            percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE)
PLIquantile(order,x,y,deltasvector,InputDistributions,type="MOY",samedelta=TRUE,
            percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE)

Arguments

`order`	the order of the quantile to estimate.
`x`	the matrix of simulation points coordinates, one column per variable.
`y`	the vector of model outputs.
`deltasvector`	a vector containing the values of delta for which the indices will be computed.
`InputDistributions`	a list of list. Each list contains, as a list, the name of the distribution to be used and the parameters. Implemented cases so far: For a mean perturbation: Gaussian, Uniform, Triangle, Left Trucated Gaussian, Left Truncated Gumbel. Using Gumbel requires the package `evd`. For a variance perturbation: Gaussian, Uniform.
`type`	a character string in which the user will specify the type of perturbation wanted. The sense of "deltasvector" varies according to the type of perturbation: type can take the value "MOY",in which case deltasvector is a vector of perturbated means. type can take the value "VAR",in which case deltasvector is a vector of perturbated variances, therefore needs to be positive integers.
`samedelta`	a boolean used with the value "MOY" for type. If it is set at TRUE, the mean perturbation will be the same for all the variables. If not, the mean perturbation will be new_mean = mean+sigma*delta where mean, sigma are parameters defined in InputDistributions and delta is a value of deltasvector.
`percentage`	a boolean that defines the formula used for the PLI. If it is set at FALSE, the initially proposed formula is used (see Sueur et al., 2017). If not (set as TRUE), the PLI is given in percentage of variation of the quantile (see Iooss et al., 2020).
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`bootsample`	If TRUE, the uncertainty about the original quantile estimation is taken into account in the PLI confidence intervals (see Iooss et al., 2021). If FALSE, standard confidence intervals are computed for the PLI. It mainly changes the CI at small delta values.

Value

PLIquantile returns a list of matrices (each column corresponds to an input, each line corresponds to a twist of amplitude delta) containing the following components:

`PLI`	the PLI.
`PLICIinf`	the bootstrap lower confidence interval values of the PLI.
`PLICIsup`	the bootstrap upper confidence interval values of the PLI.
`quantile`	the perturbed quantile.
`quantileCIinf`	the bootstrap lower confidence interval values of the perturbed quantile.
`quantileCIsup`	the bootstrap upper confidence interval values of the perturbed quantile.

Author(s)

Paul Lemaitre, Bertrand Iooss, Thibault Delage and Roman Sueur

References

T. Delage, R. Sueur and B. Iooss, 2018, Robustness analysis of epistemic uncertainties propagation studies in LOCA assessment thermal-hydraulic model, ANS Best Estimate Plus Uncertainty International Conference (BEPU 2018), Lucca, Italy, May 13-19, 2018.

C. Gauchy, J. Stenger, R. Sueur and B. Iooss, 2022, An information geometry approach for robustness analysis in uncertainty quantification of computer codes, Technometrics, 64:80-91.

B. Iooss, V. Verges and V. Larget, 2022, BEPU robustness analysis via perturbed law-based sensitivity indices, Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 236:855-865.

P. Lemaitre, E. Sergienko, A. Arnaud, N. Bousquet, F. Gamboa and B. Iooss, 2015, Density modification based reliability sensitivity analysis, Journal of Statistical Computation and Simulation, 85:1200-1223.

R. Sueur, N. Bousquet, B. Iooss and J. Bect, 2016, Perturbed-Law based sensitivity Indices for sensitivity analysis in structural reliability, Proceedings of the SAMO 2016 Conference, Reunion Island, France, December 2016.

R. Sueur, B. Iooss and T. Delage, 2017, Sensitivity analysis using perturbed-law based indices for quantiles and application to an industrial case, 10th International Conference on Mathematical Methods in Reliability (MMR 2017), Grenoble, France, July 2017.

Examples



# Model: 3D function 

  distribution = list()
	for (i in 1:3) distribution[[i]]=list("norm",c(0,1))
  
# Monte Carlo sampling 

  N = 5000
	X = matrix(0,ncol=3,nrow=N)
	for(i in 1:3) X[,i] = rnorm(N,0,1)
     
	Y = 2 * X[,1] + X[,2] + X[,3]/2
	alpha <- 0.95 # quantile order
	
	q95 = quantile(Y,alpha)
	
	nboot=20 # put nboot=200 for consistency
	
# sensitivity indices with perturbation of the mean 
  
	v_delta = seq(-1,1,1/10) 
	toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,
	  InputDistributions=distribution,type="MOY",samedelta=TRUE,
	  percentage=FALSE,nboot=nboot)

# Plotting the PLI

  par(mar=c(4,5,1,1))
	plot(v_delta,toto$PLI[,2],ylim=c(-1.5,1.5),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$PLICIinf[,2],col="black")
	lines(v_delta,toto$PLICIsup[,2],col="black")
	lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
	lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
	lines(v_delta,toto$PLICIinf[,3],col="red")
	lines(v_delta,toto$PLICIsup[,3],col="red")
	abline(h=0,lty=2)
	legend(0.8,1.5,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
  
# Plotting the perturbed quantiles

  par(mar=c(4,5,1,1))
	plot(v_delta,toto$quantile[,2],ylim=c(1.5,6.5),xlab=expression(delta),
		ylab=expression(hat(q[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$quantile[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$quantile[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$quantileCIinf[,2],col="black")
	lines(v_delta,toto$quantileCIsup[,2],col="black")
	lines(v_delta,toto$quantileCIinf[,1],col="darkgreen")
	lines(v_delta,toto$quantileCIsup[,1],col="darkgreen")
	lines(v_delta,toto$quantileCIinf[,3],col="red")
	lines(v_delta,toto$quantileCIsup[,3],col="red")
	abline(h=q95,lty=2)
	legend(0.5,2.4,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
		
###########################################################		
# Plotting the PLI in percentage with refined confidence intervals

	toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,
	  InputDistributions=distribution,type="MOY",samedelta=TRUE,
	  percentage=TRUE,nboot=nboot,bootsample=FALSE)
	  
  par(mar=c(4,5,1,1))
	plot(v_delta,toto$PLI[,2],ylim=c(-0.6,0.6),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$PLICIinf[,2],col="black")
	lines(v_delta,toto$PLICIsup[,2],col="black")
	lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
	lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
	lines(v_delta,toto$PLICIinf[,3],col="red")
	lines(v_delta,toto$PLICIsup[,3],col="red")
	abline(h=0,lty=2)
	legend(0,0.6,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)

###################################################		
# another visualization by using the plotCI() fct 
# (from plotrix package) for the CI plotting(from Vanessa Verges)

  library(plotrix)
  parameters = list(colors=c("darkgreen","black","red"),
                  symbols=c(15,19,17),overlay=c(FALSE,TRUE,TRUE))
  par(mar=c(4,5,1,1),xpd=TRUE)
  for (i in 1:3){
    plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i],
         cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i],
         add=parameters$overlay[i], xlab="", ylab="")
  }
  title(xlab=expression(delta),ylab=expression(hat(PLI[i*delta])),
       main=bquote("PLI-quantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~
       ") of Y="~2*X[1] + X[2] + X[3]/2))
  abline(h=0,lty=2)
  legend("topleft",legend=c("X1","X2","X3"),col=parameters$colors,
          pch=parameters$symbols,cex=1.5)

	

# Model: 3D function 

  distribution = list()
	for (i in 1:3) distribution[[i]]=list("norm",c(0,1))
  
# Monte Carlo sampling 

  N = 5000
	X = matrix(0,ncol=3,nrow=N)
	for(i in 1:3) X[,i] = rnorm(N,0,1)
     
	Y = 2 * X[,1] + X[,2] + X[,3]/2
	alpha <- 0.95 # quantile order
	
	q95 = quantile(Y,alpha)
	
	nboot=20 # put nboot=200 for consistency
	
# sensitivity indices with perturbation of the mean 
  
	v_delta = seq(-1,1,1/10) 
	toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,
	  InputDistributions=distribution,type="MOY",samedelta=TRUE,
	  percentage=FALSE,nboot=nboot)

# Plotting the PLI

  par(mar=c(4,5,1,1))
	plot(v_delta,toto$PLI[,2],ylim=c(-1.5,1.5),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$PLICIinf[,2],col="black")
	lines(v_delta,toto$PLICIsup[,2],col="black")
	lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
	lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
	lines(v_delta,toto$PLICIinf[,3],col="red")
	lines(v_delta,toto$PLICIsup[,3],col="red")
	abline(h=0,lty=2)
	legend(0.8,1.5,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
  
# Plotting the perturbed quantiles

  par(mar=c(4,5,1,1))
	plot(v_delta,toto$quantile[,2],ylim=c(1.5,6.5),xlab=expression(delta),
		ylab=expression(hat(q[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$quantile[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$quantile[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$quantileCIinf[,2],col="black")
	lines(v_delta,toto$quantileCIsup[,2],col="black")
	lines(v_delta,toto$quantileCIinf[,1],col="darkgreen")
	lines(v_delta,toto$quantileCIsup[,1],col="darkgreen")
	lines(v_delta,toto$quantileCIinf[,3],col="red")
	lines(v_delta,toto$quantileCIsup[,3],col="red")
	abline(h=q95,lty=2)
	legend(0.5,2.4,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
		
###########################################################		
# Plotting the PLI in percentage with refined confidence intervals

	toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,
	  InputDistributions=distribution,type="MOY",samedelta=TRUE,
	  percentage=TRUE,nboot=nboot,bootsample=FALSE)
	  
  par(mar=c(4,5,1,1))
	plot(v_delta,toto$PLI[,2],ylim=c(-0.6,0.6),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$PLICIinf[,2],col="black")
	lines(v_delta,toto$PLICIsup[,2],col="black")
	lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
	lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
	lines(v_delta,toto$PLICIinf[,3],col="red")
	lines(v_delta,toto$PLICIsup[,3],col="red")
	abline(h=0,lty=2)
	legend(0,0.6,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)

###################################################		
# another visualization by using the plotCI() fct 
# (from plotrix package) for the CI plotting(from Vanessa Verges)

  library(plotrix)
  parameters = list(colors=c("darkgreen","black","red"),
                  symbols=c(15,19,17),overlay=c(FALSE,TRUE,TRUE))
  par(mar=c(4,5,1,1),xpd=TRUE)
  for (i in 1:3){
    plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i],
         cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i],
         add=parameters$overlay[i], xlab="", ylab="")
  }
  title(xlab=expression(delta),ylab=expression(hat(PLI[i*delta])),
       main=bquote("PLI-quantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~
       ") of Y="~2*X[1] + X[2] + X[3]/2))
  abline(h=0,lty=2)
  legend("topleft",legend=c("X1","X2","X3"),col=parameters$colors,
          pch=parameters$symbols,cex=1.5)

Perturbed-Law based sensitivity Indices (PLI) for quantile and simultaneous perturbations of 2 inputs

Description

PLIquantile_multivar computes the Perturbed-Law based Indices (PLI) for quantile and simultaneous perturbations of the means of 2 inputs, estimated by a Monte Carlo method.

Usage

PLIquantile_multivar(order,x,y,inputs,deltasvector,InputDistributions,samedelta=TRUE,
            percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE)
PLIquantile_multivar(order,x,y,inputs,deltasvector,InputDistributions,samedelta=TRUE,
            percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE)

Arguments

`order`	the order of the quantile to estimate.
`x`	the matrix of simulation points coordinates, one column per variable.
`y`	the vector of model outputs.
`inputs`	the vector of the two inputs' indices for which the indices will be computed.
`deltasvector`	a vector containing the values of the perturbed means for which the indices will be computed. Warning: if samedelta=FALSE, deltasvector has to be the vector of deltas (mean perturbations)
`InputDistributions`	a list of list. Each list contains, as a list, the name of the distribution to be used and the parameters. Implemented cases so far (for a mean perturbation): Gaussian, Uniform, Triangle, Left Trucated Gaussian, Left Truncated Gumbel. Using Gumbel requires the package `evd`.
`samedelta`	a boolean used with the value "MOY" for type. If it is set at TRUE, the mean perturbation will be the same for all the variables. If not, the mean perturbation will be new_mean = mean+sigma*delta where mean, sigma are parameters defined in InputDistributions and delta is a value of deltasvector.
`percentage`	a boolean that defines the formula used for the PLI. If it is set at FALSE, the initially proposed formula is used (see Sueur et al., 2017). If not (set as TRUE), the PLI is given in percentage of variation of the quantile (see Iooss et al., 2021).
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`bootsample`	If TRUE, the uncertainty about the original quantile estimation is taken into account in the PLI confidence intervals (see Iooss et al., 2021). If FALSE, standard confidence intervals are computed for the PLI. It mainly changes the CI at small delta values.

Details

This function does not allow perturbations on the variance of the inputs' distributions.

Value

PLIquantile_multivar returns a list of matrices (delta twist of input 1 (in rows) vs. delta twist of input 2 (in columns)) containing the following components:

`PLI`	the PLI.
`PLICIinf`	the bootstrap lower confidence interval values of the PLI.
`PLICIsup`	the bootstrap upper confidence interval values of the PLI.
`quantile`	the perturbed quantile.
`quantileCIinf`	the bootstrap lower confidence interval values of the perturbed quantile.
`quantileCIsup`	the bootstrap upper confidence interval values of the perturbed quantile.

Author(s)

Bertrand Iooss

References

Examples



# Model: 3D function 

distribution = list()
for (i in 1:3) distribution[[i]]=list("norm",c(0,1))
N = 5000
X = matrix(0,ncol=3,nrow=N)
for(i in 1:3) X[,i] = rnorm(N,0,1)
Y = 2 * X[,1] + X[,2] + X[,3]/2
alpha <- 0.95
nboot <- 20 # put nboot=200 for consistency

q95 = quantile(Y,alpha)
v_delta = seq(-1,1,1/10) 
toto12 = PLIquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta,
    InputDistributions=distribution,samedelta=TRUE)
toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution,
    type="MOY",samedelta=TRUE,nboot=0)

par(mar=c(4,5,1,1))
plot(v_delta,diag(toto12$PLI),,ylim=c(-1,1),xlab=expression(delta),
    ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue")
points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5)
points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
abline(h=0,lty=2)
legend(-1,1.,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"),
    pch=c(15,19,17,16),cex=1.5)

# with bootstrap

v_delta = seq(-1,1,2/10) 

toto12 = PLIquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta,
    InputDistributions=distribution,samedelta=TRUE,nboot=nboot,bootsample=FALSE)
toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution,
    type="MOY",samedelta=TRUE,nboot=nboot,bootsample=FALSE)

par(mar=c(4,5,1,1))
plot(v_delta,diag(toto12$PLI),ylim=c(-1,1),xlab=expression(delta),
    ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue")
points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5)
points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
lines(v_delta,diag(toto12$PLICIinf),col="blue")
lines(v_delta,diag(toto12$PLICIsup),col="blue")
lines(v_delta,toto$PLICIinf[,2],col="black")
lines(v_delta,toto$PLICIsup[,2],col="black")
lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
lines(v_delta,toto$PLICIinf[,3],col="red")
lines(v_delta,toto$PLICIsup[,3],col="red")
abline(h=0,lty=2)
legend(-1,1,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"),
    pch=c(15,19,17,16),cex=1.5)

###################################################		
# another visualizations by using the plotrix, 
# viridisLite, lattice and grid packages (from Vanessa Verges)

library(plotrix)

parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17))
par(mar=c(4,5,1,1),xpd=TRUE)
plotCI(v_delta,diag(toto12$PLI),ui=diag(toto12$PLICIsup),li=diag(toto12$PLICIinf),
       xlab=expression(delta),ylab=expression(hat(PLI[i*delta])),
       main=bquote("PLI-quantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~
       ") on "~X[1]~"and"~X[2]~"of Y="~2*X[1] + X[2] + X[3]/2),
       cex=1.5,col="blue",pch=16)
for (i in 1:3){
  plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i],
         cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i],
         add=TRUE)
}
abline(h=0,lty=2)
legend("topleft",legend=c("X1","X2","X3","X1X2"),
        col=c(parameters$colors,"blue"),pch=c(parameters$symbols,16),cex=1.5)

# Visu of all the PLIs (at any paired combinations of deltas)

library(viridisLite)
library(lattice)
library(grid)

colnames(toto12$PLI) = round(v_delta,2)
rownames(toto12$PLI) = round(v_delta,2)
coul = viridis(100)
levelplot(toto12$PLI, col.regions = coul, xlab=bquote(delta[X~.(1)]), 
  ylab=bquote(delta[X~.(2)]), main=bquote(hat(PLI)[quantile[~X[1]~X[2]]]))


# Model: 3D function 

distribution = list()
for (i in 1:3) distribution[[i]]=list("norm",c(0,1))
N = 5000
X = matrix(0,ncol=3,nrow=N)
for(i in 1:3) X[,i] = rnorm(N,0,1)
Y = 2 * X[,1] + X[,2] + X[,3]/2
alpha <- 0.95
nboot <- 20 # put nboot=200 for consistency

q95 = quantile(Y,alpha)
v_delta = seq(-1,1,1/10) 
toto12 = PLIquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta,
    InputDistributions=distribution,samedelta=TRUE)
toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution,
    type="MOY",samedelta=TRUE,nboot=0)

par(mar=c(4,5,1,1))
plot(v_delta,diag(toto12$PLI),,ylim=c(-1,1),xlab=expression(delta),
    ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue")
points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5)
points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
abline(h=0,lty=2)
legend(-1,1.,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"),
    pch=c(15,19,17,16),cex=1.5)

# with bootstrap

v_delta = seq(-1,1,2/10) 

toto12 = PLIquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta,
    InputDistributions=distribution,samedelta=TRUE,nboot=nboot,bootsample=FALSE)
toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution,
    type="MOY",samedelta=TRUE,nboot=nboot,bootsample=FALSE)

par(mar=c(4,5,1,1))
plot(v_delta,diag(toto12$PLI),ylim=c(-1,1),xlab=expression(delta),
    ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue")
points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5)
points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
lines(v_delta,diag(toto12$PLICIinf),col="blue")
lines(v_delta,diag(toto12$PLICIsup),col="blue")
lines(v_delta,toto$PLICIinf[,2],col="black")
lines(v_delta,toto$PLICIsup[,2],col="black")
lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
lines(v_delta,toto$PLICIinf[,3],col="red")
lines(v_delta,toto$PLICIsup[,3],col="red")
abline(h=0,lty=2)
legend(-1,1,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"),
    pch=c(15,19,17,16),cex=1.5)

###################################################		
# another visualizations by using the plotrix, 
# viridisLite, lattice and grid packages (from Vanessa Verges)

library(plotrix)

parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17))
par(mar=c(4,5,1,1),xpd=TRUE)
plotCI(v_delta,diag(toto12$PLI),ui=diag(toto12$PLICIsup),li=diag(toto12$PLICIinf),
       xlab=expression(delta),ylab=expression(hat(PLI[i*delta])),
       main=bquote("PLI-quantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~
       ") on "~X[1]~"and"~X[2]~"of Y="~2*X[1] + X[2] + X[3]/2),
       cex=1.5,col="blue",pch=16)
for (i in 1:3){
  plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i],
         cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i],
         add=TRUE)
}
abline(h=0,lty=2)
legend("topleft",legend=c("X1","X2","X3","X1X2"),
        col=c(parameters$colors,"blue"),pch=c(parameters$symbols,16),cex=1.5)

# Visu of all the PLIs (at any paired combinations of deltas)

library(viridisLite)
library(lattice)
library(grid)

colnames(toto12$PLI) = round(v_delta,2)
rownames(toto12$PLI) = round(v_delta,2)
coul = viridis(100)
levelplot(toto12$PLI, col.regions = coul, xlab=bquote(delta[X~.(1)]), 
  ylab=bquote(delta[X~.(2)]), main=bquote(hat(PLI)[quantile[~X[1]~X[2]]]))

Perturbed-Law based sensitivity Indices (PLI) for superquantile

Description

PLIsuperquantile computes the Perturbed-Law based Indices (PLI) for superquantile, which are robustness indices related to a superquantile of a model output, estimated by a Monte Carlo method. See Iooss et al. (2020).

Usage

PLIsuperquantile(order,x,y,deltasvector,InputDistributions,type="MOY",
  samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE,bias=TRUE)
PLIsuperquantile(order,x,y,deltasvector,InputDistributions,type="MOY",
  samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE,bias=TRUE)

Arguments

`order`	the order of the superquantile to estimate.
`x`	the matrix of simulation points coordinates, one column per variable.
`y`	the vector of model outputs.
`deltasvector`	a vector containing the values of delta for which the indices will be computed.
`InputDistributions`	a list of list. Each list contains, as a list, the name of the distribution to be used and the parameters. Implemented cases so far: For a mean perturbation: Gaussian, Uniform, Triangle, Left Trucated Gaussian, Left Truncated Gumbel. Using Gumbel requires the package `evd`. For a variance perturbation: Gaussian, Uniform.
`type`	a character string in which the user will specify the type of perturbation wanted. The sense of "deltasvector" varies according to the type of perturbation: type can take the value "MOY",in which case deltasvector is a vector of perturbated means. type can take the value "VAR",in which case deltasvector is a vector of perturbated variances, therefore needs to be positive integers.
`samedelta`	a boolean used with the value "MOY" for type. If it is set at TRUE, the mean perturbation will be the same for all the variables. If not, the mean perturbation will be new_mean = mean+sigma*delta where mean, sigma are parameters defined in InputDistributions and delta is a value of deltasvector.
`percentage`	a boolean that defines the formula used for the PLI. If it is set at FALSE, the classical formula used in the bibliographical references is used. If not (set as TRUE), the PLI is given in percentage of variation of the superquantile (even if it is negative).
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`bootsample`	If TRUE, the uncertainty about the original quantile estimation is taken into account in the PLI confidence intervals (see Iooss et al., 2020). If FALSE, standard confidence intervals are computed for the PLI. It mainly changes the CI at small delta values.
`bias`	defines the version of PLI-superquantile: If it is set at "TRUE", it gives the mean of outputs above the perturbed quantile (alternative formula) If it is set at "FALSE", it gives the mean of perturbed outputs above the perturbed quantile (original formula)

Value

PLIsuperquantile returns a list of matrices (each column corresponds to an input, each line corresponds to a twist of amplitude delta) containing the following components:

`PLI`	the PLI.
`PLICIinf`	the bootstrap lower confidence interval values of the PLI.
`PLICIsup`	the bootstrap upper confidence interval values of the PLI.
`superquantile`	the perturbed superquantile.
`superquantileCIinf`	the bootstrap lower confidence interval values of the perturbed superquantile.
`superquantileCIsup`	the bootstrap upper confidence interval values of the perturbed superquantile.

Author(s)

Bertrand Iooss

References

Examples



# Model: 3D function 

  distribution = list()
	for (i in 1:3) distribution[[i]]=list("norm",c(0,1))
  
# Monte Carlo sampling 

  N = 10000
	X = matrix(0,ncol=3,nrow=N)
	for(i in 1:3) X[,i] = rnorm(N,0,1)
     
	Y = 2 * X[,1] + X[,2] + X[,3]/2
	alpha <- 0.95
	
	q95 = quantile(Y,alpha)
  sq95a <- mean(Y*(Y>q95)/(1-alpha)) ; sq95b <- mean(Y[Y>q95])
	
	nboot=20 # change to nboot=200 for consistency
	
# sensitivity indices with perturbation of the mean 
  
	v_delta = seq(-1,1,1/10) 
	toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,
	  InputDistributions=distribution,type="MOY",samedelta=TRUE,
	  percentage=FALSE,nboot=nboot,bias=TRUE)

# Plotting the PLI
  par(mar=c(4,5,1,1))
	plot(v_delta,toto$PLI[,2],ylim=c(-0.5,0.5),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$PLICIinf[,2],col="black")
	lines(v_delta,toto$PLICIsup[,2],col="black")
	lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
	lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
	lines(v_delta,toto$PLICIinf[,3],col="red")
	lines(v_delta,toto$PLICIsup[,3],col="red")
	abline(h=0,lty=2)
	legend(-1,0.5,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
  
# Plotting the perturbed superquantiles
  par(mar=c(4,5,1,1))
	plot(v_delta,toto$superquantile[,2],ylim=c(3,7),xlab=expression(delta),
		ylab=expression(hat(q[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$superquantile[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$superquantile[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$superquantileCIinf[,2],col="black")
	lines(v_delta,toto$superquantileCIsup[,2],col="black")
	lines(v_delta,toto$superquantileCIinf[,1],col="darkgreen")
	lines(v_delta,toto$superquantileCIsup[,1],col="darkgreen")
	lines(v_delta,toto$superquantileCIinf[,3],col="red")
	lines(v_delta,toto$superquantileCIsup[,3],col="red")
	abline(h=q95,lty=2)
	legend(-1,7,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
		
# Plotting the unbiased PLI in percentage with refined confidence intervals
	toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,
	  InputDistributions=distribution,type="MOY",samedelta=TRUE,percentage=TRUE,
	  nboot=nboot,bootsample=FALSE,bias=FALSE)
	  
  par(mar=c(4,5,1,1))
	plot(v_delta,toto$PLI[,2],ylim=c(-0.4,0.5),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$PLICIinf[,2],col="black")
	lines(v_delta,toto$PLICIsup[,2],col="black")
	lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
	lines(v_delta,toto$PLICIsup[,1],col="darkgreen") 
	lines(v_delta,toto$PLICIinf[,3],col="red")
	lines(v_delta,toto$PLICIsup[,3],col="red")
	abline(h=0,lty=2)
	legend(-1,0.5,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)

##################################################
# another visualization by using the plotCI() fct 
# (from plotrix package) for the CI plotting (from Vanessa Verges)

	library(plotrix)
	parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17),
	  overlay=c(FALSE,TRUE,TRUE))
  par(mar=c(4,5,1,1),xpd=TRUE)
  for (i in 1:3){
  plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i],
         cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i],
         add=parameters$overlay[i], xlab="", ylab="")
  }
  title(xlab=expression(delta),ylab=expression(hat(PLI[i*delta])),
      main=bquote("PLI-superquantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~
      ") of Y="~2*X[1] + X[2] + X[3]/2))
  abline(h=0,lty=2)
  legend("topleft",legend=c("X1","X2","X3"),
          col=parameters$colors,pch=parameters$symbols,cex=1.5)


# Model: 3D function 

  distribution = list()
	for (i in 1:3) distribution[[i]]=list("norm",c(0,1))
  
# Monte Carlo sampling 

  N = 10000
	X = matrix(0,ncol=3,nrow=N)
	for(i in 1:3) X[,i] = rnorm(N,0,1)
     
	Y = 2 * X[,1] + X[,2] + X[,3]/2
	alpha <- 0.95
	
	q95 = quantile(Y,alpha)
  sq95a <- mean(Y*(Y>q95)/(1-alpha)) ; sq95b <- mean(Y[Y>q95])
	
	nboot=20 # change to nboot=200 for consistency
	
# sensitivity indices with perturbation of the mean 
  
	v_delta = seq(-1,1,1/10) 
	toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,
	  InputDistributions=distribution,type="MOY",samedelta=TRUE,
	  percentage=FALSE,nboot=nboot,bias=TRUE)

# Plotting the PLI
  par(mar=c(4,5,1,1))
	plot(v_delta,toto$PLI[,2],ylim=c(-0.5,0.5),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$PLICIinf[,2],col="black")
	lines(v_delta,toto$PLICIsup[,2],col="black")
	lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
	lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
	lines(v_delta,toto$PLICIinf[,3],col="red")
	lines(v_delta,toto$PLICIsup[,3],col="red")
	abline(h=0,lty=2)
	legend(-1,0.5,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
  
# Plotting the perturbed superquantiles
  par(mar=c(4,5,1,1))
	plot(v_delta,toto$superquantile[,2],ylim=c(3,7),xlab=expression(delta),
		ylab=expression(hat(q[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$superquantile[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$superquantile[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$superquantileCIinf[,2],col="black")
	lines(v_delta,toto$superquantileCIsup[,2],col="black")
	lines(v_delta,toto$superquantileCIinf[,1],col="darkgreen")
	lines(v_delta,toto$superquantileCIsup[,1],col="darkgreen")
	lines(v_delta,toto$superquantileCIinf[,3],col="red")
	lines(v_delta,toto$superquantileCIsup[,3],col="red")
	abline(h=q95,lty=2)
	legend(-1,7,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
		
# Plotting the unbiased PLI in percentage with refined confidence intervals
	toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,
	  InputDistributions=distribution,type="MOY",samedelta=TRUE,percentage=TRUE,
	  nboot=nboot,bootsample=FALSE,bias=FALSE)
	  
  par(mar=c(4,5,1,1))
	plot(v_delta,toto$PLI[,2],ylim=c(-0.4,0.5),xlab=expression(delta),
		ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5)
	points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
	points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
	lines(v_delta,toto$PLICIinf[,2],col="black")
	lines(v_delta,toto$PLICIsup[,2],col="black")
	lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
	lines(v_delta,toto$PLICIsup[,1],col="darkgreen") 
	lines(v_delta,toto$PLICIinf[,3],col="red")
	lines(v_delta,toto$PLICIsup[,3],col="red")
	abline(h=0,lty=2)
	legend(-1,0.5,legend=c("X1","X2","X3"),
		col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)

##################################################
# another visualization by using the plotCI() fct 
# (from plotrix package) for the CI plotting (from Vanessa Verges)

	library(plotrix)
	parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17),
	  overlay=c(FALSE,TRUE,TRUE))
  par(mar=c(4,5,1,1),xpd=TRUE)
  for (i in 1:3){
  plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i],
         cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i],
         add=parameters$overlay[i], xlab="", ylab="")
  }
  title(xlab=expression(delta),ylab=expression(hat(PLI[i*delta])),
      main=bquote("PLI-superquantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~
      ") of Y="~2*X[1] + X[2] + X[3]/2))
  abline(h=0,lty=2)
  legend("topleft",legend=c("X1","X2","X3"),
          col=parameters$colors,pch=parameters$symbols,cex=1.5)

Perturbed-Law based sensitivity Indices (PLI) for superquantile and simultaneous perturbations of 2 inputs

Description

PLIquantile_multivar computes the Perturbed-Law based Indices (PLI) for superquantile and simultaneous perturbations of the means of 2 inputs, estimated by a Monte Carlo method.

Usage

PLIsuperquantile_multivar(order,x,y,inputs,deltasvector,InputDistributions,
  samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE,bias=TRUE)
PLIsuperquantile_multivar(order,x,y,inputs,deltasvector,InputDistributions,
  samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE,bias=TRUE)

Arguments

`order`	the order of the quantile to estimate.
`x`	the matrix of simulation points coordinates, one column per variable.
`y`	the vector of model outputs.
`inputs`	the vector of the two inputs' indices for which the indices will be computed.
`deltasvector`	a vector containing the values of the perturbed means for which the indices will be computed. Warning: if samedelta=FALSE, deltasvector has to be the vector of deltas (mean perturbations)
`InputDistributions`	a list of list. Each list contains, as a list, the name of the distribution to be used and the parameters. Implemented cases so far (for a mean perturbation): Gaussian, Uniform, Triangle, Left Trucated Gaussian, Left Truncated Gumbel. Using Gumbel requires the package `evd`.
`samedelta`	a boolean used with the value "MOY" for type. If it is set at TRUE, the mean perturbation will be the same for all the variables. If not, the mean perturbation will be new_mean = mean+sigma*delta where mean, sigma are parameters defined in InputDistributions and delta is a value of deltasvector.
`percentage`	a boolean that defines the formula used for the PLI. If it is set at FALSE, the initially proposed formula is used (see Sueur et al., 2017). If not (set as TRUE), the PLI is given in percentage of variation of the superquantile (see Iooss et al., 2021).
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`bootsample`	If TRUE, the uncertainty about the original quantile estimation is taken into account in the PLI confidence intervals (see Iooss et al., 2021). If FALSE, standard confidence intervals are computed for the PLI. It mainly changes the CI at small delta values.
`bias`	defines the version of PLI-superquantile: If it is set at "TRUE", it gives the mean of outputs above the perturbed quantile (alternative formula) If it is set at "FALSE", it gives the mean of perturbed outputs above the perturbed quantile (original formula)

Details

This function does not allow perturbations on the variance of the inputs' distributions.

Value

PLIsuperquantile_multivar returns a list of matrices (delta twist of input 1 (in rows) vs. delta twist of input 2 (in columns)) containing the following components:

`PLI`	the PLI.
`PLICIinf`	the bootstrap lower confidence interval values of the PLI.
`PLICIsup`	the bootstrap upper confidence interval values of the PLI.
`quantile`	the perturbed quantile.
`quantileCIinf`	the bootstrap lower confidence interval values of the perturbed superquantile.
`quantileCIsup`	the bootstrap upper confidence interval values of the perturbed superquantile.

Author(s)

Bertrand Iooss

References

Examples



# Model: 3D function 

distribution = list()
for (i in 1:3) distribution[[i]]=list("norm",c(0,1))
N = 10000
X = matrix(0,ncol=3,nrow=N)
for(i in 1:3) X[,i] = rnorm(N,0,1)
Y = 2 * X[,1] + X[,2] + X[,3]/2
alpha <- 0.95
nboot <- 20 # put nboot=200 for consistency

q95 = quantile(Y,alpha)
sq95a <- mean(Y*(Y>q95)/(1-alpha)) ; sq95b <- mean(Y[Y>q95])
  
v_delta = seq(-1,1,1/10) 
toto12 = PLIsuperquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta,
    InputDistributions=distribution,samedelta=TRUE,bias=FALSE)
toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution,
    type="MOY",samedelta=TRUE,nboot=0,bias=FALSE)

par(mar=c(4,5,1,1))
plot(v_delta,diag(toto12$PLI),,ylim=c(-1,1),xlab=expression(delta),
    ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue")
points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5)
points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
abline(h=0,lty=2)
legend(-1,1.,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"),
    pch=c(15,19,17,16),cex=1.5)

# with bootstrap (put in comment because too long for the CRAN tests)

v_delta = seq(-1,1,2/10) 

toto12 = PLIsuperquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta,
    InputDistributions=distribution,samedelta=TRUE,nboot=nboot,bootsample=FALSE,bias=FALSE)
toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution,
    type="MOY",samedelta=TRUE,nboot=nboot,bootsample=FALSE,bias=FALSE)

par(mar=c(4,5,1,1))
plot(v_delta,diag(toto12$PLI),ylim=c(-1,1),xlab=expression(delta),
    ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue")
points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5)
points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
lines(v_delta,diag(toto12$PLICIinf),col="blue")
lines(v_delta,diag(toto12$PLICIsup),col="blue")
lines(v_delta,toto$PLICIinf[,2],col="black")
lines(v_delta,toto$PLICIsup[,2],col="black")
lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
lines(v_delta,toto$PLICIinf[,3],col="red")
lines(v_delta,toto$PLICIsup[,3],col="red")
abline(h=0,lty=2)
legend(-1,1,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"),
    pch=c(15,19,17,16),cex=1.5)

###################################################		
# another visualizations by using the plotrix, 
# viridisLite, lattice and grid packages (from Vanessa Verges)

library(plotrix)
parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17))
par(mar=c(4,5,1,1),xpd=TRUE)
plotCI(v_delta,diag(toto12$PLI),ui=diag(toto12$PLICIsup),li=diag(toto12$PLICIinf),
       xlab=expression(delta),ylab=expression(hat(PLI[i*delta])),
       main=bquote("PLI-superquantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~
       ") on "~X[1]~"and"~X[2]~"of Y="~2*X[1] + X[2] + X[3]/2),
       cex=1.5,col="blue",pch=16)
for (i in 1:3){
  plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i],
         cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i],
         add=TRUE)
}
abline(h=0,lty=2)
legend("topleft",legend=c("X1","X2","X3","X1X2"),
        col=c(parameters$colors,"blue"),pch=c(parameters$symbols,16),cex=1.5)

# Visu of all the PLIs (at any paired combinations of deltas)

library(viridisLite)
library(lattice)
library(grid)

colnames(toto12$PLI) = round(v_delta,2)
rownames(toto12$PLI) = round(v_delta,2)
coul = viridis(100)
levelplot(toto12$PLI,col.regions=coul,main=bquote(hat(PLI)[superquantile[~X[1]~X[2]]]),
          xlab=bquote(delta[X~.(1)]),ylab=bquote(delta[X~.(2)]))


# Model: 3D function 

distribution = list()
for (i in 1:3) distribution[[i]]=list("norm",c(0,1))
N = 10000
X = matrix(0,ncol=3,nrow=N)
for(i in 1:3) X[,i] = rnorm(N,0,1)
Y = 2 * X[,1] + X[,2] + X[,3]/2
alpha <- 0.95
nboot <- 20 # put nboot=200 for consistency

q95 = quantile(Y,alpha)
sq95a <- mean(Y*(Y>q95)/(1-alpha)) ; sq95b <- mean(Y[Y>q95])
  
v_delta = seq(-1,1,1/10) 
toto12 = PLIsuperquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta,
    InputDistributions=distribution,samedelta=TRUE,bias=FALSE)
toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution,
    type="MOY",samedelta=TRUE,nboot=0,bias=FALSE)

par(mar=c(4,5,1,1))
plot(v_delta,diag(toto12$PLI),,ylim=c(-1,1),xlab=expression(delta),
    ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue")
points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5)
points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
abline(h=0,lty=2)
legend(-1,1.,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"),
    pch=c(15,19,17,16),cex=1.5)

# with bootstrap (put in comment because too long for the CRAN tests)

v_delta = seq(-1,1,2/10) 

toto12 = PLIsuperquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta,
    InputDistributions=distribution,samedelta=TRUE,nboot=nboot,bootsample=FALSE,bias=FALSE)
toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution,
    type="MOY",samedelta=TRUE,nboot=nboot,bootsample=FALSE,bias=FALSE)

par(mar=c(4,5,1,1))
plot(v_delta,diag(toto12$PLI),ylim=c(-1,1),xlab=expression(delta),
    ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue")
points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5)
points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5)
points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5)
lines(v_delta,diag(toto12$PLICIinf),col="blue")
lines(v_delta,diag(toto12$PLICIsup),col="blue")
lines(v_delta,toto$PLICIinf[,2],col="black")
lines(v_delta,toto$PLICIsup[,2],col="black")
lines(v_delta,toto$PLICIinf[,1],col="darkgreen")
lines(v_delta,toto$PLICIsup[,1],col="darkgreen")
lines(v_delta,toto$PLICIinf[,3],col="red")
lines(v_delta,toto$PLICIsup[,3],col="red")
abline(h=0,lty=2)
legend(-1,1,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"),
    pch=c(15,19,17,16),cex=1.5)

###################################################		
# another visualizations by using the plotrix, 
# viridisLite, lattice and grid packages (from Vanessa Verges)

library(plotrix)
parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17))
par(mar=c(4,5,1,1),xpd=TRUE)
plotCI(v_delta,diag(toto12$PLI),ui=diag(toto12$PLICIsup),li=diag(toto12$PLICIinf),
       xlab=expression(delta),ylab=expression(hat(PLI[i*delta])),
       main=bquote("PLI-superquantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~
       ") on "~X[1]~"and"~X[2]~"of Y="~2*X[1] + X[2] + X[3]/2),
       cex=1.5,col="blue",pch=16)
for (i in 1:3){
  plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i],
         cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i],
         add=TRUE)
}
abline(h=0,lty=2)
legend("topleft",legend=c("X1","X2","X3","X1X2"),
        col=c(parameters$colors,"blue"),pch=c(parameters$symbols,16),cex=1.5)

# Visu of all the PLIs (at any paired combinations of deltas)

library(viridisLite)
library(lattice)
library(grid)

colnames(toto12$PLI) = round(v_delta,2)
rownames(toto12$PLI) = round(v_delta,2)
coul = viridis(100)
levelplot(toto12$PLI,col.regions=coul,main=bquote(hat(PLI)[superquantile[~X[1]~X[2]]]),
          xlab=bquote(delta[X~.(1)]),ylab=bquote(delta[X~.(2)]))

Support index functions: Measuring the effect of input variables over their support

Description

Methods to plot the normalized support index functions (Fruth et al., 2016).

Usage

## S3 method for class 'support'
plot(x, i = 1:ncol(x$X),
        xprob = FALSE, p = NULL, p.arg = NULL,
        ylim = NULL, col = 1:3, lty = 1:3, lwd = c(2,2,1), cex = 1, ...)
## S3 method for class 'support'
scatterplot(x, i = 1:ncol(x$X), 
               xprob = FALSE, p = NULL, p.arg = NULL, 
               cex = 1, cex.lab = 1, ...)

## S3 method for class 'support'
plot(x, i = 1:ncol(x$X),
        xprob = FALSE, p = NULL, p.arg = NULL,
        ylim = NULL, col = 1:3, lty = 1:3, lwd = c(2,2,1), cex = 1, ...)
## S3 method for class 'support'
scatterplot(x, i = 1:ncol(x$X), 
               xprob = FALSE, p = NULL, p.arg = NULL, 
               cex = 1, cex.lab = 1, ...)

Arguments

`x`	an object of class support.
`i`	an optional vector of integers indicating the subset of input variables `X_i` for plotting. Default is the entire set of input variables.
`xprob`	an optional boolean indicating whether the inputs should be plotted in probability scale.
`p`	,
`p.arg`	list of probability names and parameters for the input distribution.
`ylim`	,
`col`	,
`lty`	,
`lwd`	,
`cex`	,
`cex.lab`	usual graphical parameters.
`...`	additional graphical parameters to be passed to `scatterplot` method (`ggMarginal` function).

Details

If xprob = TRUE, the input variable X_i is plotted in probability scale according to the informations provided in the arguments p, p.arg: The x-axis is thus F(x), where F is the cdf of X_i. If these ones are not provided, the empirical distribution is used for rescaling: The x-axis is thus Fn(x), where Fn is the empirical cdf of X_i.

Legend details:

zeta*T : normalized total support index function

zeta* : normalized 1st-order support index function

nu* : normalized DGSM

Notice that the sum of (normalized) DGSM (nu*) over all input variables is equal to 1. Furthermore, the expectation of the total support index function (zeta*T) is equal to the (normalized) DGSM (nu*).

Author(s)

O. Roustant

Data-given proportional marginal effects estimation via nearest-neighbors procedure

Description

pme_knn computes the proportional marginal effects (PME), from Herin et al. (2024) via a nearest neighbor estimation. Parallelized computations are possible to accelerate the estimation process. It can be used with categorical inputs (which are transformed with one-hot encoding before computing the nearest-neighbors), dependent inputs and multiple outputs. For large sample sizes, the nearest neighbour algorithm can be significantly accelerated by using approximate nearest neighbour search.

Usage

pme_knn(model=NULL, X, method = "knn", tol = NULL, marg = T, n.knn = 2, 
          n.limit = 2000, noise = F, rescale = F, nboot = NULL, 
          boot.level = 0.8, conf=0.95, parl=NULL, ...)
## S3 method for class 'pme_knn'
tell(x, y, ...)
## S3 method for class 'pme_knn'
print(x, ...)
## S3 method for class 'pme_knn'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'pme_knn'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
pme_knn(model=NULL, X, method = "knn", tol = NULL, marg = T, n.knn = 2, 
          n.limit = 2000, noise = F, rescale = F, nboot = NULL, 
          boot.level = 0.8, conf=0.95, parl=NULL, ...)
## S3 method for class 'pme_knn'
tell(x, y, ...)
## S3 method for class 'pme_knn'
print(x, ...)
## S3 method for class 'pme_knn'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'pme_knn'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function defining the model to analyze, taking X as an argument.
`X`	a matrix or data frame containing the observed inputs.
`method`	the algorithm to be used for estimation, either "rank" or "knn", see details. Default is `method="knn"`.
`tol`	tolerance under which an input is considered as being a zero input. See details.
`marg`	whether to chose the closed Sobol' (`FALSE`) or total Sobol' (`TRUE`) indices as value functions.
`n.knn`	the number of nearest neighbours used for estimation.
`n.limit`	sample size limit above which approximate nearest neighbour search is activated.
`noise`	a logical which is TRUE if the model or the output sample is noisy. See details.
`rescale`	a logical indicating if continuous inputs must be rescaled before distance computations. If TRUE, continuous inputs are first whitened with the ZCA-cor whitening procedure (cf. whiten() function in package `whitening`). If the inputs are independent, this first step will have a very limited impact. Then, the resulting whitened inputs are individually modified via a copula transform such that each input has the same scale.
`nboot`	the number of bootstrap resamples for the bootstrap estimate of confidence intervals. See details.
`boot.level`	a numeric between 0 and 1 for the proportion of the bootstrap sample size.
`conf`	the confidence level of the bootstrap confidence intervals.
`parl`	number of cores on which to parallelize the computation. If `NULL`, then no parallelization is done.
`x`	the object returned by `pme_knn`.
`data`	the object returned by `pme_knn`.
`y`	a numeric univariate vector containing the observed outputs.
`ylim`	the y-coordinate limits for plotting.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	additional arguments to be passed to `model`, or to the methods, such as graphical parameters (see `par`).

Details

The computation is done using the subset procedure, defined in Broto, Bachoc and Depecker (2020), that is computing all the Sobol' closed indices for all possible sub-models first, and then computing the proportional values recursively, as detailed in Feldman (2005), but using an extension to non strictly positive games (Herin et al., 2024).

Since boostrap creates ties which are not accounted for in the algorithm, confidence intervals are obtained by sampling without replacement with a proportion of the total sample size boot.level, drawn uniformly.

If the outputs are noisy, the argument noise can be used: it only has an impact on the estimation of one specific sensitivity index, namely $Var(E(Y|X1,\ldots,Xp))/Var(Y)$ . If there is no noise this index is equal to 1, while in the presence of noise it must be estimated.

The distance used for subsets with mixed inputs (continuous and categorical) is the Euclidean distance, thanks to a one-hot encoding of categorical inputs.

If too many cores for the machine are passed on to the parl argument, the chosen number of cores is defaulted to the available cores minus one.

If marg = TRUE (default), the chosen value function to compute the proportional values are the total Sobol' indices (dual of the underlying cooperative game). If marg = FALSE, then the closed Sobol' indices are used instead. Differences may appear between the two.

Zero inputs are defined by the tol argument. If null, then inputs with:

$S^T_{\{i\}}) = 0$

are considered as zero input in the detection of spurious variables. If provided, zero inputs are detected when:

$S^T_{\{i\}} \leq \textrm{tol}$

Value

pme_knn returns a list of class "pme_knn":

`call`	the matched call.
`PME`	the estimations of the PME indices.
`VE`	the estimations of the closed Sobol' indices for all possible sub-models.
`indices`	list of all subsets corresponding to the structure of VE.
`method`	which estimation method has been used.
`conf_int`	a matrix containing the estimations, biais and confidence intervals by bootstrap (if `nboot>0`).
`X`	the observed covariates.
`y`	the observed outcomes.
`n.knn`	value of the `n.knn` argument.
`rescale`	wheter the design matrix has been rescaled.
`n.limit`	value of the `n.limit` argument.
`boot.level`	value of the `boot.level` argument.
`noise`	wheter the PME must sum up to one or not.
`boot`	logical, wheter bootstrap confidence interval estimates have been performed.
`nboot`	value of the `nboot` argument.
`parl`	value of the `parl` argument.
`conf`	value of the `conf` argument.
`marg`	value of the `marg` argument.
`tol`	value of the `tol` argument.

Author(s)

Marouane Il Idrissi, Margot Herin

References

Azadkia M., Chatterjee S., 2021), A simple measure of conditional dependence, Ann. Statist. 49(6):3070-3102.

Chatterjee, S., 2021, A new coefficient of correlation, Journal of the American Statistical Association, 116:2009-2022.

Gamboa, F., Gremaud, P., Klein, T., & Lagnoux, A., 2022, Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics, Bernoulli 28: 2345-2374.

Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).

M. Herin, M. Il Idrissi, V. Chabridon and B. Iooss, Proportional marginal effects for sensitivity analysis with correlated inputs, Proceedings of the 10th International Conferenceon Sensitivity Analysis of Model Output (SAMO 2022), p 42-43, Tallahassee, Florida, March 2022.

M. Herin, M. Il Idrissi, V. Chabridon and B. Iooss, Proportional marginal effects for global sensitivity analysis, SIAM/ASA Journal of Uncertainty Quantification, 12:667-692 2024

M. Il Idrissi, V. Chabridon and B. Iooss (2021). Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs. Environmental Modelling & Software, 143, 105115.

Feldman, B. (2005) Relative Importance and Value SSRN Electronic Journal.

Examples

  
  
library(parallel)
library(doParallel)
library(foreach)
library(gtools)
library(boot)
library(RANN)

###########################################################
# Linear Model with Gaussian correlated inputs

library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")


y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-pme_knn(model=NULL, X=X,
            n.knn=3,
            noise=TRUE)
tell(x,y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-pme_knn(model=NULL, X=X,
            nboot=10, 
            n.knn=3,
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)

#####################################################
# Test case: the Ishigami function
# Example with given data and the use of approximate nearest neighbour search
n <- 5000
X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
Y <- ishigami.fun(X)
x <- pme_knn(model = NULL, X = X,  method = "knn", n.knn = 5, 
                       n.limit = 2000)
tell(x,Y)
plot(x)

library(ggplot2) ; ggplot(x)

######################################################
# Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling
# See Iooss and Prieur (2019)
library(mvtnorm) # Multivariate Gaussian variables
library(whitening) # For scaling
modlin <- function(X) apply(X,1,sum)
d <- 3
n <- 10000
mu <- rep(0,d)
sig <- c(1,1,2)
ro <- 0.9
Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
Covmat <- ( sig %*% t(sig) ) * Cormat
Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)
X <- Xall(n)
x <- pme_knn(model = modlin, X = X, method = "knn", n.knn = 5, 
                       rescale = TRUE, n.limit = 2000)
print(x)
plot(x)

library(parallel)
library(doParallel)
library(foreach)
library(gtools)
library(boot)
library(RANN)

###########################################################
# Linear Model with Gaussian correlated inputs

library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")


y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-pme_knn(model=NULL, X=X,
            n.knn=3,
            noise=TRUE)
tell(x,y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-pme_knn(model=NULL, X=X,
            nboot=10, 
            n.knn=3,
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)

#####################################################
# Test case: the Ishigami function
# Example with given data and the use of approximate nearest neighbour search
n <- 5000
X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
Y <- ishigami.fun(X)
x <- pme_knn(model = NULL, X = X,  method = "knn", n.knn = 5, 
                       n.limit = 2000)
tell(x,Y)
plot(x)

library(ggplot2) ; ggplot(x)

######################################################
# Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling
# See Iooss and Prieur (2019)
library(mvtnorm) # Multivariate Gaussian variables
library(whitening) # For scaling
modlin <- function(X) apply(X,1,sum)
d <- 3
n <- 10000
mu <- rep(0,d)
sig <- c(1,1,2)
ro <- 0.9
Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
Covmat <- ( sig %*% t(sig) ) * Cormat
Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)
X <- Xall(n)
x <- pme_knn(model = modlin, X = X, method = "knn", n.knn = 5, 
                       rescale = TRUE, n.limit = 2000)
print(x)
plot(x)

Proportional Marginal Variance Decomposition indices for linear and logistic models

Description

pmvd computes the PMVD indices derived from Feldman (2005) applied to the explained variance ( $R^2$ ) as a performance metric. They allow for relative importance indices by $R^2$ decomposition for linear and logistic regression models. These indices allocate a share of $R^2$ to each input based on a Proportional attribution system, allowing for covariates with null regression coefficients to have indices equal to 0, despite their potential dependence with other covariates (Exclusion principle).

Usage

pmvd(X, y, logistic = FALSE, tol = NULL, rank = FALSE, nboot = 0, 
    conf = 0.95, max.iter = 1000, parl = NULL)
## S3 method for class 'pmvd'
print(x, ...)
## S3 method for class 'pmvd'
plot(x, ylim = c(0,1), ...)
pmvd(X, y, logistic = FALSE, tol = NULL, rank = FALSE, nboot = 0, 
    conf = 0.95, max.iter = 1000, parl = NULL)
## S3 method for class 'pmvd'
print(x, ...)
## S3 method for class 'pmvd'
plot(x, ylim = c(0,1), ...)

Arguments

`X`	a matrix or data frame containing the observed covariates (i.e., features, input variables...).
`y`	a numeric vector containing the observed outcomes (i.e., dependent variable). If `logistic=TRUE`, can be a numeric vector of zeros and ones, or a logical vector, or a factor.
`logistic`	logical. If `TRUE`, the analysis is done via a logistic regression(binomial GLM).
`tol`	covariates with absolute marginal contributions less or equal to `tol` are omitted. By default, if `tol=NULL`, only covariates with no marginal contribution are omitted.
`rank`	logical. If `TRUE`, the analysis is done on the ranks.
`nboot`	the number of bootstrap replicates for the computation of confidence intervals.
`conf`	the confidence level of the bootstrap confidence intervals.
`max.iter`	if `logistic=TRUE`, the maximum number of iterative optimization steps allowed for the logistic regression. Default is `1000`.
`parl`	number of cores on which to parallelize the computation. If `NULL`, then no parallelization is done.
`x`	the object returned by `lmg`.
`ylim`	the y-coordinate limits of the plot.
`...`	arguments to be passed to methods, such as graphical parameters (see `par`).

Details

The computation of the PMVD is done using the recursive method defined in Feldman (2005), but using the subset procedure defined in Broto, Bachoc and Depecker (2020), that is computing all the $R^2$ for all possible sub-models first, and then computing $P(.)$ recursively for all subsets of covariates. See Il Idrissi et al. (2021).

For logistic regression (logistic=TRUE), the $R^2$ value is equal to:

$R^2 = 1-\frac{\textrm{model deviance}}{\textrm{null deviance}}$

If too many cores for the machine are passed on to the parl argument, the chosen number of cores is defaulted to the available cores minus one.

Spurious covariates are defined by the tol argument. If null, then covariates with:

$w(\{i\}) = 0$

are omitted, and their pmvd index is set to zero. In other cases, the spurious covariates are detected by:

$|w(\{i\})| \leq \textrm{tol}$

Value

pmvd returns a list of class "pmvd", containing the following components:

`call`	the matched call.
`pmvd`	a data frame containing the estimations of the PMVD indices.
`R2s`	the estimations of the $R^2$ for all possible sub-models.
`indices`	list of all subsets corresponding to the structure of R2s.
`P`	the values of $P(.)$ of all subsets for recursive computing. Equal to `NULL` if bootstrap estimates are made.
`conf_int`	a matrix containing the estimations, biais and confidence intervals by bootstrap (if `nboot>0`).
`X`	the observed covariates.
`y`	the observed outcomes.
`logistic`	logical. `TRUE` if the analysis has been made by logistic regression.
`boot`	logical. `TRUE` if bootstrap estimates have been produced.
`nboot`	number of bootstrap replicates.
`rank`	logical. `TRUE` if a rank analysis has been made.
`parl`	number of chosen cores for the computation.
`conf`	level for the confidence intervals by bootstrap.

Author(s)

Marouane Il Idrissi

References

Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).

D.V. Budescu (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114:542-551.

Feldman, B. (2005) Relative Importance and Value SSRN Electronic Journal.

U. Gromping (2006). Relative importance for linear regression in R: the Package relaimpo. Journal of Statistical Software, 17:1-27.

Examples

library(parallel)
library(gtools)
library(boot)

library(mvtnorm)

set.seed(1234)
n <- 100
beta<-c(1,-2,3)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

############################
# Gaussian correlated inputs

X <-rmvnorm(n, rep(0,3), sigma)

#############################
# Linear Model

y <- X%*%beta + rnorm(n)

# Without Bootstrap confidence intervals
x<-pmvd(X, y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-pmvd(X, y, nboot=100, conf=0.95)
print(x)
plot(x)

# Rank-based analysis
x<-pmvd(X, y, rank=TRUE, nboot=100, conf=0.95)
print(x)
plot(x)

############################
# Logistic Regression
y<-as.numeric(X%*%beta + rnorm(n)>0)
x<-pmvd(X,y, logistic = TRUE)
plot(x)
print(x)

# Parallel computing
#x<-pmvd(X,y, logistic = TRUE, parl=2)
#plot(x)
#print(x)

library(parallel)
library(gtools)
library(boot)

library(mvtnorm)

set.seed(1234)
n <- 100
beta<-c(1,-2,3)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

############################
# Gaussian correlated inputs

X <-rmvnorm(n, rep(0,3), sigma)

#############################
# Linear Model

y <- X%*%beta + rnorm(n)

# Without Bootstrap confidence intervals
x<-pmvd(X, y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-pmvd(X, y, nboot=100, conf=0.95)
print(x)
plot(x)

# Rank-based analysis
x<-pmvd(X, y, rank=TRUE, nboot=100, conf=0.95)
print(x)
plot(x)

############################
# Logistic Regression
y<-as.numeric(X%*%beta + rnorm(n)>0)
x<-pmvd(X,y, logistic = TRUE)
plot(x)
print(x)

# Parallel computing
#x<-pmvd(X,y, logistic = TRUE, parl=2)
#plot(x)
#print(x)

Squared coefficients computation in generalized chaos

Description

This program computes the squared coefficient of the function decomposition in the tensor basis formed by eigenfunctions of Poincare differential operators. After division by the variance of the model output, it provides lower bounds of first-order and total Sobol' indices.

Usage

PoincareChaosSqCoef(PoincareEigen, multiIndex, design, output, outputGrad = NULL, 
                    inputIndex = 1, der = FALSE, method = "unbiased")
PoincareChaosSqCoef(PoincareEigen, multiIndex, design, output, outputGrad = NULL, 
                    inputIndex = 1, der = FALSE, method = "unbiased")

Arguments

`PoincareEigen`	output list from PoincareOptimal() function
`multiIndex`	vector of indices (l1, ..., ld). A coordinate equal to 0 corresponds to the constant basis function 1
`design`	design of experiments (matrix of size n x d) with d the number of inputs and n the number of observations
`output`	vector of length n (y1, ..., yn) of output values at `design` points
`outputGrad`	matrix n x d whose columns contain the output partial derivatives at `design` points
`inputIndex`	index of the input variable (between 1 and d)
`der`	logical (default=FALSE): should we use the formula with derivatives to compute the squared coefficient?
`method`	"biased" or "unbiased" formula when estimating the squared integral. See `squaredIntEstim`

Details

Similarly to polynomial chaos, where tensors of polynomials are used, we consider here tensor basis formed by eigenfunctions of Poincare differential operators. This basis is also orthonormal, and Parseval formula lead to lower bound for (unnormalized) Sobol, total Sobol indices, and any variance-based index. Denoting by $(e_{1, l1}... e_{d, ld})$ one tensor basis, the corresponding coefficient is equal to

$c_{l1, ..., ld} = <f, e_{1, l1}... e_{d, ld}>$ .

For a given input variable (say $x1$ to simplify notations), it can be rewritten with derivatives as:

$c_{l1, ..., ld} = <df/dx1, de_{1, l1}/dx1 e_{2, l2}...e_{d, ld}> / eigenvalue_{1, l1}$

The function returns an estimate of $c_{l1, ..., ld}^2$ , corresponding to one of these two forms (derivative-free, or derivative-based).

Value

An estimate of the squared coefficient.

Author(s)

Olivier Roustant and Bertrand Iooss

References

O. Roustant, F. Gamboa and B. Iooss, Parseval inequalities and lower bounds for variance-based sensitivity indices, Electronic Journal of Statistics, 14:386-412, 2020

Examples


# A simple example

g <- function(x, a){
  res <- x[, 1] + a*x[, 1]*x[, 2]
  attr(res, "grad") <- cbind(1 + a * x[, 2], a * x[, 1])
  return(res)
}

n <- 1e3
set.seed(0)
X <- matrix(runif(2*n, min = -1/2, max = 1/2), nrow = n, ncol = 2)
a <- 3
fX <- g(X, a = a)

out_1 <- out_2 <- PoincareOptimal(distr = list("unif", -1/2, 1/2), 
                                  only.values = FALSE, der = TRUE, 
                                  method = "quad")
out <- list(out_1, out_2)

# Lower bounds for X1
c2_10 <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 0), 
                             design = X, output = fX, outputGrad = attr(fX, "grad"), 
                             inputIndex = 1, der = FALSE)
c2_11 <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 1), 
                             design = X, output = fX, outputGrad = attr(fX, "grad"), 
                             inputIndex = 1, der = FALSE)
c2_10_der <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 0), 
                                 design = X, output = fX, outputGrad = attr(fX, "grad"), 
                                 inputIndex = 1, der = TRUE)
c2_11_der <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 1), 
                                 design = X, output = fX, outputGrad = attr(fX, "grad"), 
                                 inputIndex = 1, der = TRUE)

LB1 <- c(8/pi^4, c2_10, c2_10_der)
LB1tot <- LB1 + c(64/pi^8 * a^2, c2_11, c2_11_der)
LB <- cbind(LB1, LB1tot)
rownames(LB) <- c("True lower bound value", 
                  "Estimated, no derivatives", "Estimated, with derivatives")
colnames(LB) <- c("D1", "D1tot")
cat("True values of D1 and D1tot:", c(1/12, 1/12 + a^2 / 144),"\n")
cat("Sample size: ", n, "\n")
cat("Lower bounds computed with the first Poincare eigenvalue:\n")
print(LB)
cat("\nN.B. Increase the sample size to see the convergence to true lower bound values.\n")

############################################################
# Flood model example (see Roustant et al., 2017, 2019)



library(evd) # Gumbel law
library(triangle) # Triangular law

# Flood model
Fcrues_full2=function(X,ans=0){ 
  # ans=1 gives Overflow output; ans=2 gives Cost output; ans=0 gives both
  mat=matrix(X,ncol=8);
  if (ans==0){ reponse=matrix(NA,nrow(mat),2);}
  else{ reponse=rep(NA,nrow(mat));}
  for (i in 1:nrow(mat)) {
    H = (mat[i,1] / (mat[i,2]*mat[i,8]*sqrt((mat[i,4] - mat[i,3])/mat[i,7])))^(0.6) ;
    S = mat[i,3] + H - mat[i,5] - mat[i,6] ;
    if (S > 0){ Cp = 1 ;}
    else{ Cp = 0.2 + 0.8 * (1 - exp(-1000 / S^4));}
    if (mat[i,5]>8){ Cp = Cp + mat[i,5]/20 ;}
    else{ Cp = Cp + 8/20 ;}
    if (ans==0){
      reponse[i,1] = S ;
      reponse[i,2] = Cp ;
    }
    if (ans==1){ reponse[i] = S ;}
    if (ans==2){ reponse[i] = Cp ;}
    
  }
  return(RES=reponse)
}

# Flood model derivatives (by finite-differences)
dFcrues_full2 <- function(X, i, ans, eps){
  der = X
  X1 = X
  X1[,i] = X[,i]+eps
  der = (Fcrues_full2(X1,ans) - Fcrues_full2(X,ans))/(eps)
  return(der)
}

# Function for flood model inputs sampling
EchantFcrues_full2<-function(taille){
  X = matrix(NA,taille,8)
  X[,1] = rgumbel.trunc(taille,loc=1013.0,scale=558.0,min=500,max=3000)
  X[,2] = rnorm.trunc(taille,mean=30.0,sd=8,min=15.)
  X[,3] = rtriangle(taille,a=49,b=51,c=50)
  X[,4] = rtriangle(taille,a=54,b=56,c=55)
  X[,5] = runif(taille,min=7,max=9)
  X[,6] = rtriangle(taille,a=55,b=56,c=55.5)
  X[,7] = rtriangle(taille,a=4990,b=5010,c=5000)
  X[,8] = rtriangle(taille,a=295,b=305,c=300)
  return(X)
}

d <- 8
n <- 1e3
eps <- 1e-7 # finite-differences for derivatives
x <- EchantFcrues_full2(n)
yy <- Fcrues_full2(x, ans=2)
y <- scale(yy, center = TRUE, scale = FALSE)[,1]
dy <- NULL
for (i in 1:d) dy <- cbind(dy, dFcrues_full2(x, i, ans=2, eps))

method <- "quad"
out_1 <- PoincareOptimal(distr = list("gumbel", 1013, 558), min=500,max=3000, 
                         only.values = FALSE, der = TRUE, method = method)
out_2 <- PoincareOptimal(distr = list("norm", 30, 8), min=15, max=200, 
                         only.values = FALSE, der = TRUE, method = method)
out_3 <- PoincareOptimal(distr = list("triangle", 49, 51, 50), 
                         only.values = FALSE, der = TRUE, method = method)
out_4 <- PoincareOptimal(distr = list("triangle", 54, 56, 55), 
                         only.values = FALSE, der = TRUE, method = method)
out_5 <- PoincareOptimal(distr = list("unif", 7, 9), 
                         only.values = FALSE, der = TRUE, method = method)
out_6 <- PoincareOptimal(distr = list("triangle", 55, 56, 55.5), 
                         only.values = FALSE, der = TRUE, method = method)
out_7 <- PoincareOptimal(distr = list("triangle", 4990, 5010, 5000), 
                         only.values = FALSE, der = TRUE, method = method)
out_8 <- PoincareOptimal(distr = list("triangle", 295, 305, 300), 
                         only.values = FALSE, der = TRUE, method = method)
out_ <- list(out_1,out_2,out_3,out_4,out_5,out_6,out_7,out_8)

c2 <- c2der <- c2tot <- c2totder <- rep(0,d)

for (i in 1:d){
  m <- diag(1,d,d) ; m[,i] <- 1
  
  for (j in 1:d){
    cc <- PoincareChaosSqCoef(PoincareEigen = out_, multiIndex = m[j,], 
            design = x, output = y, outputGrad = NULL, 
            inputIndex = i, der = FALSE)
    c2tot[i] <- c2tot[i] + cc
    if (j == i) c2[i] <- cc
    
    cc <- PoincareChaosSqCoef(PoincareEigen = out_, multiIndex = m[j,], 
            design = x, output = y, outputGrad = dy, 
            inputIndex = i, der = TRUE)
    c2totder[i] <- c2totder[i] + cc
    if (j == i) c2der[i] <- cc
  }
}

print("Lower bounds of first-order Sobol' indices without derivatives:")
print(c2/var(y))
print("Lower bounds of first-order Sobol' indices with derivatives:")
print(c2der/var(y))

print("Lower bounds of total Sobol' indices without derivatives:")
print(c2tot/var(y))
print("Lower bounds of total Sobol' indices with derivatives:")
print(c2totder/var(y))



# A simple example

g <- function(x, a){
  res <- x[, 1] + a*x[, 1]*x[, 2]
  attr(res, "grad") <- cbind(1 + a * x[, 2], a * x[, 1])
  return(res)
}

n <- 1e3
set.seed(0)
X <- matrix(runif(2*n, min = -1/2, max = 1/2), nrow = n, ncol = 2)
a <- 3
fX <- g(X, a = a)

out_1 <- out_2 <- PoincareOptimal(distr = list("unif", -1/2, 1/2), 
                                  only.values = FALSE, der = TRUE, 
                                  method = "quad")
out <- list(out_1, out_2)

# Lower bounds for X1
c2_10 <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 0), 
                             design = X, output = fX, outputGrad = attr(fX, "grad"), 
                             inputIndex = 1, der = FALSE)
c2_11 <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 1), 
                             design = X, output = fX, outputGrad = attr(fX, "grad"), 
                             inputIndex = 1, der = FALSE)
c2_10_der <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 0), 
                                 design = X, output = fX, outputGrad = attr(fX, "grad"), 
                                 inputIndex = 1, der = TRUE)
c2_11_der <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 1), 
                                 design = X, output = fX, outputGrad = attr(fX, "grad"), 
                                 inputIndex = 1, der = TRUE)

LB1 <- c(8/pi^4, c2_10, c2_10_der)
LB1tot <- LB1 + c(64/pi^8 * a^2, c2_11, c2_11_der)
LB <- cbind(LB1, LB1tot)
rownames(LB) <- c("True lower bound value", 
                  "Estimated, no derivatives", "Estimated, with derivatives")
colnames(LB) <- c("D1", "D1tot")
cat("True values of D1 and D1tot:", c(1/12, 1/12 + a^2 / 144),"\n")
cat("Sample size: ", n, "\n")
cat("Lower bounds computed with the first Poincare eigenvalue:\n")
print(LB)
cat("\nN.B. Increase the sample size to see the convergence to true lower bound values.\n")

############################################################
# Flood model example (see Roustant et al., 2017, 2019)



library(evd) # Gumbel law
library(triangle) # Triangular law

# Flood model
Fcrues_full2=function(X,ans=0){ 
  # ans=1 gives Overflow output; ans=2 gives Cost output; ans=0 gives both
  mat=matrix(X,ncol=8);
  if (ans==0){ reponse=matrix(NA,nrow(mat),2);}
  else{ reponse=rep(NA,nrow(mat));}
  for (i in 1:nrow(mat)) {
    H = (mat[i,1] / (mat[i,2]*mat[i,8]*sqrt((mat[i,4] - mat[i,3])/mat[i,7])))^(0.6) ;
    S = mat[i,3] + H - mat[i,5] - mat[i,6] ;
    if (S > 0){ Cp = 1 ;}
    else{ Cp = 0.2 + 0.8 * (1 - exp(-1000 / S^4));}
    if (mat[i,5]>8){ Cp = Cp + mat[i,5]/20 ;}
    else{ Cp = Cp + 8/20 ;}
    if (ans==0){
      reponse[i,1] = S ;
      reponse[i,2] = Cp ;
    }
    if (ans==1){ reponse[i] = S ;}
    if (ans==2){ reponse[i] = Cp ;}
    
  }
  return(RES=reponse)
}

# Flood model derivatives (by finite-differences)
dFcrues_full2 <- function(X, i, ans, eps){
  der = X
  X1 = X
  X1[,i] = X[,i]+eps
  der = (Fcrues_full2(X1,ans) - Fcrues_full2(X,ans))/(eps)
  return(der)
}

# Function for flood model inputs sampling
EchantFcrues_full2<-function(taille){
  X = matrix(NA,taille,8)
  X[,1] = rgumbel.trunc(taille,loc=1013.0,scale=558.0,min=500,max=3000)
  X[,2] = rnorm.trunc(taille,mean=30.0,sd=8,min=15.)
  X[,3] = rtriangle(taille,a=49,b=51,c=50)
  X[,4] = rtriangle(taille,a=54,b=56,c=55)
  X[,5] = runif(taille,min=7,max=9)
  X[,6] = rtriangle(taille,a=55,b=56,c=55.5)
  X[,7] = rtriangle(taille,a=4990,b=5010,c=5000)
  X[,8] = rtriangle(taille,a=295,b=305,c=300)
  return(X)
}

d <- 8
n <- 1e3
eps <- 1e-7 # finite-differences for derivatives
x <- EchantFcrues_full2(n)
yy <- Fcrues_full2(x, ans=2)
y <- scale(yy, center = TRUE, scale = FALSE)[,1]
dy <- NULL
for (i in 1:d) dy <- cbind(dy, dFcrues_full2(x, i, ans=2, eps))

method <- "quad"
out_1 <- PoincareOptimal(distr = list("gumbel", 1013, 558), min=500,max=3000, 
                         only.values = FALSE, der = TRUE, method = method)
out_2 <- PoincareOptimal(distr = list("norm", 30, 8), min=15, max=200, 
                         only.values = FALSE, der = TRUE, method = method)
out_3 <- PoincareOptimal(distr = list("triangle", 49, 51, 50), 
                         only.values = FALSE, der = TRUE, method = method)
out_4 <- PoincareOptimal(distr = list("triangle", 54, 56, 55), 
                         only.values = FALSE, der = TRUE, method = method)
out_5 <- PoincareOptimal(distr = list("unif", 7, 9), 
                         only.values = FALSE, der = TRUE, method = method)
out_6 <- PoincareOptimal(distr = list("triangle", 55, 56, 55.5), 
                         only.values = FALSE, der = TRUE, method = method)
out_7 <- PoincareOptimal(distr = list("triangle", 4990, 5010, 5000), 
                         only.values = FALSE, der = TRUE, method = method)
out_8 <- PoincareOptimal(distr = list("triangle", 295, 305, 300), 
                         only.values = FALSE, der = TRUE, method = method)
out_ <- list(out_1,out_2,out_3,out_4,out_5,out_6,out_7,out_8)

c2 <- c2der <- c2tot <- c2totder <- rep(0,d)

for (i in 1:d){
  m <- diag(1,d,d) ; m[,i] <- 1
  
  for (j in 1:d){
    cc <- PoincareChaosSqCoef(PoincareEigen = out_, multiIndex = m[j,], 
            design = x, output = y, outputGrad = NULL, 
            inputIndex = i, der = FALSE)
    c2tot[i] <- c2tot[i] + cc
    if (j == i) c2[i] <- cc
    
    cc <- PoincareChaosSqCoef(PoincareEigen = out_, multiIndex = m[j,], 
            design = x, output = y, outputGrad = dy, 
            inputIndex = i, der = TRUE)
    c2totder[i] <- c2totder[i] + cc
    if (j == i) c2der[i] <- cc
  }
}

print("Lower bounds of first-order Sobol' indices without derivatives:")
print(c2/var(y))
print("Lower bounds of first-order Sobol' indices with derivatives:")
print(c2der/var(y))

print("Lower bounds of total Sobol' indices without derivatives:")
print(c2tot/var(y))
print("Lower bounds of total Sobol' indices with derivatives:")
print(c2totder/var(y))

Poincare constants for Derivative-based Global Sensitivity Measures (DGSM)

Description

A DGSM is a sensitivity index relying on the integral (over the space domain of the input variables) of the squared derivatives of a model output with respect to one model input variable. The product between a DGSM and a Poincare Constant (Roustant et al., 2014: Roustant et al., 2017) gives an upper bound of the total Sobol' index corresponding to the same input (Lamboni et al., 2013; Kucherenko and Iooss, 2016).

This Poincare constant depends on the type of probability distribution of the input variable. In the particular case of log-concave distribution, analytical formulas are available for double-exponential transport by the way of the median value (Lamboni et al., 2013). For truncated log-concave distributions, different formulas are available (Roustant et al., 2014). For general distributions (truncated or not), some Poincare constants can be computed via a relatively simple optimization process using different formula coming from transport inequalities (Roustant et al., 2017).

Notice that the analytical formula based on the log-concave law cases is a subcase of the double-exponential transport. In all cases, with this function, the smallest constant is obtained using the logistic transport formula. PoincareOptimal allows to obtained the best (optimal) constant using another (spectral) method.

IMPORTANT: This program is useless for the two following input variable distributions:

uniform on $[min,max]$ interval: The optimal Poincare constant is $\frac{(max-min)^2}{pi^2}$ .
normal with a standard deviation $sd$ : The optimal Poincare constant is $sd^2$ .

Usage

PoincareConstant(dfct=dnorm, qfct=qnorm, pfct=pnorm,
                 logconcave=FALSE, transport="logistic", 
                 optimize.interval=c(-100, 100),
                 truncated=FALSE, min=0, max=1,  ...)
PoincareConstant(dfct=dnorm, qfct=qnorm, pfct=pnorm,
                 logconcave=FALSE, transport="logistic", 
                 optimize.interval=c(-100, 100),
                 truncated=FALSE, min=0, max=1,  ...)

Arguments

`dfct`	the probability density function of the input variable
`qfct`	the quantile function of the input variable
`pfct`	the distribution function of the input variable
`logconcave`	logical value: TRUE for a log-concave distribution (analyical formula will be used). Requires argument 'dfct' and 'qfct'. FALSE (default value) means that the calculations will be performed using transport-based formulas (applicable for log-concave and non-log concave cases)
`transport`	If logconcave=FALSE, choice of the transport inequalities to be used: "double_exp" (default value) for double exponential transport and "logistic" for logistic transport". Requires argument 'dfct' and 'pfct'
`optimize.interval`	In the transport-based case (logconcave=FALSE), a vector containing the end-points of the interval to be searched for the maximum of the function to be optimized
`truncated`	logical value: TRUE for a truncated distribution. Default value is FALSE
`min`	the minimal bound in the case of a truncated distribution
`max`	the maximal bound in the case of a truncated distribution
`...`	additional arguments

Details

In the case of truncated distributions (truncated=TRUE), in addition to the min and max arguments: - the truncated distribution name has to be passed in the 'dfct' and 'pfct' arguments if logconcave=FALSE, - the non-truncated distribution name has to be passed in the 'dfct' and 'qfct' arguments if logconcave=TRUE. Moreover, if min and max are finite, optimize.interval is required to be defined as c(min,max).

Value

PoincareConstant returns the value of the Poincare constant.

Author(s)

Jana Fruth, Bertrand Iooss and Olivier Roustant

References

S. Kucherenko and B. Iooss, Derivative-based global sensitivity measures, In: R. Ghanem, D. Higdon and H. Owhadi (eds.), Handbook of Uncertainty Quantification, 2016.

M. Lamboni, B. Iooss, A-L. Popelin and F. Gamboa, Derivative-based global sensitivity measures: General links with Sobol' indices and numerical tests, Mathematics and Computers in Simulation, 87:45-54, 2013.

O. Roustant, F. Barthe and B. Iooss, Poincare inequalities on intervals - application to sensitivity analysis, Electronic Journal of Statistics, Vol. 11, No. 2, 3081-3119, 2017.

O. Roustant, J. Fruth, B. Iooss and S. Kuhnt, Crossed-derivative-based sensitivity measures for interaction screening, Mathematics and Computers in Simulation, 105:105-118, 2014.

Examples


# Exponential law (log-concave)
PoincareConstant(dfct=dexp,qfct=qexp,pfct=NULL,rate=1,
  logconcave=TRUE) # log-concave assumption
PoincareConstant(dfct=dexp,qfct=NULL,pfct=pexp,rate=1,
  optimize.interval=c(0, 15)) # logistic transport approach

# Weibull law (log-concave)
PoincareConstant(dfct=dweibull,qfct=NULL,pfct=pweibull,
  optimize.interval=c(0, 15),shape=1,scale=1) # logistic transport approach


# Triangular law (log-concave)
library(triangle)
PoincareConstant(dfct=dtriangle, qfct=qtriangle, pfct=NULL, a=-1, b=1, c=0, 
  logconcave=TRUE) # log-concave assumption
PoincareConstant(dfct=dtriangle, qfct=NULL, pfct=ptriangle, a=-1, b=1, c=0, 
  transport="double_exp", optimize.interval=c(-1,1)) # Double-exp transport 
PoincareConstant(dfct=dtriangle, qfct=NULL, pfct=ptriangle, a=-1, b=1, c=0, 
  optimize.interval=c(-1,1)) # Logistic transport calculation

# Normal N(0,1) law truncated on [-1.87,+infty]
PoincareConstant(dfct=dnorm,qfct=qnorm,pfct=pnorm,mean=0,sd=1,logconcave=TRUE, 
  transport="double_exp", truncated=TRUE, min=-1.87, max=999) # log-concave hyp 
# Double-exponential transport approach
PoincareConstant(dfct=dnorm.trunc, qfct=qnorm.trunc, pfct=pnorm.trunc, 
  mean=0, sd=1, truncated=TRUE, min=-1.87, max=999,   transport="double_exp", 
    optimize.interval=c(-1.87,20)) 
# Logistic transport approach
PoincareConstant(dfct=dnorm.trunc, qfct=qnorm.trunc, pfct=pnorm.trunc, 
  mean=0, sd=1, truncated=TRUE, min=-1.87, max=999, optimize.interval=c(-1.87,20)) 


# Gumbel law (log-concave)
library(evd)
PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=NULL, loc=0, scale=1, 
  logconcave=TRUE, transport="double_exp") # log-concave assumption
PoincareConstant(dfct=dgumbel, qfct=NULL, pfct=pgumbel, loc=0, scale=1, 
  transport="double_exp", optimize.interval=c(-3,20)) # Double-exp transport 
PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=pgumbel, loc=0, scale=1, 
  optimize.interval=c(-3,20)) # Logistic transport approach

# Truncated Gumbel law (log-concave)
# Double-exponential transport approach
PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=pgumbel, loc=0, scale=1, 
  logconcave=TRUE, transport="double_exp", truncated=TRUE, 
  min=-0.92, max=3.56) # log-concave assumption
PoincareConstant(dfct=dgumbel.trunc, qfct=NULL, pfct=pgumbel.trunc, loc=0, scale=1, 
  truncated=TRUE, min=-0.92, max=3.56, transport="double_exp", 
  optimize.interval=c(-0.92,3.56))
# Logistic transport approach
PoincareConstant(dfct=dgumbel.trunc, qfct=qgumbel.trunc, pfct=pgumbel.trunc, 
  loc=0, scale=1, truncated=TRUE, min=-0.92, max=3.56, 
  optimize.interval=c(-0.92,3.56)) 
  


# Exponential law (log-concave)
PoincareConstant(dfct=dexp,qfct=qexp,pfct=NULL,rate=1,
  logconcave=TRUE) # log-concave assumption
PoincareConstant(dfct=dexp,qfct=NULL,pfct=pexp,rate=1,
  optimize.interval=c(0, 15)) # logistic transport approach

# Weibull law (log-concave)
PoincareConstant(dfct=dweibull,qfct=NULL,pfct=pweibull,
  optimize.interval=c(0, 15),shape=1,scale=1) # logistic transport approach


# Triangular law (log-concave)
library(triangle)
PoincareConstant(dfct=dtriangle, qfct=qtriangle, pfct=NULL, a=-1, b=1, c=0, 
  logconcave=TRUE) # log-concave assumption
PoincareConstant(dfct=dtriangle, qfct=NULL, pfct=ptriangle, a=-1, b=1, c=0, 
  transport="double_exp", optimize.interval=c(-1,1)) # Double-exp transport 
PoincareConstant(dfct=dtriangle, qfct=NULL, pfct=ptriangle, a=-1, b=1, c=0, 
  optimize.interval=c(-1,1)) # Logistic transport calculation

# Normal N(0,1) law truncated on [-1.87,+infty]
PoincareConstant(dfct=dnorm,qfct=qnorm,pfct=pnorm,mean=0,sd=1,logconcave=TRUE, 
  transport="double_exp", truncated=TRUE, min=-1.87, max=999) # log-concave hyp 
# Double-exponential transport approach
PoincareConstant(dfct=dnorm.trunc, qfct=qnorm.trunc, pfct=pnorm.trunc, 
  mean=0, sd=1, truncated=TRUE, min=-1.87, max=999,   transport="double_exp", 
    optimize.interval=c(-1.87,20)) 
# Logistic transport approach
PoincareConstant(dfct=dnorm.trunc, qfct=qnorm.trunc, pfct=pnorm.trunc, 
  mean=0, sd=1, truncated=TRUE, min=-1.87, max=999, optimize.interval=c(-1.87,20)) 


# Gumbel law (log-concave)
library(evd)
PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=NULL, loc=0, scale=1, 
  logconcave=TRUE, transport="double_exp") # log-concave assumption
PoincareConstant(dfct=dgumbel, qfct=NULL, pfct=pgumbel, loc=0, scale=1, 
  transport="double_exp", optimize.interval=c(-3,20)) # Double-exp transport 
PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=pgumbel, loc=0, scale=1, 
  optimize.interval=c(-3,20)) # Logistic transport approach

# Truncated Gumbel law (log-concave)
# Double-exponential transport approach
PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=pgumbel, loc=0, scale=1, 
  logconcave=TRUE, transport="double_exp", truncated=TRUE, 
  min=-0.92, max=3.56) # log-concave assumption
PoincareConstant(dfct=dgumbel.trunc, qfct=NULL, pfct=pgumbel.trunc, loc=0, scale=1, 
  truncated=TRUE, min=-0.92, max=3.56, transport="double_exp", 
  optimize.interval=c(-0.92,3.56))
# Logistic transport approach
PoincareConstant(dfct=dgumbel.trunc, qfct=qgumbel.trunc, pfct=pgumbel.trunc, 
  loc=0, scale=1, truncated=TRUE, min=-0.92, max=3.56, 
  optimize.interval=c(-0.92,3.56))

Optimal Poincare constants for Derivative-based Global Sensitivity Measures (DGSM)

Description

A DGSM is a sensitivity index relying on the integral (over the space domain of the input variables) of the squared derivatives of a model output with respect to one model input variable. The product between a DGSM and a Poincare Constant (Roustant et al., 2014: Roustant et al., 2017), on the type of probability distribution of the input variable, gives an upper bound of the total Sobol' index corresponding to the same input (Lamboni et al., 2013; Kucherenko and Iooss, 2016).

This function provides the optimal Poincare constant as explained in Roustant et al. (2017). It solves numerically the spectral problem corresponding to the Poincare inequality, with Neumann conditions. The differential equation is f” - V'f'= - lambda f with f'(a) = f'(b) = 0. In addition, all the spectral decomposition can be returned by the function. The eigenvalues are sorted in ascending order, starting from zero. The information corresponding to the optimal constant is thus given in the second column.

IMPORTANT: This program is useless for the two following input variable distributions:

uniform on $[min,max]$ interval: The optimal Poincare constant is $\frac{(max-min)^2}{pi^2}$ .
normal with a standard deviation $sd$ : The optimal Poincare constant is $sd^2$ .

Usage

PoincareOptimal(distr=list("unif",c(0,1)), min=NULL, max=NULL, 
                n = 500, method = c("quadrature", "integral"), only.values = TRUE, 
                der = FALSE, plot = FALSE, ...)
PoincareOptimal(distr=list("unif",c(0,1)), min=NULL, max=NULL, 
                n = 500, method = c("quadrature", "integral"), only.values = TRUE, 
                der = FALSE, plot = FALSE, ...)

Arguments

`distr`	a list or a function corresponding to the probability distribution. If it is a list, it contains the name of the R distribution of the variable and its parameters. Possible choices are: "unif" (uniform), "norm" (normal), "exp" (exponential), "triangle" (triangular from package triangle), "gumbel" (from package evd), "beta", "gamma", "weibull" and "lognorm" (lognormal). The values of the distribution parameters have to be passed in arguments in the same order than the corresponding R function. If it is a function, it corresponds to the pdf. Notice that the normalizing constant has no impact on the computation of the optimal Poincare constant and can be ommitted.
`min`	see below
`max`	[min,max]: interval on which the distribution is truncated. Choose low and high quantiles in case of unbounded distribution. Choose NULL for uniform and triangular distributions
`n`	number of discretization steps
`method`	method of integration: "quadrature" (default value) uses the trapez quadrature (close and quicker), "integral" is longer but does not make any approximation
`only.values`	if TRUE, only eigen values are computed and returned, otherwise both eigenvalues and eigenvectors are returned (default value is TRUE)
`der`	if TRUE, compute the eigenfunction derivatives (default value is FALSE)
`plot`	logical:if TRUE and only.values=FALSE, plots a minimizer of the Rayleigh ratio (default value is FALSE)
`...`	additional arguments

Details

For the uniform, normal, triangular and Gumbel distributions, the optimal constants are computed on the standardized correponding distributions (for a better numerical efficiency). In these cases, the return optimal constant and eigenvalues correspond to original distributions.

Value

PoincareOptimal returns a list containing:

`opt`	the optimal Poincare constant
`values`	the eigenvalues in increasing order, starting from 0. Thus, the second one is the spectral gap, equal to the inverse of the Poincare constant
`vectors`	the values of eigenfunctions at `knots`
`der`	the values of eigenfunction derivatives at `knots`
`knots`	a sequence of length `n` formed by equally spaced real numbers in the support of the probability distribution, used for discretization

Author(s)

Olivier Roustant and Bertrand Iooss

References

O. Roustant, F. Barthe and B. Iooss, Poincare inequalities on intervals - application to sensitivity analysis, Electronic Journal of Statistics, Vol. 11, No. 2, 3081-3119, 2017.

O Roustant, F. Gamboa, B Iooss. Parseval inequalities and lower bounds # for variance-based sensitivity indices. 2019. hal-02140127

Examples



# uniform on [a, b]
a <- -1 ; b <- 1
out <- PoincareOptimal(distr = list("unif", a, b))
cat("Poincare constant (theory -- estimated):", (b-a)^2/pi^2, "--", out$opt, "\n")

# truncated standard normal on [-1, 1]
# the optimal Poincare constant is then equal to 1/3,
# as -1 and 1 are consecutive roots of the 2nd Hermite polynomial X*X - 1.
out <- PoincareOptimal(distr = dnorm, min = -1, max = 1, 
                       plot = TRUE, only.values = FALSE)
cat("Poincare constant (theory -- estimated):", 1/3, "--", out$opt, "\n")


# truncated standard normal on [-1.87, +infty]
out <- PoincareOptimal(distr = list("norm", 0, 1), min = -1.87, max = 5, 
                       method = "integral", n = 500)
print(out$opt)

# truncated Gumbel(0,1) on [-0.92, 3.56]
library(evd)
out <- PoincareOptimal(distr = list("gumbel", 0, 1), min = -0.92, max = 3.56, 
                       method = "integral", n = 500)
print(out$opt)

# symetric triangular [-1,1]
library(triangle)
out <- PoincareOptimal(distr = list("triangle", -1, 1, 0), min = NULL, max = NULL)
cat("Poincare constant (theory -- estimated):", 0.1729, "--", out$opt, "\n")


# Lognormal distribution
out <- PoincareOptimal(distr = list("lognorm", 1, 2), min = 3, max = 10, 
                       only.values = FALSE, plot = TRUE, method = "integral")
print(out$opt)


## -------------------------------

## Illustration for eigenfunctions on the uniform distribution
## (corresponds to Fourier series)
b <- 1
a <- -b
out <- PoincareOptimal(distr = list("unif", a, b), 
                       only.values = FALSE, der = TRUE, method = "quad")

# Illustration for 3 eigenvalues

par(mfrow = c(3,2))
eigenNumber <- 1:3 # eigenvalue number
for (k in eigenNumber[1:3]){ # keep the 3 first ones (for graphics)
  plot(out$knots, out$vectors[, k + 1], type = "l", 
       ylab = "", main = paste("Eigenfunction", k), 
       xlab = paste("Eigenvalue:", round(out$values[k+1], digits = 3)))
  sgn <- sign(out$vectors[1, k + 1])
  lines(out$knots, sgn * sqrt(2) * cos(pi * k * (out$knots/(b-a) + 0.5)), 
        col = "red", lty = "dotted")
  
  plot(out$knots, out$der[, k + 1], type = "l", 
       ylab = "", main = paste("Eigenfunction derivative", k), 
       xlab = "")
  sgn <- sign(out$vectors[1, k + 1])
  lines(out$knots, - sgn * sqrt(2) / (b-a) * pi * k * sin(pi * k * (out$knots/(b-a) + 0.5)), 
        col = "red", lty = "dotted")
}


# how to create a function for one eigenfunction and eigenvalue,
# given N values 
eigenFun <- approxfun(x = out$knots, y = out$vectors[, 2])
eigenDerFun <- approxfun(x = out$knots, y = out$der[, 2])
x <- runif(n = 3, min = -1/2, max = 1/2)
eigenFun(x)
eigenDerFun(x)


# uniform on [a, b]
a <- -1 ; b <- 1
out <- PoincareOptimal(distr = list("unif", a, b))
cat("Poincare constant (theory -- estimated):", (b-a)^2/pi^2, "--", out$opt, "\n")

# truncated standard normal on [-1, 1]
# the optimal Poincare constant is then equal to 1/3,
# as -1 and 1 are consecutive roots of the 2nd Hermite polynomial X*X - 1.
out <- PoincareOptimal(distr = dnorm, min = -1, max = 1, 
                       plot = TRUE, only.values = FALSE)
cat("Poincare constant (theory -- estimated):", 1/3, "--", out$opt, "\n")


# truncated standard normal on [-1.87, +infty]
out <- PoincareOptimal(distr = list("norm", 0, 1), min = -1.87, max = 5, 
                       method = "integral", n = 500)
print(out$opt)

# truncated Gumbel(0,1) on [-0.92, 3.56]
library(evd)
out <- PoincareOptimal(distr = list("gumbel", 0, 1), min = -0.92, max = 3.56, 
                       method = "integral", n = 500)
print(out$opt)

# symetric triangular [-1,1]
library(triangle)
out <- PoincareOptimal(distr = list("triangle", -1, 1, 0), min = NULL, max = NULL)
cat("Poincare constant (theory -- estimated):", 0.1729, "--", out$opt, "\n")


# Lognormal distribution
out <- PoincareOptimal(distr = list("lognorm", 1, 2), min = 3, max = 10, 
                       only.values = FALSE, plot = TRUE, method = "integral")
print(out$opt)


## -------------------------------

## Illustration for eigenfunctions on the uniform distribution
## (corresponds to Fourier series)
b <- 1
a <- -b
out <- PoincareOptimal(distr = list("unif", a, b), 
                       only.values = FALSE, der = TRUE, method = "quad")

# Illustration for 3 eigenvalues

par(mfrow = c(3,2))
eigenNumber <- 1:3 # eigenvalue number
for (k in eigenNumber[1:3]){ # keep the 3 first ones (for graphics)
  plot(out$knots, out$vectors[, k + 1], type = "l", 
       ylab = "", main = paste("Eigenfunction", k), 
       xlab = paste("Eigenvalue:", round(out$values[k+1], digits = 3)))
  sgn <- sign(out$vectors[1, k + 1])
  lines(out$knots, sgn * sqrt(2) * cos(pi * k * (out$knots/(b-a) + 0.5)), 
        col = "red", lty = "dotted")
  
  plot(out$knots, out$der[, k + 1], type = "l", 
       ylab = "", main = paste("Eigenfunction derivative", k), 
       xlab = "")
  sgn <- sign(out$vectors[1, k + 1])
  lines(out$knots, - sgn * sqrt(2) / (b-a) * pi * k * sin(pi * k * (out$knots/(b-a) + 0.5)), 
        col = "red", lty = "dotted")
}


# how to create a function for one eigenfunction and eigenvalue,
# given N values 
eigenFun <- approxfun(x = out$knots, y = out$vectors[, 2])
eigenDerFun <- approxfun(x = out$knots, y = out$der[, 2])
x <- runif(n = 3, min = -1/2, max = 1/2)
eigenFun(x)
eigenDerFun(x)

Quantile-oriented sensitivity analysis

Description

qosa implements the estimation of first-order quantile-oriented sensitivity indices as defined in Fort et al. (2016) with a kernel-based estimator of conditonal probability density functions closely related to the one proposed by Maume-Deschamps and Niang (2018). qosa also supports a kernel-based estimation of Sobol first-order indices (i.e. Nadaraya-Watson).

Usage

qosa(model = NULL, X1, X2 = NULL, type = "quantile", alpha = 0.1, split.sample = 2/3, 
nsample = 1e4, nboot = 0, conf = 0.95, ...)
## S3 method for class 'qosa'
tell(x, y = NULL, ...)
## S3 method for class 'qosa'
print(x, ...)
## S3 method for class 'qosa'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'qosa'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
qosa(model = NULL, X1, X2 = NULL, type = "quantile", alpha = 0.1, split.sample = 2/3, 
nsample = 1e4, nboot = 0, conf = 0.95, ...)
## S3 method for class 'qosa'
tell(x, y = NULL, ...)
## S3 method for class 'qosa'
print(x, ...)
## S3 method for class 'qosa'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'qosa'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	a random sample of the inputs used for the estimation of conditional probability density functions. If `X2` is NULL, `X1` is split in two samples, with the first `split.sample` proportion of observations assigned to `X1` and the rest to `X2`.
`X2`	a random sample of the inputs used to evaluate the conditional probability density functions. If NULL, it is constructed with the last `(1-split.sample)` proportion of observations from `X1`, see above.
`type`	a string specifying which first-order sensitivity indices must be estimated: quantile-oriented indices (`type="quantile"`) or Sobol' indices (`type="mean"`).
`alpha`	if `type="quantile"` the quantile level.
`split.sample`	if `X2=NULL` the proportion of observations from `X1` assigned to the estimation of conditional probability density functions.
`nsample`	the number of samples from the conditional probability density functions used to estimate the conditional quantiles (if `type="quantile"`) or the conditional means (if `type="mean"`).
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for confidence intervals.
`x`	a list of class `"sobolrank"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolrank"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

Quantile-oriented sensitivty indices were defined as a special case of sensitivity indices based on contrast functions in Fort et al. (2016). The estimator used by qosa follows closely the one proposed by Maume-Deschamps & Niang (2018). The only difference is that Maume-Deschamps and Niang (2018) use the following kernel-based estimate of the conditional cumulative distribution function:

$\hat{F}(y\Vert X=x) = \frac{ \sum_{i=1}^n K_{h_x}(x - X_i) \bold{1}\{Y_i< y\}}{\sum_{i=1}^n K_{h_x}(x - X_i)}$

whereas we use

$\hat{F}(y\vert X=x) = \frac{ \sum_{i=1}^n K_{h_x}(x - X_i) \int_{-\infty}^y K_{h_y}(t - Y_i)dt} {\sum_{i=1}^n K_{h_x}(x - X_i)},$

meaning that $\bold{1}\{Y_i< y\}$ is replaced by $\int_{-\infty}^y K_{h_y}(t - Y_i)dt = \Phi (\frac{y-Y_i}{h_y})$ where $\Phi$ is the cumulative distribution function of the standard normal distribution (since kernel $K$ is Gaussian). The two definitions thus coincide when $h_y \rightarrow 0$ . Our formula arises from a kernel density estimator of the joint pdf with a diagonal bandwidth. In a future version, it will be genralized to a general bandwidth matrix for improved performance.

Value

qosa returns a list of class "qosa", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`X1`	a `data.frame` containing the design of experiments used for the estimation of conditional probability density functions.
`X`	a `data.frame` containing the design of experiments used for the evaluation of conditional probability density functions.
`y`	a vector of model responses.
`S`	the estimations of the Sobol' sensitivity indices.

Author(s)

Sebastien Da Veiga

References

Fort, J. C., Klein, T., and Rachdi, N. (2016). New sensitivity analysis subordinated to a contrast. Communications in Statistics-Theory and Methods, 45(15), 4349-4364.

Maume-Deschamps, V., and Niang, I. (2018). Estimation of quantile oriented sensitivity indices. Statistics & Probability Letters, 134, 122-127.

Examples

 
library(ks)
library(ggplot2)
library(boot)

# Test case : difference of two exponential distributions (Fort et al. (2016))
# We use two samples with different sizes
n1 <- 5000
X1 <- data.frame(matrix(rexp(2 * n1,1), nrow = n1))
n2 <- 1000
X2 <- data.frame(matrix(rexp(2 * n2,1), nrow = n2))
Y1 <- X1[,1] - X1[,2]
Y2 <- X2[,1] - X2[,2]
x <- qosa(model = NULL, X1, X2, type = "quantile", alpha = 0.1)
tell(x,c(Y1,Y2))
print(x)
ggplot(x)

# Test case : difference of two exponential distributions (Fort et al. (2016))
# We use only one sample
n <- 1000 # put n=10000 for more consistency
X <- data.frame(matrix(rexp(2 * n,1), nrow = n))
Y <- X[,1] - X[,2]
x <- qosa(model = NULL, X1 = X, type = "quantile", alpha = 0.7)
tell(x,Y)
print(x)
ggplot(x)

# Test case : the Ishigami function
# We estimate first-order Sobol' indices (by specifying 'mean')
# Next lines are put in comment because too long fro CRAN tests
#n <- 5000 
#nboot <- 50 
#X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
#x <- qosa(model = ishigami.fun, X1 = X, type = "mean", nboot = nboot)
#print(x)
#ggplot(x)


library(ks)
library(ggplot2)
library(boot)

# Test case : difference of two exponential distributions (Fort et al. (2016))
# We use two samples with different sizes
n1 <- 5000
X1 <- data.frame(matrix(rexp(2 * n1,1), nrow = n1))
n2 <- 1000
X2 <- data.frame(matrix(rexp(2 * n2,1), nrow = n2))
Y1 <- X1[,1] - X1[,2]
Y2 <- X2[,1] - X2[,2]
x <- qosa(model = NULL, X1, X2, type = "quantile", alpha = 0.1)
tell(x,c(Y1,Y2))
print(x)
ggplot(x)

# Test case : difference of two exponential distributions (Fort et al. (2016))
# We use only one sample
n <- 1000 # put n=10000 for more consistency
X <- data.frame(matrix(rexp(2 * n,1), nrow = n))
Y <- X[,1] - X[,2]
x <- qosa(model = NULL, X1 = X, type = "quantile", alpha = 0.7)
tell(x,Y)
print(x)
ggplot(x)

# Test case : the Ishigami function
# We estimate first-order Sobol' indices (by specifying 'mean')
# Next lines are put in comment because too long fro CRAN tests
#n <- 5000 
#nboot <- 50 
#X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
#x <- qosa(model = ishigami.fun, X1 = X, type = "mean", nboot = nboot)
#print(x)
#ggplot(x)

Sequential Bifurcations

Description

sb implements the Sequential Bifurcations screening method (Bettonvil and Kleijnen 1996).

Usage

sb(p, sign = rep("+", p), interaction = FALSE)
## S3 method for class 'sb'
ask(x, i = NULL, ...)
## S3 method for class 'sb'
tell(x, y, ...)
## S3 method for class 'sb'
print(x, ...)
## S3 method for class 'sb'
plot(x, ...)
sb(p, sign = rep("+", p), interaction = FALSE)
## S3 method for class 'sb'
ask(x, i = NULL, ...)
## S3 method for class 'sb'
tell(x, y, ...)
## S3 method for class 'sb'
print(x, ...)
## S3 method for class 'sb'
plot(x, ...)

Arguments

`p`	number of factors.
`sign`	a vector fo length `p` filled with `"+"` and `"-"`, giving the (assumed) signs of the factors effects.
`interaction`	a boolean, `TRUE` if the model is supposed to be with interactions, `FALSE` otherwise.
`x`	a list of class `"sb"` storing the state of the screening study at the current iteration.
`y`	a vector of model responses.
`i`	an integer, used to force a wanted bifurcation instead of that proposed by the algorithm.
`...`	not used.

Details

The model without interaction is

$Y=\beta_0 + \sum_{i=1}^p \beta_i X_i$

while the model with interactions is

$Y=\beta_0 + \sum_{i=1}^p \beta_i X_i + \sum_{1 \leq i < j \leq p} \gamma_{ij} X_i X_j$

In both cases, the factors are assumed to be uniformly distributed on $[-1,1]$ . This is a difference with Bettonvil et al. where the factors vary across $[0,1]$ in the former case, while $[-1,1]$ in the latter.

Another difference with Bettonvil et al. is that in the current implementation, the groups are splitted right in the middle.

Value

sb returns a list of class "sb", containing all the input arguments detailed before, plus the following components:

`i`	the vector of bifurcations.
`y`	the vector of observations.
`ym`	the vector of mirror observations (model with interactions only).

The groups effects can be displayed with the print method.

Author(s)

Gilles Pujol

References

B. Bettonvil and J. P. C. Kleijnen, 1996, Searching for important factors in simulation models with many factors: sequential bifurcations, European Journal of Operational Research, 96, 180–194.

Examples

# a model with interactions
p <- 50
beta <- numeric(length = p)
beta[1:5] <- runif(n = 5, min = 10, max = 50)
beta[6:p] <- runif(n = p - 5, min = 0, max = 0.3)
beta <- sample(beta)
gamma <- matrix(data = runif(n = p^2, min = 0, max = 0.1), nrow = p, ncol = p)
gamma[lower.tri(gamma, diag = TRUE)] <- 0
gamma[1,2] <- 5
gamma[5,9] <- 12
f <- function(x) { return(sum(x * beta) + (x %*% gamma %*% x))}

# 10 iterations of SB
sa <- sb(p, interaction = TRUE)
for (i in 1 : 10) {
  x <- ask(sa)
  y <- list()
  for (i in names(x)) {
    y[[i]] <- f(x[[i]])
  }
  tell(sa, y)
}
print(sa)
plot(sa)
# a model with interactions
p <- 50
beta <- numeric(length = p)
beta[1:5] <- runif(n = 5, min = 10, max = 50)
beta[6:p] <- runif(n = p - 5, min = 0, max = 0.3)
beta <- sample(beta)
gamma <- matrix(data = runif(n = p^2, min = 0, max = 0.1), nrow = p, ncol = p)
gamma[lower.tri(gamma, diag = TRUE)] <- 0
gamma[1,2] <- 5
gamma[5,9] <- 12
f <- function(x) { return(sum(x * beta) + (x %*% gamma %*% x))}

# 10 iterations of SB
sa <- sb(p, interaction = TRUE)
for (i in 1 : 10) {
  x <- ask(sa)
  y <- list()
  for (i in names(x)) {
    y[[i]] <- f(x[[i]])
  }
  tell(sa, y)
}
print(sa)
plot(sa)

Sensitivity Indices based on Csiszar f-divergence

Description

sensiFdiv conducts a density-based sensitivity analysis where the impact of an input variable is defined in terms of dissimilarity between the original output density function and the output density function when the input variable is fixed. The dissimilarity between density functions is measured with Csiszar f-divergences. Estimation is performed through kernel density estimation and the function kde of the package ks.

Usage

sensiFdiv(model = NULL, X, fdiv = "TV", nboot = 0, conf = 0.95, ...)
## S3 method for class 'sensiFdiv'
tell(x, y = NULL, ...)
## S3 method for class 'sensiFdiv'
print(x, ...)
## S3 method for class 'sensiFdiv'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'sensiFdiv'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sensiFdiv(model = NULL, X, fdiv = "TV", nboot = 0, conf = 0.95, ...)
## S3 method for class 'sensiFdiv'
tell(x, y = NULL, ...)
## S3 method for class 'sensiFdiv'
print(x, ...)
## S3 method for class 'sensiFdiv'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'sensiFdiv'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X`	a matrix or `data.frame` representing the input random sample.
`fdiv`	a string or a list of strings specifying the Csiszar f-divergence to be used. Available choices are "TV" (Total-Variation), "KL" (Kullback-Leibler), "Hellinger" and "Chi2" (Neyman chi-squared).
`nboot`	the number of bootstrap replicates
`conf`	the confidence level for confidence intervals.
`x`	a list of class `"sensiFdiv"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sensiFdiv"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

Some of the Csiszar f-divergences produce sensitivity indices that have already been studied in the context of sensitivity analysis. In particular, "TV" leads to the importance measure proposed by Borgonovo (2007) (up to a constant), "KL" corresponds to the mutual information (Krzykacz-Hausmann 2001) and "Chi2" produces the squared-loss mutual information. See Da Veiga (2015) for details.

Value

sensiFdiv returns a list of class "sensiFdiv", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a vector of model responses.
`S`	the estimations of the Csiszar f-divergence sensitivity indices. If several divergences have been selected, Sis a list where each element encompasses the estimations of the sensitivity indices for one of the divergence.

Author(s)

Sebastien Da Veiga, Snecma

References

Borgonovo E. (2007), A new uncertainty importance measure, Reliability Engineering and System Safety 92(6), 771–784.

Da Veiga S. (2015), Global sensitivity analysis with dependence measures, Journal of Statistical Computation and Simulation, 85(7), 1283–1305.

Krzykacz-Hausmann B. (2001), Epistemic sensitivity analysis based on the concept of entropy, Proceedings of SAMO2001, 53–57.

Examples


library(ks)

# Test case : the non-monotonic Sobol g-function
n <- 100
X <- data.frame(matrix(runif(8 * n), nrow = n))

# Density-based sensitivity analysis
# the next lines are put in comment because too long for CRAN tests
#x <- sensiFdiv(model = sobol.fun, X = X, fdiv = c("TV","KL"), nboot=30)
#print(x)
#library(ggplot2)
#ggplot(x)

library(ks)

# Test case : the non-monotonic Sobol g-function
n <- 100
X <- data.frame(matrix(runif(8 * n), nrow = n))

# Density-based sensitivity analysis
# the next lines are put in comment because too long for CRAN tests
#x <- sensiFdiv(model = sobol.fun, X = X, fdiv = c("TV","KL"), nboot=30)
#print(x)
#library(ggplot2)
#ggplot(x)

Sensitivity Indices based on the Hilbert-Schmidt Independence Criterion (HSIC)

Description

sensiHSIC allows to conduct global sensitivity analysis (GSA) in many different contexts thanks to several sensitivity measures based on the Hilbert-Schmidt independence criterion (HSIC). The so-called HSIC sensitivity indices depend on the kernels which are affected to the input variables $Xi$ as well as on the kernel which is affected to the output object $Y$ . For each random entity, a reproducing kernel Hilbert space (RKHS) is associated to the chosen kernel and allows to represent probability distributions in an appropriate function space. The influence of $Xi$ on $Y$ is then measured through the distance between the joint probability distribution (true impact of $Xi$ on $Y$ in the numerical model) and the product of marginal distributions (no impact of $Xi$ on $Y$ ) after embedding those distributions into a bivariate RKHS. Such a GSA approach has three main advantages:

The input variables $Xi$ may be correlated.
Any kind of mathematical object is supported (provided that a kernel function is available).
Accurate estimation is possible even in presence of very few data (no more than a hundred of input-output samples).

In sensiHSIC, each input variable $Xi$ is expected to be scalar (either discrete or continous). On the contrary, a much wider collection of mathematical objects are supported for the output variable $Y$ . In particular, $Y$ may be:

A scalar output (either discrete or continous). If so, one single kernel family is selected among the kernel collection.
A low-dimensional vector output. If so, a kernel is selected for each output variable and the final output kernel is built by tensorization.
A high-dimensional vector output or a functional output. In this case, the output data must be seen as time series observations. Three different methods are proposed.
1. A preliminary dimension reduction may be performed. In order to achieve this, a principal component analysis (PCA) based on the empirical covariance matrix helps identify the first terms of the Kharunen-Loeve expansion. The final output kernel is then built in the reduced subspace where the functional data are projected.
2. The dynamic time warping (DTW) algorithm may be combined with a translation-invariant kernel. The resulting DTW-based output kernel is well-adapted to measure similarity between two given time series.
3. The global alignment kernel (GAK) may be directly applied on the functional data. Unlike the DTW kernel, it was specifically designed to deal with time series and to measure the impact of one input variable on the shape of the output curve.

Many variants of the original HSIC indices are now available in sensiHSIC.

Normalized HSIC indices (R2-HSIC)
The original HSIC indices defined in Gretton et al. (2005) may be hard to interpret because they do not admit a universal upper bound. A first step to overcome this difficulty was enabled by Da Veiga (2015) with the definition of the R2-HSIC indices. The resulting sensitivity indices can no longer be greater than $1$ .
Target HSIC indices (T-HSIC)
They were thought by Marrel and Chabridon (2021) to meet the needs of target sensitivity analysis (TSA). The idea is to measure the impact of each input variable $Xi$ on a specific part of the output distribution (for example the upper tail). To achieve this, a weight function $w$ is applied on $Y$ before computing HSIC indices.
Conditional HSIC indices (C-HSIC)
They were thought by Marrel and Chabridon (2021) to meet the needs of conditional sensitivity analysis (CSA). The idea is to measure the impact of each input variable $Xi$ on $Y$ when a specific event occurs. This conditioning event is detected on the output variable $Y$ and its amplitude is taken into account thanks to a weight function $w$ .
HSIC-ANOVA indices
To improve the interpretability of HSIC indices, Da Veiga (2021) came up with an ANOVA-like decomposition that allows to establish a strict separation of main effects and interaction effects in the HSIC paradigm. The first-order and total-order HSIC-ANOVA indices are then defined just as the first-order and total-order Sobol' indices. However, this framework only holds if two major assumptions are verified: the input variables $Xi$ must be mutually independent and all input kernels must belong to the very restrained class of ANOVA kernels.

As most sensitivity measures, HSIC indices allow to rank the input variables $Xi$ according to the influence they have on the output variable $Y$ . They can also be used for a screening purpose, that is to distinguish between truly influential input variables and non-influential input variables. The user who is interested in this topic is invited to consult the documentation of the function testHSIC.

Usage

sensiHSIC(model = NULL, X, target = NULL, cond = NULL, 
          kernelX = "rbf", paramX = NA,
          kernelY = "rbf", paramY = NA,
          estimator.type = "V-stat",
          nboot = 0, conf = 0.95,
          anova = list(obj = "no", is.uniform = TRUE),
          sensi = NULL, 
          save.GM = list(KX = TRUE, KY = TRUE), ...)
          
## S3 method for class 'sensiHSIC'
tell(x, y = NULL, ...)

## S3 method for class 'sensiHSIC'
print(x, ...)

## S3 method for class 'sensiHSIC'
plot(x, ylim = c(0, 1), ...)
sensiHSIC(model = NULL, X, target = NULL, cond = NULL, 
          kernelX = "rbf", paramX = NA,
          kernelY = "rbf", paramY = NA,
          estimator.type = "V-stat",
          nboot = 0, conf = 0.95,
          anova = list(obj = "no", is.uniform = TRUE),
          sensi = NULL, 
          save.GM = list(KX = TRUE, KY = TRUE), ...)
          
## S3 method for class 'sensiHSIC'
tell(x, y = NULL, ...)

## S3 method for class 'sensiHSIC'
print(x, ...)

## S3 method for class 'sensiHSIC'
plot(x, ylim = c(0, 1), ...)

Arguments

`model`	A function, or a statistical model with a `predict` method. It defines the input-output model that needs to be studied.
`X`	A $n$ -by- $p$ matrix containing all input samples. It comprises $n$ joint observations of the $p$ input variables. If the user is only wanting to estimate HSIC indices or R2-HSIC indices, the input variables can be correlated. If the user is also wanting to estimate HSIC-ANOVA indices, the input variables have to be mutually independent.
`target`	A list of options to perform TSA. At least, `target` must contain an option named `"c"`. For other options, there exist default assignments. `type` is a string specifying the weight function. Available choices include `"indicTh"`, `"zeroTh"`, `"logistic"` and `"exp1side"`. Default value is `"exp1side"`. `"indicTh"` and `"zeroTh"` only depend on a threshold parameter. `"logistic"` and `"exp1side"` depend on both a threshold parameter and a smoothness parameter. `c` is a scalar value specifying the threshold parameter. `upper` is a boolean indicating whether the target region is located above (`TRUE`) or below (`FALSE`) the threshold parameter `c`. Only relevant when `type` is `"indicTh"`, `"zeroTh"` or `"exp1side"`. Default value is `TRUE`. `param` is a scalar value specifying the smoothness parameter. Only relevant when `type` is `"logistic"` or `"exp1side"`. Default value is `1`.
`cond`	A list of options to perform CSA. At least, `cond` must contain an option named `"c"`. For other options, there exist default assignments. `type` is a string specifying the weight function. Available choices include `"indicTh"`, `"zeroTh"`, `"logistic"` and `"exp1side"`. Default value is `"exp1side"`. `"indicTh"` and `"zeroTh"` only depend on a threshold parameter. `"logistic"` and `"exp1side"` depend on both a threshold parameter and a smoothness parameter. `c` is a scalar value specifying the threshold parameter. `upper` is a boolean indicating whether the conditioning region is located above (`TRUE`) or below (`FALSE`) the threshold parameter `c`. Only relevant when `type` is `"indicTh"`, `"zeroTh"` or `"exp1side"`. Default value is `TRUE`. `param` is a scalar value specifying the smoothness parameter. Only relevant if `type` is `"logistic"` or `"exp1side"`. Default value is `1`.
`kernelX`	A string or a vector of $p$ strings that specifies how to choose input kernels. If only one string is provided, the associated kernel is affected to all inputs. For dimension-wise kernel selection, a vector of $p$ strings must be provided. For each input variable, available choices include `"categ"` (categorical kernel), `"dcov"` (covariance kernel of the fractional Brownian motion), `"invmultiquad"` (inverse multiquadratic kernel), `"laplace"` (exponential kernel), `"linear"` (dot-product kernel), `"matern3"` (Matern $3/2$ kernel), `"matern5"` (Matern $5/2$ kernel), `"raquad"` (rationale quadratic kernel), `"rbf"` (Gaussian kernel), `"sobolev1"` (Sobolev kernel with smoothness parameter $r=1$ ) and `"sobolev2"` (Sobolev kernel with smoothness parameter $r=2$ ). In addition, let us assume that all input variables are uniformly distributed on $[0,1]$ . Under this assumption, the kernels `"laplace"`, `"matern3"`, `"matern5"` and `"rbf"` can be easily transformed into ANOVA kernels. The resulting kernels are respectively called `"laplace_anova"`, `"matern3_anova"`, `"matern5_anova"` and `"rbf_anova"`. One-parameter kernels: `"categ"`, `"dcov"`, `"invmultiquad"`, `"laplace"`, `"laplace_anova"`, `"matern3"`, `"matern3_anova"`, `"matern5"`, `"matern5_anova"`, `"raquad"`, `"rbf"` and `"rbf_anova"`. Parameter-free kernels: `"linear"`, `"sobolev1"` and `"sobolev2"`.
`paramX`	A scalar value or a vector of $p$ values with input kernel parameters. If `paramX=NA`, input kernel parameters are computed automatically with rules of thumb. If `paramX` is a scalar value, it is affected to all input kernels. For dimension-wise kernel parametrization, a vector of $p$ values must be provided. If `kernelX` combines one-parameter kernels and parameter-free kernels, `NA` must be specified for parameter-free kernels.
`kernelY`	A string, a vector of $q$ strings or a list of options that specifies how to construct the output kernel. Regardless of its mathematical nature, the model output must be envisioned as a $q$ -dimensional random vector. To deal with a scalar output or a low-dimensional vector output, it is advised to select one kernel per output dimension and to tensorize all selected kernels. In this case, `kernelY` must be a string or a vector of $q$ strings. If only one string is provided, the associated kernel is repeated $q$ times. For dimension-wise kernel selection, a vector of $q$ strings must be provided. Have a look at `kernelX` for an exhaustive list of available kernels. To deal with a high-dimensional vector output or a functional output, it is advised to reduce dimension or to use a dedicated kernel. In this case, `kernelY` must be specified as a list of options. At least, `kernelY` must contain an option named `"method"`. For other options, there exist default assignments. `method` is a string indicating the strategy used to construct the output kernel. Available choices include `"PCA"` (dimension reduction through principal component analysis), `"DTW"` (dynamic type warping) and `"GAK"` (global alignment kernel). If `method="PCA"`, the following options may also be specified: `data.centering` is a boolean indicating whether the input samples must be centered before performing the preliminary PCA. Default value is `TRUE`. `data.scaling` is a boolean indicating whether the input samples must be scaled before performing the preliminary PCA. Default value is `TRUE`. `fam` is a string specifying the input kernel which is applied on principal components. Available choices only include `"dcov"`, `"invmultiquad"`, `"laplace"`, `"linear"`, `"matern3"`, `"matern5"`, `"raquad"` and `"rbf"`. Default value is `"rbf"`. `expl.var` is a scalar value (between $0$ and $1$ ) specifying the expected percentage of output variance that must be explained by PCA. Default value is `0.95`. `PC` is the expected number of principal components in PCA. Default value is `NA`. `combi` is a string indicating how the final output kernel is built in the reduced subspace. Available options include `"sum"` or `"prod"`. The chosen kernel in `fam` is applied on all principal components before summation (if `"sum"`) or tensorization (if `"prod"`). `position` is a string indicating whether weights have to be involved in the construction of the final output kernel in the reduced subspace. Available choices include `"nowhere"` (no weights), `"intern"` (weights applied on principal components) or `"extern"` (weights applied on kernels). Default value is `"intern"`. Remark: `expl.var` and `PC` are conflicting options. Only one of both needs to be specified and the other one must be set to `NA`. If both are specified, `expl.var` is prioritized. If both are set to `NA`, `expl.var` is then set to its default value. If `method="DTW"`, the following option may also be specified: `fam` is a string specifying the translation-invariant kernel which is combined with DTW. Available choices only include `"invmultiquad"`, `"laplace"`, `"matern3"`, `"matern5"`, `"raquad"` and `"rbf"`. Default value is `"rbf"`. If `method="GAK"`, there is no other option to specify.
`paramY`	A scalar value or a vector of values with output kernel parameters. If `paramY=NA`, output kernel parameters are computed automatically with rules of thumb. In other cases, `paramY` must be specified in agreement with `kernelY`. Case 1: `kernelY` is a string or a vector of $q$ strings. `paramY` must be a scalar value or a vector of $q$ values with output kernel parameters. If `paramY` is a scalar value, it is affected to all output kernels. For dimension-wise kernel parametrization, a vector of $q$ values must be provided. If `kernelY` combines one parameter kernels and parameter-free kernels, `NA` must be specified for parameter-free kernels. Case 2: `kernelY` is a list of options with `method="PCA"`. `paramY` must be set to `NA` because the parameters involved in the final output kernel are computed automatically. Their optimal tuning depends on the reduced subspace given by PCA. Case 3: `kernelY` is a list of options with `method="DTW"`. `paramY` must be set to `NA`. Case 4: `kernelY` is a list of options with `method="GAK"`. `paramY` must be a vector of $2$ values. If the user only wants to specify one parameter, the other one must be set to `NA`. The two parameters correspond to the arguments `sigma` and `window.size` in the function `gak` from the package `dtwclust`. However, automatical computation (specified by `paramY=NA`) is strongly advised for this kind of output kernel.
`estimator.type`	A string specifying the kind of estimator used for HSIC indices. Available choices include `"U-stat"` (U-stastics) and `"V-stat"` (V-statistics). U-statistics are unbiased estimators. V-statistics are biased estimators but they become unbiased asymptotically. In the specific case of HSIC indices, V-statistics are non-negative estimators and they offer more flexibility for further test procedures (see `testHSIC`). Both kinds of estimators can be computed with complexity $O(n^2)$ where $n$ denotes the sample size.
`nboot`	Number of bootstrap replicates.
`conf`	A scalar value (between $0$ and $1$ ) specifying the level of confidence intervals.
`anova`	A list of parameters to achieve an ANOVA-like decomposition based on HSIC indices. At least, `anova` must contain an option named `"obj"`. For other options, there exist default assignments. `obj` is a string specifying which kinds of HSIC-ANOVA indices are expected. Available choices include `"no"` (`anova` is disabled), `"FO"` (first-order only), `"TO"` (total-order only) and `"both"` (first-order and total-order). `is.uniform` is a boolean indicating whether the samples stored in `X` come from random variables that are uniformly distributed on $[0,1]$ . Let us recall that HSIC-ANOVA indices can only be computed by means of ANOVA kernels. Among available kernels, only `"laplace_anova"`, `"matern3_anova"`, `"matern5_anova"`, `"rbf_anova"`, `"sobolev1"` and `"sobole2"` verify this constraint (provided that all input variables are uniformly distributed on $[0,1]$ ). If `is.uniform=TRUE`, it is checked that each input data stored in $X$ actually lies in $[0,1]$ . If this condition is not verified, an error is returned. If `is.uniform=FALSE`, non-parametric rescaling (based on empirical distribution functions) is operated.
`sensi`	An object of class `"sensiHSIC"` resulting from a prior call to the hereby function. If an object of class `"sensiHSIC"` is indeed provided, the following happens: If `sensi` contains an object named `"KX"`, it is extracted from `sensi` and the input Gram matrices (required to estimate HSIC indices) are not built from `X`, `kernelX` and `paramX`. If `sensi` contains an object named `"KY"`, it is extracted from `sensi` and the output Gram matrix (required to estimate HSIC indices) is not built from `model`, `kernelY` and `paramY`.
`save.GM`	A list of parameters indicating whether Gram matrices have to be saved. The list `save.GM` must contain options named `"KX"` and `"KY"`. `KX` is a boolean indicating whether the input Gram matrices have to be saved. `KY` is a boolean indicating whether the output Gram matrix has to be saved.
`x`	An object of class `"sensiHSIC"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	A $n$ -by- $q$ matrix containing all output samples. It comprises $n$ observations of the $q$ output variables.
`ylim`	A vector of two values specifying the $y$ -coordinate plotting limits.
`...`	Any other arguments for `model` which are passed unchanged each time `model` is called.

Details

Let $(Xi,Y)$ be an input-output pair. The kernels assigned to $Xi$ and $Y$ are respectively denoted by $Ki$ and $KY$ .

For many global sensitivity measures, the influence of $Xi$ on $Y$ is measured in the light of the probabilistic dependence that exists within the input-output pair $(Xi,Y)$ . For this, a dissimilarity measure is applied between the joint probability distribution (true impact of $Xi$ and $Y$ in the numerical model) and the product of marginal distributions (no impact of $Xi$ on $Y$ ). For instance, Borgonovo's sensitivity measure is built upon the total variation distance between those two probability distributions. See Borgonovo and Plischke (2016) for further details.

The HSIC-based sensitivity measure can be understood in this way since the index $HSIC(Xi,Y)$ results from the application of the Hilbert-Schmidt independence criterion (HSIC) on the pair $(Xi,Y)$ . This criterion is nothing but a special kind of dissimilarity measure between the joint probability distribution and the product of marginal distributions. This dissimilarity measure is called the maximum mean discrepancy (MMD) and its definition relies on the selected kernels $Ki$ and $KY$ . According to the theory of reproducing kernels, every kernel $K$ is related to a reproducing kernel Hilbert space (RKHS).Then, if $K$ is affected to a random variable $Z$ , any probability distribution describing the random behavior of $Z$ may be represented within the induced RKHS. In this setup, the dissimilarity between the joint probability distribution and the product of marginal distributions is then measured through the squared norm of their images into the bivariate RKHS. The user is referred to Gretton et al. (2006) for additional details on the mathematical construction of HSIC indices.

In practice, it may be difficult to understand how $HSIC(Xi,Y)$ measures dependence within $(Xi,Y)$ . An alternative definition relies on the concept of feature map. Let us recall that the value taken by a kernel function can always be seen as the scalar product of two feature functions lying in a feature space. Definition 1 in Gretton et al. (2005) introduces $HSIC(Xi,Y)$ as the Hilbert-Schmidt norm of a covariance-like operator between random features. For this reason, having access to the input and output feature maps may help identify the dependence patterns captured by $HSIC(Xi,Y)$ .

Kernels must be chosen very carefully. There exists a wide variety of kernels but only a few f them meet the needs of GSA. As $HSIC(Xi,Y)$ is supposed to be a dependence measure, it must be equal to $0$ if and only if $Xi$ and $Y$ are independent. A sufficient condition to enable this equivalence is to take two characteristic kernels. The reader is referred to Fukumizu et al. (2004) for the mathematical definition of a characteristic kernel and to Sriperumbur et al. (2010) for an overview of the major related results. In particular:

The Gaussian kernel, the Laplace kernel, the Matern $3/2$ kernel and the Matern $5/2$ kernel (all defined on $R^2$ ) are characteristic.
The transformed versions of the four abovementioned kernels (all defined on $[0,1]^2$ ) are characteristic.
All Sobolev kernels (defined on $[0,1]^2$ ) are characteristic.
The categorical kernel (defined on any discrete probability space) is characteristic.

Lemma 1 in Gretton et al. (2005) provides a third way of defining $HSIC(Xi,Y)$ . Since the associated formula is only based on three expectation terms, the corresponding estimation procedures are very simple and they do not ask for a large amount of input-output samples to be accurate. Two kinds of estimators may be used for $HSIC(Xi,Y)$ : the V-statistic estimator (which is non negative, biased and asymptotically unbiased) or the U-statistic estimator (unbiased). For both estimators, the computational complexity is $O(n^2)$ where $n$ is the sample size.

The user must always keep in mind the key steps leading to the estimation of $HSIC(Xi,Y)$ :

Input samples are simulated and the corresponding output samples are computed with the numerical model.
An input kernel $Ki$ and an output kernel $KY$ are selected.
In case of target sensitivity analysis: output samples are transformed by means of a weight function $w$ .
The input and output Gram matrices are constructed.
In case of conditional sensitivity analysis: conditioning weights are computed by means of a weight function $w$ .
The final estimate is computed. It depends on the selected estimator type (either a U-statistic or a V-statistic).

Kernel functions for random variables

All what follows is written for a scalar output $Y$ but the same is true for any scalar input $Xi$ .

Let $D$ denote the support of the output probability distribution. A kernel is a symmetric and positive definite function defined on the domain $D$ . Different kernel families are available in sensiHSIC.

To deal with continuous probability distributions on $R$ , one can use:
- The covariance kernel of the fractional Browian motion ("dcov"), the inverse multiquadratic kernel ("invmultiquad"), the exponential kernel ("laplace"), the dot-product kernel ("linear"), the Matern $3/2$ kernel ("matern3"), the Matern $5/2$ kernel ("matern5"), the rationale quadratic kernel ("raquad") and the Gaussian kernel ("rbf").
To deal with continuous probability distributions on $[0,1]$ , one can use:
- Any of the abovementioned kernel (restricted to $[0,1]$ ).
- The transformed exponential kernel ("laplace_anova"), the transformed Matern $3/2$ kernel ("matern3_anova"), the transformed Matern $5/2$ kernel ("matern5_anova"), the transformed Gaussian kernel ("rbf_anova"), the Sobolev kernel with smoothness parameter $r=1$ ("sobolev1") and the Sobolev kernel with smoothness parameter $r=2$ ("sobolev2").
To deal with any discrete probability distribution, the categorical kernel ("categ") must be used.

Two kinds of kernels must be distinguished:

Parameter-free kernels: the dot-product kernel ("linear"), the Sobolev kernel with smoothness parameter $r=1$ ("sobolev1") and the Sobolev kernel with smoothness parameter $r=2$ ("sobolev2").
One-parameter kernels: the categorical kernel ("categ"), the covariance kernel of the fractional Brownian motion kernel ("dcov"), the inverse multiquadratic kernel ("invmultiquad"), the exponential kernel ("laplace"), the transformed exponential kernel ("laplace_anova"), the Matern $3/2$ kernel ("matern3"), the transformed Matern $3/2$ kernel ("matern3_anova"), the Matern $5/2$ kernel ("matern5"), the transformed Matern $5/2$ kernel ("matern5_anova"), the rationale quadratic kernel ("raquad"), the Gaussian kernel ("rbf") and the transformed Gaussian kernel ("rbf_anova").

A major issue related to one-parameter kernels is how to set the parameter. It mainly depends on the role played by the parameter in the kernel expression.

For translation-invariant kernels and their ANOVA variants (that is all one-parameter kernels except "categ" and "dcov"), the parameter may be interpreted as a correlation length (or a scale parameter). The rule of thumb is to compute the empirical standard deviation of the provided samples.
For the covariance kernel of the fractional Brownian motion ("dcov"), the parameter is an exponent. Default value is $1$ .
For the categorical kernel ("categ"), the parameter has no physical sense. It is just a kind of binary encoding.
- $0$ means the user wants to use the basic categorical kernel.
- $1$ means the user wants to use the weighted variant of the categorical kernel.

How to deal with a low-dimensional vector output?

Let us assume that the output vector $Y$ is composed of $q$ random variables $Y1,...,Yq$ .

A kernel $Kj$ is affected to each output variable $Yj$ and this leads to embed the $j$ -th output probability distribution in a RKHS denoted by $Hj$ . Then, the tensorization of $H1,...,Hq$ allows to build the final RKHS, that is the RKHS where the $q$ -variate output probability distribution describing the overall random behavior of $Y$ will be embedded. In this situation:

The final output kernel is the tensor product of all output kernels.
The final output Gram matrix is the Hadamard product of all output Gram matrices.

Once the final output Gram matrix is built, HSIC indices can be estimated, just as in the case of a scalar output.

How to deal with a high-dimensional vector output or a functional output?

In sensiHSIC, three different methods are proposed in order to compute HSIC-based sensitivity indices in presence of functional outputs.

Dimension reduction

This approach was initially proposed by Da Veiga (2015). The key idea is to approximate the random functional output by the first terms of its Kharunen-Loeve expansion. This can be achived with a principal component analysis (PCA) that is carried out on the empirical covariance matrix.

The eigenvectors (or principal directions) allow to approximate the (deterministic) functional terms involved in the Kharunen-Loeve decomposition.
The eigenvalues allow to determine how many principal directions are sufficient in order to accurately represent the random function by means of its truncated Kharunen-Loeve expansion. The key idea behind dimension reduction is to keep as few principal directions as possible while preserving a prescribed level of explained variance.

The principal components are the coordinates of the functional output in the low-dimensional subspace resulting from PCA. There are computed for all output samples (time series observations). See Le Maitre and Knio (2010) for more detailed explanations.

The last step consists in constructing a kernel in the reduced subspace. One single kernel family is selected and affected to all principal directions. Moreover, all kernel parameters are computed automatically (with appropriate rules of thumb). Then, several strategies may be considered.

The initial method described in Da Veiga (2015) is based on a direct tensorization. One can also decide to sum kernels.
This approach was improved by El Amri and Marrel (2021). For each principal direction, a weight coefficient (equal the ratio between the eigenvalue and the sum of all selected eigenvalues) is computed.
- The principal components are multiplied by their respective weight coefficients before summing kernels or tensorizing kernels.
- The kernels can also be directly applied on the principal components before being linearly combined according to the weight coefficients.

In sensiHSIC, all these strategies correspond to the following specifications in kernelY:

Direct tensorization: kernelY=list(method="PCA", combi="prod", position="nowhere")
Direct sum: kernelY=list(method="PCA", combi="sum", position="nowhere")
Rescaled tensorization: kernelY=list(method="PCA", combi="prod", position="intern")
Rescaled sum: kernelY=list(method="PCA", combi="sum", position="intern")
Weighted linear combination: kernelY=list(method="PCA", combi="sum", position="extern")

Dynamic Time Warping (DTW)

The DTW algorithm developed by Sakoe and Chiba (1978) can be combined with a translation-invariant kernel in order to create a kernel function for times series. The resulting DTW-based output kernel is well-adapted to measure similarity between two given time series.

Suitable translation-invariant kernels include the inverse multiquadratic kernel ("invmultiquad"), the exponential kernel ("laplace"), the Matern $3/2$ kernel ("matern3"), the Matern $5/2$ kernel ("matern5"), the rationale quadratic kernel ("raquad") and the Gaussian kernel ("rbf").

The user is warned against the fact that DTW-based kernels are not positive definite functions. As a consequence, many theoretical properties do not hold anymore for HSIC indices.

For faster computations, sensiHSIC is using the function dtw_dismat from the package incDTW.

Global Alignment Kernel (GAK)

Unlike DTW-based kernels, the GAK is a positive definite function. This time-series kernel was originally introduced in Cuturi et al. (2007) and further investigated in Cuturi (2011). It was used to compute HSIC indices on a simplified compartmental epidemiological model in Da Veiga (2021).

For faster computations, sensiHSIC is using the function gak from the package dtwclust.

In sensiHSIC, two GAK-related parameters may be tuned by the user with paramY. They exactly correspond to the arguments sigma and window.size in the function gak.

About normalized HSIC indices (R2-HSIC)

No doubt interpretability is the major drawback of HSIC indices. This shortcoming led Da Veiga (2021) to introduce a normalized version of $HSIC(Xi,Y)$ . The so-called R2-HSIC index is thus defined as the ratio between $HSIC(Xi,Y)$ and the square root of a normalizing constant equal to $HSIC(Xi,Xi)*HSIC(Y,Y)$ .

This normalized sensitivity measure is inspired from the distance correlation measure proposed by Szekely et al. (2007) and the resulting sensitivity indices are easier to interpret since they all fall in the interval $[0,1]$ .

About target HSIC indices (T-HSIC)

T-HSIC indices were designed by Marrel and Chabridon (2021) for TSA. They are only defined for a scalar output. Vector and functional outputs are not supported. The main idea of TSA is to measure the influence of each input variable $Xi$ on a modified version of $Y$ . To do so, a preliminary mathematical transform $w$ (called the weight function) is applied on $Y$ . The collection of HSIC indices is then estimated with respect to $w(Y)$ . Here are two examples of situations where TSA is particularly relevant:

How to measure the impact of $Xi$ on the upper values taken by $Y$ (for example the values above a given threshold $T$ )?
- To answer this question, one may take $w(Y)=Y*1_{Y>T}$ (zero-thresholding).
  This can be specified in sensiHSIC with target=list(c=T, type="zeroTh", upper=TRUE).
How to measure the influence of $Xi$ on the occurrence of the event ${Y>T}$ ?
- To answer this question, one may take $w(Y)=1_{Y<T}$ (indicator-thresholding).
  This can be specified in sensiHSIC with target=list(c=T, type="indicTh", upper=FALSE).

In Marrel and Chabridon (2021), the two situations described above are referred to as "hard thresholding". To avoid using discontinuous weight functions, "smooth thresholding" may be used instead.

Spagnol et al. (2019): logistic transformation on both sides of the threshold $T$ .
Marrel and Chabridon (2021): exponential transformation above or below the threshold $T$ .

These two smooth relaxation functions depend on a tuning parameter that helps control smoothness. For further details, the user is invited to consult the documentation of the function weightTSA.

Remarks:

When type="indicTh" (indicator-thesholding), $w(Y)$ becomes a binary random variable. Accordingly, the output kernel selected in kernelY must be the categorical kernel.
In the spirit of R2-HSIC indices, T-HSIC indices can be normalized. The associated normalizing constant is equal to the square root of $HSIC(Xi,Xi)*HSIC(w(Y),w(Y))$ .
T-HSIC indices can be very naturally combined with the HSIC-ANOVA decomposition proposed by Da Veiga (2021). As a consequence, the arguments target and anova in sensiHSIC can be enabled simultaneously. Compared with basic HSIC indices, there are three main differences: the input variables must be mutually independent, ANOVA kernels must be used for all input variables and the output of interest is $w(Y)$ .
T-HSIC indices can be very naturally combined with the tests of independence proposed in testHSIC. In this context, the null hypothesis is $H0$ : " $Xi$ and $w(Y)$ are independent".

About conditional HSIC indices (C-HSIC)

C-HSIC indices were designed by Marrel and Chabridon (2021) for CSA. They are only defined for a scalar output. Vector and functional outputs are not supported. The idea is to measure the impact of each input variable $Xi$ on $Y$ when a specific event occurs. This conditioning event is defined on $Y$ thanks to a weight function $w$ . In order to compute the conditioning weights, $w$ is applied on the output samples and an empirical normalization is carried out (so that the overall sum of conditioning weights is equal to $1$ ). The conditioning weights are then combined with the simulated Gram matrices in order to estimate C-HSIC indices. All formulas can be found in Marrel and Chabridon (2021). Here is an exemple of a situation where CSA is particularly relevant:

Let us imagine that the event ${Y>T}$ coincides with a system failure.
How to measure the influence of $Xi$ on $Y$ when failure occurs?
- To answer this question, one may take $w(Y) = 1_{Y>T}$ (indicator-thresholding).
  This can be specified in sensiHSIC with cond=list(c=T, type="indicTh", upper=TRUE).

The three other weight functions proposed for TSA (namely "zeroTh", "logistic" and "exp1side") can also be used but the role they play is less intuitive to understand. See Marrel and Chabridon (2021) for better explanations.

Remarks:

Unlike what is pointed out for TSA, when type="thresholding", the output of interest $Y$ remains a continuous random variable. The categorical kernel is thus inappropriate. A continuous kernel must be used instead.
In the spirit of R2-HSIC indices, C-HSIC indices can be normalized. The associated normalizing constant is equal to the square root of $C-HSIC(Xi,Xi)*C-HSIC(Y,Y)$ .
Only V-statistics are supported to estimate C-HSIC indices. The reason is because the normalized version of C-HSIC indices cannot always be estimated with U-statistics. In particular, the estimates of $C-HSIC(Xi,Xi)*C-HSIC(Y,Y)$ may be negative.
C-HSIC indices cannot be combined with the HSIC-ANOVA decomposition proposed in Da Veiga (2021). In fact, the conditioning operation is feared to introduce statistical dependence among input variables, which forbids using HSIC-ANOVA indices. As a consequence, the arguments cond and anova in sensiHSIC cannot be enabled simultaneously.
C-HSIC indices can harly be combined with the tests of inpendence proposed in testHSIC. This is only possible if type="indicTh". In this context, the null hypothesis is $H0$ : " $Xi$ and $Y$ are independent if the event described in cond occurs".

About HSIC-ANOVA indices

In comparison with HSIC indices, R2-HSIC indices are easier to interpret. However, in terms of interpretability, Sobol' indices remain much more convenient since they can be understood as shares of the total output variance. Such an interpretation is made possible by the Hoeffding decomposition, also known as ANOVA decomposition.

It was proved in Da Veiga (2021) that an ANOVA-like decomposition can be achived for HSIC indices under certain conditions:

The input variables must be mutually independent (which was not required to compute all other kinds of HSIC indices).
ANOVA kernels must be assigned to all input variables.

This ANOVA setup allows to establish a strict separation between main effects and interaction effects in the HSIC sense. The first-order and total-order HSIC-ANOVA indices are then defined in the same fashion than first-order and total-order Sobol' indices. It is worth noting that the HSIC-ANOVA normalizing constant is equal to $HSIC(X,Y)$ and is thus different from the one used for R2-HSIC indices.

For a given probability measure $P$ , an ANOVA kernel $K$ is a kernel that can rewritten $1+k$ where $k$ is an orthogonal kernel with respect to $P$ . Among the well-known parametric families of probability distributions and kernel functions, there are very few examples of orthogonal kernels. One example is given by Sobolev kernels when there are matched with the uniform probability measure on [0,1]. See Wahba et al. (1995) for further details on Sobolev kernels.

Moreover, several strategies to construct orthogonal kernels from non-orthogonal kernels are recalled in Da Veiga (2021). One of them consists in translating the feature map so that the resulting kernel becomes centered at the prescribed probability measure $P$ . This can be done analytically for some basic kernels (Gaussian, exponential, Matern $3/2$ and Matern $5/2$ ) when $P$ is the uniform measure on $[0,1]$ . See Section 9 in Ginsbourger et al. (2016) for the corresponding formulas.

In sensiHSIC, ANOVA kernels are only available for the uniform probability measure on $[0,1]$ . This includes the Sobolev kernel with parameter $r=1$ ("sobolev1"), the Sobolev kernel with parameter $r=2$ ("sobolev2"), the transformed Gaussian kernel ("rbf_anova"), the transformed exponential kernel ("laplace_anova"), the transformed Matern $3/2$ kernel ("matern3_anova") and the transformed Matern $5/2$ kernel ("matern5_anova").

As explained above, the HSIC-ANOVA indices can only be computed if all input variables are uniformly distributed on $[0,1]$ . Because of this limitation, a preliminary reformulation is needed if the GSA problem includes other kinds of input probability distributions. The probability integral transform (PIT) must be applied on each input variable $Xi$ . In addition, all quantile functions must be encapsulated in the numerical model, which may lead to reconsider the way model is specified. In sensiHSIC, if check=TRUE is selected in anova, it is checked that all input samples lie in $[0,1]$ . If this is not the case, a non-parametric rescaling (based on empirical distribution functions) is operated.

HSIC-ANOVA indices can be used for TSA. The only difference with GSA is the use of a weight function $w$ . On the contrary, CSA cannot be conducted with HSIC-ANOVA indices. Indeed, the conditioning operation is feared to introduce statistical independence among the input variables, which prevents using the HSIC-ANOVA approach.

Value

sensiHSIC returns a list of class "sensiHSIC". It contains all the input arguments detailed before, except sensi which is not kept. It must be noted that some of them might have been altered, corrected or completed.

`kernelX`	A vector of $p$ strings with input kernels.
`paramX`	A vector of $p$ values with input kernel parameters. For each one-parameter kernel, a real number is returned. It is either the original value (if correct), a corrected value (if not) or the default value (computed from a rule of thumb when `NA` is specified). For each parameter-free kernel, `NA` is returned.
`kernelY`	A vector of $q$ strings or a list of options that specifies how the output kernel was constructed. In the case where `kernelY` is a list of options with `method="PCA"`, `kernelY` contains additional information resulting from PCA. If `kernelY` initally contained an option named `"expl.var"`, `kernelY` now also contains an option named `"PC"` that provides the associated number of principal components. If `kernelY` initially contained an option named `"PC"`, `kernelY` now also contains an option named `"expl.var"` that provides the associated percentage of output variance that is explained by PCA. If `kernelY` initally contained an option named `"position"` that was set to `"intern"` or `"extern"`, `kernelY` now contains an option named `"ratios"` that provides the weights used to combine kernels in the reduced subspace given by PCA.
`paramY`	A vector of values with output kernel parameters. Case 1: `kernelY` is a list of $q$ strings. `paramY` is a vector of `q` values. For each one-parameter kernel, a real number is returned. It is either the original value (if correct), a corrected value or the default value (computed with a rule of thumb if `NA` was initially specified). For each parameter-free kernel, `NA` is returned. Case 2: `kernelY` is a list of options with `method="PCA"`. `paramY` is a vector of `PC` values. For this method, let us recall that all kernels belong to the same family which is specified by an option named `"fam"` within `kernelY`. For each dimension in the reduced subspace, the kernel parameter is computed (with a rule of thumb) from the corresponding principal component. If the kernel in `fam` is parameter-free, `paramY` is a vector where `NA` is repeated `PC` times. Case 3: `kernelY` is a list of options with `method="DTW"`. `paramY` remains equal to `NA`. Case 4: `kernelY` is a list of options with `method="GAK"`. `paramY` is a vector of $2$ values. For each parameter, the returned value is either the original value (if correct), a corrected value or the default value (computed with a rule of thumb if `NA` was initially specified).

More importantly, the list of class "sensiHSIC" contains all expected results (output samples, sensitivity measures and conditioning weights).

`call`	The matched call.
`y`	A $n$ -row matrix containing all output samples. The $i$ -th row in `y` is obtained from the $i$ -th row in `X` after computing the model response. If `target` is passed to `sensiHSIC`, output samples in `y` are obtained after applying consecutively `model` and the specified weight function.
`HSICXY`	The estimated HSIC indices.
`S`	The estimated R2-HSIC indices (also called normalized HSIC indices).
`weights`	Only if `cond` is passed to `sensiHSIC`. A vector of $n$ values containing all conditioning weights. In the CSA context, the conditioning factor is defined by $w(Y)/E[w(Y)]$ . See Marrel and Chabridon (2021) for further explanations.

Depending on what is specified in anova, the list of class "sensiHSIC" may also contain the following objects:

`FO`	The estimated first-order HSIC-ANOVA indices.
`TO`	The estimated total-order HSIC-ANOVA indices.
`TO.num`	The estimated numerators of total-order HSIC-ANOVA indices.
`denom`	The estimated common denominator of HSIC-ANOVA indices.

Author(s)

Sebastien Da Veiga, Amandine Marrel, Anouar Meynaoui, Reda El Amri and Gabriel Sarazin.

References

Borgonovo, E. and Plischke, E. (2016), Sensitivity analysis: a review of recent advances, European Journal of Operational Research, 248(3), 869-887.

Cuturi, M., Vert, J. P., Birkenes, O. and Matsui, T. (2007), A kernel for time series based on global alignments, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 2, pp. II-413), IEEE.

Cuturi, M. (2011), Fast global alignment kernels, Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 929-936).

Da Veiga, S. (2015), Global sensitivity analysis with dependence measures, Journal of Statistical Computation and Simulation, 85(7), 1283-1305.

Da Veiga, S. (2021). Kernel-based ANOVA decomposition and Shapley effects: application to global sensitivity analysis, arXiv preprint arXiv:2101.05487.

El Amri, M. R. and Marrel, A. (2021), More powerful HSIC-based independence tests, extension to space-filling designs and functional data. https:/cea.hal.science/cea-03406956/

Fukumizu, K., Bach, F. R. and Jordan, M. I. (2004), Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, Journal of Machine Learning Research, 5(Jan), 73-99.

Ginsbourger, D., Roustant, O., Schuhmacher, D., Durrande, N. and Lenz, N. (2016), On ANOVA decompositions of kernels and Gaussian random field paths, Monte Carlo and Quasi-Monte Carlo Methods (pp. 315-330), Springer, Cham.

Gretton, A., Bousquet, O., Smola, A., and Scholkopf, B. (2005), Measuring statistical dependence with Hilbert-Schmidt norms, International Conference on Algorithmic Learning Theory (pp. 63-77), Springer, Berlin, Heidelberg.

Gretton, A., Borgwardt, K., Rasch, M., Scholkopf, B. and Smola, A. (2006), A kernel method for the two-sample-problem, Advances in Neural Information Processing Systems, 19.

Le Maitre, O. and Knio, O. M. (2010), Spectral methods for uncertainty quantification with applications to computational fluid dynamics, Springer Science & Business Media.

Marrel, A. and Chabridon, V. (2021), Statistical developments for target and conditional sensitivity analysis: application on safety studies for nuclear reactor, Reliability Engineering & System Safety, 214, 107711.

Sakoe, H. and Chiba, S. (1978), Dynamic programming algorithm optimization for spoken word recognition, IEEE International Conference on Acoustics, Speech and Signal, 26(1), 43-49.

Spagnol, A., Riche, R. L. and Veiga, S. D. (2019), Global sensitivity analysis for optimization with variable selection, SIAM/ASA Journal on Uncertainty Quantification, 7(2), 417-443.

Sriperumbudur, B., Fukumizu, K. and Lanckriet, G. (2010), On the relation between universality, characteristic kernels and RKHS embedding of measures, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 773-780). JMLR Workshop and Conference Proceedings.

Szekely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007), Measuring and testing dependence by correlation of distances, The Anals of Statistics, 35(6), 2769-2794.

Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995), Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy: the 1994 Neyman Memorial Lecture, The Annals of Statistics, 23(6), 1865-1895.

Examples

 

############################
### HSIC indices for GSA ###
############################

# Test case 1: the Friedman function
# --> 5 input variables

### GSA with a given model ###

n <- 800
p <- 5
X <- matrix(runif(n*p), n, p)

kernelX <- c("rbf", "rbf", "laplace", "laplace", "sobolev1")
paramX <- c(0.2, 0.3, 0.4, NA, NA)

# kernel for X1: Gaussian kernel with given parameter 0.2
# kernel for X2: Gaussian kernel with given parameter 0.3
# kernel for X3: exponential kernel with given parameter 0.4
# kernel for X4: exponential kernel with automatic computation of the parameter
# kernel for X5: Sobolev kernel (r=1) with no parameter

kernelY <- "raquad"
paramY <- NA 

sensi <- sensiHSIC(model=friedman.fun, X,
                   kernelX=kernelX, paramX=paramX, 
                   kernelY=kernelY, paramY=paramY)

print(sensi)
plot(sensi)
title("GSA for the Friedman function")

### GSA with given data ###

Y <- friedman.fun(X)
sensi <- sensiHSIC(model=NULL, X,
                   kernelX=kernelX, paramX=paramX, 
                   kernelY=kernelY, paramY=paramY)
tell(sensi, y=Y)

print(sensi)

### GSA from a prior object of class "sensiHSIC" ###

new.sensi <- sensiHSIC(model=friedman.fun, X,
                       kernelX=kernelX, paramX=paramX, 
                       kernelY=kernelY, paramY=paramY,
                       estimator.type="U-stat", 
                       sensi=sensi,
                       save.GM=list(KX=FALSE, KY=FALSE))

print(new.sensi)

# U-statistics are computed without rebuilding all Gram matrices.
# Those Gram matrices are not saved a second time.

##################################
### HSIC-ANOVA indices for GSA ###
##################################

# Test case 2: the Matyas function with Gaussian input variables
# --> 3 input variables (including 1 dummy variable)

n <- 10^3
p <- 2

X <- matrix(rnorm(n*p), n, p)

# The Sobolev kernel (with r=1) is used to achieve the HSIC-ANOVA decomposition.
# Both first-order and total-order HSIC-ANOVA indices are expected.

### AUTOMATIC RESCALING ###

kernelX <- "sobolev1"
anova <- list(obj="both", is.uniform=FALSE)

sensi.A <- sensiHSIC(model=matyas.fun, X, kernelX=kernelX, anova=anova)

print(sensi.A)
plot(sensi.A)
title("GSA for the Matyas function")

### PROBLEM REFORMULATION ###

U <- matrix(runif(n*p), n, p)
new.matyas.fun <- function(U){ matyas.fun(qnorm(U)) }

kernelX <- "sobolev1"
anova <- list(obj="both", is.uniform=TRUE)

sensi.B <- sensiHSIC(model=new.matyas.fun, U, kernelX=kernelX, anova=anova)

print(sensi.B)

####################################
### T-HSIC indices for target SA ###
####################################

# Test case 3: the Sobol function
# --> 8 input variables

n <- 10^3
p <- 8

X <- matrix(runif(n*p), n, p)

kernelY <- "categ"
target <- list(c=0.4, type="indicTh")

sensi <- sensiHSIC(model=sobol.fun, X, kernelY=kernelY, target=target)

print(sensi)
plot(sensi)
title("TSA for the Sobol function")

#########################################
### C-HSIC indices for conditional SA ###
#########################################

# Test case 3: the Sobol function
# --> 8 input variables

n <- 10^3
p <- 8

X <- matrix(runif(n*p), n, p)

cond <- list(c=0.2, type="exp1side", upper=FALSE)

sensi <- sensiHSIC(model=sobol.fun, X, cond=cond)

print(sensi)
plot(sensi)
title("CSA for the Sobol function")

##########################################
### How to deal with discrete outputs? ###
##########################################

# Test case 4: classification of the Ishigami output
# --> 3 input variables
# --> 3 categories

classif <- function(X){
  
  Ytemp <- ishigami.fun(X) 
  Y <- rep(NA, n)
  Y[Ytemp<0] <- 0
  Y[Ytemp>=0 & Ytemp<10] <- 1                
  Y[Ytemp>=10] <- 2  
  
  return(Y)
  
}

###

n <- 10^3
p <- 3

X <- matrix(runif(n*p, -pi, pi), n, p)

kernelY <- "categ"
paramY <- 0

sensi <- sensiHSIC(model=classif, X, kernelY=kernelY, paramY=paramY)
print(sensi)
plot(sensi)
title("GSA for the classified Ishigami function")

############################################
### How to deal with functional outputs? ###
############################################

# Test case 5: the arctangent temporal function
# --> 3 input variables (including 1 dummy variable)

n <- 500
p <- 3

X <- matrix(runif(n*p,-7,7), n, p)

### with a preliminary dimension reduction by PCA ###

kernelY <- list(method="PCA", 
                data.centering=TRUE, data.scaling=TRUE,
                fam="rbf", expl.var=0.95, combi="sum", position="extern")

sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)

print(sensi)
plot(sensi)
title("PCA-based GSA for the arctangent temporal function")

### with a kernel based on dynamic time warping ###

kernelY <- list(method="DTW", fam="rbf")

sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)

print(sensi)
plot(sensi)
title("DTW-based GSA for the arctangent temporal function")



### with the global alignment kernel ###

kernelY <- list(method="GAK")

sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)

print(sensi)
plot(sensi)
title("GAK-based GSA for the arctangent temporal function")

  
############################
### HSIC indices for GSA ###
############################

# Test case 1: the Friedman function
# --> 5 input variables

### GSA with a given model ###

n <- 800
p <- 5
X <- matrix(runif(n*p), n, p)

kernelX <- c("rbf", "rbf", "laplace", "laplace", "sobolev1")
paramX <- c(0.2, 0.3, 0.4, NA, NA)

# kernel for X1: Gaussian kernel with given parameter 0.2
# kernel for X2: Gaussian kernel with given parameter 0.3
# kernel for X3: exponential kernel with given parameter 0.4
# kernel for X4: exponential kernel with automatic computation of the parameter
# kernel for X5: Sobolev kernel (r=1) with no parameter

kernelY <- "raquad"
paramY <- NA 

sensi <- sensiHSIC(model=friedman.fun, X,
                   kernelX=kernelX, paramX=paramX, 
                   kernelY=kernelY, paramY=paramY)

print(sensi)
plot(sensi)
title("GSA for the Friedman function")

### GSA with given data ###

Y <- friedman.fun(X)
sensi <- sensiHSIC(model=NULL, X,
                   kernelX=kernelX, paramX=paramX, 
                   kernelY=kernelY, paramY=paramY)
tell(sensi, y=Y)

print(sensi)

### GSA from a prior object of class "sensiHSIC" ###

new.sensi <- sensiHSIC(model=friedman.fun, X,
                       kernelX=kernelX, paramX=paramX, 
                       kernelY=kernelY, paramY=paramY,
                       estimator.type="U-stat", 
                       sensi=sensi,
                       save.GM=list(KX=FALSE, KY=FALSE))

print(new.sensi)

# U-statistics are computed without rebuilding all Gram matrices.
# Those Gram matrices are not saved a second time.

##################################
### HSIC-ANOVA indices for GSA ###
##################################

# Test case 2: the Matyas function with Gaussian input variables
# --> 3 input variables (including 1 dummy variable)

n <- 10^3
p <- 2

X <- matrix(rnorm(n*p), n, p)

# The Sobolev kernel (with r=1) is used to achieve the HSIC-ANOVA decomposition.
# Both first-order and total-order HSIC-ANOVA indices are expected.

### AUTOMATIC RESCALING ###

kernelX <- "sobolev1"
anova <- list(obj="both", is.uniform=FALSE)

sensi.A <- sensiHSIC(model=matyas.fun, X, kernelX=kernelX, anova=anova)

print(sensi.A)
plot(sensi.A)
title("GSA for the Matyas function")

### PROBLEM REFORMULATION ###

U <- matrix(runif(n*p), n, p)
new.matyas.fun <- function(U){ matyas.fun(qnorm(U)) }

kernelX <- "sobolev1"
anova <- list(obj="both", is.uniform=TRUE)

sensi.B <- sensiHSIC(model=new.matyas.fun, U, kernelX=kernelX, anova=anova)

print(sensi.B)

####################################
### T-HSIC indices for target SA ###
####################################

# Test case 3: the Sobol function
# --> 8 input variables

n <- 10^3
p <- 8

X <- matrix(runif(n*p), n, p)

kernelY <- "categ"
target <- list(c=0.4, type="indicTh")

sensi <- sensiHSIC(model=sobol.fun, X, kernelY=kernelY, target=target)

print(sensi)
plot(sensi)
title("TSA for the Sobol function")

#########################################
### C-HSIC indices for conditional SA ###
#########################################

# Test case 3: the Sobol function
# --> 8 input variables

n <- 10^3
p <- 8

X <- matrix(runif(n*p), n, p)

cond <- list(c=0.2, type="exp1side", upper=FALSE)

sensi <- sensiHSIC(model=sobol.fun, X, cond=cond)

print(sensi)
plot(sensi)
title("CSA for the Sobol function")

##########################################
### How to deal with discrete outputs? ###
##########################################

# Test case 4: classification of the Ishigami output
# --> 3 input variables
# --> 3 categories

classif <- function(X){
  
  Ytemp <- ishigami.fun(X) 
  Y <- rep(NA, n)
  Y[Ytemp<0] <- 0
  Y[Ytemp>=0 & Ytemp<10] <- 1                
  Y[Ytemp>=10] <- 2  
  
  return(Y)
  
}

###

n <- 10^3
p <- 3

X <- matrix(runif(n*p, -pi, pi), n, p)

kernelY <- "categ"
paramY <- 0

sensi <- sensiHSIC(model=classif, X, kernelY=kernelY, paramY=paramY)
print(sensi)
plot(sensi)
title("GSA for the classified Ishigami function")

############################################
### How to deal with functional outputs? ###
############################################

# Test case 5: the arctangent temporal function
# --> 3 input variables (including 1 dummy variable)

n <- 500
p <- 3

X <- matrix(runif(n*p,-7,7), n, p)

### with a preliminary dimension reduction by PCA ###

kernelY <- list(method="PCA", 
                data.centering=TRUE, data.scaling=TRUE,
                fam="rbf", expl.var=0.95, combi="sum", position="extern")

sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)

print(sensi)
plot(sensi)
title("PCA-based GSA for the arctangent temporal function")

### with a kernel based on dynamic time warping ###

kernelY <- list(method="DTW", fam="rbf")

sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)

print(sensi)
plot(sensi)
title("DTW-based GSA for the arctangent temporal function")



### with the global alignment kernel ###

kernelY <- list(method="GAK")

sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)

print(sensi)
plot(sensi)
title("GAK-based GSA for the arctangent temporal function")

Computation of the Shapley effects in the Gaussian linear framework with an unknown block-diagonal covariance matrix

Description

shapleyBlockEstimation estimates the Shapley effects of a Gaussian linear model when the parameters are unknown and when the number of inputs is large, choosing the most likely block-diagonal structure of the covariance matrix.

Usage

shapleyBlockEstimationS(Beta, S, kappa=0,  M=20, tol=10^(-6))
shapleyBlockEstimationX(X, Y, delta=NULL, M=20, tol=10^(-6))
shapleyBlockEstimationS(Beta, S, kappa=0,  M=20, tol=10^(-6))
shapleyBlockEstimationX(X, Y, delta=NULL, M=20, tol=10^(-6))

Arguments

`Beta`	A vector containing the (estimated) coefficients of the linear model.
`S`	Empirical covariance matrix of the inputs. Has to be positive semi-definite matrix with same size that Beta.
`X`	Matrix containing an i.i.d. sample of the inputs.
`Y`	Vector containing the corresponding i.i.d. sample of the (noisy) output.
`kappa`	The positive penalization coefficient that promotes block-diagonal matrices. It is advised to choose `kappa=0` to get the largest block structure such that the maximal block size is `M`.
`delta`	Positive number that fixes the positive penalization coefficient `kappa` to $1/(p n^{delta})$ . It is advised to choose `delta` to 2/3 for a positive penalisation or `delta=NULL` to get the largest block structure such that the maximal block size is `M`.
`M`	Maximal size of the estimate of the block-diagonal structure. The computation time grows exponentially with `M`.
`tol`	A relative tolerance to detect zero singular values of Sigma.

Details

If kappa = 0 or if delta = NULL, there is no penalization.

It is advised to choose M smaller or equal than 20. For M larger or equal than 25, the computation is very long.

Value

shapleyBlockEstimationS and shapleyblockEstimationX return a list containing the following compopents:

`label`	a vector containing the label of the group of each input variable.
`S_B`	the block-diagonal estimated covariance matrix of the inputs.
`Shapley`	a vector containing all the estimated Shapley effects.

Author(s)

Baptiste Broto, CEA LIST

References

B. Broto, F. Bachoc, L. Clouvel and J-M Martinez, 2022,Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis, SIAM/ASA Journal on Uncertainty Quantification, 10, 379–403.

B. Broto, F. Bachoc, M. Depecker, and J-M. Martinez, 2019, Sensitivity indices for independent groups of variables, Mathematics and Computers in Simulation, 163, 19–31.

B. Iooss and C. Prieur, 2019, Shapley effects for sensitivity analysis with correlated inputs: comparisons with Sobol' indices, numerical estimation and applications, International Journal of Uncertainty Quantification, 9, 493–514.

A.B. Owen and C. Prieur, 2016, On Shapley value for measuring importance of dependent inputs, SIAM/ASA Journal of Uncertainty Quantification, 5, 986–1002.

Examples


# packages for the plots of the matrices
library(gplots)
library(graphics)


# the following function improves the plots of the matrices
sig=function(x,alpha=0.4)
{
  return(1/(1+exp(-x/alpha)))
}


# 1) we generate the parameters by groups in order

K=4 # number or groups

pk=rep(0,K)
for(k in 1:K)
{
  pk[k]=round(6+4*runif(1))
}
p=sum(pk)
Sigma_ord=matrix(0,nrow=p, ncol=p)
ind_min=0
L=5
for(k in 1:K)
{
  p_k=pk[k]
  ind=ind_min+(1:p_k)
  ind_min=ind_min+p_k
  
  A=2*matrix(runif(p_k*L),nrow=L,ncol=p_k)-1
  Sigma_ord[ind,ind]=t(A)%*%A + 0.2*diag(rep(1,p_k))
}


image((0:p)+0.5,(0:p)+0.5,z=sig(Sigma_ord),col=cm.colors(100), zlim=c(0,1),
      ylim=c(p+0.5,0.5), main=expression(Sigma["order"]), 
      cex.main=3,ylab = "", xlab = "",axes=FALSE)
box()


Beta_ord=3*runif(p)+1
eta_ord=shapleyLinearGaussian(Beta=Beta_ord, Sigma=Sigma_ord)
barplot(eta_ord,main=expression(eta["order"]),cex.axis = 2,cex.main=3)


# 2) We sample the input variables to get an input vector more general

samp=sample(1:p)
Sigma=Sigma_ord[samp,samp]

image((0:p)+0.5,(0:p)+0.5,z=sig(Sigma),col=cm.colors(100), zlim=c(0,1),
      ylim=c(p+0.5,0.5), main=expression(Sigma), 
      cex.main=3,ylab = "",xlab = "",axes=FALSE)
box()


Beta=Beta_ord[samp]
eta=shapleyLinearGaussian(Beta=Beta, Sigma=Sigma)
barplot(eta,main=expression(eta),cex.axis = 2,cex.main=3)




# 3) We generate the observations with these parameters

n=5*p #sample size


C=chol(Sigma)
X0=matrix(rnorm(p*n),ncol=p)
X=X0%*%C

S=var(X) #empirical covariance matrix
image((0:p)+0.5,(0:p)+0.5,z=sig(S),col=cm.colors(100), zlim=c(0,1),
      ylim=c(p+0.5,0.5), main=expression(S), 
      cex.main=3,ylab = "", xlab = "",axes=FALSE)
box()

beta0=rnorm(1)
Y=X%*%as.matrix(Beta)+beta0+0.2*rnorm(p)



# 4) We estimate the block-diagonal covariance matrix 
# and the Shapley effects using the observations
# We assume that we know that the groups are smaller than 15

Estim=shapleyBlockEstimationX(X,Y,delta=3/4, M=15, tol=10^(-6))

eta_hat=Estim$Shapley
S_B=Estim$S_B

image((0:p)+0.5,(0:p)+0.5,z=sig(S_B),col=cm.colors(100), zlim=c(0,1),
      ylim=c(p+0.5,0.5), main=expression(S[hat(B)]), 
      cex.main=3,ylab = "",xlab = "",axes=FALSE)
box()

barplot(eta_hat,main=expression(hat(eta)),cex.axis = 2,cex.main=3)


sum(abs(eta_hat-eta))
# packages for the plots of the matrices
library(gplots)
library(graphics)


# the following function improves the plots of the matrices
sig=function(x,alpha=0.4)
{
  return(1/(1+exp(-x/alpha)))
}


# 1) we generate the parameters by groups in order

K=4 # number or groups

pk=rep(0,K)
for(k in 1:K)
{
  pk[k]=round(6+4*runif(1))
}
p=sum(pk)
Sigma_ord=matrix(0,nrow=p, ncol=p)
ind_min=0
L=5
for(k in 1:K)
{
  p_k=pk[k]
  ind=ind_min+(1:p_k)
  ind_min=ind_min+p_k
  
  A=2*matrix(runif(p_k*L),nrow=L,ncol=p_k)-1
  Sigma_ord[ind,ind]=t(A)%*%A + 0.2*diag(rep(1,p_k))
}


image((0:p)+0.5,(0:p)+0.5,z=sig(Sigma_ord),col=cm.colors(100), zlim=c(0,1),
      ylim=c(p+0.5,0.5), main=expression(Sigma["order"]), 
      cex.main=3,ylab = "", xlab = "",axes=FALSE)
box()


Beta_ord=3*runif(p)+1
eta_ord=shapleyLinearGaussian(Beta=Beta_ord, Sigma=Sigma_ord)
barplot(eta_ord,main=expression(eta["order"]),cex.axis = 2,cex.main=3)


# 2) We sample the input variables to get an input vector more general

samp=sample(1:p)
Sigma=Sigma_ord[samp,samp]

image((0:p)+0.5,(0:p)+0.5,z=sig(Sigma),col=cm.colors(100), zlim=c(0,1),
      ylim=c(p+0.5,0.5), main=expression(Sigma), 
      cex.main=3,ylab = "",xlab = "",axes=FALSE)
box()


Beta=Beta_ord[samp]
eta=shapleyLinearGaussian(Beta=Beta, Sigma=Sigma)
barplot(eta,main=expression(eta),cex.axis = 2,cex.main=3)




# 3) We generate the observations with these parameters

n=5*p #sample size


C=chol(Sigma)
X0=matrix(rnorm(p*n),ncol=p)
X=X0%*%C

S=var(X) #empirical covariance matrix
image((0:p)+0.5,(0:p)+0.5,z=sig(S),col=cm.colors(100), zlim=c(0,1),
      ylim=c(p+0.5,0.5), main=expression(S), 
      cex.main=3,ylab = "", xlab = "",axes=FALSE)
box()

beta0=rnorm(1)
Y=X%*%as.matrix(Beta)+beta0+0.2*rnorm(p)



# 4) We estimate the block-diagonal covariance matrix 
# and the Shapley effects using the observations
# We assume that we know that the groups are smaller than 15

Estim=shapleyBlockEstimationX(X,Y,delta=3/4, M=15, tol=10^(-6))

eta_hat=Estim$Shapley
S_B=Estim$S_B

image((0:p)+0.5,(0:p)+0.5,z=sig(S_B),col=cm.colors(100), zlim=c(0,1),
      ylim=c(p+0.5,0.5), main=expression(S[hat(B)]), 
      cex.main=3,ylab = "",xlab = "",axes=FALSE)
box()

barplot(eta_hat,main=expression(hat(eta)),cex.axis = 2,cex.main=3)


sum(abs(eta_hat-eta))

Computation of the Shapley effects in the linear Gaussian framework

Description

shapleyLinearGaussian implements the computation of the Shapley effects in the linear Gaussian framework, using the linear model (without the value at zero) and the covariance matrix of the inputs. It uses the block-diagonal covariance trick of Broto et al. (2019) which allows to go through high-dimensional cases (nb of inputs > 25). It gives a warning in case of dim(block) > 25.

Usage

shapleyLinearGaussian(Beta, Sigma, tol=10^(-6))
shapleyLinearGaussian(Beta, Sigma, tol=10^(-6))

Arguments

`Beta`	a vector containing the coefficients of the linear model (without the value at zero).
`Sigma`	covariance matrix of the inputs. Has to be positive semi-definite matrix with same size that Beta.
`tol`	a relative tolerance to detect zero singular values of Sigma.

Value

shapleyLinearGaussian returns a numeric vector containing all the Shapley effects.

Author(s)

Baptiste Broto

References

B. Broto, F. Bachoc, M. Depecker, and J-M. Martinez, 2019, Sensitivity indices for independent groups of variables, Mathematics and Computers in Simulation, 163, 19–31.

A.B. Owen and C. Prieur, 2016, On Shapley value for measuring importance of dependent inputs, SIAM/ASA Journal of Uncertainty Quantification, 5, 986–1002.

Examples


library(MASS)
library(igraph)

# First example:

p=5 #dimension
A=matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma=t(A)%*%A
Beta=runif(p)
Shapley=shapleyLinearGaussian(Beta,Sigma)
plot(Shapley)


# Second Example, block-diagonal:

K=5 #number of groups
m=5 # number of variables in each group
p=K*m
Sigma=matrix(0,ncol=p,nrow=p)

for(k in 1:K)
{
  A=matrix(rnorm(m^2),nrow=m,ncol=m)
  Sigma[(m*(k-1)+1):(m*k),(m*(k-1)+1):(m*k)]=t(A)%*%A
}
# we mix the variables:
samp=sample(1:p,p)
Sigma=Sigma[samp,samp]

Beta=runif(p)
Shapley=shapleyLinearGaussian(Beta,Sigma)
plot(Shapley)

library(MASS)
library(igraph)

# First example:

p=5 #dimension
A=matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma=t(A)%*%A
Beta=runif(p)
Shapley=shapleyLinearGaussian(Beta,Sigma)
plot(Shapley)


# Second Example, block-diagonal:

K=5 #number of groups
m=5 # number of variables in each group
p=K*m
Sigma=matrix(0,ncol=p,nrow=p)

for(k in 1:K)
{
  A=matrix(rnorm(m^2),nrow=m,ncol=m)
  Sigma[(m*(k-1)+1):(m*k),(m*(k-1)+1):(m*k)]=t(A)%*%A
}
# we mix the variables:
samp=sample(1:p,p)
Sigma=Sigma[samp,samp]

Beta=runif(p)
Shapley=shapleyLinearGaussian(Beta,Sigma)
plot(Shapley)

Estimation of Shapley effects by examining all permutations of inputs (Agorithm of Song et al, 2016), in cases of independent or dependent inputs

Description

shapleyPermEx implements the Monte Carlo estimation of the Shapley effects (Owen, 2014) and their standard errors by examining all permutations of inputs (Song et al., 2016; Iooss and Prieur, 2019). It also estimates full first order and independent total Sobol' indices (Mara et al., 2015). The function also allows the estimations of all these sensitivity indices in case of dependent inputs. The total cost of this algorithm is $Nv + d! \times (d-1) \times No \times Ni$ model evaluations.

Usage

shapleyPermEx(model = NULL, Xall, Xset, d, Nv, No, Ni = 3, colnames = NULL, ...)
## S3 method for class 'shapleyPermEx'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'shapleyPermEx'
print(x, ...)
## S3 method for class 'shapleyPermEx'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'shapleyPermEx'
ggplot(data, mapping = aes(), ylim = c(0, 1), title = NULL,
                 ..., environment = parent.frame())
shapleyPermEx(model = NULL, Xall, Xset, d, Nv, No, Ni = 3, colnames = NULL, ...)
## S3 method for class 'shapleyPermEx'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'shapleyPermEx'
print(x, ...)
## S3 method for class 'shapleyPermEx'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'shapleyPermEx'
ggplot(data, mapping = aes(), ylim = c(0, 1), title = NULL,
                 ..., environment = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`Xall`	Xall(n) is a function to generate a n-sample of a d-dimensional input vector (following the required joint distribution).
`Xset`	Xset(n, Sj, Sjc, xjc) is a function to generate a n-sample of a d-dimensional input vector corresponding to the indices in Sj conditional on the input values xjc with the index set Sjc (following the required joint distribution).
`d`	number of inputs.
`Nv`	Monte Carlo sample size to estimate the output variance.
`No`	Outer Monte Carlo sample size to estimate the expectation of the conditional variance of the model output.
`Ni`	Inner Monte Carlo sample size to estimate the conditional variance of the model output.
`colnames`	Optional: A vector containing the names of the inputs.
`x`	a list of class `"shapleyPermEx"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"shapleyPermEx"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`title`	a title of the plot with ggplot() function.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

This function requires R package "gtools".

The default values Ni = 3 is the optimal one obtained by the theoretical analysis of Song et al., 2016.

The computations of the standard errors (and then the confidence intervals) come from Iooss and prieur (2019). Based on the outer Monte carlo loop (calculation of expectation of conditional variance), the variance of the Monte carlo estimate is divided by No. The standard error is then averaged over the exact permutation loop. The confidence intervals at 95% correspond to +- 1.96 standard deviations.

Value

shapleyPermEx returns a list of class "shapleyPermEx", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	the response used.
`E`	the estimation of the ouput mean.
`V`	the estimation of the ouput variance.
`Shapley`	the estimations of the Shapley effects.
`SobolS`	the estimations of the full first-order Sobol' indices.
`SobolT`	the estimations of the independent total sensitivity Sobol' indices.

Users can ask more ouput variables with the argument return.var (for example, the list of permutations perms).

Author(s)

Bertrand Iooss, Eunhye Song, Barry L. Nelson, Jeremy Staum

References

T. Mara, S. Tarantola, P. Annoni, 2015, Non-parametric methods for global sensitivity analysis of model output with dependent inputs, Environmental Modeling & Software 72, 173–183.

A.B. Owen, 2014, Sobol' indices and Shapley value, SIAM/ASA Journal of Uncertainty Quantification, 2, 245–251.

A.B. Owen and C. Prieur, 2016, On Shapley value for measuring importance of dependent inputs, SIAM/ASA Journal of Uncertainty Quantification, 5, 986–1002.

E. Song, B.L. Nelson, and J. Staum, 2016, Shapley effects for global sensitivity analysis: Theory and computation, SIAM/ASA Journal of Uncertainty Quantification, 4, 1060–1083.

Examples




##################################
# Test case : the Ishigami function (3 uniform independent inputs)
# See Iooss and Prieur (2019)

library(gtools)

d <- 3
Xall <- function(n) matrix(runif(d*n,-pi,pi),nc=d)
Xset <- function(n, Sj, Sjc, xjc) matrix(runif(n*length(Sj),-pi,pi),nc=length(Sj))

x <- shapleyPermEx(model = ishigami.fun, Xall=Xall, Xset=Xset, d=d, Nv=1e4, No = 1e3, Ni = 3)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

##################################
# Test case : Linear model (3 Gaussian inputs including 2 dependent)
# See Iooss and Prieur (2019)

library(ggplot2)
library(gtools)
library(mvtnorm) # Multivariate Gaussian variables
library(condMVNorm) # Conditional multivariate Gaussian variables

modlin <- function(X) apply(X,1,sum)

d <- 3
mu <- rep(0,d)
sig <- c(1,1,2)
ro <- 0.9
Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
Covmat <- ( sig %*% t(sig) ) * Cormat

Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)

Xset <- function(n, Sj, Sjc, xjc){
  if (is.null(Sjc)){
    if (length(Sj) == 1){ rnorm(n,mu[Sj],sqrt(Covmat[Sj,Sj]))
    } else{ mvtnorm::rmvnorm(n,mu[Sj],Covmat[Sj,Sj])}
  } else{ condMVNorm::rcmvnorm(n, mu, Covmat, dependent.ind=Sj, given.ind=Sjc, 
                                X.given=xjc)}}

x <- shapleyPermEx(model = modlin, Xall=Xall, Xset=Xset, d=d, Nv=1e4, 
                    No = 1e3, Ni = 3)
print(x)
ggplot(x)


##################################
# Test case : the Ishigami function (3 uniform independent inputs)
# See Iooss and Prieur (2019)

library(gtools)

d <- 3
Xall <- function(n) matrix(runif(d*n,-pi,pi),nc=d)
Xset <- function(n, Sj, Sjc, xjc) matrix(runif(n*length(Sj),-pi,pi),nc=length(Sj))

x <- shapleyPermEx(model = ishigami.fun, Xall=Xall, Xset=Xset, d=d, Nv=1e4, No = 1e3, Ni = 3)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

##################################
# Test case : Linear model (3 Gaussian inputs including 2 dependent)
# See Iooss and Prieur (2019)

library(ggplot2)
library(gtools)
library(mvtnorm) # Multivariate Gaussian variables
library(condMVNorm) # Conditional multivariate Gaussian variables

modlin <- function(X) apply(X,1,sum)

d <- 3
mu <- rep(0,d)
sig <- c(1,1,2)
ro <- 0.9
Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
Covmat <- ( sig %*% t(sig) ) * Cormat

Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)

Xset <- function(n, Sj, Sjc, xjc){
  if (is.null(Sjc)){
    if (length(Sj) == 1){ rnorm(n,mu[Sj],sqrt(Covmat[Sj,Sj]))
    } else{ mvtnorm::rmvnorm(n,mu[Sj],Covmat[Sj,Sj])}
  } else{ condMVNorm::rcmvnorm(n, mu, Covmat, dependent.ind=Sj, given.ind=Sjc, 
                                X.given=xjc)}}

x <- shapleyPermEx(model = modlin, Xall=Xall, Xset=Xset, d=d, Nv=1e4, 
                    No = 1e3, Ni = 3)
print(x)
ggplot(x)

Estimation of Shapley effects by random permutations of inputs (Agorithm of Song et al, 2016), in cases of independent or dependent inputs

Description

shapleyPermRand implements the Monte Carlo estimation of the Shapley effects (Owen, 2014) and their standard errors by randomly sampling permutations of inputs (Song et al., 2016). It also estimates full first order and independent total Sobol' indices (Mara et al., 2015), and their standard errors. The function also allows the estimations of all these sensitivity indices in case of dependent inputs. The total cost of this algorithm is $Nv + m \times (d-1) \times No \times Ni$ model evaluations.

Usage

shapleyPermRand(model = NULL, Xall, Xset, d, Nv, m, No = 1, Ni = 3, 
                colnames = NULL, ...)
## S3 method for class 'shapleyPermRand'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'shapleyPermRand'
print(x, ...)
## S3 method for class 'shapleyPermRand'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'shapleyPermRand'
ggplot(data, mapping = aes(), ylim = c(0, 1), 
                          title = NULL, ..., environment = parent.frame())
shapleyPermRand(model = NULL, Xall, Xset, d, Nv, m, No = 1, Ni = 3, 
                colnames = NULL, ...)
## S3 method for class 'shapleyPermRand'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'shapleyPermRand'
print(x, ...)
## S3 method for class 'shapleyPermRand'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'shapleyPermRand'
ggplot(data, mapping = aes(), ylim = c(0, 1), 
                          title = NULL, ..., environment = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`Xall`	Xall(n) is a function to generate a n-sample of a d-dimensional input vector (following the required joint distribution).
`Xset`	Xset(n, Sj, Sjc, xjc) is a function to generate a n-sample of a d-dimensional input vector corresponding to the indices in Sj conditional on the input values xjc with the index set Sjc (following the required joint distribution).
`d`	number of inputs.
`Nv`	Monte Carlo sample size to estimate the output variance.
`m`	Number of randomly sampled permutations.
`No`	Outer Monte Carlo sample size to estimate the expectation of the conditional variance of the model output.
`Ni`	Inner Monte Carlo sample size to estimate the conditional variance of the model output.
`colnames`	Optional: A vector containing the names of the inputs.
`x`	a list of class `"shapleyPermRand"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"shapleyPermRand"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`title`	a title of the plot with ggplot() function.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

This function requires R package "gtools".

The default values No = 1 and Ni = 3 are the optimal ones obtained by the theoretical analysis of Song et al., 2016.

The computations of the standard errors do not consider the samples to estimate expectation of conditional variances. They are only made regarding the random permutations and are based on the variance of the Monte carlo estimates divided by m. The confidence intervals at 95% correspond to +- 1.96 standard deviations.

Value

shapleyPermRand returns a list of class "shapleyPermRand", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	the response used.
`E`	the estimation of the ouput mean.
`V`	the estimation of the ouput variance.
`Shapley`	the estimations of the Shapley effects.
`SobolS`	the estimations of the full first-order Sobol' indices.
`SobolT`	the estimations of the independent total sensitivity Sobol' indices.

Users can ask more ouput variables with the argument return.var (for example, the list of permutations perms).

Author(s)

Bertrand Iooss, Eunhye Song, Barry L. Nelson, Jeremy Staum

References

T. Mara, S. Tarantola, P. Annoni, 2015, Non-parametric methods for global sensitivity analysis of model output with dependent inputs, Environmental Modeling & Software 72, 173–183.

A.B. Owen, 2014, Sobol' indices and Shapley value, SIAM/ASA Journal of Uncertainty Quantification, 2, 245–251.

A.B. Owen and C. Prieur, 2016, On Shapley value for measuring importance of dependent inputs, SIAM/ASA Journal of Uncertainty Quantification, 5, 986–1002.

E. Song, B.L. Nelson, and J. Staum, 2016, Shapley effects for global sensitivity analysis: Theory and computation, SIAM/ASA Journal of Uncertainty Quantification, 4, 1060–1083.

Examples




##################################
# Test case : the Ishigami function
# See Iooss and Prieur (2019)

library(gtools)

d <- 3
Xall <- function(n) matrix(runif(d*n,-pi,pi),nc=d)
Xset <- function(n, Sj, Sjc, xjc) matrix(runif(n*length(Sj),-pi,pi),nc=length(Sj))

x <- shapleyPermRand(model = ishigami.fun, Xall=Xall, Xset=Xset, d=d, Nv=1e4, 
                      m=1e4, No = 1, Ni = 3)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

##################################
# Test case : Linear model (3 Gaussian inputs including 2 dependent)
# See Iooss and Prieur (2019)

library(ggplot2)
library(gtools)
library(mvtnorm) # Multivariate Gaussian variables
library(condMVNorm) # Conditional multivariate Gaussian variables

modlin <- function(X) apply(X,1,sum)

d <- 3
mu <- rep(0,d)
sig <- c(1,1,2)
ro <- 0.9
Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
Covmat <- ( sig %*% t(sig) ) * Cormat

Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)

Xset <- function(n, Sj, Sjc, xjc){
  if (is.null(Sjc)){
    if (length(Sj) == 1){ rnorm(n,mu[Sj],sqrt(Covmat[Sj,Sj]))
    } else{ mvtnorm::rmvnorm(n,mu[Sj],Covmat[Sj,Sj])}
  } else{ condMVNorm::rcmvnorm(n, mu, Covmat, dependent.ind=Sj, given.ind=Sjc, 
                                X.given=xjc)}}

m <- 1e3 # put m)1e4 for more precised results
x <- shapleyPermRand(model = modlin, Xall=Xall, Xset=Xset, d=d, Nv=1e3, m = m, 
                      No = 1, Ni = 3)
print(x)
ggplot(x)



##################################
# Test case : the Ishigami function
# See Iooss and Prieur (2019)

library(gtools)

d <- 3
Xall <- function(n) matrix(runif(d*n,-pi,pi),nc=d)
Xset <- function(n, Sj, Sjc, xjc) matrix(runif(n*length(Sj),-pi,pi),nc=length(Sj))

x <- shapleyPermRand(model = ishigami.fun, Xall=Xall, Xset=Xset, d=d, Nv=1e4, 
                      m=1e4, No = 1, Ni = 3)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

##################################
# Test case : Linear model (3 Gaussian inputs including 2 dependent)
# See Iooss and Prieur (2019)

library(ggplot2)
library(gtools)
library(mvtnorm) # Multivariate Gaussian variables
library(condMVNorm) # Conditional multivariate Gaussian variables

modlin <- function(X) apply(X,1,sum)

d <- 3
mu <- rep(0,d)
sig <- c(1,1,2)
ro <- 0.9
Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
Covmat <- ( sig %*% t(sig) ) * Cormat

Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)

Xset <- function(n, Sj, Sjc, xjc){
  if (is.null(Sjc)){
    if (length(Sj) == 1){ rnorm(n,mu[Sj],sqrt(Covmat[Sj,Sj]))
    } else{ mvtnorm::rmvnorm(n,mu[Sj],Covmat[Sj,Sj])}
  } else{ condMVNorm::rcmvnorm(n, mu, Covmat, dependent.ind=Sj, given.ind=Sjc, 
                                X.given=xjc)}}

m <- 1e3 # put m)1e4 for more precised results
x <- shapleyPermRand(model = modlin, Xall=Xall, Xset=Xset, d=d, Nv=1e3, m = m, 
                      No = 1, Ni = 3)
print(x)
ggplot(x)

Data given Shapley effects estimation via nearest-neighbors procedure

Description

shapleysobol_knn implements the estimation of several sensitivity indices using only N model evaluations via ranking (following Gamboa et al. (2020) and Chatterjee (2019)) or nearest neighbour search (Broto et al. (2020) and Azadkia & Chatterjee (2020)). Parallelized computations are possible to accelerate the estimation process. It can be used with categorical inputs (which are transformed with one-hot encoding), dependent inputs and multiple outputs. Sensitivity indices of any group of inputs can be computed, which means that in particular (full) first-order, (independent) total Sobol indices and Shapley effects are accessible. For large sample sizes, the nearest neightbour algorithm can be significantly accelerated by using approximate nearest neighbour search. It is also possible to estimate Shapley effects with the random permutation approach of Castro et al.(2009), where all the terms are obtained with ranking or nearest neighbours.

Usage

shapleysobol_knn(model=NULL, X, method = "knn", n.knn = 2, n.limit = 2000, 
          U = NULL, n.perm = NULL, noise = F, rescale = F, nboot = NULL, 
          boot.level = 0.8, conf=0.95, parl=NULL, ...)
## S3 method for class 'shapleysobol_knn'
tell(x, y, ...)
## S3 method for class 'shapleysobol_knn'
extract(x, ...)
## S3 method for class 'shapleysobol_knn'
print(x, ...)
## S3 method for class 'shapleysobol_knn'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'shapleysobol_knn'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., 
                environment = parent.frame())
## S3 method for class 'sobol_knn'
print(x, ...)
## S3 method for class 'sobol_knn'
plot(x, ylim = c(0,1), ...)
shapleysobol_knn(model=NULL, X, method = "knn", n.knn = 2, n.limit = 2000, 
          U = NULL, n.perm = NULL, noise = F, rescale = F, nboot = NULL, 
          boot.level = 0.8, conf=0.95, parl=NULL, ...)
## S3 method for class 'shapleysobol_knn'
tell(x, y, ...)
## S3 method for class 'shapleysobol_knn'
extract(x, ...)
## S3 method for class 'shapleysobol_knn'
print(x, ...)
## S3 method for class 'shapleysobol_knn'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'shapleysobol_knn'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., 
                environment = parent.frame())
## S3 method for class 'sobol_knn'
print(x, ...)
## S3 method for class 'sobol_knn'
plot(x, ylim = c(0,1), ...)

Arguments

`model`	a function defining the model to analyze, taking X as an argument.
`X`	a matrix or data frame containing the observed inputs.
`method`	the algorithm to be used for estimation, either "rank" or "knn", see details. Default is `method="knn"`.
`n.knn`	the number of nearest neighbours used for estimation.
`n.limit`	sample size limit above which approximate nearest neighbour search is activated.
`U`	an integer equal to 0 (total Sobol indices) or 1 (first-order Sobol indices) or a list of vector indices defining the subsets of inputs whose sensitivity indices must be computed or a matrix of 0s and 1s where each row encodes a subset of inputs whose sensitivity indices must be computed (see examples). Default value is `NULL`, meaning that Shapley values are returned (see details).
`n.perm`	an integer, indicating the number of random permutations used for the Shapley effects' estimation. Default is `n.perm=NULL`, indicating that all possible permutations are used.
`noise`	a logical which is TRUE if the model or the output sample is noisy. See details.
`rescale`	a logical indicating if continuous inputs must be rescaled before distance computations. If TRUE, continuous inputs are first whitened with the ZCA-cor whitening procedure (cf. whiten() function in package `whitening`). If the inputs are independent, this first step will have a very limited impact. Then, the resulting whitened inputs are individually modified via a copula transform such that each input has the same scale.
`nboot`	the number of bootstrap resamples for the bootstrap estimate of confidence intervals. See details.
`boot.level`	a numeric between 0 and 1 for the proportion of the bootstrap sample size.
`conf`	the confidence level of the bootstrap confidence intervals.
`parl`	number of cores on which to parallelize the computation. If `NULL`, then no parallelization is done.
`x`	the object returned by `shapleysobol_knn`.
`data`	the object returned by `shapleysobol_knn`.
`y`	a numeric univariate vector containing the observed outputs.
`ylim`	the y-coordinate limits for plotting.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	additional arguments to be passed to `model`, or to the methods, such as graphical parameters (see `par`).

Details

It is the same algorithm as sobolshap_knn with method = "knn" with a slight computational improvement (the search for weight affectations is done on much smaller matrices, stored in a list indexed by their order), and ability to perform parallel computation and boostrap confidence interval estimates.

The distance used for subsets with mixed inputs (continuous and categorical) is the Euclidean distance, thanks to a one-hot encoding of categorical inputs.

If too many cores for the machine are passed on to the parl argument, the chosen number of cores is defaulted to the available cores minus one.

If argument U is specified, only the estimated first-order or total Sobol' indices are returned, or the estimated closed Sobol' indices for the selected subsets. The Shapley effects are not computed, and thus, not returned.

The extract method can be used for extracting first-order and total Sobol' indices, after the Shapley effects have been computed. It returns a list containing both sensitivity indices.

Value

shapleysobol_knn returns a list of class "shapleysobol_knn" if U=NULL, containing the following components:

`call`	the matched call.
`Shap`	the estimations of the Shapley effect indices.
`VE`	the estimations of the closed Sobol' indices for all possible sub-models.
`indices`	list of all subsets corresponding to the structure of VE.
`method`	which estimation method has been used.
`n.perm`	number of random permutations.
`w`	the Shapley weights.
`conf_int`	a matrix containing the estimations, biais and confidence intervals by bootstrap (if `nboot>0`).
`X`	the observed covariates.
`y`	the observed outcomes.
`n.knn`	value of the `n.knn` argument.
`n.limit`	value of the `n.limit` argument.
`U`	value of the `U` argument.
`rescale`	wheter the design matrix has been rescaled.
`n.limit`	maximum number of sample before nearest-neighbor approximation.
`boot.level`	value of the `boot.level` argument.
`noise`	wheter the Shapley values must sum up to one or not.
`boot`	logical, wheter bootstrap confidence interval estimates have been performed.
`nboot`	value of the `nboot` argument.
`parl`	value of the `parl` argument.
`conf`	value of the `conf` argument.

shapleysobol_knn returns a list of class "sobol_knn" if U, is specified, containing the following components:

`call`	the matched call.
`Sobol`	the estimations of the Sobol' indices.
`indices`	list of all subsets corresponding to the structure of VE.
`method`	which estimation method has been used.
`conf_int`	a matrix containing the estimations, biais and confidence intervals by bootstrap (if `nboot>0`).
`X`	the observed covariates.
`y`	the observed outcomes.
`U`	value of the `U` argument.
`n.knn`	value of the `n.knn` argument.
`rescale`	wheter the design matrix has been rescaled.
`n.limit`	value of the `n.limit` argument.
`boot.level`	value of the `boot.level` argument.
`boot`	logical, wheter bootstrap confidence interval estimates have been performed.
`nboot`	value of the `nboot` argument.
`parl`	value of the `parl` argument.
`conf`	value of the `conf` argument.

Author(s)

Marouane Il Idrissi, Sebastien Da Veiga

References

Azadkia M., Chatterjee S., 2021), A simple measure of conditional dependence, Ann. Statist. 49(6):3070-3102.

Chatterjee, S., 2021, A new coefficient of correlation, Journal of the American Statistical Association, 116:2009-2022.

Gamboa, F., Gremaud, P., Klein, T., & Lagnoux, A., 2022, Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics, Bernoulli 28: 2345-2374.

Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).

Castro J., Gomez D, Tejada J. (2009). Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5):1726-1730.

M. Il Idrissi, V. Chabridon and B. Iooss (2021). Mesures d'importance relative par decompositions de la performance de modeles de regression, Preprint, 52emes Journees de Statistiques de la Societe Francaise de Statistique (SFdS), pp. 497-502, Nice, France, Juin 2021

Examples

  
  
library(parallel)
library(doParallel)
library(foreach)
library(gtools)
library(boot)
library(RANN)

###########################################################
# Linear Model with Gaussian correlated inputs

library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")


y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-shapleysobol_knn(model=NULL, X=X,
            n.knn=3,
            noise=TRUE)
tell(x,y)
print(x)
plot(x)

#Using the extract method to get first-order and total Sobol' indices
extract(x)

# With Boostrap confidence intervals
x<-shapleysobol_knn(model=NULL, X=X,
            nboot=10, 
            n.knn=3,
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)

#####################
# Extracting Sobol' indices with Bootstrap confidence intervals

nboot <- 10 # put nboot=50 for consistency

#Total Sobol' indices
x<-shapleysobol_knn(model=NULL, X=X,
            nboot=nboot, 
            n.knn=3,
            U=0,
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)

#First-order Sobol' indices
x<-shapleysobol_knn(model=NULL, X=X,
            nboot=nboot, 
            n.knn=3,
            U=1,
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)

#Closed Sobol' indices for specific subsets (list)
x<-shapleysobol_knn(model=NULL, X=X,
            nboot=nboot, 
            n.knn=3,
            U=list(c(1,2), c(1,2,3), 2),
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)


#####################################################
# Test case: the non-monotonic Sobol g-function
# Example with a call to a numerical model
# First compute first-order indices with ranking
    
n <- 1000
X <- data.frame(matrix(runif(8 * n), nrow = n))
x <- shapleysobol_knn(model = sobol.fun, X = X, U = 1, method = "rank")
print(x)
plot(x)

library(ggplot2) ; ggplot(x)

# We can use the output sample generated for this estimation to compute total indices 
# without additional calls to the model
x2 <- shapleysobol_knn(model = NULL, X = X, U = 0, method = "knn", n.knn = 5)
tell(x2,x$y)
plot(x2)

ggplot(x2)


#####################################################
# Test case: the Ishigami function
# Example with given data and the use of approximate nearest neighbour search
n <- 5000
X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
Y <- ishigami.fun(X)
x <- shapleysobol_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, 
                       n.limit = 2000)
tell(x,Y)
plot(x)

library(ggplot2) ; ggplot(x)

# Extract first-order and total Sobol indices
x1 <- extract(x) ; print(x1)
    
######################################################
# Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling
# See Iooss and Prieur (2019)
library(mvtnorm) # Multivariate Gaussian variables
library(whitening) # For scaling
modlin <- function(X) apply(X,1,sum)
d <- 3
n <- 10000
mu <- rep(0,d)
sig <- c(1,1,2)
ro <- 0.9
Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
Covmat <- ( sig %*% t(sig) ) * Cormat
Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)
X <- Xall(n)
x <- shapleysobol_knn(model = modlin, X = X, U = NULL, method = "knn", n.knn = 5, 
                       rescale = TRUE, n.limit = 2000)
print(x)
plot(x)

library(parallel)
library(doParallel)
library(foreach)
library(gtools)
library(boot)
library(RANN)

###########################################################
# Linear Model with Gaussian correlated inputs

library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")


y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-shapleysobol_knn(model=NULL, X=X,
            n.knn=3,
            noise=TRUE)
tell(x,y)
print(x)
plot(x)

#Using the extract method to get first-order and total Sobol' indices
extract(x)

# With Boostrap confidence intervals
x<-shapleysobol_knn(model=NULL, X=X,
            nboot=10, 
            n.knn=3,
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)

#####################
# Extracting Sobol' indices with Bootstrap confidence intervals

nboot <- 10 # put nboot=50 for consistency

#Total Sobol' indices
x<-shapleysobol_knn(model=NULL, X=X,
            nboot=nboot, 
            n.knn=3,
            U=0,
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)

#First-order Sobol' indices
x<-shapleysobol_knn(model=NULL, X=X,
            nboot=nboot, 
            n.knn=3,
            U=1,
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)

#Closed Sobol' indices for specific subsets (list)
x<-shapleysobol_knn(model=NULL, X=X,
            nboot=nboot, 
            n.knn=3,
            U=list(c(1,2), c(1,2,3), 2),
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)


#####################################################
# Test case: the non-monotonic Sobol g-function
# Example with a call to a numerical model
# First compute first-order indices with ranking
    
n <- 1000
X <- data.frame(matrix(runif(8 * n), nrow = n))
x <- shapleysobol_knn(model = sobol.fun, X = X, U = 1, method = "rank")
print(x)
plot(x)

library(ggplot2) ; ggplot(x)

# We can use the output sample generated for this estimation to compute total indices 
# without additional calls to the model
x2 <- shapleysobol_knn(model = NULL, X = X, U = 0, method = "knn", n.knn = 5)
tell(x2,x$y)
plot(x2)

ggplot(x2)


#####################################################
# Test case: the Ishigami function
# Example with given data and the use of approximate nearest neighbour search
n <- 5000
X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
Y <- ishigami.fun(X)
x <- shapleysobol_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, 
                       n.limit = 2000)
tell(x,Y)
plot(x)

library(ggplot2) ; ggplot(x)

# Extract first-order and total Sobol indices
x1 <- extract(x) ; print(x1)
    
######################################################
# Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling
# See Iooss and Prieur (2019)
library(mvtnorm) # Multivariate Gaussian variables
library(whitening) # For scaling
modlin <- function(X) apply(X,1,sum)
d <- 3
n <- 10000
mu <- rep(0,d)
sig <- c(1,1,2)
ro <- 0.9
Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
Covmat <- ( sig %*% t(sig) ) * Cormat
Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)
X <- Xall(n)
x <- shapleysobol_knn(model = modlin, X = X, U = NULL, method = "knn", n.knn = 5, 
                       rescale = TRUE, n.limit = 2000)
print(x)
plot(x)

Estimation of Shapley effects from data using nearest neighbors method

Description

shapleySubsetMc implements the estimation of the Shapley effects from data using some nearest neighbors method to generate according to the conditional distributions of the inputs. It can be used with categorical inputs.

Usage

shapleySubsetMc(X,Y, Ntot=NULL, Ni=3, cat=NULL, weight=NULL, discrete=NULL, 
                noise=FALSE)
## S3 method for class 'shapleySubsetMc'
plot(x, ylim = c(0, 1), ...)
shapleySubsetMc(X,Y, Ntot=NULL, Ni=3, cat=NULL, weight=NULL, discrete=NULL, 
                noise=FALSE)
## S3 method for class 'shapleySubsetMc'
plot(x, ylim = c(0, 1), ...)

Arguments

`X`	a matrix or a dataframe of the input sample
`Y`	a vector of the output sample
`Ntot`	an integer of the approximate cost wanted
`Ni`	the number of nearest neighbours taken for each point
`cat`	a vector giving the indices of the input categorical variables
`weight`	a vector with the same length of `cat` giving the weight of each categorical variable in the product distance
`discrete`	a vector giving the indices of the input variable that are real, and not categorical, but that can take several times the same values
`noise`	logical. If FALSE (the default), the variable Y is a function of X
`x`	a list of class `"shapleySubsetMc"` storing the state of the sensitivity study (Shapley effects, cost, names of inputs)
`ylim`	y-coordinate plotting limits
`...`	any other arguments for plotting

Details

If weight = NULL, all the categorical variables will have the same weight 1.

If Ntot = NULL, the nearest neighbours will be compute for all the $n (2^p-2)$ points, where n is the length of the sample. The estimation can be very long with this parameter.

Value

shapleySubsetMc returns a list of class "shapleySubsetMc", containing:

`shapley`	the Shapley effects estimates.
`cost`	the real total cost of these estimates: the total number of points for which the nearest neighbours were computed.
`names`	the labels of the input variables.

Author(s)

Baptiste Broto

References

B. Broto, F. Bachoc, M. Depecker, 2020, Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution, SIAM/ASA Journal of Uncertainty Quantification, 8:693-716.

Examples



# First example: the linear Gaussian framework

# we generate a covariance matrice Sigma
p <- 4 #dimension
A <- matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma <- t(A)%*%A # it means t(A)%*%A
C <- chol(Sigma)
n <- 500 #sample size (put n=2000 for more consistency)

Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z%*%C # X is a gaussian vector with zero mean and covariance Sigma
Y=rowSums(X) 
Shap=shapleySubsetMc(X=X,Y=Y,Ntot=5000)
plot(Shap)


#Second example: The Sobol model with heterogeneous inputs

p=8 #dimension
A=matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma=t(A)%*%A
C=chol(Sigma)
n=500 #sample size (put n=5000 for more consistency)

Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z

#we create discrete and categorical variables
X[,1]=round(X[,1]/2) 
X[,2]=X[,2]>2
X[,4]=-2*round(X[,4])+4
X[(X[,6]>0 &X[,6]<1),6]=1

cat=c(1,2)  # we choose to take X1 and X2 as categorical variables 
            #   (with the discrete distance)
discrete=c(4,6) # we indicate that X4 and X6 can take several times the same value

Y=sobol.fun(X)
Ntot <- 2000 # put Ntot=20000 for more consistency
Shap=shapleySubsetMc(X=X,Y=Y, cat=cat, discrete=discrete, Ntot=Ntot, Ni=10)

plot(Shap)


# First example: the linear Gaussian framework

# we generate a covariance matrice Sigma
p <- 4 #dimension
A <- matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma <- t(A)%*%A # it means t(A)%*%A
C <- chol(Sigma)
n <- 500 #sample size (put n=2000 for more consistency)

Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z%*%C # X is a gaussian vector with zero mean and covariance Sigma
Y=rowSums(X) 
Shap=shapleySubsetMc(X=X,Y=Y,Ntot=5000)
plot(Shap)


#Second example: The Sobol model with heterogeneous inputs

p=8 #dimension
A=matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma=t(A)%*%A
C=chol(Sigma)
n=500 #sample size (put n=5000 for more consistency)

Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z

#we create discrete and categorical variables
X[,1]=round(X[,1]/2) 
X[,2]=X[,2]>2
X[,4]=-2*round(X[,4])+4
X[(X[,6]>0 &X[,6]<1),6]=1

cat=c(1,2)  # we choose to take X1 and X2 as categorical variables 
            #   (with the discrete distance)
discrete=c(4,6) # we indicate that X4 and X6 can take several times the same value

Y=sobol.fun(X)
Ntot <- 2000 # put Ntot=20000 for more consistency
Shap=shapleySubsetMc(X=X,Y=Y, cat=cat, discrete=discrete, Ntot=Ntot, Ni=10)

plot(Shap)

Monte Carlo Estimation of Sobol' Indices

Description

sobol implements the Monte Carlo estimation of the Sobol' sensitivity indices (standard estimator). This method allows the estimation of the indices of the variance decomposition, sometimes referred to as functional ANOVA decomposition, up to a given order, at a total cost of $(N+1) \times n$ where $N$ is the number of indices to estimate. This function allows also the estimation of the so-called subset (or group) indices, i.e. the first-order indices with respect to single multidimensional inputs.

Usage

sobol(model = NULL, X1, X2, order = 1, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobol'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobol'
print(x, ...)
## S3 method for class 'sobol'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobol'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'sobol'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sobol(model = NULL, X1, X2, order = 1, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobol'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobol'
print(x, ...)
## S3 method for class 'sobol'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobol'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'sobol'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`order`	either an integer, the maximum order in the ANOVA decomposition (all indices up to this order will be computed), or a list of numeric vectors, the multidimensional compounds of the wanted subset indices.
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`x`	a list of class `"sobol"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobol"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Value

sobol returns a list of class "sobol", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a vector of model responses.
`V`	the estimations of Variances of the Conditional Expectations (VCE) with respect to one factor or one group of factors.
`D`	the estimations of the terms of the ANOVA decomposition (not for subset indices).
`S`	the estimations of the Sobol' sensitivity indices (not for subset indices).

Users can ask more ouput variables with the argument return.var (for example, bootstrap outputs V.boot, D.boot and S.boot).

Author(s)

Gilles Pujol

References

I. M. Sobol, 1993, Sensitivity analysis for non-linear mathematical model, Math. Modelling Comput. Exp., 1, 407–414.

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])
library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis
x <- sobol(model = sobol.fun, X1 = X1, X2 = X2, order = 2, nboot = 100)
print(x)
#plot(x)

library(ggplot2)
ggplot(x)
# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])
library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis
x <- sobol(model = sobol.fun, X1 = X1, X2 = X2, order = 2, nboot = 100)
print(x)
#plot(x)

library(ggplot2)
ggplot(x)

Monte Carlo Estimation of Sobol' Indices (scheme by Saltelli 2002)

Description

sobol2002 implements the Monte Carlo estimation of the Sobol' indices for both first-order and total indices at the same time (alltogether $2p$ indices), at a total cost of $(p+2) \times n$ model evaluations. These are called the Saltelli estimators.

Usage

sobol2002(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobol2002'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobol2002'
print(x, ...)
## S3 method for class 'sobol2002'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobol2002'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'sobol2002'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sobol2002(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobol2002'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobol2002'
print(x, ...)
## S3 method for class 'sobol2002'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobol2002'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'sobol2002'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`x`	a list of class `"sobol2002"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobol2002"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called

Details

BE CAREFUL! This estimator suffers from a conditioning problem when estimating the variances behind the indices computations. This can seriously affect the Sobol' indices estimates in case of largely non-centered output. To avoid this effect, you have to center the model output before applying "sobol2002". Functions "sobolEff", "soboljansen" and "sobolmartinez" do not suffer from this problem.

Value

sobol2002 returns a list of class "sobol2002", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	the response used
`V`	the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor and also with respect to the complementary set of each factor ("all but $X_i$ ").
`S`	the estimations of the Sobol' first-order indices.
`T`	the estimations of the Sobol' total sensitivity indices.

Users can ask more ouput variables with the argument return.var (for example, bootstrap outputs V.boot, S.boot and T.boot).

Author(s)

Gilles Pujol

References

A. Saltelli, 2002, Making best use of model evaluations to compute sensitivity indices, Computer Physics Communication, 145, 580–297.

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- sobol2002(model = sobol.fun, X1, X2, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)
# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- sobol2002(model = sobol.fun, X1, X2, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

Monte Carlo Estimation of Sobol' Indices (improved formulas of Mauntz: Sobol et al. (2007) and Saltelli et al. (2010))

Description

sobol2007 implements the Monte Carlo estimation of the Sobol' indices for both first-order and total indices at the same time (alltogether $2p$ indices), at a total cost of $(p+2) \times n$ model evaluations. These are called the Mauntz estimators.

Usage

sobol2007(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobol2007'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobol2007'
print(x, ...)
## S3 method for class 'sobol2007'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobol2007'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'sobol2007'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sobol2007(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobol2007'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobol2007'
print(x, ...)
## S3 method for class 'sobol2007'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobol2007'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'sobol2007'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`x`	a list of class `"sobol2007"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobol2007"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called

Details

This estimator is good for small first-order and total indices.

BE CAREFUL! This estimator suffers from a conditioning problem when estimating the variances behind the indices computations. This can seriously affect the Sobol' indices estimates in case of largely non-centered output. To avoid this effect, you have to center the model output before applying "sobol2007". Functions "sobolEff", "soboljansen" and "sobolmartinez" do not suffer from this problem.

Value

sobol2007 returns a list of class "sobol2007", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	the response used
`V`	the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor and also with respect to the complementary set of each factor ("all but $X_i$ ").
`S`	the estimations of the Sobol' first-order indices.
`T`	the estimations of the Sobol' total sensitivity indices.

Users can ask more ouput variables with the argument return.var (for example, bootstrap outputs V.boot, S.boot and T.boot).

Author(s)

Bertrand Iooss

References

I.M. Sobol, S. Tarantola, D. Gatelli, S.S. Kucherenko and W. Mauntz, 2007, Estimating the approximation errors when fixing unessential factors in global sensitivity analysis, Reliability Engineering and System Safety, 92, 957–960.

A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto and S. Tarantola, 2010, Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index, Computer Physics Communications 181, 259–270.

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- sobol2007(model = sobol.fun, X1, X2, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)
# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- sobol2007(model = sobol.fun, X1, X2, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

Monte Carlo Estimation of Sobol' Indices (formulas of Janon-Monod)

Description

sobolEff implements the Monte Carlo estimation of the Sobol' sensitivity indices using the asymptotically efficient formulas in section 4.2.4.2 of Monod et al. (2006). Either all first-order indices or all total-effect indices are estimated at a cost of $N \times (p+1)$ model calls or all closed second-order indices are estimated at a cost of $N \times p \choose 2)$ model calls.

Usage

sobolEff(model = NULL, X1, X2, order=1, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobolEff'
tell(x, y = NULL, ...)
## S3 method for class 'sobolEff'
print(x, ...)
## S3 method for class 'sobolEff'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'sobolEff'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sobolEff(model = NULL, X1, X2, order=1, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobolEff'
tell(x, y = NULL, ...)
## S3 method for class 'sobolEff'
print(x, ...)
## S3 method for class 'sobolEff'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'sobolEff'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`order`	an integer specifying the indices to estimate: 0 for total effect indices,1 for first-order indices and 2 for closed second-order indices.
`nboot`	the number of bootstrap replicates, or zero to use asymptotic standard deviation estimates given in Janon et al. (2012).
`conf`	the confidence level for confidence intervals.
`x`	a list of class `"sobolEff"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolEff"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

The estimator used by sobolEff is defined in Monod et al. (2006), Section 4.2.4.2 and studied under the name T_N in Janon et al. (2012). This estimator is good for large first-order indices.

Value

sobolEff returns a list of class "sobolEff", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a vector of model responses.
`S`	the estimations of the Sobol' sensitivity indices.

Author(s)

Alexandre Janon, Laurent Gilquin

References

Monod, H., Naud, C., Makowski, D. (2006), Uncertainty and sensitivity analysis for crop models in Working with Dynamic Crop Models: Evaluation, Analysis, Parameterization, and Applications, Elsevier.

A. Janon, T. Klein, A. Lagnoux, M. Nodet, C. Prieur (2014), Asymptotic normality and efficiency of two Sobol index estimators, ESAIM: Probability and Statistics, 18:342-364.

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis
x <- sobolEff(model = sobol.fun, X1 = X1, X2 = X2, nboot = 0)
print(x)

library(ggplot2)
ggplot(x)
# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis
x <- sobolEff(model = sobol.fun, X1 = X1, X2 = X2, nboot = 0)
print(x)

library(ggplot2)
ggplot(x)

Kriging-based sensitivity analysis

Description

Perform a kriging-based global sensitivity analysis taking into account both the meta-model and the Monte-Carlo errors. The Sobol indices are estimated with a Monte-Carlo integration and the true function is substituted by a kriging model. It is built thanks to the function km of the package DiceKriging. The complete conditional predictive distribution of the kriging model is considered (not only the predictive mean).

Usage

sobolGP(
	model,
	type="SK",
	MCmethod="sobol",
	X1,
	X2, 
	nsim=100,
	nboot=1,
	conf = 0.95,
	sequential = FALSE, 
	candidate, 
	sequential.tot=FALSE,
	max_iter = 1000) 

## S3 method for class 'sobolGP'
ask(x, tot = FALSE, ...)

## S3 method for class 'sobolGP'
tell(x, y=NULL, xpoint=NULL, newcandidate=NULL, ...)

## S3 method for class 'sobolGP'
print(x, ...)

## S3 method for class 'sobolGP'
plot(x,...)
sobolGP(
	model,
	type="SK",
	MCmethod="sobol",
	X1,
	X2, 
	nsim=100,
	nboot=1,
	conf = 0.95,
	sequential = FALSE, 
	candidate, 
	sequential.tot=FALSE,
	max_iter = 1000) 

## S3 method for class 'sobolGP'
ask(x, tot = FALSE, ...)

## S3 method for class 'sobolGP'
tell(x, y=NULL, xpoint=NULL, newcandidate=NULL, ...)

## S3 method for class 'sobolGP'
print(x, ...)

## S3 method for class 'sobolGP'
plot(x,...)

Arguments

`model`	an object of class `"km"` specifying the kriging model built from package `"DiceKriging"` (see `km`).
`type`	a character string giving the type of the considered kriging model. `"SK"` refers to simple kriging and `"UK"` refers to universal kriging (see `km`).
`MCmethod`	a character string specifying the Monte-Carlo procedure used to estimate the Sobol indices. The avaible methods are : `"sobol"`, `"sobol2002"`, `"sobol2007"`, `"sobolEff"` and `"soboljansen"`.
`X1`	a matrix representing the first random sample.
`X2`	a matrix representing the second random sample.
`nsim`	an integer giving the number of samples for the conditional Gaussian process. It is used to quantify the uncertainty due to the kriging approximation.
`nboot`	an integer representing the number of bootstrap replicates. It is used to quantify the uncertainty due to the Monte-Carlo integrations. We recommend to set `nboot = 100`.
`conf`	a numeric representing the confidence intervals taking into account the uncertainty due to the bootstrap procedure and the Gaussian process samples.
`sequential`	a boolean. If `sequential=TRUE`, the procedure provides a new point where to perform a simulation. It is the one minimizing the sum of the MAIN effect estimate variances. The variance is taken with respect to the conditional Gaussian process. The new point is selected in the points `candidate`.
`candidate`	a matrix representing the candidate points where the best new point to be simulated is selected. The lines represent the points and the columns represent the dimension.
`sequential.tot`	a boolean. If `sequential.tot=TRUE`, the procedure provides a new point where to perform the simulation. It is the one minimizing the sum of the TOTAL effect estimate. The variance is taken with respect to the conditional Gaussian process. The new point is selected in the points `candidate`.
`max_iter`	a numeric giving the maximal number of iterations for the propagative Gibbs sampler. It is used to simulate the realizations of the Gaussian process.
`x`	an object of class S3 `"sobolGP"` obtaining from the procedure `sobolGP`. It stores the results of the Kriging-based global sensitivity analysis.
`tot`	a boolean. If `tot=TRUE`, the procedure ask provides a point relative to the uncertainty of the total Sobol' indices (instead of first order' ones).
`xpoint`	a matrix representing a new point added to the kriging model.
`y`	a numeric giving the response of the function at `xpoint`.
`newcandidate`	a matrix representing the new candidate points where the best point to be simulated is selected. If `newcandidate=NULL`, these points correspond to `candidate` without the new point `xpoint`.
`...`	any other arguments to be passed

Details

The function ask provides the new point where the function should be simulated. Furthermore, the function tell performs a new kriging-based sensitivity analysis when the point x with the corresponding observation y is added.

Value

An object of class S3 sobolGP.

call : a list containing the arguments of the function sobolGP :
- X1 : X1
- X2 : X2
- conf : conf
- nboot : nboot
- candidate : candidate
- sequential : sequential
- max_iter : max_iter
- sequential.tot : sequential.tot
- model : model
- tot : tot
- method : MCmethod
- type : type
- nsim : nsim
S : a list containing the results of the kriging-based sensitivity analysis for the MAIN effects:
- mean : a matrix giving the mean of the Sobol index estimates.
- var : a matrix giving the variance of the Sobol index estimates.
- ci : a matrix giving the confidence intervals of the Sobol index estimates according to conf.
- varPG : a matrix giving the variance of the Sobol index estimates due to the Gaussian process approximation.
- varMC : a matrix giving the variance of the Sobol index estimates due to the Monte-Carlo integrations.
- xnew : if sequential=TRUE, a matrix giving the point in candidate which is the best to simulate.
- xnewi : if sequential=TRUE, an integer giving the index of the point in candidate which is the best to simulate.
T : a list containing the results of the kriging-based sensitivity analysis for the TOTAL effects:
- mean : a matrix giving the mean of the Sobol index estimates.
- var : a matrix giving the variance of the Sobol index estimates.
- ci : a matrix giving the confidence intervals of the Sobol index estimates according to conf.
- varPG : a matrix giving the variance of the Sobol index estimates due to the Gaussian process approximation.
- varMC : a matrix giving the variance of the Sobol index estimates due to the Monte-Carlo integrations.
- xnew : if sequential.tot=TRUE, a matrix giving the point in candidate which is the best to simulate.
- xnewi : if sequential.tot=TRUE, an integer giving the index of the point in candidate which is the best to simulate.

Author(s)

Loic Le Gratiet, EDF R&D

References

L. Le Gratiet, C. Cannamela and B. Iooss (2014), A Bayesian approach for global sensitivity analysis of (multifidelity) computer codes, SIAM/ASA J. Uncertainty Quantification 2-1, pp. 336-363.

Examples



library(DiceKriging)

#--------------------------------------#
# kriging model building
#--------------------------------------#

d <- 2; n <- 16
design.fact <- expand.grid(x1=seq(0,1,length=4), x2=seq(0,1,length=4))
y <- apply(design.fact, 1, branin) 

m <- km(design=design.fact, response=y)

#--------------------------------------#
# sobol samples & candidate points
#--------------------------------------#

n <- 1000
X1 <- data.frame(matrix(runif(d * n), nrow = n))
X2 <- data.frame(matrix(runif(d * n), nrow = n))

candidate <- data.frame(matrix(runif(d * 100), nrow = 100))

#--------------------------------------#
# Kriging-based Sobol
#--------------------------------------#

nsim <- 10 # put nsim <- 100
nboot <- 10 # put nboot <- 100

res <- sobolGP(
model = m,
type="UK",
MCmethod="sobol",
X1,
X2,
nsim = nsim,
conf = 0.95,
nboot = nboot,
sequential = TRUE,
candidate,
sequential.tot=FALSE,
max_iter = 1000
) 

res
plot(res)

x <- ask(res)
y <- branin(x)

# The following line doesn't work (uncorrected bug: 
#     unused argument in km(), passed by update(), eval(), tell.sobolGP() ??)
#res.new <- tell(res,y,x)
#res.new


library(DiceKriging)

#--------------------------------------#
# kriging model building
#--------------------------------------#

d <- 2; n <- 16
design.fact <- expand.grid(x1=seq(0,1,length=4), x2=seq(0,1,length=4))
y <- apply(design.fact, 1, branin) 

m <- km(design=design.fact, response=y)

#--------------------------------------#
# sobol samples & candidate points
#--------------------------------------#

n <- 1000
X1 <- data.frame(matrix(runif(d * n), nrow = n))
X2 <- data.frame(matrix(runif(d * n), nrow = n))

candidate <- data.frame(matrix(runif(d * 100), nrow = 100))

#--------------------------------------#
# Kriging-based Sobol
#--------------------------------------#

nsim <- 10 # put nsim <- 100
nboot <- 10 # put nboot <- 100

res <- sobolGP(
model = m,
type="UK",
MCmethod="sobol",
X1,
X2,
nsim = nsim,
conf = 0.95,
nboot = nboot,
sequential = TRUE,
candidate,
sequential.tot=FALSE,
max_iter = 1000
) 

res
plot(res)

x <- ask(res)
y <- branin(x)

# The following line doesn't work (uncorrected bug: 
#     unused argument in km(), passed by update(), eval(), tell.sobolGP() ??)
#res.new <- tell(res,y,x)
#res.new

Monte Carlo Estimation of Sobol' Indices (improved formulas of Jansen (1999) and Saltelli et al. (2010))

Description

soboljansen implements the Monte Carlo estimation of the Sobol' indices for both first-order and total indices at the same time (alltogether $2p$ indices), at a total cost of $(p+2) \times n$ model evaluations. These are called the Jansen estimators.

Usage

soboljansen(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...)
## S3 method for class 'soboljansen'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'soboljansen'
print(x, ...)
## S3 method for class 'soboljansen'
plot(x, ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ...)
  ## S3 method for class 'soboljansen'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'soboljansen'
ggplot(data, mapping = aes(), ylim = c(0, 1), y_col = NULL,
                 y_dim3 = NULL, ..., environment = parent.frame())
soboljansen(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...)
## S3 method for class 'soboljansen'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'soboljansen'
print(x, ...)
## S3 method for class 'soboljansen'
plot(x, ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ...)
  ## S3 method for class 'soboljansen'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'soboljansen'
ggplot(data, mapping = aes(), ylim = c(0, 1), y_col = NULL,
                 y_dim3 = NULL, ..., environment = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`x`	a list of class `"soboljansen"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"soboljansen"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`y_col`	an integer defining the index of the column of `x$y` to be used for plotting the corresponding sensitivity indices (only applies if `x$y` is a matrix or an array). If set to `NULL` (as per default) and `x$y` is a matrix or an array, the first column (respectively the first element in the second dimension) of `x$y` is used (i.e. `y_col = 1`).
`y_dim3`	an integer defining the index in the third dimension of `x$y` to be used for plotting the corresponding sensitivity indices (only applies if `x$y` is an array). If set to `NULL` (as per default) and `x$y` is a three-dimensional array, the first element in the third dimension of `x$y` is used (i.e. `y_dim3 = 1`).
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	for `soboljansen`: any other arguments for `model` which are passed unchanged each time it is called.

Details

This estimator is good for large first-order indices, and (large and small) total indices.

This version of soboljansen also supports matrices and three-dimensional arrays as output of model. If the model output is a matrix or an array, V, S and T are matrices or arrays as well (depending on the type of y and the value of nboot).

The bootstrap outputs V.boot, S.boot and T.boot can only be returned if the model output is a vector (using argument return.var). For matrix or array output, these objects can't be returned.

Value

soboljansen returns a list of class "soboljansen", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	either a vector, a matrix or a three-dimensional array of model responses (depends on the output of `model`).
`V`	the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor and also with respect to the complementary set of each factor ("all but $X_i$ ").
`S`	the estimations of the Sobol' first-order indices.
`T`	the estimations of the Sobol' total sensitivity indices.

Users can ask more ouput variables with the argument return.var (for example, bootstrap outputs V.boot, S.boot and T.boot).

Author(s)

Bertrand Iooss, with contributions from Frank Weber (2016)

References

M.J.W. Jansen, 1999, Analysis of variance designs for model output, Computer Physics Communication, 117, 35–43.

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- soboljansen(model = sobol.fun, X1, X2, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)


# Only for demonstration purposes: a model function returning a matrix
sobol.fun_matrix <- function(X){
  res_vector <- sobol.fun(X)
  cbind(res_vector, 2 * res_vector)
}
x_matrix <- soboljansen(model = sobol.fun_matrix, X1, X2)
plot(x_matrix, y_col = 2)
title(main = "y_col = 2")

# Also only for demonstration purposes: a model function returning a
# three-dimensional array
sobol.fun_array <- function(X){
  res_vector <- sobol.fun(X)
  res_matrix <- cbind(res_vector, 2 * res_vector)
  array(data = c(res_matrix, 5 * res_matrix), 
        dim = c(length(res_vector), 2, 2))
}
x_array <- soboljansen(model = sobol.fun_array, X1, X2)
plot(x_array, y_col = 2, y_dim3 = 2)
title(main = "y_col = 2, y_dim3 = 2")


# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- soboljansen(model = sobol.fun, X1, X2, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)


# Only for demonstration purposes: a model function returning a matrix
sobol.fun_matrix <- function(X){
  res_vector <- sobol.fun(X)
  cbind(res_vector, 2 * res_vector)
}
x_matrix <- soboljansen(model = sobol.fun_matrix, X1, X2)
plot(x_matrix, y_col = 2)
title(main = "y_col = 2")

# Also only for demonstration purposes: a model function returning a
# three-dimensional array
sobol.fun_array <- function(X){
  res_vector <- sobol.fun(X)
  res_matrix <- cbind(res_vector, 2 * res_vector)
  array(data = c(res_matrix, 5 * res_matrix), 
        dim = c(length(res_vector), 2, 2))
}
x_array <- soboljansen(model = sobol.fun_array, X1, X2)
plot(x_array, y_col = 2, y_dim3 = 2)
title(main = "y_col = 2, y_dim3 = 2")

Monte Carlo Estimation of Sobol' Indices via matrix permutations

Description

sobolmara implements the Monte Carlo estimation of the first-order Sobol' sensitivity indices using the formula of Mara and Joseph (2008), called the Mara estimator. This method allows the estimation of all first-order p indices at a cost of 2N model calls (the random sample size), then independently of p (the number of inputs).

Usage

sobolmara(model = NULL, X1, ...)
## S3 method for class 'sobolmara'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobolmara'
print(x, ...)
## S3 method for class 'sobolmara'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobolmara'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'sobolmara'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sobolmara(model = NULL, X1, ...)
## S3 method for class 'sobolmara'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobolmara'
print(x, ...)
## S3 method for class 'sobolmara'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobolmara'
plotMultOut(x, ylim = c(0, 1), ...)
## S3 method for class 'sobolmara'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the random sample.
`x`	a list of class `"sobolmara"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolmara"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

The estimator used by sobolmara is based on rearragement of a unique matrix via random permutations (see Mara and Joseph, 2008). Bootstrap confidence intervals are not available.

Value

sobolmara returns a list of class "sobolmara", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a vector of model responses.
`S`	the estimations of the Sobol' sensitivity indices.

Author(s)

Bertrand Iooss

References

Mara, T. and Joseph, O.R. (2008), Comparison of some efficient methods to evaluate the main effect of computer model factors, Journal of Statistical Computation and Simulation, 78:167–178

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobolmara requires 1 sample
# (there are 8 factors, all following the uniform distribution on [0,1])
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis
x <- sobolmara(model = sobol.fun, X1 = X1)
print(x)
plot(x)

library(ggplot2)
ggplot(x)
# Test case : the non-monotonic Sobol g-function

# The method of sobolmara requires 1 sample
# (there are 8 factors, all following the uniform distribution on [0,1])
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis
x <- sobolmara(model = sobol.fun, X1 = X1)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

Monte Carlo Estimation of Sobol' Indices (formulas of Martinez (2011))

Description

sobolmartinez implements the Monte Carlo estimation of the Sobol' indices for both first-order and total indices using correlation coefficients-based formulas, at a total cost of $(p+2) \times n$ model evaluations. These are called the Martinez estimators.

Usage

sobolmartinez(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobolmartinez'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobolmartinez'
print(x, ...)
## S3 method for class 'sobolmartinez'
plot(x, ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ...)
## S3 method for class 'sobolmartinez'
ggplot(data, mapping = aes(), ylim = c(0, 1), y_col = NULL,
                 y_dim3 = NULL, ..., environment = parent.frame())
sobolmartinez(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobolmartinez'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'sobolmartinez'
print(x, ...)
## S3 method for class 'sobolmartinez'
plot(x, ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ...)
## S3 method for class 'sobolmartinez'
ggplot(data, mapping = aes(), ylim = c(0, 1), y_col = NULL,
                 y_dim3 = NULL, ..., environment = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`nboot`	the number of bootstrap replicates, or zero to use theoretical formulas based on confidence interfaces of correlation coefficient (Martinez, 2011).
`conf`	the confidence level for bootstrap confidence intervals.
`x`	a list of class `"sobolmartinez"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolmartinez"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`y_col`	an integer defining the index of the column of `x$y` to be used for plotting the corresponding sensitivity indices (only applies if `x$y` is a matrix or an array). If set to `NULL` (as per default) and `x$y` is a matrix or an array, the first column (respectively the first element in the second dimension) of `x$y` is used (i.e. `y_col = 1`).
`y_dim3`	an integer defining the index in the third dimension of `x$y` to be used for plotting the corresponding sensitivity indices (only applies if `x$y` is an array). If set to `NULL` (as per default) and `x$y` is a three-dimensional array, the first element in the third dimension of `x$y` is used (i.e. `y_dim3 = 1`).
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	for `sobolmartinez`: any other arguments for `model` which are passed unchanged each time it is called

Details

This estimator supports missing values (NA or NaN) which can occur during the simulation of the model on the design of experiments (due to code failure) even if Sobol' indices are no more rigorous variance-based sensitivity indices if missing values are present. In this case, a warning is displayed.

This version of sobolmartinez also supports matrices and three-dimensional arrays as output of model. Bootstrapping (including bootstrap confidence intervals) is also supported for matrix or array output. However, theoretical confidence intervals (for nboot = 0) are only supported for vector output. If the model output is a matrix or an array, V, S and T are matrices or arrays as well (depending on the type of y and the value of nboot).

The bootstrap outputs V.boot, S.boot and T.boot can only be returned if the model output is a vector (using argument return.var). For matrix or array output, these objects can't be returned.

Value

sobolmartinez returns a list of class "sobolmartinez", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	either a vector, a matrix or a three-dimensional array of model responses (depends on the output of `model`).
`V`	the estimations of normalized variances of the Conditional Expectations (VCE) with respect to each factor and also with respect to the complementary set of each factor ("all but $X_i$ ").
`S`	the estimations of the Sobol' first-order indices.
`T`	the estimations of the Sobol' total sensitivity indices.

Users can ask more ouput variables with the argument return.var (for example, bootstrap outputs V.boot, S.boot and T.boot).

Author(s)

Bertrand Iooss, with contributions from Frank Weber (2016)

References

J-M. Martinez, 2011, Analyse de sensibilite globale par decomposition de la variance, Presentation in the meeting of GdR Ondes and GdR MASCOT-NUM, January, 13th, 2011, Institut Henri Poincare, Paris, France.

M. Baudin, K. Boumhaout, T. Delage, B. Iooss and J-M. Martinez, 2016, Numerical stability of Sobol' indices estimation formula, Proceedings of the SAMO 2016 Conference, Reunion Island, France, December 2016

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- sobolmartinez(model = sobol.fun, X1, X2, nboot = 0)
print(x)
plot(x)

library(ggplot2)
ggplot(x)


# Only for demonstration purposes: a model function returning a matrix
sobol.fun_matrix <- function(X){
  res_vector <- sobol.fun(X)
  cbind(res_vector, 2 * res_vector)
}
x_matrix <- sobolmartinez(model = sobol.fun_matrix, X1, X2)
plot(x_matrix, y_col = 2)
title(main = "y_col = 2")

# Also only for demonstration purposes: a model function returning a
# three-dimensional array
sobol.fun_array <- function(X){
  res_vector <- sobol.fun(X)
  res_matrix <- cbind(res_vector, 2 * res_vector)
  array(data = c(res_matrix, 5 * res_matrix), 
        dim = c(length(res_vector), 2, 2))
}
x_array <- sobolmartinez(model = sobol.fun_array, X1, X2)
plot(x_array, y_col = 2, y_dim3 = 2)
title(main = "y_col = 2, y_dim3 = 2")

# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- sobolmartinez(model = sobol.fun, X1, X2, nboot = 0)
print(x)
plot(x)

library(ggplot2)
ggplot(x)


# Only for demonstration purposes: a model function returning a matrix
sobol.fun_matrix <- function(X){
  res_vector <- sobol.fun(X)
  cbind(res_vector, 2 * res_vector)
}
x_matrix <- sobolmartinez(model = sobol.fun_matrix, X1, X2)
plot(x_matrix, y_col = 2)
title(main = "y_col = 2")

# Also only for demonstration purposes: a model function returning a
# three-dimensional array
sobol.fun_array <- function(X){
  res_vector <- sobol.fun(X)
  res_matrix <- cbind(res_vector, 2 * res_vector)
  array(data = c(res_matrix, 5 * res_matrix), 
        dim = c(length(res_vector), 2, 2))
}
x_array <- sobolmartinez(model = sobol.fun_array, X1, X2)
plot(x_array, y_col = 2, y_dim3 = 2)
title(main = "y_col = 2, y_dim3 = 2")

Monte Carlo Estimation of Aggregated Sobol' Indices for multiple and functional outputs

Description

sobolMultOut implements the aggregated Sobol' indices for multiple outputs. It consists in averaging all the Sobol indices weighted by the variance of their corresponding output. Moreover, this function computes and plots the functional (unidimensional) Sobol' indices for functional (unidimensional) model output via plotMultOut. Sobol' indices for both first-order and total indices are estimated by Monte Carlo formulas.

Usage

  sobolMultOut(model = NULL, q = 1, X1, X2, MCmethod = "sobol", 
               ubiquitous = FALSE, ...)
  ## S3 method for class 'sobolMultOut'
print(x, ...)
  ## S3 method for class 'sobolMultOut'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobolMultOut'
plotMultOut(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobolMultOut'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sobolMultOut(model = NULL, q = 1, X1, X2, MCmethod = "sobol", 
               ubiquitous = FALSE, ...)
  ## S3 method for class 'sobolMultOut'
print(x, ...)
  ## S3 method for class 'sobolMultOut'
plot(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobolMultOut'
plotMultOut(x, ylim = c(0, 1), ...)
  ## S3 method for class 'sobolMultOut'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`q`	dimension of the model output vector.
`X1`	the first random sample.
`X2`	the second random sample.
`MCmethod`	a character string specifying the Monte-Carlo procedure used to estimate the Sobol indices. The avaible methods are : `"sobol"`, `"sobol2002"`, `"sobol2007"`, `"soboljansen"`, `"sobolmara"` and `"sobolGP"`.
`ubiquitous`	if TRUE, 1D functional Sobol indices are computed (default=FALSE).
`x`	a list of class `MCmethod` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `MCmethod` storing the state of the sensitivity study (parameters, data, estimates).
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called

Details

For this function, there are several gaps: the bootstrap estimation of confidence intervals is not avalaible and the tell function does not work. Aggregated Sobol' indices can be plotted with the S3 method plot and ubiquitous Sobol' indices can be visualized with the S3 method plotMultOut (does not work for the "sobolGP" method).

Value

sobolMultOut returns a list of class MCmethod, containing all its input arguments, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	the response used
`V`	the estimations of the aggregated Variances of the Conditional Expectations (VCE) with respect to each factor and also with respect to the complementary set of each factor ("all but $X_i$ ").
`S`	the estimations of the aggregated Sobol' first-order indices.
`T`	the estimations of the aggregated Sobol' total sensitivity indices.
`Sfct`	the estimations of the functional Sobol' first-order indices (if ubiquitous=TRUE and plot.fct=TRUE).
`Tfct`	the estimations of the functional Sobol' total sensitivity indices (if ubiquitous=TRUE and plot.fct=TRUE).

Author(s)

Bertrand Iooss

References

M. Lamboni, H. Monod and D. Makowski, 2011, Multivariate sensitivity analysis to measure global contribution of input factors in dynamic models, Reliability Engineering and System Safety, 96:450-459.

F. Gamboa, A. Janon, T. Klein and A. Lagnoux, 2014, Sensitivity indices for multivariate outputs, Electronic Journal of Statistics, 8:575-603.

Examples

  
    
    # Tests on the functional toy fct 'Arctangent temporal function'
    
    y0 <- atantemp.fun(matrix(c(-7,0,7,-7,0,7),ncol=2))
    #plot(y0[1,],type="l")
    #apply(y0,1,lines)
    
    n <- 100
    X <- matrix(c(runif(2*n,-7,7)),ncol=2)
    y <- atantemp.fun(X)
    plot(y0[2,],ylim=c(-2,2),type="l")
    apply(y,1,lines)
    
    # Sobol indices computations
    
    n <- 1000
    X1 <- data.frame(matrix(runif(2*n,-7,7), nrow = n))
    X2 <- data.frame(matrix(runif(2*n,-7,7), nrow = n))
    
    sa <- sobolMultOut(model=atantemp.fun, q=100, X1, X2, 
                       MCmethod="soboljansen", ubiquitous=TRUE)
    print(sa)
    plot(sa)
    plotMultOut(sa)
    
    library(ggplot2)
    ggplot(sa)
  
# Tests on the functional toy fct 'Arctangent temporal function'
    
    y0 <- atantemp.fun(matrix(c(-7,0,7,-7,0,7),ncol=2))
    #plot(y0[1,],type="l")
    #apply(y0,1,lines)
    
    n <- 100
    X <- matrix(c(runif(2*n,-7,7)),ncol=2)
    y <- atantemp.fun(X)
    plot(y0[2,],ylim=c(-2,2),type="l")
    apply(y,1,lines)
    
    # Sobol indices computations
    
    n <- 1000
    X1 <- data.frame(matrix(runif(2*n,-7,7), nrow = n))
    X2 <- data.frame(matrix(runif(2*n,-7,7), nrow = n))
    
    sa <- sobolMultOut(model=atantemp.fun, q=100, X1, X2, 
                       MCmethod="soboljansen", ubiquitous=TRUE)
    print(sa)
    plot(sa)
    plotMultOut(sa)
    
    library(ggplot2)
    ggplot(sa)

Monte Carlo Estimation of Sobol' Indices (improved formulas of Owen (2013)

Description

sobolowen implements the Monte Carlo estimation of the Sobol' indices for both first-order and total indices at the same time (alltogether $2p$ indices). Take as input 3 independent matrices. These are called the Owen estimators.

Usage

sobolowen(model = NULL, X1, X2, X3, nboot = 0, conf = 0.95, varest = 2, ...)
## S3 method for class 'sobolowen'
tell(x, y = NULL, return.var = NULL, varest = 2, ...)
## S3 method for class 'sobolowen'
print(x, ...)
## S3 method for class 'sobolowen'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'sobolowen'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sobolowen(model = NULL, X1, X2, X3, nboot = 0, conf = 0.95, varest = 2, ...)
## S3 method for class 'sobolowen'
tell(x, y = NULL, return.var = NULL, varest = 2, ...)
## S3 method for class 'sobolowen'
print(x, ...)
## S3 method for class 'sobolowen'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'sobolowen'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`X3`	the third random sample.
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`varest`	choice for the variance estimator for the denominator of the Sobol' indices. varest=1 is for a classical estimator. varest=2 (default) is for the estimator proposed in Janon et al. (2012).
`x`	a list of class `"sobolowen"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolowen"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called

Value

sobolowen returns a list of class "sobolowen", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	the response used
`V`	the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor and also with respect to the complementary set of each factor ("all but $X_i$ ").
`S`	the estimations of the Sobol' first-order indices.
`T`	the estimations of the Sobol' total sensitivity indices.

Users can ask more ouput variables with the argument return.var (for example, bootstrap outputs V.boot, S.boot and T.boot).

Author(s)

Taieb Touati and Bernardo Ramos

References

A. Owen, 2013, Better estimations of small Sobol' sensitivity indices, ACM Transactions on Modeling and Computer Simulations (TOMACS), 23(2), 11.

Janon, A., Klein T., Lagnoux A., Nodet M., Prieur C. (2012), Asymptotic normality and efficiency of two Sobol index estimators. Accepted in ESAIM: Probability and Statistics.

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobolowen requires 3 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))
X3 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis


x <- sobolowen(model = sobol.fun, X1, X2, X3, nboot = 10) # put nboot=100
print(x)
plot(x)

library(ggplot2)
ggplot(x)

# Test case : the non-monotonic Sobol g-function

# The method of sobolowen requires 3 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))
X3 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis


x <- sobolowen(model = sobol.fun, X1, X2, X3, nboot = 10) # put nboot=100
print(x)
plot(x)

library(ggplot2)
ggplot(x)

First-order sensitivity indices estimation via ranking

Description

sobolrank implements the estimation of all first-order indices using only N model evaluations via ranking following Gamboa et al. (2020) and inspired by Chatterjee (2019).

Usage

sobolrank(model = NULL, X, nboot = 0, conf = 0.95, nsample = round(0.8*nrow(X)), 
          ...)
## S3 method for class 'sobolrank'
tell(x, y = NULL, ...)
## S3 method for class 'sobolrank'
print(x, ...)
## S3 method for class 'sobolrank'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'sobolrank'
ggplot(data, mapping = aes(), ..., environment
                 = parent.frame(), ylim = c(0, 1))
sobolrank(model = NULL, X, nboot = 0, conf = 0.95, nsample = round(0.8*nrow(X)), 
          ...)
## S3 method for class 'sobolrank'
tell(x, y = NULL, ...)
## S3 method for class 'sobolrank'
print(x, ...)
## S3 method for class 'sobolrank'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'sobolrank'
ggplot(data, mapping = aes(), ..., environment
                 = parent.frame(), ylim = c(0, 1))

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X`	a random sample of the inputs.
`nboot`	the number of bootstrap replicates, see details.
`conf`	the confidence level for confidence intervals, see details.
`nsample`	the size of the bootstrap sample, see details.
`x`	a list of class `"sobolrank"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolrank"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

The estimator used by sobolrank is defined in Gamboa et al. (2020). It is based on ranking the inputs as was first proposed by Chatterjee (2019) for a Cramer-Von Mises based estimator. All first-order indices can be estimated with a single sample of size N. Since boostrap creates ties which are not accounted for in the algorithm, confidence intervals are obtained by sampling without replacement with a sample size nsample.

Value

sobolrank returns a list of class "sobolrank", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a vector of model responses.
`S`	the estimations of the Sobol' sensitivity indices.

Author(s)

Sebastien Da Veiga

References

Gamboa, F., Gremaud, P., Klein, T., & Lagnoux, A., 2022, Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics, Bernoulli 28: 2345-2374.

Chatterjee, S., 2021, A new coefficient of correlation, Journal of the American Statistical Association, 116:2009-2022.

Examples

# Test case : the non-monotonic Sobol g-function
# Example with a call to a numerical model
library(boot)
n <- 1000
X <- data.frame(matrix(runif(8 * n), nrow = n))
x <- sobolrank(model = sobol.fun, X = X, nboot = 100)
print(x)
library(ggplot2)
ggplot(x)
# Test case : the Ishigami function
# Example with given data
n <- 500
X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
Y <- ishigami.fun(X)
x <- sobolrank(model = NULL, X)
tell(x,Y)
print(x)
ggplot(x)
# Test case : the non-monotonic Sobol g-function
# Example with a call to a numerical model
library(boot)
n <- 1000
X <- data.frame(matrix(runif(8 * n), nrow = n))
x <- sobolrank(model = sobol.fun, X = X, nboot = 100)
print(x)
library(ggplot2)
ggplot(x)
# Test case : the Ishigami function
# Example with given data
n <- 500
X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
Y <- ishigami.fun(X)
x <- sobolrank(model = NULL, X)
tell(x,Y)
print(x)
ggplot(x)

Recursive estimation of Sobol' indices

Description

sobolrec implements a recursive version of the procedure introduced by Tissot & Prieur (2015) using two replicated nested designs. This function estimates either all first-order indices or all closed second-order indices at a total cost of $2 \times N$ model evaluations where $N$ is the size of each replicated nested design.

Usage

sobolrec(model=NULL, factors, layers, order, precision, method=NULL, tail=TRUE, 
          ...)
## S3 method for class 'sobolrec'
ask(x, index, ...)
## S3 method for class 'sobolrec'
tell(x, y = NULL, index, ...)
## S3 method for class 'sobolrec'
print(x, ...)
## S3 method for class 'sobolrec'
plot(x, ylim = c(0,1), ...)
sobolrec(model=NULL, factors, layers, order, precision, method=NULL, tail=TRUE, 
          ...)
## S3 method for class 'sobolrec'
ask(x, index, ...)
## S3 method for class 'sobolrec'
tell(x, y = NULL, index, ...)
## S3 method for class 'sobolrec'
print(x, ...)
## S3 method for class 'sobolrec'
plot(x, ylim = c(0,1), ...)

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`factors`	an integer giving the number of factors, or a vector of character strings giving their names.
`layers`	If `order=1`, a vector specifying the respective sizes of each layer (see "Details"). If `order=2`, an integer specifying the size of all layers.
`order`	an integer specifying which indices to estimate: `1` for first-order indices, `2` for closed second-order indices.
`precision`	a vector containing: the target precision for the stopping criterion. the number of steps for the stopping criterion (must be greater than 1).
`tail`	a boolean specifying the method used to choose the number of levels of the orthogonal array (see "Warning messages").
`method`	If `order=2`, a character specifying the method to construct the orthogonal arrays (see "Details"): `"al"` for the algebraic method `"ar"` for the accept-reject method Set to `NULL` if `order=1`.
`x`	a list of class `"sobolrec"` storing the state of the sensitivity study (parameters, data, estimates).
`index`	an integer specifying the step of the recursion
`y`	the model response.
`ylim`	y-coordinate plotting limits.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

For first-order indices, layers is a vector:

$\left(s_1, ...,s_m \right)$

specifying the number $m$ of layers of the nested design whose respective size are given by:

$\prod_{i=1}^{k-1} s_i, \ k=2, ...,m+1$

For closed second-order indices, layers directly specifies the size of all layers.

For each Sobol' index $S$ the stopping criterion writes:

$\mid S_{l-1}-S_{l} \mid < \epsilon$

This criterion is tested for the last $l_0$ steps (including the current one). $\epsilon$ and $l_0$ are respectively the target precision and the number of steps of the stopping criterion specified in precision.

sobolrec uses either an algebraic or an accept-rejet method to construct the orthogonal arrays for the estimation of closed second-order indices. The algebraic method is less precise than the accept-reject method but offers more steps when the number of factors is small.

sobolrec automatically assigns a uniform distribution on [0,1] to each input. Transformations of distributions (between U[0,1] and the wanted distribution) have to be performed before the call to tell().

Value

sobolrec returns a list of class "sobolrec", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments (row concatenation of the two replicated designs).
`y`	a list of the response used at each step.
`V`	a list of the model variance estimated at each step.
`S`	a list of the Sobol' indices estimated at each step.
`steps`	the number of steps performed.
`N`	the size of each replicated nested design.

Warning messages

"The value entered for layers is not the square of a prime number. It has been replaced by: ": When order=2, the value of layers must be the square of a prime power number. This warning message indicates that it was not the case and the value has been replaced depending on tail. If tail=TRUE (resp. tail=FALSE) the new value of layers is equal to the square of the prime number preceding (resp. following) the square root of layers.
"The value entered for layers is not satisfying the constraint. It has been replaced by: ": the value $N$ for layers must satisfied the constraint $N \geq (d-1)^{2}$ where $d$ is the number of factors. This warning message indicates that N was replaced by the square of the prime number following (or equals to) $d-1$ .

References

A.S. Hedayat, N.J.A. Sloane and J. Stufken, 1999, Orthogonal Arrays: Theory and Applications, Springer Series in Statistics.

L. Gilquin, E. Arnaud, H. Monod and C. Prieur, 2021, Recursive estimation procedure of Sobol' indices based on replicated designs, Computational and Applied Mathematics, 40:1–23.

Examples

	
# Test case: the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])

# first-order indices estimation
x <- sobolrec(model = sobol.fun, factors = 8, layers=rep(2,each=15), order=1,
              precision = c(5*10^(-2),2), method=NULL, tail=TRUE)
print(x)

# closed second-order indices estimation
x <- sobolrec(model = sobol.fun, factors = 8, layers=11^2, order=2,
              precision = c(10^(-2),3), method="al", tail=TRUE)
print(x)


# Test case: dealing with external model 
# put in comment because of bug with ask use !

#x <- sobolrec(model = NULL, factors = 8, layers=rep(2,each=15), order=1,
#              precision = c(5*10^(-2),2), method=NULL, tail=TRUE)
#toy <- sobol.fun
#k <- 1
#stop_crit <- FALSE
#while(!(stop_crit) & (k<length(x$layers))){
#  ask(x, index=k)
#  y <- toy(x$block)
#  tell(x, y, index=k)
#  stop_crit <- x$stop_crit
#  k <- k+1
#}
#print(x)


# Test case: the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])

# first-order indices estimation
x <- sobolrec(model = sobol.fun, factors = 8, layers=rep(2,each=15), order=1,
              precision = c(5*10^(-2),2), method=NULL, tail=TRUE)
print(x)

# closed second-order indices estimation
x <- sobolrec(model = sobol.fun, factors = 8, layers=11^2, order=2,
              precision = c(10^(-2),3), method="al", tail=TRUE)
print(x)


# Test case: dealing with external model 
# put in comment because of bug with ask use !

#x <- sobolrec(model = NULL, factors = 8, layers=rep(2,each=15), order=1,
#              precision = c(5*10^(-2),2), method=NULL, tail=TRUE)
#toy <- sobol.fun
#k <- 1
#stop_crit <- FALSE
#while(!(stop_crit) & (k<length(x$layers))){
#  ask(x, index=k)
#  y <- toy(x$block)
#  tell(x, y, index=k)
#  stop_crit <- x$stop_crit
#  k <- k+1
#}
#print(x)

Sobol' indices estimation based on replicated orthogonal arrays

Description

sobolrep generalizes the estimation of the Sobol' sensitivity indices introduced by Tissot & Prieur (2015) using two replicated orthogonal arrays. This function estimates either

all first-order and second-order indices at a total cost of $2 \times N$ model evaluations,
or all first-order, second-order and total-effect indices at a total cost of $N \times (d+2)$ model evaluations,

where $N=q^{2}$ and $q \geq d-1$ is a prime number corresponding to the number of levels of each orthogonal array.

Usage

sobolrep(model = NULL, factors, N, tail=TRUE, 
			conf=0.95, nboot=0, nbrep=1, total=FALSE, ...)
## S3 method for class 'sobolrep'
tell(x, y = NULL, ...)
## S3 method for class 'sobolrep'
print(x, ...)
## S3 method for class 'sobolrep'
plot(x, ylim = c(0,1), choice, ...)
sobolrep(model = NULL, factors, N, tail=TRUE, 
			conf=0.95, nboot=0, nbrep=1, total=FALSE, ...)
## S3 method for class 'sobolrep'
tell(x, y = NULL, ...)
## S3 method for class 'sobolrep'
print(x, ...)
## S3 method for class 'sobolrep'
plot(x, ylim = c(0,1), choice, ...)

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`factors`	an integer giving the number of factors, or a vector of character strings giving their names.
`N`	an integer giving the size of each replicated design (for a total of $2 \times N$ model evaluations).
`tail`	a boolean specifying the method used to choose the number of levels of the orthogonal array (see "Warning messages").
`conf`	the confidence level for confidence intervals.
`nboot`	the number of bootstrap replicates.
`nbrep`	the number of times the estimation procedure is repeated (see "Details").
`total`	a boolean specifying whether or not total effect indices are estimated.
`x`	a list of class `"sobolrep"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	the model response.
`ylim`	y-coordinate plotting limits.
`choice`	an integer specifying which indices to plot: `1` for first-order indices, `2` for second-order indices, `3` for total-effect indices.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

sobolrep automatically assigns a uniform distribution on [0,1] to each input. Transformations of distributions (between U[0,1] and the wanted distribution) have to be performed before the call to tell() (see "Examples").

nbrep specifies the number of times the estimation procedure is repeated. Each repetition makes use of the orthogonal array structure to obtain a new set of Sobol' indices. It is important to note that no additional model evaluations are performed (the cost of the procedure remains the same).

Value

sobolrep returns a list of class "sobolrep", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments (row concatenation of the two replicated designs).
`y`	the response used.
`RP`	the matrix of permutations.
`V`	the model variance.
`S`	a data.frame containing estimations of the first-order Sobol' indices plus confidence intervals if specified.
`S2`	a data.frame containing estimations of the second-order Sobol' indices plus confidence intervals if specified.
`T`	a data.frame containing estimations of the total-effect indices plus confidence intervals if specified.

Warning messages

"The value entered for N is not the square of a prime number. It has been replaced by: ": the number of levels q of each orthogonal array must be a prime number. If N is not a square of a prime number, then this warning message indicates that it was replaced depending on the value of tail. If tail=TRUE (resp. tail=FALSE) the new value of N is equal to the square of the prime number preceding (resp. following) the square root of N.
"The value entered for N is not satisfying the constraint $N \geq (d-1)^2$ . It has been replaced by: ": the following constraint must be satisfied $N \geq (d-1)^{2}$ where $d$ is the number of factors. This warning message indicates that N was replaced by the square of the prime number following (or equals to) $d-1$ .

References

A.S. Hedayat, N.J.A. Sloane and J. Stufken, 1999, Orthogonal Arrays: Theory and Applications, Springer Series in Statistics.

J-Y. Tissot and C. Prieur, 2015, A randomized orthogonal orray-based procedure for the estimation of first- and second-order Sobol' indices, J. Statist. Comput. Simulation, 85:1358-1381.

Examples

# Test case: the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])

x <- sobolrep(model = sobol.fun, factors = 8, N = 1000, nboot=100, nbrep=1, total=FALSE)
print(x)
plot(x,choice=1)
plot(x,choice=2)

# Test case: dealing with non-uniform distributions

x <- sobolrep(model = NULL, factors = 3, N = 1000, nboot=0, nbrep=1, total=FALSE)

# X1 follows a log-normal distribution:
x$X[,1] <- qlnorm(x$X[,1])

# X2 follows a standard normal distribution:
x$X[,2] <- qnorm(x$X[,2])

# X3 follows a gamma distribution:
x$X[,3] <- qgamma(x$X[,3],shape=0.5)

# toy example
toy <- function(x){rowSums(x)}
y <- toy(x$X)
tell(x, y)
print(x)
plot(x,choice=1)
plot(x,choice=2)
# Test case: the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])

x <- sobolrep(model = sobol.fun, factors = 8, N = 1000, nboot=100, nbrep=1, total=FALSE)
print(x)
plot(x,choice=1)
plot(x,choice=2)

# Test case: dealing with non-uniform distributions

x <- sobolrep(model = NULL, factors = 3, N = 1000, nboot=0, nbrep=1, total=FALSE)

# X1 follows a log-normal distribution:
x$X[,1] <- qlnorm(x$X[,1])

# X2 follows a standard normal distribution:
x$X[,2] <- qnorm(x$X[,2])

# X3 follows a gamma distribution:
x$X[,3] <- qgamma(x$X[,3],shape=0.5)

# toy example
toy <- function(x){rowSums(x)}
y <- toy(x$X)
tell(x, y)
print(x)
plot(x,choice=1)
plot(x,choice=2)

Sobol' Indices Estimation Using Replicated OA-based LHS

Description

sobolroalhs implements the estimation of the Sobol' sensitivity indices introduced by Tissot & Prieur (2015) using two replicated designs (Latin hypercubes or orthogonal arrays). This function estimates either all first-order indices or all closed second-order indices at a total cost of $2 \times N$ model evaluations. For closed second-order indices $N=q^{2}$ where $q \geq d-1$ is a prime number corresponding to the number of levels of the orthogonal array, and where $d$ indicates the number of factors.

Usage

sobolroalhs(model = NULL, factors, N, p=1, order, tail=TRUE, conf=0.95, nboot=0, ...)
## S3 method for class 'sobolroalhs'
tell(x, y = NULL, ...)
## S3 method for class 'sobolroalhs'
print(x, ...)
## S3 method for class 'sobolroalhs'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'sobolroalhs'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sobolroalhs(model = NULL, factors, N, p=1, order, tail=TRUE, conf=0.95, nboot=0, ...)
## S3 method for class 'sobolroalhs'
tell(x, y = NULL, ...)
## S3 method for class 'sobolroalhs'
print(x, ...)
## S3 method for class 'sobolroalhs'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'sobolroalhs'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`factors`	an integer giving the number of factors, or a vector of character strings giving their names.
`N`	an integer giving the size of each replicated design (for a total of $2 \times N$ model evaluations).
`p`	an integer giving the number of model outputs.
`order`	an integer giving the order of the indices (1 or 2).
`tail`	a boolean specifying the method used to choose the number of levels of the orthogonal array (see "Warning messages").
`conf`	the confidence level for confidence intervals.
`nboot`	the number of bootstrap replicates.
`x`	a list of class `"sobolroalhs"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolroalhs"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

sobolroalhs automatically assigns a uniform distribution on [0,1] to each input. Transformations of distributions (between U[0,1] and the wanted distribution) have to be realized before the call to tell() (see "Examples").

Missing values (i.e NA values) in outputs are automatically handled by the function.

This function also supports multidimensional outputs (matrices in y or as output of model). In this case, aggregated Sobol' indices are returned (see sobolMultOut).

Value

sobolroalhs returns a list of class "sobolroalhs", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments (row concatenation of the two replicated designs).
`y`	the responses used.
`OA`	the orthogonal array constructed (`NULL` if `order`=1).
`V`	the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor.
`S`	the estimations of the Sobol' indices.

Warning messages

"The value entered for N is not the square of a prime number. It has been replaced by: ": when order $=2$ , the number of levels of the orthogonal array must be a prime number. If N is not a square of a prime number, then this warning message indicates that it was replaced depending on the value of tail. If tail=TRUE (resp. tail=FALSE) the new value of N is equal to the square of the prime number preceding (resp. following) the square root of N.
"The value entered for N is not satisfying the constraint $N \geq (d-1)^2$ . It has been replaced by: ": when order $=2$ , the following constraint must be satisfied $N \geq (d-1)^{2}$ where $d$ is the number of factors. This warning message indicates that N was replaced by the square of the prime number following (or equals to) $d-1$ .

Author(s)

Laurent Gilquin

References

A.S. Hedayat, N.J.A. Sloane and J. Stufken, 1999, Orthogonal Arrays: Theory and Applications, Springer Series in Statistics.

F. Gamboa, A. Janon, T. Klein and A. Lagnoux, 2014, Sensitivity indices for multivariate outputs, Electronic Journal of Statistics, 8:575-603.

J.Y. Tissot and C. Prieur, 2015, A randomized orthogonal orray-based procedure for the estimation of first- and second-order Sobol' indices, J. Statist. Comput. Simulation, 85:1358-1381.

Examples

library(boot)
library(numbers)

####################
# Test case: the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])

# first-order sensitivity indices
x <- sobolroalhs(model = sobol.fun, factors = 8, N = 1000, order = 1, nboot=100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

# closed second-order sensitivity indices
x <- sobolroalhs(model = sobol.fun, factors = 8, N = 1000, order = 2, nboot=100)
print(x)
ggplot(x)

####################
# Test case: dealing with non-uniform distributions

x <- sobolroalhs(model = NULL, factors = 3, N = 1000, order =1, nboot=0)

# X1 follows a log-normal distribution:
x$X[,1] <- qlnorm(x$X[,1])

# X2 follows a standard normal distribution:
x$X[,2] <- qnorm(x$X[,2])

# X3 follows a gamma distribution:
x$X[,3] <- qgamma(x$X[,3],shape=0.5)

# toy example
toy <- function(x){rowSums(x)}
y <- toy(x$X)
tell(x, y)
print(x)
ggplot(x)

####################
# Test case : multidimensional outputs


toy <- function(x){cbind(x[,1]+x[,2]+x[,1]*x[,2],2*x[,1]+3*x[,1]*x[,2]+x[,2])}
x <- sobolroalhs(model = toy, factors = 3, N = 1000, p=2, order =1, nboot=100)
print(x)
ggplot(x)


library(boot)
library(numbers)

####################
# Test case: the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# (there are 8 factors, all following the uniform distribution on [0,1])

# first-order sensitivity indices
x <- sobolroalhs(model = sobol.fun, factors = 8, N = 1000, order = 1, nboot=100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

# closed second-order sensitivity indices
x <- sobolroalhs(model = sobol.fun, factors = 8, N = 1000, order = 2, nboot=100)
print(x)
ggplot(x)

####################
# Test case: dealing with non-uniform distributions

x <- sobolroalhs(model = NULL, factors = 3, N = 1000, order =1, nboot=0)

# X1 follows a log-normal distribution:
x$X[,1] <- qlnorm(x$X[,1])

# X2 follows a standard normal distribution:
x$X[,2] <- qnorm(x$X[,2])

# X3 follows a gamma distribution:
x$X[,3] <- qgamma(x$X[,3],shape=0.5)

# toy example
toy <- function(x){rowSums(x)}
y <- toy(x$X)
tell(x, y)
print(x)
ggplot(x)

####################
# Test case : multidimensional outputs


toy <- function(x){cbind(x[,1]+x[,2]+x[,1]*x[,2],2*x[,1]+3*x[,1]*x[,2]+x[,2])}
x <- sobolroalhs(model = toy, factors = 3, N = 1000, p=2, order =1, nboot=100)
print(x)
ggplot(x)

Sobol' Indices estimation under inequality constraints

Description

sobolroauc deals with the estimation of Sobol' sensitivity indices when there exists one or multiple sets of constrained factors. Constraints within a set are expressed as inequality constraints (simplex constraint). This function generalizes the procedure of Tissot and Prieur (2015) to estimate either all first-order indices or all closed second-order indices at a total cost of $2 \times N$ model evaluations. For closed second-order indices $N=q^{2}$ where $q \geq d-1$ is a prime number denoting the number of levels of the orthogonal array, and where $d$ indicates the number of independent factors or sets of factors.

Usage

sobolroauc(model = NULL, factors, constraints = NULL, N, p = 1, order, 
            tail = TRUE, conf = 0.95, nboot = 0, ...)
## S3 method for class 'sobolroauc'
tell(x, y = NULL, ...)
## S3 method for class 'sobolroauc'
print(x, ...)
## S3 method for class 'sobolroauc'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'sobolroauc'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
sobolroauc(model = NULL, factors, constraints = NULL, N, p = 1, order, 
            tail = TRUE, conf = 0.95, nboot = 0, ...)
## S3 method for class 'sobolroauc'
tell(x, y = NULL, ...)
## S3 method for class 'sobolroauc'
print(x, ...)
## S3 method for class 'sobolroauc'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'sobolroauc'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`factors`	an integer giving the number of factors, or a vector of character strings giving their names.
`constraints`	a list giving the sets of constrained factors (see "Details").
`N`	an integer giving the size of each replicated design (for a total of $2 \times N$ model evaluations).
`p`	an integer giving the number of model outputs.
`order`	an integer giving the order of the indices (1 or 2).
`tail`	a boolean specifying the method used to choose the number of levels of the orthogonal array (see "Warning messages").
`conf`	the confidence level for confidence intervals.
`nboot`	the number of bootstrap replicates.
`x`	a list of class `"sobolroauc"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolroauc"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

constraints list the sets of factors depending on each other through inequality constraints (see "Examples"). A same factor is not allowed to appear in multiple sets. Factors not appearing in constraints are assumed to be independent and follow each a uniform distribution on [0,1]. One Sobol' index is estimated for each independent factor or set of factors.

Missing values (i.e NA values) in the model responses are automatically handled by the function.

This function also supports multidimensional outputs (matrices in y or as output of model). In this case, aggregated Sobol' indices are returned (see sobolMultOut).

Value

sobolroauc returns a list of class "sobolroauc", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments (concatenation of two replicated designs).
`y`	the responses used.
`OA`	the orthogonal array constructed (`NULL` if `order`=1).
`V`	the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor.
`S`	the estimations of the Sobol' indices.

Warning messages

"The value entered for N is not the square of a prime number. It has been replaced by: ": when order $=2$ , the number of levels of the orthogonal array must be a prime number. If N is not a square of a prime number, then this warning message indicates that it was replaced depending on the value of tail. If tail=TRUE (resp. tail=FALSE) the new value of N is equal to the square of the prime number preceding (resp. following) the square root of N.
"The value entered for N is not satisfying the constraint $N \geq (d-1)^2$ . It has been replaced by: ": when order $=2$ , the following constraint must be satisfied $N \geq (d-1)^{2}$ where $d$ is the number of independent factors or sets of factors. This warning message indicates that N was replaced by the square of the prime number following (or equals to) $d-1$ .

Author(s)

Laurent Gilquin

References

L. Devroye, 1986, Non-Uniform Random Variate Generation. Springer-Verlag.

J. Jacques, C. Lavergne and N. Devictor, 2006, Sensitivity Analysis in presence of model uncertainty and correlated inputs. Reliability Engineering & System Safety, 91:1126-1134.

L. Gilquin, C. Prieur and E. Arnaud, 2015, Replication procedure for grouped Sobol' indices estimation in dependent uncertainty spaces, Information and Inference, 4:354-379.

J.Y. Tissot and C. Prieur, 2015, A randomized orthogonal orray-based procedure for the estimation of first- and second-order Sobol' indices, J. Statist. Comput. Simulation, 85:1358-1381.

Examples

library(boot)
library(numbers)

# Test case: the non-monotonic Sobol g-function
# (there are 8 factors, all following the uniform distribution on [0,1])

# Suppose we have the inequality constraints: X1 <= X3 and X4 <= X6.

# first-order sensitivity indices
x <- sobolroauc(model = sobol.fun, factors = 8, constraints = list(c(1,3),c(4,6)), 
                N = 1000, order = 1, nboot=100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

# closed second-order sensitivity indices
x <- sobolroauc(model = sobol.fun, factors = 8, constraints = list(c(1,3),c(4,6)), 
                N = 1000, order = 2, nboot=100)
print(x)
ggplot(x)

library(boot)
library(numbers)

# Test case: the non-monotonic Sobol g-function
# (there are 8 factors, all following the uniform distribution on [0,1])

# Suppose we have the inequality constraints: X1 <= X3 and X4 <= X6.

# first-order sensitivity indices
x <- sobolroauc(model = sobol.fun, factors = 8, constraints = list(c(1,3),c(4,6)), 
                N = 1000, order = 1, nboot=100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

# closed second-order sensitivity indices
x <- sobolroauc(model = sobol.fun, factors = 8, constraints = list(c(1,3),c(4,6)), 
                N = 1000, order = 2, nboot=100)
print(x)
ggplot(x)

Monte Carlo Estimation of Sobol' Indices based on Saltelli schemes

Description

sobolSalt implements the Monte Carlo estimation of the Sobol' indices for either both first-order and total effect indices at the same time (alltogether $2p$ indices) at a total cost of $n\times(p+2)$ model evaluations; or first-order, second-order and total indices at the same time (alltogether $2p+ p\times(p-1)/2$ indices) at a total cost of $n\times(2\times p+2)$ model evaluations.

Usage

sobolSalt(model = NULL, X1, X2, scheme="A", nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobolSalt'
tell(x, y = NULL, ...)
## S3 method for class 'sobolSalt'
print(x, ...)
## S3 method for class 'sobolSalt'
plot(x, ylim = c(0, 1), choice, ...)
## S3 method for class 'sobolSalt'
ggplot(data, mapping = aes(), ylim = c(0, 1), choice, ..., environment
                 = parent.frame())
sobolSalt(model = NULL, X1, X2, scheme="A", nboot = 0, conf = 0.95, ...)
## S3 method for class 'sobolSalt'
tell(x, y = NULL, ...)
## S3 method for class 'sobolSalt'
print(x, ...)
## S3 method for class 'sobolSalt'
plot(x, ylim = c(0, 1), choice, ...)
## S3 method for class 'sobolSalt'
ggplot(data, mapping = aes(), ylim = c(0, 1), choice, ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample (containing `n` points).
`X2`	the second random sample (containing `n` points).
`scheme`	a letter `"A"` or `"B"` indicating which scheme to use (see "Details")
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level for bootstrap confidence intervals.
`x`	a list of class `"sobolSalt"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolSalt"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`ylim`	y-coordinate plotting limits.
`choice`	an integer specifying which indices to plot: `1` for first-order and total effect indices, `2` for second-order indices.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called

Details

The estimators used are the one implemented in "sobolEff".

scheme specifies which Saltelli's scheme is to be used: "A" to estimate both first-order and total effect indices, "B" to estimate first-order, second-order and total effect indices.

Value

sobolSalt returns a list of class "sobolSalt", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	the response used.
`V`	the model variance.
`S`	the estimations of the Sobol' first-order indices.
`S2`	the estimations of the Sobol' second-order indices (only for scheme `"B"`).
`T`	the estimations of the Sobol' total sensitivity indices.

Author(s)

Laurent Gilquin

References

A. Janon, T. Klein, A. Lagnoux, M. Nodet, C. Prieur (2014), Asymptotic normality and efficiency of two Sobol index estimators, ESAIM: Probability and Statistics, 18:342-364.

A. Saltelli, 2002, Making best use of model evaluations to compute sensitivity indices, Computer Physics Communication, 145:580-297.

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- sobolSalt(model = sobol.fun, X1, X2, scheme="A", nboot = 100)
print(x)
plot(x, choice=1)

library(ggplot2)
ggplot(x, choice=1)
# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- sobolSalt(model = sobol.fun, X1, X2, scheme="A", nboot = 100)
print(x)
plot(x, choice=1)

library(ggplot2)
ggplot(x, choice=1)

Flexible sensitivity analysis via ranking / nearest neighbours

Description

WARNING: DEPRECATED function: use shapleysobol_knn instead. sobolshap_knn implements the estimation of several sensitivity indices using only N model evaluations via ranking (following Gamboa et al. (2020) and Chatterjee (2019)) or nearest neighbour search (Broto et al. (2020) and Azadkia & Chatterjee (2020)). It can be used with categorical inputs (which are transformed with one-hot encoding), dependent inputs and multiple outputs. Sensitivity indices of any group of inputs can be computed, which means that in particular first-order/total Sobol indices and Shapley effects are accessible. For large sample sizes, the nearest neightbour algorithm can be significantly accelerated by using approximate nearest neighbour search. It is also possible to estimate Shapley effects with the random permutation approach of Castro et al.(2009), where all the terms are obtained with ranking or nearest neighbours.

Usage

  sobolshap_knn(model = NULL, X, id.cat = NULL, U = NULL, method = "knn", 
                n.knn = 2, return.shap = FALSE, randperm = FALSE, n.perm = 1e4, 
                rescale = FALSE, n.limit = 2000, noise = FALSE, ...)
  ## S3 method for class 'sobolshap_knn'
tell(x, y = NULL, ...)
  ## S3 method for class 'sobolshap_knn'
extract(x, ...)
  ## S3 method for class 'sobolshap_knn'
print(x, ...)
  ## S3 method for class 'sobolshap_knn'
plot(x, ylim = c(0, 1), type.multout = "lines", ...)
  ## S3 method for class 'sobolshap_knn'
ggplot(data,  mapping = aes(), ylim = c(0, 1), 
              type.multout = "lines", ..., environment = parent.frame())
sobolshap_knn(model = NULL, X, id.cat = NULL, U = NULL, method = "knn", 
                n.knn = 2, return.shap = FALSE, randperm = FALSE, n.perm = 1e4, 
                rescale = FALSE, n.limit = 2000, noise = FALSE, ...)
  ## S3 method for class 'sobolshap_knn'
tell(x, y = NULL, ...)
  ## S3 method for class 'sobolshap_knn'
extract(x, ...)
  ## S3 method for class 'sobolshap_knn'
print(x, ...)
  ## S3 method for class 'sobolshap_knn'
plot(x, ylim = c(0, 1), type.multout = "lines", ...)
  ## S3 method for class 'sobolshap_knn'
ggplot(data,  mapping = aes(), ylim = c(0, 1), 
              type.multout = "lines", ..., environment = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X`	a random sample of the inputs.
`id.cat`	a vector with the indices of the categorical inputs.
`U`	an integer equal to 0 (total Sobol indices) or 1 (first-order Sobol indices) or a list of vector indices defining the subsets of inputs whose sensitivity indices must be computed or a matrix of 0s and 1s where each row encodes a subset of inputs whose sensitivity indices must be computed (see examples) or NULL (all possible subsets).
`method`	the algorithm to be used for estimation, either "rank" or "knn", see details.
`n.knn`	the number of nearest neighbours used for estimation if `method="knn"`.
`return.shap`	a logical indicating if Shapley effects must be estimated, can only be TRUE if `U=NULL`.
`randperm`	a logical indicating if random permutations are used to estimate Shapley effects, only if `U=NULL` and `return.shap=TRUE`.
`n.perm`	the number of random permutations used for estimation if `randperm=TRUE`.
`rescale`	a logical indicating if continuous inputs must be rescaled before distance computations. If TRUE, continuous inputs are first whitened with the ZCA-cor whitening procedure (cf. whiten() function in package `whitening`). If the inputs are independent, this first step will have a very limited impact. Then, the resulting whitened inputs are individually modified via a copula transform such that each input has the same scale.
`n.limit`	the sample size limit above which approximate nearest neighbour search is activated, only used if `method="knn"`.
`noise`	a logical which is TRUE if the model or the output sample is noisy, see details.
`x`	a list of class `"sobolshap_knn"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolshap_knn"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`ylim`	y-coordinate plotting limits.
`type.multout`	the plotting method in the case of multiple outputs, either "points" or "lines", see examples.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.

Details

For method="rank", the estimator is defined in Gamboa et al. (2020) following Chatterjee (2019). For first-order indices it is based on an input ranking (same algorithm as in sobolrank) while for higher orders, it uses an approximate heuristic solution of the traveling salesman problem applied to the input sample distances (cf. TSP() function in package TSP). For method="knn", ranking and TSP are replaced by a nearest neighbour search as proposed in Broto et al. (2020) and in Azadkia & Chatterjee (2020) for a similar coefficient. The algorithm is the same as in shapleySubsetMc but with an optimized implementation. In particular, the distance used for subsets with mixed inputs (continuous and categorical) are the same but here the additional one-hot encoding of categorical variables makes it possible to work only with Euclidean distances. Furthermore, a fast approximate nearest neighbour search is also available, which is strongly recommended for large sample sizes. The main difference with shapleySubsetMc is that here we use the entire N sample to compute all indices, while in shapleySubsetMc the user can specify a total cost Ntot which performs a specific allocation of sample sizes to the estimation of each index. In addition, the weights option is not available here yet. If the outputs are noisy, the argument noise can be used: it only has an impact on the estimation of one specific sensitivity index, namely $Var(E(Y|X1,\ldots,Xp))/Var(Y)$ . If there is no noise this index is equal to 1, while in the presence of noise it must be estimated.

When randperm=TRUE, Shapley effects are no longer estimated by computing all the possible subsets of variables but only on subsets obtained with random permutations as proposed in Castro et al.(2009). This is useful for problems with a large number of inputs, since the number of subsets increases exponentially with dimension.

Value

sobolshap_knn returns a list of class "sobolshap_knn", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a vector of model responses.
`U`	the subsets of inputs for which sensitivity indices have been computed.
`S`	the estimations of the Sobol sensitivity indices (see details).
`Shap`	the estimations of Shapley effects, if return.shap was set to TRUE.
`order`	0 (total indices), 1 (first-order indices) or NULL. Used for plotting defaults.

Author(s)

Sebastien Da Veiga

References

Azadkia M., Chatterjee S., 2021), A simple measure of conditional dependence, Ann. Statist. 49(6):3070-3102.

Broto B., Bachoc F., Depecker M. (2020), Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution, SIAM/ASA Journal of Uncertainty Quantification, 8:693-716.

Castro J., Gomez D, Tejada J. (2009). Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5):1726-1730.

Chatterjee, S., 2021, A new coefficient of correlation, Journal of the American Statistical Association, 116:2009-2022.

Gamboa, F., Gremaud, P., Klein, T., & Lagnoux, A., 2022, Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics, Bernoulli 28: 2345-2374.

Examples

  
    # Test case: the non-monotonic Sobol g-function
    # Example with a call to a numerical model
    # First compute first-order indices with ranking
    n <- 1000
    X <- data.frame(matrix(runif(8 * n), nrow = n))
    x <- sobolshap_knn(model = sobol.fun, X = X, U = 1, method = "rank")
    print(x)
    library(ggplot2)
    ggplot(x)
    # We can use the output sample generated for this estimation to compute 
    # total indices without additional calls to the model
    x2 <- sobolshap_knn(model = NULL, X = X, U = 0, method = "knn", n.knn = 5)
    tell(x2,x$y)
    ggplot(x2)
    
    # Test case: the Ishigami function
    # Example with given data and the use of approximate nearest neighbour search
    library(RANN)
    n <- 5000
    X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
    Y <- ishigami.fun(X)
    x <- sobolshap_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, 
                       return.shap = TRUE, n.limit = 2000)
    tell(x,Y)
    library(ggplot2)
    ggplot(x)
    # We can also extract first-order and total Sobol indices
    x1 <- extract(x)
    print(x1)
    
    # Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling
    # See Iooss and Prieur (2019)
    library(mvtnorm) # Multivariate Gaussian variables
    library(whitening) # For scaling
    modlin <- function(X) apply(X,1,sum)
    d <- 3
    n <- 10000
    mu <- rep(0,d)
    sig <- c(1,1,2)
    ro <- 0.9
    Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
    Covmat <- ( sig %*% t(sig) ) * Cormat
    Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)
    X <- Xall(n)
    x <- sobolshap_knn(model = modlin, X = X, U = NULL, method = "knn", n.knn = 5, 
                       return.shap = TRUE, rescale = TRUE, n.limit = 2000)
    print(x)
    
    # Test case: functional toy fct 'Arctangent temporal function'
    n <- 3000
    X <- data.frame(matrix(runif(2*n,-7,7), nrow = n))
    Y <- atantemp.fun(X)
    x <- sobolshap_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, 
                       return.shap = TRUE, n.limit = 2000)
    tell(x,Y)
    library(ggplot2)
    library(reshape2)
    ggplot(x, type.multout="lines")
  
# Test case: the non-monotonic Sobol g-function
    # Example with a call to a numerical model
    # First compute first-order indices with ranking
    n <- 1000
    X <- data.frame(matrix(runif(8 * n), nrow = n))
    x <- sobolshap_knn(model = sobol.fun, X = X, U = 1, method = "rank")
    print(x)
    library(ggplot2)
    ggplot(x)
    # We can use the output sample generated for this estimation to compute 
    # total indices without additional calls to the model
    x2 <- sobolshap_knn(model = NULL, X = X, U = 0, method = "knn", n.knn = 5)
    tell(x2,x$y)
    ggplot(x2)
    
    # Test case: the Ishigami function
    # Example with given data and the use of approximate nearest neighbour search
    library(RANN)
    n <- 5000
    X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
    Y <- ishigami.fun(X)
    x <- sobolshap_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, 
                       return.shap = TRUE, n.limit = 2000)
    tell(x,Y)
    library(ggplot2)
    ggplot(x)
    # We can also extract first-order and total Sobol indices
    x1 <- extract(x)
    print(x1)
    
    # Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling
    # See Iooss and Prieur (2019)
    library(mvtnorm) # Multivariate Gaussian variables
    library(whitening) # For scaling
    modlin <- function(X) apply(X,1,sum)
    d <- 3
    n <- 10000
    mu <- rep(0,d)
    sig <- c(1,1,2)
    ro <- 0.9
    Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
    Covmat <- ( sig %*% t(sig) ) * Cormat
    Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)
    X <- Xall(n)
    x <- sobolshap_knn(model = modlin, X = X, U = NULL, method = "knn", n.knn = 5, 
                       return.shap = TRUE, rescale = TRUE, n.limit = 2000)
    print(x)
    
    # Test case: functional toy fct 'Arctangent temporal function'
    n <- 3000
    X <- data.frame(matrix(runif(2*n,-7,7), nrow = n))
    Y <- atantemp.fun(X)
    x <- sobolshap_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, 
                       return.shap = TRUE, n.limit = 2000)
    tell(x,Y)
    library(ggplot2)
    library(reshape2)
    ggplot(x, type.multout="lines")

Estimation of Sobol' First Order Indices with B-spline Smoothing

Description

Determines the Si coefficient for singular parameters through B-spline smoothing with roughness penalty.

Usage

	sobolSmthSpl(Y, X)
sobolSmthSpl(Y, X)

Arguments

`Y`	vector of model responses.
`X`	matrix having as rows the input vectors corresponding to the responses in Y.

Details

WARNING: This function can give bad results for reasons that have not been yet investigated.

Value

sobolSmthSpl returns a list of class "sobolSmthSpl" containing the following components:

`call`	the matched call.
`X`	the provided input matrix.
`Y`	the provided matrix of model responses.
`S`	a matrix having the following columns: Si (the estimated first order Sobol' indices), Si.e (the standard errors for the estimated first order Sobol' indices) and q0.05 (the 0.05 quantiles assuming for the Si indices Normal distributions centred on the Si estimates and with standard deviations the calculated standard errors)

Author(s)

Filippo Monari

References

Saltelli, A; Ratto, M; Andres, T; Campolongo, F; Cariboni, J; Gatelli, D; Saisana, M & Tarantola, S. Global Sensitivity Analysis: The Primer Wiley-Interscience, 2008

M Ratto and A. Pagano, 2010, Using recursive algorithms for the efficient identification of smoothing spline ANOVA models, Advances in Statistical Analysis, 94, 367–388.

Examples

	X = matrix(runif(5000), ncol = 10)
	Y = sobol.fun(X)
	sa = sobolSmthSpl(Y, X)
	plot(sa)
X = matrix(runif(5000), ncol = 10)
	Y = sobol.fun(X)
	sa = sobolSmthSpl(Y, X)
	plot(sa)

Liu and Owen Estimation of Total Interaction Indices

Description

sobolTIIlo implements the asymptotically efficient formula of Liu and Owen (2006) for the estimation of total interaction indices as described e.g. in Section 3.4 of Fruth et al. (2014). Total interaction indices (TII) are superset indices of pairs of variables, thus give the total influence of each second-order interaction. The total cost of the method is $(1+N+\choose(N,2)) \times n$ where $N$ is the number of indices to estimate. Asymptotic confidence intervals are provided. Via plotFG (which uses functions of the package igraph), the TIIs can be visualized in a so-called FANOVA graph as described in section 2.2 of Muehlenstaedt et al. (2012).

Usage

sobolTIIlo(model = NULL, X1, X2, conf = 0.95, ...)
## S3 method for class 'sobolTIIlo'
tell(x, y = NULL, ...)
## S3 method for class 'sobolTIIlo'
print(x, ...)
## S3 method for class 'sobolTIIlo'
plot(x, ylim = NULL, ...)
## S3 method for class 'sobolTIIlo'
ggplot(data, mapping = aes(), ylim = NULL, ..., environment
                 = parent.frame())
## S3 method for class 'sobolTIIlo'
plotFG(x)
sobolTIIlo(model = NULL, X1, X2, conf = 0.95, ...)
## S3 method for class 'sobolTIIlo'
tell(x, y = NULL, ...)
## S3 method for class 'sobolTIIlo'
print(x, ...)
## S3 method for class 'sobolTIIlo'
plot(x, ylim = NULL, ...)
## S3 method for class 'sobolTIIlo'
ggplot(data, mapping = aes(), ylim = NULL, ..., environment
                 = parent.frame())
## S3 method for class 'sobolTIIlo'
plotFG(x)

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`conf`	the confidence level for asymptotic confidence intervals, defaults to 0.95.
`x`	a list of class `"sobolTIIlo"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolTIIlo"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.
`ylim`	optional, the y limits of the plot.

Value

sobolTIIlo returns a list of class "sobolTIIlo", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a vector of model responses.
`V`	the estimation of the overall variance.
`tii.unscaled`	the unscaled estimations of the TIIs.
`tii.scaled`	the scaled estimations of the TIIs together with asymptotic confidence intervals.

Author(s)

Jana Fruth

References

R. Liu, A. B. Owen, 2006, Estimating mean dimensionality of analysis of variance decompositions, JASA, 101 (474), 712–721.

J. Fruth, O. Roustant, S. Kuhnt, 2014, Total interaction index: A variance-based sensitivity index for second-order interaction screening, J. Stat. Plan. Inference, 147, 212–223.

T. Muehlenstaedt, O. Roustant, L. Carraro, S. Kuhnt, 2012, Data-driven Kriging models based on FANOVA-decomposition, Stat. Comput., 22 (3), 723–738.

Examples

# Test case : the Ishigami function

# The method requires 2 samples
n <- 1000
X1 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n))
X2 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n))

# sensitivity analysis (the true values of the scaled TIIs are 0, 0.244, 0)
x <- sobolTIIlo(model = ishigami.fun, X1 = X1, X2 = X2)
print(x)

# plot of tiis and FANOVA graph
plot(x)

library(ggplot2)
ggplot(x)


library(igraph)
plotFG(x)



# Test case : the Ishigami function

# The method requires 2 samples
n <- 1000
X1 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n))
X2 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n))

# sensitivity analysis (the true values of the scaled TIIs are 0, 0.244, 0)
x <- sobolTIIlo(model = ishigami.fun, X1 = X1, X2 = X2)
print(x)

# plot of tiis and FANOVA graph
plot(x)

library(ggplot2)
ggplot(x)


library(igraph)
plotFG(x)

Pick-freeze Estimation of Total Interaction Indices

Description

sobolTIIpf implements the pick-freeze estimation of total interaction indices as described in Section 3.3 of Fruth et al. (2014). Total interaction indices (TII) are superset indices of pairs of variables, thus give the total influence of each second-order interaction. The pick-freeze estimation enables the strategy to reuse evaluations of Saltelli (2002). The total costs are $(1+N) \times n$ where $N$ is the number of indices to estimate. Via plotFG, the TIIs can be visualized in a so-called FANOVA graph as described in section 2.2 of Muehlenstaedt et al. (2012).

Usage

sobolTIIpf(model = NULL, X1, X2, ...)
## S3 method for class 'sobolTIIpf'
tell(x, y = NULL, ...)
## S3 method for class 'sobolTIIpf'
print(x, ...)
## S3 method for class 'sobolTIIpf'
plot(x, ylim = NULL, ...)
## S3 method for class 'sobolTIIpf'
ggplot(data, mapping = aes(), ylim = NULL, ..., environment
                 = parent.frame())
## S3 method for class 'sobolTIIpf'
plotFG(x)
sobolTIIpf(model = NULL, X1, X2, ...)
## S3 method for class 'sobolTIIpf'
tell(x, y = NULL, ...)
## S3 method for class 'sobolTIIpf'
print(x, ...)
## S3 method for class 'sobolTIIpf'
plot(x, ylim = NULL, ...)
## S3 method for class 'sobolTIIpf'
ggplot(data, mapping = aes(), ylim = NULL, ..., environment
                 = parent.frame())
## S3 method for class 'sobolTIIpf'
plotFG(x)

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`x`	a list of class `"sobolTIIpf"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"sobolTIIpf"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called.
`ylim`	optional, the y limits of the plot.

Value

sobolTIIpf returns a list of class "sobolTIIpf", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	a vector of model responses.
`V`	the estimation of the overall variance.
`tii.unscaled`	the unscaled estimations of the TIIs together.
`tii.scaled`	the scaled estimations of the TIIs.

Author(s)

Jana Fruth

References

J. Fruth, O. Roustant, S. Kuhnt, 2014, Total interaction index: A variance-based sensitivity index for second-order interaction screening, J. Stat. Plan. Inference, 147, 212–223.

A. Saltelli, 2002, Making best use of model evaluations to compute sensitivity indices, Comput. Phys. Commun., 145, 580-297.

T. Muehlenstaedt, O. Roustant, L. Carraro, S. Kuhnt, 2012, Data-driven Kriging models based on FANOVA-decomposition, Stat. Comput., 22 (3), 723–738.

Examples

# Test case : the Ishigami function

# The method requires 2 samples
n <- 1000
X1 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n))
X2 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n))

# sensitivity analysis (the true values are 0, 0.244, 0)
x <- sobolTIIpf(model = ishigami.fun, X1 = X1, X2 = X2)
print(x)

# plot of tiis and FANOVA graph
plot(x)

library(ggplot2)
ggplot(x)


library(igraph)
plotFG(x)

# Test case : the Ishigami function

# The method requires 2 samples
n <- 1000
X1 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n))
X2 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n))

# sensitivity analysis (the true values are 0, 0.244, 0)
x <- sobolTIIpf(model = ishigami.fun, X1 = X1, X2 = X2)
print(x)

# plot of tiis and FANOVA graph
plot(x)

library(ggplot2)
ggplot(x)


library(igraph)
plotFG(x)

Monte Carlo Estimation of Sobol' Indices (formulas of Martinez (2011) and Touati (2016))

Description

soboltouati implements the Monte Carlo estimation of the Sobol' indices for both first-order and total indices using correlation coefficients-based formulas, at a total cost of $(p+2) \times n$ model evaluations. These are called the Martinez estimators. It also computes their confidence intervals based on asymptotic properties of empirical correlation coefficients.

Usage

soboltouati(model = NULL, X1, X2, conf = 0.95, ...)
## S3 method for class 'soboltouati'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'soboltouati'
print(x, ...)
## S3 method for class 'soboltouati'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'soboltouati'
ggplot(data,  mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())
soboltouati(model = NULL, X1, X2, conf = 0.95, ...)
## S3 method for class 'soboltouati'
tell(x, y = NULL, return.var = NULL, ...)
## S3 method for class 'soboltouati'
print(x, ...)
## S3 method for class 'soboltouati'
plot(x, ylim = c(0, 1), ...)
## S3 method for class 'soboltouati'
ggplot(data,  mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function, or a model with a `predict` method, defining the model to analyze.
`X1`	the first random sample.
`X2`	the second random sample.
`conf`	the confidence level for confidence intervals, or zero to avoid their computation if they are not needed.
`x`	a list of class `"soboltouati"` storing the state of the sensitivity study (parameters, data, estimates).
`data`	a list of class `"soboltouati"` storing the state of the sensitivity study (parameters, data, estimates).
`y`	a vector of model responses.
`return.var`	a vector of character strings giving further internal variables names to store in the output object `x`.
`ylim`	y-coordinate plotting limits.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	any other arguments for `model` which are passed unchanged each time it is called

Details

Value

soboltouati returns a list of class "soboltouati", containing all the input arguments detailed before, plus the following components:

`call`	the matched call.
`X`	a `data.frame` containing the design of experiments.
`y`	the response used
`V`	the estimations of normalized variances of the Conditional Expectations (VCE) with respect to each factor and also with respect to the complementary set of each factor ("all but $X_i$ ").
`S`	the estimations of the Sobol' first-order indices.
`T`	the estimations of the Sobol' total sensitivity indices.

Author(s)

Taieb Touati, Khalid Boumhaout

References

T. Touati, 2016, Confidence intervals for Sobol' indices. Proceedings of the SAMO 2016 Conference, Reunion Island, France, December 2016.

T. Touati, 2017, Intervalles de confiance pour les indices de Sobol, 49emes Journees de la SFdS, Avignon, France, Juin 2017.

Examples

# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- soboltouati(model = sobol.fun, X1, X2)
print(x)
plot(x)

library(ggplot2)
ggplot(x)
# Test case : the non-monotonic Sobol g-function

# The method of sobol requires 2 samples
# There are 8 factors, all following the uniform distribution
# on [0,1]

library(boot)
n <- 1000
X1 <- data.frame(matrix(runif(8 * n), nrow = n))
X2 <- data.frame(matrix(runif(8 * n), nrow = n))

# sensitivity analysis

x <- soboltouati(model = sobol.fun, X1, X2)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

Squared integral estimate

Description

This function provides two estimators of a squared expectation. The first one, naive, is the square of the sample mean. It is positively biased. The second one is a U-statistics, and unbiased. The two are equivalent for large sample sizes.

Usage

squaredIntEstim(x, method = "unbiased")
squaredIntEstim(x, method = "unbiased")

Arguments

`x`	A vector of observations supposed to be drawn independently from a square integrable random variable
`method`	If "unbiased", computes the U-statistics, otherwise the square of the sample mean is computed

Details

Let X1, ..., Xn be i.i.d. random variables. The aim is to estimate t = E(Xi)^2. The naive estimator is the square of the sample mean: T1 = [(X1 + ... + Xn)/n]^2. It is positively biased, and the bias is equal to s^2/n, where s^2 = var(X1). The U-statistics estimator is the average of Xi * Xj over all unordered pairs (i,j). Equivalently, it is equal to T1 minus the (unbiased) sample variance divided by n.

Value

A real number, corresponding to the estimated value of the squared integral.

Author(s)

O. Roustant

References

O. Roustant, F. Gamboa and B. Iooss, Parseval inequalities and lower bounds for variance-based sensitivity indices, Electronic Journal of Statistics, 14:386-412, 2020

Van der Vaart, A. W. Asymptotic statistics. Vol. 3. Cambridge university press, 2000.

Examples

n <- 100  # sample size
nsim <- 100 # number of simulations
mu <- 0

T <- Tunb <- rep(NA, nsim)
theta <- mu^2  # E(X)^2, with X following N(mu, 1)

for (i in 1:nsim){
  x <- rnorm(n, mean = mu, sd = 1)
  T[i] <- squaredIntEstim(x, method = "biased")
  Tunb[i] <- squaredIntEstim(x, method = "unbiased")
}

par(mfrow = c(1, 1))
boxplot(cbind(T, Tunb))
abline(h = theta, col = "red")
abline(h = c(mean(T), mean(Tunb)), col = c("blue", "cyan"), lty = "dotted")
# look at the difference between median and mean
n <- 100  # sample size
nsim <- 100 # number of simulations
mu <- 0

T <- Tunb <- rep(NA, nsim)
theta <- mu^2  # E(X)^2, with X following N(mu, 1)

for (i in 1:nsim){
  x <- rnorm(n, mean = mu, sd = 1)
  T[i] <- squaredIntEstim(x, method = "biased")
  Tunb[i] <- squaredIntEstim(x, method = "unbiased")
}

par(mfrow = c(1, 1))
boxplot(cbind(T, Tunb))
abline(h = theta, col = "red")
abline(h = c(mean(T), mean(Tunb)), col = c("blue", "cyan"), lty = "dotted")
# look at the difference between median and mean

Standardized Regression Coefficients

Description

src computes the Standardized Regression Coefficients (SRC), or the Standardized Rank Regression Coefficients (SRRC), which are sensitivity indices based on linear or monotonic assumptions in the case of independent factors.

Usage

src(X, y, rank = FALSE, logistic = FALSE, nboot = 0, conf = 0.95)
## S3 method for class 'src'
print(x, ...)
## S3 method for class 'src'
plot(x, ylim = c(-1,1), ...)
## S3 method for class 'src'
ggplot(data,  mapping = aes(), ylim = c(-1, 1), ..., environment
                 = parent.frame())
src(X, y, rank = FALSE, logistic = FALSE, nboot = 0, conf = 0.95)
## S3 method for class 'src'
print(x, ...)
## S3 method for class 'src'
plot(x, ylim = c(-1,1), ...)
## S3 method for class 'src'
ggplot(data,  mapping = aes(), ylim = c(-1, 1), ..., environment
                 = parent.frame())

Arguments

`X`	a data frame (or object coercible by `as.data.frame`) containing the design of experiments (model input variables).
`y`	a vector containing the responses corresponding to the design of experiments (model output variables).
`rank`	logical. If `TRUE`, the analysis is done on the ranks.
`logistic`	logical. If `TRUE`, the analysis is done via a logistic regression (binomial GLM).
`nboot`	the number of bootstrap replicates.
`conf`	the confidence level of the bootstrap confidence intervals.
`x`	the object returned by `src`.
`data`	the object returned by `src`.
`ylim`	the y-coordinate limits of the plot.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	arguments to be passed to methods, such as graphical parameters (see `par`).

Details

Logistic regression model (logistic = TRUE) and rank-based indices (rank = TRUE) are incompatible.

Value

src returns a list of class "src", containing the following components:

`call`	the matched call.
`SRC`	a data frame containing the estimations of the SRC indices, bias and confidence intervals (if `rank = FALSE`).
`SRRC`	a data frame containing the estimations of the SRRC indices, bias and confidence intervals (if `rank = TRUE`).

Author(s)

Gilles Pujol and Bertrand Iooss

References

A. Saltelli, K. Chan and E. M. Scott eds, 2000, Sensitivity Analysis, Wiley.

Examples


# a 100-sample with X1 ~ U(0.5, 1.5)
#                   X2 ~ U(1.5, 4.5)
#                   X3 ~ U(4.5, 13.5)

library(boot)
n <- 100
X <- data.frame(X1 = runif(n, 0.5, 1.5),
                X2 = runif(n, 1.5, 4.5),
                X3 = runif(n, 4.5, 13.5))

# linear model : Y = X1 + X2 + X3

y <- with(X, X1 + X2 + X3)

# sensitivity analysis

x <- src(X, y, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)
# a 100-sample with X1 ~ U(0.5, 1.5)
#                   X2 ~ U(1.5, 4.5)
#                   X3 ~ U(4.5, 13.5)

library(boot)
n <- 100
X <- data.frame(X1 = runif(n, 0.5, 1.5),
                X2 = runif(n, 1.5, 4.5),
                X3 = runif(n, 4.5, 13.5))

# linear model : Y = X1 + X2 + X3

y <- with(X, X1 + X2 + X3)

# sensitivity analysis

x <- src(X, y, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)

Support index functions: Measuring the effect of input variables over their support

Description

Function to estimate the first-order and total support index functions (Fruth et al., 2016).

Usage

support(model, X, Xnew = NULL, fX = NULL, gradfX = NULL, h = 1e-06, ...)
support(model, X, Xnew = NULL, fX = NULL, gradfX = NULL, h = 1e-06, ...)

Arguments

`model`	a function, or a model with a predict method, defining the model to analyze.
`X`	a random sample.
`Xnew`	an optional set of points where to visualize the support indices. If missing, `X` is used.
`fX`	an optional vector containing the evaluations of `model` at `X`. If missing, `fX` is computed by evaluating `model` at `X`.
`gradfX`	an optional vector containing the evaluations of the gradient of `model` at `X`. If missing, `gradfX` is approximated by finite differences of `model` at `X`.
`h`	a small number for computing finite differences `(f(X_i + h) - f(X_i))/h`. Default is `1e-6`.
`...`	optional arguments to be passed to `model`.

Details

The first-order support index of f(X) relative to X_i is the squared conditional expectation of its partial derivative with respect to X_i.

The total support index of f(X) relative to X_i is the conditional expectation of its squared partial derivative with respect to X_i.

These two functions measure the local influence of X_i, in the global space of the other input variables. Up to square transformations, support indices can be viewed as regression curves of partial derivatives df(X)/dX_i with respect to X_i. Estimation is performed by smoothing from the diagonal scatterplots (X_i, df/dX_i) with the function smooth.spline{stats} with the default options.

For the sake of comparison, support index functions may be normalized. The proposed normalization is the sum of the DGSM, equal to the sum of the overall means of total support functions. Normalized support index functions can be plotted with the S3 method plot, as well as the underlying diagonal scatterplots of derivatives (S3 method scatterplot).

Value

`main`	a matrix whose columns contain the first-order support index functions, estimated at `Xnew`.
`total`	a matrix whose columns contain the total support index functions, estimated at `Xnew`.
`DGSM`	a vector containing an estimation of DGSM.
`X`	...
`Xnew`	...
`fX`	...
`gradfX`	... see 'arguments' section.

Author(s)

O. Roustant

References

J. Fruth, O. Roustant, S. Kuhnt, 2019, Support indices: Measuring the effects of input variables over their support, Reliability Engineering and System Safety, 187:17-27.

Examples



# -----------------
# ishigami function
# -----------------
n <- 5000
n.points <- 1000
d <- 3

set.seed(0)
X <- matrix(runif(d*n, min = -pi, max = pi), n, d)
Xnew <- matrix(seq(from = -pi, to = pi, length=n.points), n.points, d)

b <- support(model = ishigami.fun, X, Xnew)

# plot method (x-axis in probability scale), of the normalized support index functions
plot(b, col = c("lightskyblue4", "lightskyblue1", "black"), 
     xprob = TRUE, p = 'punif', p.arg = list(min = -pi, max = pi), ylim = c(0, 2))

# below : diagonal scatterplots of the gradient, 
# on which are based the estimation by smoothing
scatterplot(b, xprob = TRUE) 

# now with normal margins
# -----------------------
X <- matrix(rnorm(d*n), n, d)
Xnew <- matrix(rnorm(d*n.points), n.points, d)
b <- support(model = ishigami.fun, X, Xnew)

plot(b, col = c("lightskyblue4", "lightskyblue1", "black"), xprob = FALSE)
scatterplot(b, xprob = FALSE, type = "histogram", bins = 10, cex = 1, cex.lab = 1.5)

# -----------------
# ishigami function
# -----------------
n <- 5000
n.points <- 1000
d <- 3

set.seed(0)
X <- matrix(runif(d*n, min = -pi, max = pi), n, d)
Xnew <- matrix(seq(from = -pi, to = pi, length=n.points), n.points, d)

b <- support(model = ishigami.fun, X, Xnew)

# plot method (x-axis in probability scale), of the normalized support index functions
plot(b, col = c("lightskyblue4", "lightskyblue1", "black"), 
     xprob = TRUE, p = 'punif', p.arg = list(min = -pi, max = pi), ylim = c(0, 2))

# below : diagonal scatterplots of the gradient, 
# on which are based the estimation by smoothing
scatterplot(b, xprob = TRUE) 

# now with normal margins
# -----------------------
X <- matrix(rnorm(d*n), n, d)
Xnew <- matrix(rnorm(d*n.points), n.points, d)
b <- support(model = ishigami.fun, X, Xnew)

plot(b, col = c("lightskyblue4", "lightskyblue1", "black"), xprob = FALSE)
scatterplot(b, xprob = FALSE, type = "histogram", bins = 10, cex = 1, cex.lab = 1.5)

Replace Values in a Template Text

Description

template.replace replaces keys within special markups with values in a so-called template file. Pieces of R code can be put into the markups of the template file, and are evaluated during the replacement.

Usage

template.replace(text, replacement, eval = FALSE,
                 key.pattern = NULL, code.pattern = NULL)
template.replace(text, replacement, eval = FALSE,
                 key.pattern = NULL, code.pattern = NULL)

Arguments

`text`	vector of character strings, the template text.
`replacement`	the list values to replace in `text`.
`eval`	boolean, `TRUE` if the code within `code.pattern` has to be evaluated, `FALSE` otherwise.
`key.pattern`	custom pattern for key replacement (see below)
`code.pattern`	custom pattern for code replacement (see below)

Details

In most cases, a computational code reads its inputs from a text file. A template file is like an input file, but where some missing values, identified with generic keys, will be replaced by specific values.

By default, the keys are enclosed into markups of the form $(KEY).

Code to be interpreted with R can be put in the template text. Pieces of code must be enclosed into markups of the form @{CODE}. This is useful for example for formating the key values (see example). For interpreting the code, set eval = TRUE.

Users can define custom patterns. These patterns must be perl-compatible regular expressions (see regexpr. The default ones are:

key.pattern = "\\$\\(KEY\\)"
code.pattern = "@\\{CODE\\}"

Note that special characters have to be escaped both (one for perl, one for R).

Author(s)

Gilles Pujol

Examples

txt <- c("Hello $(name)!", "$(a) + $(b) = @{$(a)+$(b)}",
         "pi = @{format(pi,digits=5)}")
replacement <- list(name = "world", a = 1, b = 2)
# 1. without code evaluation:
txt.rpl1 <- template.replace(txt, replacement)
print(txt.rpl1)
# 2. with code evalutation:
txt.rpl2 <- template.replace(txt, replacement, eval = TRUE)
print(txt.rpl2)
txt <- c("Hello $(name)!", "$(a) + $(b) = @{$(a)+$(b)}",
         "pi = @{format(pi,digits=5)}")
replacement <- list(name = "world", a = 1, b = 2)
# 1. without code evaluation:
txt.rpl1 <- template.replace(txt, replacement)
print(txt.rpl1)
# 2. with code evalutation:
txt.rpl2 <- template.replace(txt, replacement, eval = TRUE)
print(txt.rpl2)

Tests of Independence based on the Hilbert-Schmidt Independence Criterion (HSIC)

Description

testHSIC allows to test independence among all input-output pairs $(Xi, Y)$ after a preliminary sensitivity analysis based on HSIC indices. testHSIC takes an object of class sensiHSIC (produced by a prior call to the function sensiHSIC that estimates HSIC indices) and it returns the estimated p-values after testing independence among all input-output pairs. For each input-output pair, having access to the p-value helps the user decide whether the null hypothesis $H0$ : " $Xi$ and $Y$ are independent" must be accepted or rejected. If the kernels selected in sensiHSIC are all characteristic, $H0$ can be rewritten " $HSIC(Xi, Y)=0$ " and this paves the way to several test procedures.

Depending on the sample size and the chosen test statistic (either a U-statistic or a V-statistic), there are up to four different methods to test $H0$ . The asymptotic test is recommended when the sample size $n$ is around a few hundreds (or more). When $n$ is smaller, a permutation-based test must be considered instead. As a general rule, permutation-based tests can always be applied but a much heavier computational load is to be expected. However, if HSIC indices were initially estimated with V-statistics, the Gamma test is a parametric method that offers an enticing tradeoff.

Usage

testHSIC(sensi, test.method = "Asymptotic", B = 3000,
         seq.options = list(criterion = "screening", alpha = 0.05,
                            Bstart = 200, Bfinal = 5000, Bbatch = 100, 
                            Bconv = 200, graph = TRUE) )

## S3 method for class 'testHSIC'
print(x, ...)

## S3 method for class 'testHSIC'
plot(x, ylim = c(0, 1), err, ...)
testHSIC(sensi, test.method = "Asymptotic", B = 3000,
         seq.options = list(criterion = "screening", alpha = 0.05,
                            Bstart = 200, Bfinal = 5000, Bbatch = 100, 
                            Bconv = 200, graph = TRUE) )

## S3 method for class 'testHSIC'
print(x, ...)

## S3 method for class 'testHSIC'
plot(x, ylim = c(0, 1), err, ...)

Arguments

`sensi`	An object of class `"sensiHSIC"` which is produced by a prior call to the function `sensiHSIC`. In particular, `sensi` must contain objects named `"KX"` (3D-array filled with all input Gram matrices), `"KY"` (output Gram matrix), `"HSICXY"` (estimated HSIC indices) and `"estimator.type"` (either `"U-stat"` or `"V-stat"`). In addition, if `sensi` results from a conditional sensitivity analysis, `sensi` must also contain objects named `"cond"` (list of options describing the conditioning event) and `"weights"` (normalized conditioning weights).
`test.method`	A string specifying the numerical procedure used to estimate the p-values of the HSIC-based independence tests. Available procedure include `"Asymptotic"` (asymptotic test), `"Permutation"` (permutation-based test), `"Seq_Permutation"` (sequential permutation-based test) and `"Gamma"` (Gamma test). If `sensi` contains V-statistics, the asymptotic test (resp. the Gamma test) is recommended for large (resp. small) sample sizes. Otherwise, permutation-based tests can be used as well. If `sensi` contains U-statistics, the Gamma test must not be employed. The asymptotic test is recommended for large sample sizes. Otherwise, permutation-based tests can be used as well.
`B`	Number of random permutations carried out on the output samples before the non-parametric estimation of p-values. Only relevant if `test.method="Permutation"`.
`seq.options`	A list of options guiding the sequential procedure. Only relevant if `test.method="Seq_Permutation"`. `criterion` is a string specifying the stopping criterion. Available criteria include `"screening"` (permutations stop as soons as the estimated p-values have sufficiently converged so that they can be compared to the reference threshold `alpha`), `"ranking"` (permutations stop as soon as the estimated p-values have sufficiently converged so that they can be ranked) and `"both"` (permutations stop as soon as the two previous criteria are fulfilled). `alpha` is a scalar value (between $0$ and $1$ ) specifying the type I error (probability of wrongly accepting $H0$ ). Only relevant if `criterion` is `"screening"` or `"both"`. `Bstart` is the initial number of random permutations before the first criterion check. `Bfinal` is the maximum number of random permutations. `Bbatch` is the number of permutations at each new iteration of the sequential procedure. `Bconv` is the number of permutations that is used to determine whether convergence has already occured or not. For `criterion="screening"`, convergence is assumed to be reached if the positions of the estimated p-values with respect to `alpha` no longer evolve after the `Bconv` latest permutations. For `criterion="ranking"`, convergence is assumed to be reached if the rankings of the estimated p-values no longer evolve after the `Bconv` latest permutations. `graph` is a boolean indicating whether the estimated p-values have to be plotted against the number of permutations.
`x`	An object of class `"testHSIC"` storing the parameters and results of independence testing.
`ylim`	A vector of two values specifying the y-coordinate plotting limits.
`err`	A scalar value (between $0$ and $1$ ) specifying the reference type I error. This value is used to plot a vertical line.
`...`	Additional options.

Details

Why and how to properly choose kernels?

For a given input-output pair of variables, the Hilbert-Schmidt independence criterion (HSIC) is a dissimilarity measure between the joint bivariate distribution and the product of marginal distributions. Dissimilarity between those two distributions is measured through the squared norm of the distance between their respective embeddings in a reproducing kernel Hilbert space (RKHS) that directly depends on the selected input kernel $Ki$ and the selected output kernel $KY$ .

It must always be kept in mind that this criterion allows to detect independence within the pair $(Xi, Y)$ provided that the two kernels are characteristic.

If both kernels are characteristic, $H0$ : " $Xi$ and $Y$ are independent" is equivalent to $H0$ : " $HSIC(Xi, Y)=0$ " and any estimator of $HSIC(Xi, Y)$ emerges as a relevant test statistic.
If they are not, testing $H0$ : " $HSIC(Xi, Y)=0$ " is no longer sufficient for testing $H0$ : " $Xi$ and $Y$ are independent".

The reader is referred to Fukumizu et al. (2004) for the mathematical definition of a characteristic kernel and to Sriperumbur et al. (2010) for an overview of the major related results.

Responsability for kernel selection is left to the user while calling the function sensiHSIC. Let us simply recall that:

The Gaussian kernel, the exponential kernel, the Matern $3/2$ kernel and the Matern $5/2$ kernel (all defined on $R^2$ ) are characteristic. They remain characteristic when they are restricted to a compact domain $D$ within $R^2$ .
The transformed versions of the four abovementioned kernels (all defined on $[0,1]^2$ ) are characteristic.
All Sobolev kernels (defined on $[0,1]^2$ ) are characteristic.
The categorical kernel (defined on any discrete probability space) is characteristic.

Which test method is most appropriate?

The test statistic for the pair $(Xi, Y)$ is either the U-statistic or the V-statistic associated to $HSIC(Xi, Y)$ .

If a V-statistic was used in sensiHSIC, four different test methods can be considered.

The asymptotic test can be used if the sample size $n$ is large enough (at least a hundred of samples). The asymptotic distribution of the test statistic is approximated by a Gamma distribution whose parameters are estimated with the method of moments. See Gretton et al. (2007) for more details about how to estimate the first two moments of the asymptotic Gamma distribution.
The permutation-based test is more expensive in terms of computational cost but it can be used whatever the sample size $n$ is. The initial output samples (stored in the object of class sensiHSIC) are randomly permuted $B$ times and the test statistic is recomputed as many times. This allows to simulate $B$ observations of the test statistic under $H0$ and to estimate the p-value in a non-parametric way. See Meynaoui (2019) for more details on how to correctly estimate the p-value in order to preserve the expected level of the test.
The sequential permutation-based test is a goal-oriented variant of the previous test. The main idea is to reduce the computational cost by stopping permutations as soon as the estimation of the p-value has sufficiently converged so that it can be compared to a reference threshold or be given a final ranking. See El Amri and Marrel (2022) for more details on how to implement this sequential approach for the three stopping criteria (namely "ranking", "screening" or "both").
The Gamma test is a parametric alternative to permutation-based tests when $n$ is not large enough to resort to the asymptotic test. The permutation-based test reveals the test statistic under $H0$ follows a unimodal distribution having significant positive skewness. Thus, it seems quite natural to estimate the p-value with a Gamma distribution, especially in view of the fact that the asymptotic distribution is properly approximated by this parametric family. See El Amri and Marrel (2021) for more details on how to estimate the parameters of the Gamma distribution with the method of moments. In particular, the first two moments of the test statistic under $H0$ are computed thanks to the formulas that were initially provided in Kazi-Aoual et al. (1995).

If a U-statistic was used in sensiHSIC, the estimated value of $HSIC(Xi,Y)$ may be negative.

The asymptotic test can no longer be conducted with a Gamma distribution (whose support is limited to $[0,+\infty[$ ). It is replaced by a Pearson III distribution (which is a left-shifted Gamma distribution).
The permutation-based test and the sequential permutation-based test can be applied directly.
The Gamma test has no longer any theoretical justification.

What about target and conditional HSIC indices?

In Marrel and Chabridon (2021), HSIC indices were adapted to target sensitivity analysis (thus becoming T-HSIC indices) and to conditional sensitivity analysis (thus becoming C-HSIC indices). Tests of independence can still be useful after estimating T-HSIC indices or C-HSIC indices.

For T-HSIC indices, the null hypothesis is $H0$ : " $Xi$ and $w(Y)$ are independent" where $w$ is the weight function selected in target and passed to the function $sensiHSIC$ . Everything works just as for basic HSIC indices (apart from the fact that $w$ is applied on the original output variable $Y$ ). Available test methods include "Asymptotic", "Permutation", "Seq_Permutation" and "Gamma" (for V-statistics only).
For C-HSIC indices, the null hypothesis is $H0$ : " $Xi$ and $Y$ are independent if the event described in cond occurs". In this specific context, testing conditional independence is only relevant if the weight function is an indicator function. For this reason, if conditional independence has to be tested, the user must select type="indicTh" in cond while calling the function sensiHSIC. Let us recall that only V-statistic estimators can be used for C-HSIC indices. As a result, available test methods include "Asymptotic", "Permutation", "Seq_Permutation" and "Gamma".

Value

testHSIC returns a list of class "testHSIC". It contains test.method, B (for the permutation-based test), seq.options (for the sequential permutation-based test) and the following objects:

`call`	The matched call.
`pval`	The estimated p-values after testing independence for all input-output pairs.
`prop`	A vector of two strings. The first string indicates if the chosen test method is asymptotic or non-asymptotic. The second string indicates if the chosen test method is parametric or non-parametric.
`family`	Only if `test.method` is `"Asymptotic"` or `"Gamma"`. A string indicating the parametric family used to estimate p-values.
`param`	Only if `test.method` is `"Asymptotic"` or `"Gamma"`. A $2$ -column (resp. $3$ -column) matrix containing the parameters of the Gamma (resp. Pearson III) distributions used to estimate p-values.
`Hperm`	Only if `test.method="Permutation"`. A $B$ -column matrix containing simulated values of the test statistics after randomly permuting the output samples. Each column in `Hperm` corresponds to one random permutation.
`paths`	Only if `test.method="Seq_Permutation"`. A matrix containing all estimated p-values over the sequential test procedure. The $i$ -th row provides all estimates of the $i$ -th p-value as the number of permutations increases. If one row ends with a sequence of missing values `NA`, it means permutations were stopped earlier for this input variable. This can only happen if `test.method=screening`.

Author(s)

Sebastien Da Veiga, Amandine Marrel, Anouar Meynaoui, Reda El Amri and Gabriel Sarazin.

References

El Amri, M. R. and Marrel, A. (2022), Optimized HSIC-based tests for sensitivity analysis: application to thermalhydraulic simulation of accidental scenario on nuclear reactor, Quality and Reliability Engineering International, 38(3), 1386-1403.

El Amri, M. R. and Marrel, A. (2021), More powerful HSIC-based independence tests, extension to space-filling designs and functional data. https://cea.hal.science/cea-03406956/

Fukumizu, K., Bach, F. R. and Jordan, M. I. (2004), Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, Journal of Machine Learning Research, 5(Jan), 73-99.

Gretton, A., Fukumizu, K., Teo, C., Song, L., Scholkopf, B. and Smola, A. (2007), A kernel statistical test of independence, Advances in Neural Information Processing Systems, 20.

Kazi-Aoual, F., Hitier, S., Sabatier, R. and Lebreton, J. D. (1995), Refined approximations to permutation tests for multivariate inference, Computational Statistics & Data Analysis, 20(6), 643-656.

Meynaoui, A. (2019), New developments around dependence measures for sensitivity analysis: application to severe accident studies for generation IV reactors (Doctoral dissertation, INSA de Toulouse).

Examples

  

# Test case: the Ishigami function.

n <- 20   # very few input-output samples
p <- 3    # nb of input variables

########################################
### PRELIMINARY SENSITIVITY ANALYSIS ###
########################################

X <- matrix(runif(n*p), n, p)
sensi <- sensiHSIC(model=ishigami.fun, X)
print(sensi)
plot(sensi)
title("GSA for the Ishigami function")

#############################
### TESTS OF INDEPENDENCE ###
#############################

test.asymp <- testHSIC(sensi)

test.perm <- testHSIC(sensi, test.method="Permutation")

test.seq.screening <- testHSIC(sensi, test.method="Seq_Permutation")

test.seq.ranking <- testHSIC(sensi, test.method="Seq_Permutation", 
                             seq.options=list(criterion="ranking"))

test.seq.both <- testHSIC(sensi, test.method="Seq_Permutation", 
                          seq.options=list(criterion="both"))

test.gamma <- testHSIC(sensi, test.method="Gamma")

# comparison of p-values

res <- rbind( t(as.matrix(test.asymp$pval)), t(as.matrix(test.perm$pval)), 
              t(as.matrix(test.seq.screening$pval)), t(as.matrix(test.seq.ranking$pval)),
              t(as.matrix(test.seq.both$pval)), t(as.matrix(test.gamma$pval)) )

rownames(res) <- c("asymp", "perm", "seq_perm_screening", 
                   "seq_perm_ranking", "seq_perm_both", "gamma")
res

# Conclusion: n is too small for the asymptotic test.
# Take n=200 and all four test methods will provide very close p-values.

#####################
### VISUALIZATION ###
#####################

# simulated values of HSIC indices under H0 (random permutations)
Hperm <- t(unname(test.perm$Hperm))

for(i in 1:p){
  
  # histogram of the test statistic under H0 (random permutations)
  
  title <- paste0("Histogram of S", i, " = HSIC(X", i, ",Y)")
  
  hist(Hperm[,i], probability=TRUE,
       nclass=70, main=title, xlab="", ylab="", col="cyan")
  
  # asymptotic Gamma distribution
  
  shape.asymp <- test.asymp$param[i, "shape"]
  scale.asymp <- test.asymp$param[i, "scale"]
  
  xx <- seq(0, max(Hperm[,i]), length.out=200)
  dens.asymp <- dgamma(xx, shape=shape.asymp, scale=scale.asymp)
  
  lines(xx, dens.asymp, lwd=2, col="darkorchid")
  
  # finite-sample Gamma distribution
  
  shape.perm <- test.gamma$param[i, "shape"]
  scale.perm <- test.gamma$param[i, "scale"]
  
  dens.perm <- dgamma(xx, shape=shape.perm, scale=scale.perm)
  
  lines(xx, dens.perm, lwd=2, col="blue")
  
  all.cap <- c("Asymptotic Gamma distribution", "Finite-sample Gamma distribution")
  all.col <- c("darkorchid", "blue")
  
  legend("topright", legend=all.cap, col=all.col, lwd=2, y.intersp=1.3)

}
 
# Test case: the Ishigami function.

n <- 20   # very few input-output samples
p <- 3    # nb of input variables

########################################
### PRELIMINARY SENSITIVITY ANALYSIS ###
########################################

X <- matrix(runif(n*p), n, p)
sensi <- sensiHSIC(model=ishigami.fun, X)
print(sensi)
plot(sensi)
title("GSA for the Ishigami function")

#############################
### TESTS OF INDEPENDENCE ###
#############################

test.asymp <- testHSIC(sensi)

test.perm <- testHSIC(sensi, test.method="Permutation")

test.seq.screening <- testHSIC(sensi, test.method="Seq_Permutation")

test.seq.ranking <- testHSIC(sensi, test.method="Seq_Permutation", 
                             seq.options=list(criterion="ranking"))

test.seq.both <- testHSIC(sensi, test.method="Seq_Permutation", 
                          seq.options=list(criterion="both"))

test.gamma <- testHSIC(sensi, test.method="Gamma")

# comparison of p-values

res <- rbind( t(as.matrix(test.asymp$pval)), t(as.matrix(test.perm$pval)), 
              t(as.matrix(test.seq.screening$pval)), t(as.matrix(test.seq.ranking$pval)),
              t(as.matrix(test.seq.both$pval)), t(as.matrix(test.gamma$pval)) )

rownames(res) <- c("asymp", "perm", "seq_perm_screening", 
                   "seq_perm_ranking", "seq_perm_both", "gamma")
res

# Conclusion: n is too small for the asymptotic test.
# Take n=200 and all four test methods will provide very close p-values.

#####################
### VISUALIZATION ###
#####################

# simulated values of HSIC indices under H0 (random permutations)
Hperm <- t(unname(test.perm$Hperm))

for(i in 1:p){
  
  # histogram of the test statistic under H0 (random permutations)
  
  title <- paste0("Histogram of S", i, " = HSIC(X", i, ",Y)")
  
  hist(Hperm[,i], probability=TRUE,
       nclass=70, main=title, xlab="", ylab="", col="cyan")
  
  # asymptotic Gamma distribution
  
  shape.asymp <- test.asymp$param[i, "shape"]
  scale.asymp <- test.asymp$param[i, "scale"]
  
  xx <- seq(0, max(Hperm[,i]), length.out=200)
  dens.asymp <- dgamma(xx, shape=shape.asymp, scale=scale.asymp)
  
  lines(xx, dens.asymp, lwd=2, col="darkorchid")
  
  # finite-sample Gamma distribution
  
  shape.perm <- test.gamma$param[i, "shape"]
  scale.perm <- test.gamma$param[i, "scale"]
  
  dens.perm <- dgamma(xx, shape=shape.perm, scale=scale.perm)
  
  lines(xx, dens.perm, lwd=2, col="blue")
  
  all.cap <- c("Asymptotic Gamma distribution", "Finite-sample Gamma distribution")
  all.col <- c("darkorchid", "blue")
  
  legend("topright", legend=all.cap, col=all.col, lwd=2, y.intersp=1.3)

}

Test Models for Sensitivity Analysis

Description

These functions are standard testcases for sensitivity analysis benchmarks. For a scalar output (see Saltelli et al. 2000 and https://www.sfu.ca/~ssurjano/):

the g-function of Sobol' with 8 inputs, X ~ U[0,1];
the function of Ishigami with 3 inputs, X ~ U[-pi,pi];
the function of Morris with 20 inputs, X ~ U[0,1];
the Linkletter decreasing coefficients function, X ~ U[0,1] (Linkletter et al. (2006));
the heterdisc function with 4 inputs, X ~ U[0,20];
the Friedman function with 5 inputs, X ~ U[0,1] (Friedman, 1991);
the Matyas function with 2 inputs, X ~ U[0,1].

For functional output cases:

the Arctangent temporal function with 2 inputs, X ~ U[-7,7] (Auder, 2011). The functional support is on [0,2pi];
the Cambell1D function with 4 inputs, X ~U[-1,5] (Campbell et al. 2006). The functional support is on [-90,90].

Usage

sobol.fun(X)
ishigami.fun(X)
morris.fun(X)
atantemp.fun(X, q = 100)
campbell1D.fun(X, theta = -90:90)
linkletter.fun(X)
heterdisc.fun(X)
friedman.fun(X)
matyas.fun(X)
sobol.fun(X)
ishigami.fun(X)
morris.fun(X)
atantemp.fun(X, q = 100)
campbell1D.fun(X, theta = -90:90)
linkletter.fun(X)
heterdisc.fun(X)
friedman.fun(X)
matyas.fun(X)

Arguments

`X`	a matrix (or `data.frame`) containing the input sample.
`q`	for the atantemp() function: the number of discretization steps of the functional output
`theta`	for the campbell1D() function: the discretization steps (angles in degrees)

Value

A vector of function responses.

Author(s)

Gilles Pujol and Bertrand Iooss

References

A. Saltelli, K. Chan and E. M. Scott eds, 2000, Sensitivity Analysis, Wiley.

Examples



# Examples for the functional toy fonctions

# atantemp function

y0 <- atantemp.fun(matrix(c(-7,0,7,-7,0,7),ncol=2))
plot(y0[1,],type="l")
apply(y0,1,lines)

n <- 100
X <- matrix(c(runif(2*n,-7,7)),ncol=2)
y <- atantemp.fun(X)
plot(y0[2,],ylim=c(-2,2),type="l")
apply(y,1,lines)

# campbell1D function

N1=100         # nombre de simulations pour courbes 1D
min=-1 ; max=5
nominal=(max+min)/2

X1 = NULL ; y1 = NULL
Xnom=matrix(nominal,nr=1,nc=4)
ynom=campbell1D.fun(Xnom,theta=-90:90)
plot(ynom,ylim=c(8,30),type="l",col="red")
for (i in 1:N1){
  X=matrix(runif(4,min=min,max=max),nr=1,nc=4)
  rbind(X1,X)
  y=campbell1D.fun(X,theta=-90:90)
  rbind(y1,y)
  lines(y)
}


# Examples for the functional toy fonctions

# atantemp function

y0 <- atantemp.fun(matrix(c(-7,0,7,-7,0,7),ncol=2))
plot(y0[1,],type="l")
apply(y0,1,lines)

n <- 100
X <- matrix(c(runif(2*n,-7,7)),ncol=2)
y <- atantemp.fun(X)
plot(y0[2,],ylim=c(-2,2),type="l")
apply(y,1,lines)

# campbell1D function

N1=100         # nombre de simulations pour courbes 1D
min=-1 ; max=5
nominal=(max+min)/2

X1 = NULL ; y1 = NULL
Xnom=matrix(nominal,nr=1,nc=4)
ynom=campbell1D.fun(Xnom,theta=-90:90)
plot(ynom,ylim=c(8,30),type="l",col="red")
for (i in 1:N1){
  X=matrix(runif(4,min=min,max=max),nr=1,nc=4)
  rbind(X1,X)
  y=campbell1D.fun(X,theta=-90:90)
  rbind(y1,y)
  lines(y)
}

Truncated distributions

Description

dnorm.trunc, pnorm.trunc, qnorm.trunc and rnorm.trunc are functions for the Truncated Normal Distribution. dgumbel.trunc, pgumbel.trunc, qgumbel.trunc and rgumbel.trunc are functions for the Truncated Gumbel Distribution.

Usage

dnorm.trunc(x, mean = 0, sd = 1, min = -1e6, max = 1e6)
pnorm.trunc(q, mean = 0, sd = 1, min = -1e6, max = 1e6)
qnorm.trunc(p, mean = 0, sd = 1, min = -1e6, max = 1e6)
rnorm.trunc(n, mean = 0, sd = 1, min = -1e6, max = 1e6)
dgumbel.trunc(x, loc = 0, scale = 1, min = -1e6, max = 1e6)
pgumbel.trunc(q, loc = 0, scale = 1, min = -1e6, max = 1e6)
qgumbel.trunc(p, loc = 0, scale = 1, min = -1e6, max = 1e6)
rgumbel.trunc(n, loc = 0, scale = 1, min = -1e6, max = 1e6)
dnorm.trunc(x, mean = 0, sd = 1, min = -1e6, max = 1e6)
pnorm.trunc(q, mean = 0, sd = 1, min = -1e6, max = 1e6)
qnorm.trunc(p, mean = 0, sd = 1, min = -1e6, max = 1e6)
rnorm.trunc(n, mean = 0, sd = 1, min = -1e6, max = 1e6)
dgumbel.trunc(x, loc = 0, scale = 1, min = -1e6, max = 1e6)
pgumbel.trunc(q, loc = 0, scale = 1, min = -1e6, max = 1e6)
qgumbel.trunc(p, loc = 0, scale = 1, min = -1e6, max = 1e6)
rgumbel.trunc(n, loc = 0, scale = 1, min = -1e6, max = 1e6)

Arguments

`x`, `q`	vector of quantiles
`p`	vector of probabilities
`n`	number of observations
`mean`, `sd`	means and standard deviation parameters
`loc`, `scale`	location and scale parameters
`min`	vector of minimal bound values
`max`	vector of maximal bound values

Details

See dnorm for details on the Normal distribution. The Gumbel distribution comes from the evd package. See dgumbel for details on the Gumbel distribution.

Value

dnorm.trunc and dgumbel.trunc give the density, pnorm and pgumbel.trunc give the distribution function, qnorm and qgumbel.trunc give the quantile function, rnorm and rgumbel.trunc generate random deviates.

Author(s)

Gilles Pujol and Bertrand Iooss

Weight-function to transform an output variable in order to perform Target Sensitivity Analysis (TSA)

Description

Transformation function of one variable (vector sample)

Usage

weightTSA(Y, c, upper = TRUE, type="indicTh", param=1)
weightTSA(Y, c, upper = TRUE, type="indicTh", param=1)

Arguments

`Y`	The output vector
`c`	The threshold
`upper`	TRUE for upper threshold and FALSE for lower threshold
`type`	The weight function type ("indicTh", "zeroTh", logistic", "exp1side"): indicTh : indicator-thresholding zeroTh : zero-thresholding (keeps the variable value above (upper=TRUE case) or below the threshold) logistic : logistic transformation at the threshold exp1side : exponential transformation above (upper=TRUE case) or below the threshold (see Raguet and Marrel)
`param`	The parameter value for "logistic" and "exp1side" types

Details

The weight functions depend on a threshold $c$ and/or a smooth relaxation. These functions are defined as follows

if type = "indicTh": $w = 1_{Y>c}$ (upper threshold) and $w = 1_{Y<c}$ (lower threshold),
if type = "zeroTh": $w = Y 1_{Y>c}$ (upper threshold) and $w = Y 1_{Y<c}$ (lower threshold),
if type = "logistic":

$w = \left[ 1 + \exp{\left( -param\frac{Y-c}{|c|}\right)}\right]^{-1}$

(upper threshold) and

$w = \left[ 1 + \exp{\left( -param\frac{c-Y}{|c|}\right)}\right]^{-1}$

(lower threshold),
if type = "exp1side":

$w = \left[ 1 + \exp{\left( -\frac{\max(c - Y, 0)}{\frac{param}{5} \sigma(Y)}\right)}\right]$

(upper threshold) and

$w = \left[ 1 + \exp{\left( -\frac{\max(Y - c, 0)}{\frac{param}{5} \sigma(Y)}\right) }\right]$

(lower threshold), where $\sigma(Y)$ is an estimation of the standard deviation of Y and $param = 1$ is a parameter tuning the smoothness.

Value

The vector sample of the transformed variable

Author(s)

B. Iooss

References

H. Raguet and A. Marrel, Target and conditional sensitivity analysis with emphasis on dependence measures, Preprint, https://hal.archives-ouvertes.fr/hal-01694129

A. Marrel and V. Chabridon, 2021, Statistical developments for target and conditional sensitivity analysis: Application on safety studies for nuclear reactor, Reliability Engineering & System Safety, 214:107711.

A. Spagnol, Kernel-based sensitivity indices for high-dimensional optimization problems, PhD Thesis, Universite de Lyon, 2020

Spagnol A., Le Riche R., Da Veiga S. (2019), Global sensitivity analysis for optimization with variable selection, SIAM/ASA J. Uncertainty Quantification, 7(2), 417–443.

Examples

n <- 100  # sample size
c <- 1.5
Y <- rnorm(n)
Yt <- weightTSA(Y, c)
n <- 100  # sample size
c <- 1.5
Y <- rnorm(n)
Yt <- weightTSA(Y, c)

Package 'sensitivity'

Help Index

Sensitivity Analysis

Description

Details

Model managing

Author(s)

References

Addelman and Kempthorne construction

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Correlation Ratio

Description

Usage

Arguments

Value

Author(s)

References

Examples

Decoupling Simulations and Estimations

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Distributed Evaluation of Local Sensitivity Analysis

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Discrepancy measure

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Non-parametric variable significance test based on the empirical process

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Extended Fourier Amplitude Sensitivity Test

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Johnson indices

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

LMG $R^2$ decomposition for linear and logistic regression models