Monte Carlo Integration

The SI package provides several methods of MC Integrating including

  • Stochastic Point Method
  • Mean Value Method
  • Important Sampling Method
  • Stratified Sampling Method

Note that the Stochastic Point Method is only a stochastic point method to estimate integration. However, it is provided to be easily compared with other MC methods. These four methods are all belong to stochastic methods.

Stochastic Integration

Suppose that h(x) ∈ ℂ1 is well-defined and bounded on finite interval C = [a, b]. W.L.O.G., 0 ≤ h(x) ≤ M. We are tried to calculate the integral I = ∫abh(x)dx That is to say, we need to calculate the area under h(x): D = {(x, y) : 0 ≤ y ≤ h(x), x ∈ C}.

Stochastic Point Method

Uniformly sampling from G = C × (0, M) for N times and get the stochastic points Z1, ⋯, ZN where Zi = (Xi, Yi), i = 1, 2, ⋯, N.

For i = 1, 2, ⋯, N, let $$\xi_i=\begin{cases} 1&,Z_i\in D\\ 0&,\text{otherwise} \end{cases}$$

Then {ξi} are outcomes of independent duplicate trials and $\xi_1,\cdots,\xi_N\overset{i.i.d.}{\sim}Bernoulli(1,p)$ where $$p=\mathbb{P}(Z_i\in D)=\dfrac{I}{M(b-a)}$$

Thus, the stochastic point method is given by 1 = (b − a) where is the estimator of p and given by the proportion of points {Zi} that lie under the curve h(x).

From Strong Law of Large Number, which means that we can use to approximate I better if we try more times (with large N).

To estimate the variance (or precision), we use the Central Limit Theorem and get

Therefore, the variance can be estimated by $$Var(\hat{I}_1)\approx \dfrac{1}{N}[M(b-a)]^2\hat{p}(1-\hat{p})$$

For m-dimension function on [a1, b1] × ⋯ × [am, bm], we also have

In SI package, use the following code to carry out stochastic point method.

## To integrate exp(x) from -1 to 1
set.seed(0)
h <- function(x){
    exp(x)
}
N <- 100000
SPMresult <- SI.SPM(h,-1,1,exp(1),N)
I1 <- SPMresult[[1]]
VarI1 <- SPMresult[[2]]

Mean Value Method

Let U ∼ Uniform(a, b), then

Suppose that we first sample $U_1,\cdots,U_N\overset{i.i.d.}{\sim}Uniform(a,b)$, then Yi = h(Ui), i = 1, 2, ⋯, N are i.i.d random variables. From Strong Law of Large Number, $$\overline{Y}=\dfrac{1}{N}\sum\limits_{i=1}^Nh(U_i)\xrightarrow{a.s.}\mathbb{E}[h(U)]=\dfrac{I}{b-a}$$

Thus we get the mean value estimator $$\hat{I}_2=\dfrac{b-a}{N}\sum\limits_{i=1}^Nh(U_i)$$

To estimate the variance (or precision), we use the Central Limit Theorem and get where $$Var[h(U)]=\int_a^b\{h(u)-\mathbb{E}[h(U)]\}^2\dfrac{1}{b-a}\mathrm{d}u$$

Since it can be estiamted by sample variance $$Var[h(U)]\approx \dfrac{1}{N}\sum\limits_{i=1}^N(Y_i-\overline{Y})^2$$ we can estimate the variance of 2 by $$Var(\hat{I}_2)\approx \dfrac{(b-a)^2}{N^2}\sum\limits_{i=1}^N(Y_i-\overline{Y})^2$$

For m-dimension function on [a1, b1] × ⋯ × [am, bm], we also have

In SI package, use the following code to carry out mean value method.

## To integrate exp(x) from -1 to 1
set.seed(0)
h <- function(x){
    exp(x)
}
N <- 100000
MVMresult <- SI.MVM(h,-1,1,N)
I2 <- MVMresult[[1]]
VarI2 <- MVMresult[[2]]

Important Sampling Method

Suppose g(x) is a density having similar shape with |h(x)|, h(x) = 0 when g(x) = 0 and h(x) = o(g(x)) as x → ∞.

Suppose that $X_i\overset{i.i.d.}{\sim}g(x),\ i=1,2,\cdots,N$ and let $$\eta_i=\dfrac{h(X_i)}{g(X_i)}$$ then $$\mathrm{E}\eta_i=\int_C\dfrac{h(x)}{g(x)}g(x)\mathrm{d}x=I$$

Therefore, the important sampling estimator is given by $$\hat{I}_3=\overline{\eta}=\dfrac{1}{N}\sum\limits_{i=1}^N\dfrac{h(X_i)}{g(X_i)}$$

The variance can be estimated by

In SI package, use the following code to carry out mean value method.

## To integrate exp(x) from -1 to 1
## Use the sampling density (3/2+x)/3
set.seed(0)
h <- function(x){
    exp(x)
}
N <- 100000    
g <- function(x){return((3/2+x)/3)}
G_inv <- function(y){return(sqrt(6*y+1/4)-3/2)}
ISMresult <- SI.ISM(h,g,G_inv,N)
I3 <- ISMresult[[1]]
VarI3 <- ISMresult[[2]]

Stratified Sampling Method

Suppose that $C=\bigcup\limits_{j=1}^m C_j$ and Ci ∩ Cj = ⌀. For every Cj, use mean value method to approximate Cj = ∫Cjh(x)dx

And get $$\hat{I}_4=\sum\limits_{j=1}^m\hat{I}_{C_j}$$

It can be shown that the varaince of 4 is smaller than 2.

In SI package, use the following code to carry out mean value method.

## To integrate exp(x) from -1 to 1
set.seed(0)
h <- function(x){
    exp(x)
}
N <- 100000
SSMresult <- SI.SSM(h,-1,1,10,N)
I4 <- SSMresult[[1]]
VarI4 <- SSMresult[[2]]