Introduction to epandist

The scope of this package

Analyzing censored variables usually requires the use of optimization algorithms. The epandist-package provides an alternative algebraic approach to the task of determining the expected value of a random censored variable with a known censoring point. Likewise this approach allows for the determination of the censoring point if the expected value is known. These results are derived under the assumption that the variable follows an Epanechnikov kernel distribution with known mean and range prior to censoring.

The Epanechnikov distribution

The Epanechnikov kernel is often used in the context of non-parametric estimation. However the kernel may also be considered a distribution in its own right. The Epanechnikov “distribution” is simply a concave pylonomial of second degree. As such the distribution entails some desirable properties. For one thing it is a straight forward way of achieving an s-shaped cumulative distribution function. Furthermore the distribution allows for untroublesome calculation of the expected value of a censored variable.

require("epandist")

## Loading required package: epandist

curve(depan(x), xlim=c(-2.5, 2.5), yaxs='i', col="blue")  #Mean=0, sd=1
title("The Epanechnikov probability distribution function", cex.main=1)

The Epanechnikov distribution is controlled by two parameters: μ and r. μ represents the mean, mode and median, which all coincide since the distribution is symmetrical. r represents the spread and corresponds to the distance between the mean and the smallest/largest possible value supported by the distribution, i.e. half the range. $\;r=\sqrt{5}$ yields a standard deviation of exactly 1.

The cumulative distribution function and the quantile function

The cumulative distribution function is given by $$P(X<k\; | \; \mu,\; r)=\frac{ -\Big(\frac{k-\mu}{r}\Big)^3 + 3\Big(\frac{k-\mu}{r}\Big) + 2}{4} \qquad k \in (\mu-r;\; \mu+r)$$

This equation is implemented in the pepan-function.

Solving the cubic equation above yeilds the quantile function. It is beyond the scope of this introduction to show how this is done but the solution turns out to be

$$Q(p\; | \; \mu,\; r)=2\,sin\Big(\frac{asin(2p-1)}{3}\Big)r+\mu \qquad p \in (0;\; 1)$$

This equation is implemented in the qepan-function.

Example - using `pepan` and `qepan`

Consider an epanechnikov-distributed variable with distribution parameters μ = 10 and r = 4 (note that only values between 6 and 14 are supported by the distribution).

What is the probability that x < 8 ? To answer this question use the pepan-function:

pepan(x=8, mu=10, r=4) #Calculate probability

## [1] 0.15625

The propability of drawing a value below 8 is thus 15.6 percent.

Where is the lower and upper quartile located? To answer this question use the qepan-function:

qepan(p=0.25, mu=10, r=4) #Lower quantile 
qepan(p=0.75, mu=10, r=4) #Upper quantile

## [1] 8.610815
## [1] 11.38919

Thus half of the observations drawn from the distribution are located between 8.6 and 11.4.

##Working with censored variables As has already been stated the Epanechnikov distribution is convenient when working with censored variables since it allows for calculating of the expected value and the censoring point. In the remaining part of this introduction these properties are demonstrated.

Example - using `evepan`

Consider an epanechnikov-distributed variable with zero mean and a standard deviation of 1 (corresponding to $r=\sqrt{5}$). The distribution is left-censored at $-\frac{1}{2}$. What is the expected value? To answer this question use the evepan-function:

evepan(c=-.5, mu=0, r=5^.5, side_censored = "left") #Calculate expected value

## [1] 0.2108396

Not surprisingly the expected value is greater than μ.

Mathematical foundation

Readers who are mathematically inclined may also think of this type of problem in terms of mathematical notation. Finding the expected abatement is the equivalent of solving the integral below:

E(x | c) = ∫_c^∞pdf(x) * (x − c) dx

where c denotes the censoring point and pdf(x) is the probability distribution function. Inserting the Epanechnikov probability distribution function - i.e. a concave pylonomial of second degree - yields

$$E(x\; | \; \mu,\; r,\; c)\; =\;\frac{r}{16}\Big(1-\frac{c-\mu}{r}\Big)^3\Big(3+\frac{c-\mu}{r}\Big)+c \qquad c \in (\mu-r;\; \mu+r) $$

The evepan-function is essentially just an implementation of an integral with an algebraic solution; an equation of fourth degree.

Example - using `cepan`

In the previous example the expected value was calculated for a variable with a known censoring point. The opposite problem may also occur.

Which censoring point yields an expected value of 1? To answer this question use the cepan-function:

cepan(ev=1, mu=0, r=5^.5, side_censored = "left") #Calculate censoring point

## [1] 0.8981712

Thus a censoring point at 0.9 yields an expected value of 1.

The result above is calculated by solving the previously mentioned fourth degree equation. The resulting expression is rather lengthy and contains several square- and cubic-roots (which the cepan-functions evaluates behind the scenes). However it is noteworthy that an exact solution can be derived in the first place (a task that requires patience and determination). Users are encouraged to take a look at the source code.

Case study - the effect of an emission ceiling

When may the expected value of a censored variable be of interest? If e.g. the effect of a proposed emission ceiling ought to be evaluated. The emission level in the absence of the ceiling may be thought of as a random variable reflecting that future emissions are subject to uncertainty. The ceiling is simply the censoring point. With a bit of ingenuity the expected abatement can be calculated using the evepan-function.

Calculating expected abatement

Assume that future emissions in the absence of the ceiling are Epanechnikov-distributed between 90 and 110. A ceiling at 101 is suggested. What is the expected abatement?

Clearly the answer is not 0 since it is possible that the realized emissions in the absence of the ceiling would exceed 100. In the presence of an emission ceiling the realized emission may be thought of as a random right-censored variable. Subtracting the expected value of this variable from the initial mean (i.e. 100) yields the expected abatement. Thus the expected abatement may be calculated by using the evepan-function:

100 - evepan(c=101, mu=100, r=10, side_censored = "right") #Calculate expected abatement

## [1] 1.412437

Hence the expected abatement is around 1.4 percent.

Let us check if the answer is correct by generating random data which can be used to calculate an approximate solution:

censoringpoint <- 101

dist_mean <- 100 #Mean prior to censoring

dist_range <- 10 #Half of distribution range

x <- repan(1000000, dist_mean, dist_range) #Generating epan-distibuted random data

x[x > censoringpoint] <- censoringpoint #Censoring data

dist_mean - mean(x) #Approximate expected abatement

## [1] 1.417749

The author finds this verification very pleasing.

Setting the ceiling

Say that the 1.4 percent expected abatement found in the previous example is considered too ambitious by policy makers. Which ceiling level corresponds to an expected abatement of exactly 1 percent?

cepan(ev=99, mu=100, r=10, side_censored = "right") #Calculate censoring point

## [1] 102.0687

Thus an emission ceiling 2.1 percent above the “main scenario”-level (i.e. at 102.1) will result in an expected abatement of 1 percent.

Assuming that future emissions are subject to Epanechnikov-distributed uncertainty is of course not entirely innocent. Nevertheless this is much preferable to ignoring uncertainty altogether which regrettably is a common approach.