Analyzing censored variables usually requires the use of optimization
algorithms. The epandist
-package provides an alternative
algebraic approach to the task of determining the expected value of a
random censored variable with a known censoring point. Likewise this
approach allows for the determination of the censoring point if the
expected value is known. These results are derived under the assumption
that the variable follows an Epanechnikov kernel distribution with known
mean and range prior to censoring.
The Epanechnikov kernel is often used in the context of non-parametric estimation. However the kernel may also be considered a distribution in its own right. The Epanechnikov “distribution” is simply a concave pylonomial of second degree. As such the distribution entails some desirable properties. For one thing it is a straight forward way of achieving an s-shaped cumulative distribution function. Furthermore the distribution allows for untroublesome calculation of the expected value of a censored variable.
## Loading required package: epandist
curve(depan(x), xlim=c(-2.5, 2.5), yaxs='i', col="blue") #Mean=0, sd=1
title("The Epanechnikov probability distribution function", cex.main=1)
The Epanechnikov distribution is controlled by two parameters: μ and r. μ represents the mean, mode and median, which all coincide since the distribution is symmetrical. r represents the spread and corresponds to the distance between the mean and the smallest/largest possible value supported by the distribution, i.e. half the range. $\;r=\sqrt{5}$ yields a standard deviation of exactly 1.
The cumulative distribution function is given by $$P(X<k\; | \; \mu,\; r)=\frac{ -\Big(\frac{k-\mu}{r}\Big)^3 + 3\Big(\frac{k-\mu}{r}\Big) + 2}{4} \qquad k \in (\mu-r;\; \mu+r)$$
This equation is implemented in the pepan
-function.
Solving the cubic equation above yeilds the quantile function. It is beyond the scope of this introduction to show how this is done but the solution turns out to be
$$Q(p\; | \; \mu,\; r)=2\,sin\Big(\frac{asin(2p-1)}{3}\Big)r+\mu \qquad p \in (0;\; 1)$$
This equation is implemented in the qepan
-function.
pepan
and qepan
Consider an epanechnikov-distributed variable with distribution parameters μ = 10 and r = 4 (note that only values between 6 and 14 are supported by the distribution).
What is the probability that x < 8 ? To answer this question
use the pepan
-function:
## [1] 0.15625
The propability of drawing a value below 8 is thus 15.6 percent.
Where is the lower and upper quartile located? To answer this
question use the qepan
-function:
## [1] 8.610815
## [1] 11.38919
Thus half of the observations drawn from the distribution are located between 8.6 and 11.4.
##Working with censored variables As has already been stated the Epanechnikov distribution is convenient when working with censored variables since it allows for calculating of the expected value and the censoring point. In the remaining part of this introduction these properties are demonstrated.
evepan
Consider an epanechnikov-distributed variable with zero mean and a
standard deviation of 1 (corresponding to $r=\sqrt{5}$). The distribution is
left-censored at $-\frac{1}{2}$. What
is the expected value? To answer this question use the
evepan
-function:
## [1] 0.2108396
Not surprisingly the expected value is greater than μ.
Readers who are mathematically inclined may also think of this type of problem in terms of mathematical notation. Finding the expected abatement is the equivalent of solving the integral below:
E(x | c) = ∫c∞pdf(x) * (x − c) dx
where c denotes the censoring point and pdf(x) is the probability distribution function. Inserting the Epanechnikov probability distribution function - i.e. a concave pylonomial of second degree - yields
$$E(x\; | \; \mu,\; r,\; c)\; =\;\frac{r}{16}\Big(1-\frac{c-\mu}{r}\Big)^3\Big(3+\frac{c-\mu}{r}\Big)+c \qquad c \in (\mu-r;\; \mu+r) $$
The evepan
-function is essentially just an
implementation of an integral with an algebraic solution; an equation of
fourth degree.
cepan
In the previous example the expected value was calculated for a variable with a known censoring point. The opposite problem may also occur.
Which censoring point yields an expected value of 1? To answer this
question use the cepan
-function:
## [1] 0.8981712
Thus a censoring point at 0.9 yields an expected value of 1.
The result above is calculated by solving the previously mentioned
fourth degree equation. The resulting expression is rather lengthy and
contains several square- and cubic-roots (which the
cepan
-functions evaluates behind the scenes). However it is
noteworthy that an exact solution can be derived in the first place (a
task that requires patience and determination). Users are encouraged to
take a look at the source code.
When may the expected value of a censored variable be of interest? If
e.g. the effect of a proposed emission ceiling ought to be evaluated.
The emission level in the absence of the ceiling may be thought of as a
random variable reflecting that future emissions are subject to
uncertainty. The ceiling is simply the censoring point. With a bit of
ingenuity the expected abatement can be calculated using the
evepan
-function.
Assume that future emissions in the absence of the ceiling are Epanechnikov-distributed between 90 and 110. A ceiling at 101 is suggested. What is the expected abatement?
Clearly the answer is not 0 since it is possible that the realized
emissions in the absence of the ceiling would exceed 100. In the
presence of an emission ceiling the realized emission may be thought of
as a random right-censored variable. Subtracting the expected value of
this variable from the initial mean (i.e. 100) yields the expected
abatement. Thus the expected abatement may be calculated by using the
evepan
-function:
## [1] 1.412437
Hence the expected abatement is around 1.4 percent.
Let us check if the answer is correct by generating random data which can be used to calculate an approximate solution:
censoringpoint <- 101
dist_mean <- 100 #Mean prior to censoring
dist_range <- 10 #Half of distribution range
x <- repan(1000000, dist_mean, dist_range) #Generating epan-distibuted random data
x[x > censoringpoint] <- censoringpoint #Censoring data
dist_mean - mean(x) #Approximate expected abatement
## [1] 1.41425
The author finds this verification very pleasing.
Say that the 1.4 percent expected abatement found in the previous example is considered too ambitious by policy makers. Which ceiling level corresponds to an expected abatement of exactly 1 percent?
## [1] 102.0687
Thus an emission ceiling 2.1 percent above the “main scenario”-level (i.e. at 102.1) will result in an expected abatement of 1 percent.
Assuming that future emissions are subject to Epanechnikov-distributed uncertainty is of course not entirely innocent. Nevertheless this is much preferable to ignoring uncertainty altogether which regrettably is a common approach.