---
title: "Distributions with Delay"
author: matthias.kuhn@tu-dresden.de
date: 2024-07-25
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Distributions with Delay}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library("incubate")

library("dplyr")
library("ggplot2")
library("patchwork")
```


The delayed exponential distribution and the delayed Weibull distribution
are the foundational data models in the `incubate`-package.


## Three-parameter Weibull
The classical 2-parameter Weibull distribution has shape parameter $γ>0$ and scale parameter $β>0$.
As additional third parameter we use delay $α>0$ and then, density (PDF) and cumulative distribution function (CDF) read, for any $x>0$:
$$
\begin{align*}
f(x) &= \frac{γ· (x-α)^{γ-1}}{β^γ} · \exp(-\big(\frac{x-α}{β}\big)^γ)\\
F(x) &= 1 - \exp(-\big(\frac{x-α}{β}\big)^γ) \qquad \text{for } x > α \text{ and 0 otherwise}
\end{align*}
$$

If we have a data sample $\mathbf{x}$ of size $n$ and want to apply **maximum
likelihood (ML)** estimation for parameters we get as log-likelihood function

$$
l_\mathbf{x}(α, γ, β) = n \log(γ) - nγ \log(β) + (γ-1) · ∑_{i=1}^n \log(x_i-α) - β^{-γ} · ∑_{i=1}^n (x_i - α)^γ
$$

In the presence of **right-censored observations**, the log-likelihood function changes as well.
Assume, we have in total $n=n_e + n_r$ observations, where

* $n_e$ event observations with index set $O$ _and_
* $n_r$ right censored observations with index set $R$, _hence_
* $|O| + |R| = n$ and $O ⨃ R = \{1, .., n\}$

<!-- labelled as $x_{r_1}, ..., x_{r_{n_r}}$. -->

\begin{multline*}
l_\mathbf{x}(α, γ, β) = n_e \log(γ) - n_e γ \log(β) + (γ-1) · ∑_{i ∈ O} \log(x_i-α) - β^{-γ} · ∑_{i ∈ O} (x_i - α)^γ\\
- β^{-γ} · ∑_{r ∈ R} (x_r - α)^γ
\end{multline*}

This simplifies to
$$
l_\mathbf{x}(α, γ, β) = n_e \log(γ) - n_e γ \log(β) + (γ-1) · ∑_{i ∈ O} \log(x_i-α) - β^{-γ} · ∑_{i=1}^n (x_i - α)^γ
$$


One can directly go to (numerically) maximize this log-likelihood function.
Cousineau (2009) calls this the **iterative approach**.


### Profiling scale parameter β
We are able to profile out the scale parameter estimate $\hat β$:
the candidate value for a local extremum with respect to scale β is found
by taking the 1^st^ derivative and equating to 0:
$$
\frac{\partial l}{\partial β}(α, γ, β) = -\frac{nγ}{β} + γ β^{-γ-1} · ∑_{i=1}^n (x_i - α)^γ \overset{!}{=}0
$$
Hence (as $γ>0$), it must hold
$-n + β^{-γ} · ∑_{i=1}^n (x_i - α)^γ \overset{!}{=}0$
and further simplifying,
$\hat β = \sqrt[γ]{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ}$.

With **right-censored observations** present in the data, the result is
similar:

$$
\frac{\partial l}{\partial β}(α, γ, β) = -\frac{n_e γ}{β} + γ β^{-γ-1} · ∑_{i=1}^n (x_i - α)^γ
\overset{!}{=}0
$$
Hence (as $γ>0$), it must hold
$-n_e + β^{-γ} · ∑_{i=1}^n (x_i - α)^γ \overset{!}{=}0$.
The solution is
$\hat β = \sqrt[γ]{\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ}$, we divide by
the number of events $n_e$ only but we sum over all $x_i$ (observed and
right-censored).

The only problem is when we have right-censorings earlier than the (estimated)
delay α, then we potentially get negative contributions to the radicand. It
seems appropriate to set these terms to zero.


### Simplified log-likelihood
If we substitute the β-estimate into the log-likelihood equation we get a
simplified function to maximize.
$$
\begin{align}
l^P_{\mathbf{x}}(α, γ) &= n \log(γ) - n \log(\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ) + (γ-1) · ∑_{i=1}^n \log(x_i-α) - n\\
   &= n · \Big[ \log(γ) - \log\big(\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ\big) + (γ-1) · \frac{1}{n} ∑_{i=1}^n \log(x_i-α) - 1\Big]
\end{align}
$$

In the presence of **right-censored observations**, the equation for
$l^P_{\mathbf{x}}(α, γ)$ is similar but different. We have to use the number of
observed event $n_e$ (instead of $n$) in most cases.

\begin{align*}
l^P_{\mathbf{x}}(α, γ) &= n_e \log(γ) - n_e \log\big(\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ\big) + (γ-1) · ∑_{i∈O} \log(x_i-α) - n_e\\
   &= n_e · \Big[ \log(γ) - \log\big(\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ\big) + (γ-1) · \frac{1}{n_e} ∑_{i∈O} \log(x_i-α) - 1 \Big]
\end{align*}

The right-censored observations occur both in the simplified log-likelihood as
well as in the scale parameter estimate $\hat β$.


### Partial Derivatives of simplified log-likelihood
We can use derivatives of the simplified log-likelihood to find candidates for
**local maxima** of it. Cousineau (2009) calls this
the **two step approach** (as opposed to the non-profiled iterative approach
above).


Forming the partial derivatives of $l^P(α, γ)$ we get:
$$
\begin{aligned}
\frac{\partial l^P}{\partial α}(α, γ) &= n · \Big[ \frac{\frac{1}{n} ∑_{i=1}^n γ (x_i - α)^{γ-1}}{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ} - (γ-1) · \frac{1}{n} ∑_{i=1}^n \frac{1}{x_i-α}\Big]\\
  &= γ · \frac{∑_{i=1}^n (x_i - α)^{γ-1}}{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ} - (γ-1) · ∑_{i=1}^n \frac{1}{x_i-α}\\
\frac{\partial l^P}{\partial γ}(α, γ) &= n · \Big[ \frac{1}{γ} - \frac{\frac{1}{n} ∑_i \log(x_i-α) · (x_i-α)^γ}{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ} + \frac{1}{n} ∑_{i=1}^n \log(x_i-α) \Big]\\
  &= \frac{n}{γ} - \frac{∑_i \log(x_i-α) · (x_i-α)^γ}{\frac{1}{n} ∑_{i=1}^n (x_i - α)^γ} + ∑_{i=1}^n \log(x_i-α)
\end{aligned}
$$


In the presence of **right-censored observations**, the derivatives become:

\begin{align*}
\frac{\partial l^P}{\partial α}(α, γ) &= n_e · \Big[ \frac{\frac{1}{n_e} ∑_{i=1}^n γ (x_i - α)^{γ-1}}{\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ} - (γ-1) · \frac{1}{n_e} ∑_{i∈O} \frac{1}{x_i-α}\Big]\\
  &= γ · \frac{∑_{i=1}^n (x_i - α)^{γ-1}}{\frac{1}{n_e}∑_{i=1}^n (x_i - α)^γ} - (γ-1) · ∑_{i∈O} \frac{1}{x_i-α}\\
\frac{\partial l^P}{\partial γ}(α, γ) &= n_e · \Big[ \frac{1}{γ} - \frac{\frac{1}{n_e} ∑_{i=1}^n \log(x_i-α) · (x_i-α)^γ}{\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ} + \frac{1}{n_e} ∑_{i∈O} \log(x_i-α) \Big]\\
  &= \frac{n_e}{γ} - \frac{∑_{i=1}^n\log(x_i-α) · (x_i-α)^γ}{\frac{1}{n_e} ∑_{i=1}^n (x_i - α)^γ} + ∑_{i∈O} \log(x_i-α)
\end{align*}


### Score equations: local extrema candidates
Setting these two partial derivatives to 0 we get the **score equations** that
lead to candidate points for local extrema:

* from delay α

\begin{align*}
\frac{γ}{γ-1} · \frac{∑_{i=1}^n (x_i - α)^{γ-1}}{∑_{i=1}^n (x_i - α)^γ} - \frac{1}{n} ∑_{i=1}^n \frac{1}{x_i-α} &= 0\\
\frac{γ}{γ-1} - \frac{1}{n} \big(∑_{i=1}^n \frac{1}{x_i-α}\big) · \frac{∑_{i=1}^n (x_i - α)^γ}{∑_{i=1}^n (x_i - α)^{γ-1}} &= 0
\end{align*}


* from shape $γ>0$
$$
\frac{1}{γ} - \frac{∑_{i=1}^n \log(x_i-α) · (x_i-α)^γ}{∑_{i=1}^n (x_i - α)^γ} + \frac{1}{n} ∑_{i=1}^n \log(x_i-α) = 0
$$


In the presence of **right-censored observations**, the solution equations are:

* from delay α
$$
\frac{γ}{γ-1} - \frac{1}{n_e} \big(∑_{i∈O} \frac{1}{x_i-α}\big) · \frac{∑_{i=1}^n (x_i - α)^γ}{∑_{i=1}^n (x_i - α)^{γ-1}} = 0
$$

* from shape $γ>0$
$$
\frac{1}{γ} - \frac{∑_{i=1}^n \log(x_i-α) · (x_i-α)^γ}{∑_{i=1}^n (x_i - α)^γ} + \frac{1}{n_e} ∑_{i∈O} \log(x_i-α) = 0
$$


### Weighted MLE (MLEw)
Cousineau (2009) proposes a slightly varied equation to solve for the parameter
estimates. The scale parameter β gets estimated using a weighting factor $W_1$,
$$
\hat β = \sqrt[γ]{\frac{1}{n} ∑_{i=1}^n \frac{(x_i - α)^γ}{W_1}}
$$

For the remaining parameters, we modify the partial derivatives equations of the
profiled log-likelihood function. It appears to come from the following
partial derivatives of a modified log-likelihood function, where $W_1$, $W_2$
and $W_3(γ)$ are seen as weights that depend on sample size $n$.
<!-- XXX Not 100% sure, though! Re-think here! -->
Actually, these weights are random variables that ultimately depend on the true
parameter α and $γ$ but we treat $W_1$ and $W_2$ as fixed for given $n$ and we
model $W_3$ as function of shape $γ$ for given $n$.

$$
\begin{aligned}
\frac{\partial l_w^{P}}{\partial α}(α, γ) &= n · \Big[ W_3(γ) · (γ-1) · \frac{∑_{i=1}^n (x_i - α)^{γ-1}}{∑_{i=1}^n (x_i - α)^γ} - (γ-1) · \frac{1}{n} ∑_{i=1}^n \frac{1}{x_i-α}\Big]\\
\frac{\partial l_w^{P}}{\partial γ}(α, γ) &= n · \Big[ \frac{W_2}{γ} - \frac{∑_i \log(x_i-α) · (x_i-α)^γ}{∑_{i=1}^n (x_i - α)^γ} + \frac{1}{n} ∑_{i=1}^n \log(x_i-α) \Big]
\end{aligned}
$$

Compared to the MLE-equations (partial derivative equation for delay), we see that $W_3(γ)$ takes the role of $\frac{γ}{γ-1}$.


## Parameter transformation
The model fitting uses numeric optimization routines.
We use parameter transformation to improve this process,
in particular through suitable parameter transformation we can avoid constraints on the bounds of the parameters.

For instance, the initial delay parameter $α_1$ is by definition bounded upwards by the first observation $x_{(1)}$
and conventionally we assume $α_1 ≥ 0$ as well, in particular when we model a survival time. This 2^nd^ assumption is not strictly necessary, however, as a model with negative delay might also describe well a survival data.

To get rid of the (upper) bounds, an idea is to use the 1^st^ observation as a normalization for $α_1$ and find a transformation to **whole ℝ**:

### Log-based transformation
For instance, assuming only an upper bound ($α_1 ≤ x_{(1)}$, also negative values allowed) we get a transformation involving the logarithm:
$$
\tilde{α}_1 := \log(\frac{x_{(1)}-α_1}{x_{(1)}}) = \log(1 - \frac{α_1}{x_{(1)}})
$$

### Logit-based transformation
Alternatively, if we want to enforce also the lower bound for $α_1$
we can apply a **logit-based transformation** to rescale the delay
from $[0,x_{(1)}]$ to whole ℝ:
$$
\tilde{α}_1 := \operatorname{logit}(\frac{α_1}{x_{(1)}}) = -\log(1 + \frac{x_{(1)} - 2α_1}{α_1})
$$
In R, the logit function is implemented efficiently in `stats::qlogis`.
The back-transformation is $α_1 = \frac{x_{(1)}}{1+\exp(-\tilde{α}_1)}$ (cf `stats::plogis` in R).
We visualize this logit-based transformation:
```{r plot_logit_transf, echo=FALSE, fig.align='center', fig.width=8, fig.height=3}
x1 <- 7

ggplot() + xlim(0, 7) +
  geom_function(fun = \(alpha1) -log1p((x1 - 2 * alpha1)/alpha1), n = 1001) +
  labs(x = expression(alpha), y = expression(tilde(alpha))) |

  ggplot() + xlim(-9, 9) +
  geom_function(fun = \(alpha1_tr) x1 / (expm1(-alpha1_tr)+2), n = 1001) +
  labs(x = expression(tilde(alpha)), y = expression(alpha),
       caption = bquote("transformation for data with " * x[(1)] == .(x1)))
```


## Random right censoring
The functions for random generation of data `rexp_delayed()` and
`rweib_delayed()` have an option to produce right censored data. Currently, we
implement random right censoring under a **uniform censoring scheme**, starting
after the initial delay phase.^[Question is if we allow $C$ to occur during the
incubation period. If yes and there is a long delay then we will see many
censored observations during the incubation period which might be odd.
Therefore, we model the censoring process as $C \sim U(α, Z)$.] The
censoring process $C$ is independent of the event process $X$.
The observed time is $\min(X,C)$ and it is an event if and only if $X≤C$,
i.e. when the event happens before the censoring does.

You provide the desired expected proportion of right censoring π via parameter
`cens=`. This information determines the upper limit `Z` for the censoring
process `U(α,Z)`. The expected proportion of censorings π is $P(C<X)$. 
We compute this probability as an integral of the density, we start integrating
after the initial delay.
$$
\begin{equation*}
\begin{split}
π = P(C<X) &= ∫_α^∞ ∫_α^x f_{X,C}(x,c)dc dx\\
  &= ∫_α^∞ ∫_α^x f_X(x) f_C(c) dc dx \qquad \text{ (independence) }\\
  &= ∫_α^∞ f_X(x) · \Big(∫_α^x f_C(c) dc\Big) dx\\
  &= ∫_α^∞ f_X(x) · F_C(x) dx
\end{split}
\end{equation*}
$$

For a uniform censoring process with lower bound α and upper bound $Z$
we have $F_C(x) = \frac{x-α}{Z-α}$ for $x ∈ [α,Z]$
and $F_C(x) = 0$ for $x < α$ and $F_C(x) = 1$ for $x > Z$.


### Exponential
Substituting the density $f_X(x) = 𝟙_{x≥α} · λ \exp(-λ·(x-α))$ and using
integration by parts, we aim to solve for the upper bound $Z$ of the censoring process.

$$
\begin{equation*}
\begin{split}
π = P(C<X) &= ∫_α^∞ f_X(x) · F_C(x) dx\\
  &= ∫_α^Z λ \exp(-λ·(x-α)) \frac{x-α}{Z-α} dx + ∫_Z^∞ λ \exp(-λ·(x-α)) dx\\
  &= ∫_0^{Z-α} λ \exp(-λ·t) \frac{t}{Z-α} dt   + ∫_{Z-α}^∞ λ \exp(-λ·t) dt\\
  &= \frac{1}{Z-α} · \Big\{ [-\exp(-λ·t) t]_{t=0}^{t=Z-α} - ∫_0^{Z-α} -\exp(-λ·t) dt  \Big\} + [-\exp(-λ·t)]_{t=Z-α}^{t=∞} \\
  &= \frac{1}{Z-α} · \Big\{ [-\exp(-λ·(Z-α)) (Z-α) - 0] + ∫_0^{Z-α} \exp(-λ·t) dt  \Big\} - [0 - \exp(-λ·(Z-α))]\\
  &= -\exp(-λ·(Z-α)) + \frac{1}{Z-α} · \Big[ -\frac{1}{λ} \exp(-λ·t) \Big]_{t=0}^{t=Z-α} + \exp(-λ·(Z-α))\\
  &= - \frac{1}{λ(Z-α)} · \Big[ \exp(-λ·(Z-α)) - 1 \Big]\\
  &= \frac{1}{λ(Z-α)} + \frac{\exp(-λ·(Z-α))}{-λ(Z-α)}\\
\end{split}
\end{equation*}
$$

We aim to solve for $Z$ given the desired expected proportion of right censoring π.
When writing $t := λ·(Z-α)$ we need to solve $1 - π t = \exp(-t)$ for $t > 0$ (as we set $Z > α$).
We rearrange terms and get
$$
\underbrace{- \frac{1}{π} · \exp(-\frac{1}{π})}_{=: x} = -\frac{1-πt}{π} · \exp(-\frac{1-πt}{π})
$$
The principal branch $W_0$ of Lambert W function gives a solution $t_0$ which happens to be greater 0 here (when π < 1).^[The other solution through branch $W_{-1}$ might be negative and we don't investigate it further here.]
$$
-\frac{1-π t_0}{π} = W_0(x) \qquad ⇔ \qquad
t_0 = \frac{1}{π} + W_0(x)
$$
Substituting the original expression for $t$, we get $\hat Z = α + \frac{1}{λ} · (\frac{1}{π} + W_0(x))$.

As an example, to draw a random sample from a delayed exponential that (on
average) has 30% right censorings, we can use the following call:
```{r}
#| label: rcens_exp_exmpl

(x <- rexp_delayed(n = 24, delay1 = 5, rate1 = .2, cens = .3))
class(x)
sum(x[,2] == 0)/length(x)
```


### Weibull
The density of delayed Weibull is
$f_X(x) = 𝟙_{x≥α} · \frac{γ}{σ} · (\frac{x-α}{σ})^{γ-1} \exp(-(\frac{x-α}{σ})^γ)$
with shape parameter $γ$ and scale parameter $σ$.
Then, using substitution $t = x - α$ 
we get
$$
\begin{equation*}
\begin{split}
π &= P(C<X) = ∫_α^∞ f_X(x) · F_C(x) dx\\
  &= ∫_α^Z \frac{γ}{σ} · (\frac{x-α}{σ})^{γ-1} \exp(-(\frac{x-α}{σ})^γ) · \frac{x-α}{Z-α} dx + ∫_Z^∞ \frac{γ}{σ} · (\frac{x-α}{σ})^{γ-1} \exp(-(\frac{x-α}{σ})^γ)  dx\\
  &= ∫_0^{Z-α} \frac{γ}{σ} · (\frac{t}{σ})^{γ-1} \exp(-(\frac{t}{σ})^γ) \frac{t}{Z-α} dt + ∫_{Z-α}^∞ \frac{γ}{σ} · (\frac{t}{σ})^{γ-1} \exp(-(\frac{t}{σ})^γ)  dt\\
  &= \frac{γ}{σ(Z-α)} · ∫_0^{Z-α} t · (\frac{t}{σ})^{γ-1} \exp(-(\frac{t}{σ})^γ) dt + \frac{γ}{σ} · ∫_{Z-α}^∞ (\frac{t}{σ})^{γ-1} \exp(-(\frac{t}{σ})^γ) dt\\
\end{split}
\end{equation*}
$$


We apply integration by parts in the first integral (with $u=t$ and $v=-\frac{σ}{γ} \exp(-(\frac{t}{σ})^γ)$).
At the end, we use
$∫_0^s \exp(-x^b) dx = \frac{1}{b} · γ(\frac{1}{b}, s^b)$,
where γ(,) is the incomplete (not regularized) gamma function.
$$
\begin{equation*}
\begin{split}
\text{1st term} &= \frac{γ}{σ(Z-α)} · \Big\{ [-t · \frac{σ}{γ} · \exp(-(\frac{t}{σ})^γ)]_{t=0}^{t=Z-α} - ∫_0^{Z-α} -\frac{σ}{γ} · \exp(-(\frac{t}{σ})^γ) dt  \Big\}\\
  &= \frac{γ}{σ(Z-α)} · \Big\{ [-(Z-α) · \frac{σ}{γ} · \exp(-(\frac{Z-α}{σ})^γ) - 0] + \frac{σ}{γ} ∫_0^{Z-α} \exp(-(\frac{t}{σ})^γ) dt \Big\} \\
  &= -\exp(-(\frac{Z-α}{σ})^γ) + \frac{1}{Z-α} · ∫_0^{Z-α} \exp(-(\frac{t}{σ})^γ) dt \\
  &= -\exp(-(\frac{Z-α}{σ})^γ) + \frac{σ}{Z-α} · ∫_0^{\frac{Z-α}{σ}} \exp(-s^γ) ds \\
  %% 1st term: incomplete gamma function
  &= -\exp(-(\frac{Z-α}{σ})^γ) + \frac{σ}{Z-α} · \frac{1}{γ} ·\gamma(\frac{1}{γ}, (\frac{Z-α}{σ})^γ)
  %\quad \text{(incomplete non-normalized gamma function)}
\end{split}
\end{equation*}
$$

The integral in the second term has a closed form solution,
using anti-derivative $-\frac{σ}{γ} \exp(-(\frac{t}{σ})^γ)$ (exactly the substitution $v$ in first term).
$$
\begin{equation*}
\begin{split}
\text{2nd term} &= \frac{γ}{σ} · \left.\Big( -\frac{σ}{γ} \exp(-(\frac{t}{σ})^γ) \Big) \right\rvert_{t=Z-α}^{t=∞}\\
  &= \frac{γ}{σ} · \Big\{ 0 - \Big(-\frac{σ}{γ} \exp(-(\frac{Z-α}{σ})^γ)\Big) \Big\}\\
  &= \exp(-(\frac{Z-α}{σ})^γ)
\end{split}
\end{equation*}
$$

Together, we get
$$
\begin{equation*}
\begin{split}
π &= -\exp(-(\frac{Z-α}{σ})^γ) + \frac{σ}{Z-α} · \frac{1}{γ} ·\gamma(\frac{1}{γ}, (\frac{Z-α}{σ})^γ) + \exp(-(\frac{Z-α}{σ})^γ)\\
  &= \frac{σ}{Z-α} · \frac{1}{γ} ·\gamma(\frac{1}{γ}, (\frac{Z-α}{σ})^γ)\\
  &= kr^{-k} · \gamma(k, r) \qquad \text{with } k = \frac{1}{γ}>0 \text{ and } r = (\frac{Z-α}{σ})^γ
\end{split}
\end{equation*}
$$
We take log and gather terms on one side and consider the objective function
$$
f_k(r) = \log(k) - k \log(r) + \log(\gamma(k, r)) - \log(π)
$$
For given $k$, it is a monotonically decreasing function in $r$.
We can solve for $r$ by means of root finding. 
A root $r_0$ of $f_k$ gives us the estimation for $Z$,
namely $\hat Z = α + σ r_0^{\frac{1}{γ}}$.


As an example, to draw a random sample from a delayed Weibull with different
shape parameters that (on average) has 30% right censorings, we can use the
following call:
```{r}
#| label: rcens_weibu_exmpl
#| fig.width: 7
#| fig.height: 4

(x1 <- rweib_delayed(n = 24, delay1 = 5, scale1 = 3.5, shape1 = 1.7, cens = .3))
class(x1)
sum(x1[,2] == 0) / length(x1)

(x2 <- rweib_delayed(n = 24, delay1 = 5, scale1 = 3.5, shape1 = .4, cens = .3))
sum(x2[,2] == 0) / length(x2)

survival::survfit(c(x1, x2) ~ rep(c("x1","x2"), each = 24)) |> 
  plot(mark.time=T, conf.int=F, lty = 1:2, col = 1:2)
  
```

### Implementation note
Generating data depends on the pseudo-random number generator (PRNG) and the seed.
For censored observations, we generate both the event time $X$ and the censoring time $C$ and then we take the minimum of both as observed time.
Therefore, the actual proportion of right censorings in a sample will generally fluctuate around the expected proportion π, we might see more or less than π right censorings in a sample. The larger the sample size, the closer we expect to get to the expected proportion π (law of large numbers).


## MLEc
Corrected MLE (MLEc) was proposed by Cheng and Iles (1987). It uses the normal likelihood except for the first observation that is instrumental for the delay estimation. For the first observation MLEc uses the difference in cumulative distribution function between the first two observations.
The log-likelihood function becomes, using $z_i := x_i - α$,

$$
\begin{align}
l_\mathbf{x}(α, γ, β) &= \log(\exp(-(\frac{z_2}{β})^γ) - \exp(-(\frac{z_1}{β})^γ))
+ (n-1) \log(γ) - (n-1) γ \log(β) + (γ-1) · ∑_{i=2}^n \log z_i - β^{-γ} · ∑_{i=2}^n z_i^γ\\
&= -(\frac{z_1}{β})^γ + \log(1-\exp(-\frac{z_2^γ - z_1^γ}{β^γ}) + (n-1) \log(γ) - (n-1) γ \log(β) + (γ-1) · ∑_{i=2}^n \log z_i - β^{-γ} · ∑_{i=2}^n z_i^γ
\end{align}
$$
We could use 1st Taylor approximation to simplify $\log(1+z) = z - \frac{z^2}{2} + \frac{z^3}{3} ∓ ... ≈ z$ for all $z$ close to zero, $|z|<1$:
$$
l_\mathbf{x}(α, γ, β) ≈ -(\frac{z_1}{β})^γ - \exp(-\frac{z_2^γ - z_1^γ}{β^γ}) + (n-1) \log(γ) - (n-1) γ \log(β) + (γ-1) · ∑_{i=2}^n \log z_i - β^{-γ} · ∑_{i=2}^n z_i^γ
$$

We could use partial derivatives to find local maxima of this approximate log-likelihood.