Any continuous density function f on a known closed interval [a, b] can be approximated by Bernstein polynomial $f_m(x; p)=(b-a)^{-1}\sum_{i=0}^m p_i\beta_{mi}[(x-a)/(b-a)]$, where pi ≥ 0, $\sum_{i=0}^m p_i=1$ and $\beta_{mi}(u)=(m+1){m\choose i}u^i(1-u)^{m-i}$, i = 0, 1, …, m, is the beta density with shapes (i + 1, m − i + 1). This provides a way to approximately model the unknwon underlying density with a mixture beta model with an appropriate m and solve a nonparametric or semiparametric statistical problem ``parametrically’’ using the maximum likelihood method. For instance, based on one-sample data, x1, …, xn, one can estimate a nonparametric density f parametrically by maximizing the approximate likelihood $\ell(p)=\sum_{j=0}^n\log f_m(x_j; p)$ with respect to p (Guan 2016).
Since the Bernstein polynomial model of degree m is nested in the model of degree m + 1, the maximum likelihood is increasing in m. The increase is negligible when the model becomes overfitting. Therefore an optimal degree can be chosen as the change-point of the log likelihood ratios over a set of consecutive candidate model degrees.
This approach works surprisingly well for even more complicated models and data. With an estimate p̂ = (p̂0, …, p̂m) of p one can estimate the cumulative distribution function F by $\hat F(x)=F_m(x;\hat p)=\sum_{i=0}^m \hat p_i B_{mi}[(x-a)/(b-a)]$, where Bmi(u) = ∫0uβmj(t)dt, i = 0, …, m, is the beta distribution function with shapes (i + 1, m − i + 1). This manual will illustrate the use of the R package for obtaining not only smooth estimates of density, cumulative distribution, and survival functions but also estimates of parameters such as regression coefficients.
Let x1, …, xn be a sample from a population with cumulative distribution function F and density function f on [a, b]. If [a, b] is unknown we choose [a, b] ⊃ [x(1), x(n)], where x(1) and x(n) are the minimum and maximum statistics, respectively. A lower bound for unimodal density is mb = μ(1 − μ)/σ2 − 3, where μ and σ2 are, respectively, the mean and variance of the distribution after transformation Y = (X − a)/(b − a).
For the annual flow data of Vaal River at Standerton as given by Table 1.1 of Linhart and Zucchini (1986) give the flow in millions of cubic metres,
## Year Flow
## 1 1905 222
## 2 1906 1094
## 3 1907 452
we want to estimate the density and the distribution functions of annual flow
vaal<-mable(Vaal.Flow$Flow, M = c(2,100), interval = c(0, 3000), IC = "all",
controls = mable.ctrl(sig.level = 1e-8, maxit = 2000, eps = 1.0e-9))
% the following hided data contain object vaal obtained by dput(vaal)
Here we truncate the density by and choose an optimal degree m among the candidate degrees M[1]:M[2] using the method of change-point. The maximum number of iterations is and the convergence criterion is for each m of M[1]:M[2]. The search of an optimal degree stops when the p-value of change-point test falls below the specified significance level or the largest degree, M[2], has been reached. If the latter occurs a warning message shows up. In this case we should check the last value of . In the above example, we got warning message and the last , 1.3396916^{-8}, which is small enough. The selected optimal degree is m= 19. One can also look at the Bayesian information criteria, , and other information criteria, Akaike information criterion and Hannan–Quinn information criterion , at each candidate degree. These information criteria are not reliable due to the difficulty of determining the model dimension. The method for class object can visualize some of the results.
op <- par(mfrow = c(1,2))
layout(rbind(c(1, 2), c(3, 3)))
plot(vaal, which = "likelihood", cex = .5)
plot(vaal, which = "change-point", lgd.x = "topright")
hist(Vaal.Flow$Flow, prob = TRUE, xlim = c(0,3000), ylim =c(0,.0022), breaks =100*(0:30),
main = "Histogram and Densities of the Annual Flow of Vaal River",
border = "dark grey",lwd = 1, xlab = "Flow", ylab = "Density", col ="light grey")
lines(density(x<-Vaal.Flow$Flow, bw = "nrd0", adjust = 1), lty = 2, col = 2,lwd = 2)
lines(y<-seq(0, 3000, length=100), dlnorm(y, mean(log(x)), sqrt(var(log(x)))),
lty = 4, col = 4, lwd = 2)
plot(vaal, which = "density", add = TRUE, lwd = 2)
legend("topright", lty = c(1, 4, 2), col = c(1, 4, 2), bty = "n",lwd = 2,
c(expression(paste("MABLE: ",hat(f)[B])), expression(paste("Log-Normal: ",hat(f)[P])),
expression(paste("KDE: ",hat(f)[K]))))
In Figure , the unknown density f is estimated by f̂B using optimal degrees m= 19 selected using the exponential change-point method, the parametric estimate using model and the kernel density estimate : f̂K.
We can also look at the plots of AIC, BIC, and QHC, and likelihood ration(LR) of gamma change-point.
M <- vaal$M[1]:vaal$M[2]
aic <- vaal$ic$AIC
bic <- vaal$ic$BIC
qhc <- vaal$ic$QHC
vaal.gcp <- optim.gcp(vaal) # choose m by gamma change-point model
lr <- vaal.gcp$lr
plot(M, aic, cex = 0.7, xlab = "m", ylab = "", main = "AIC, BIC, QHC, and LR",
ylim = c(ymin<-min(aic,bic,qhc,lr), ymax<-max(aic,bic,qhc,lr)), col = 1)
points(M, bic, pch = 2, cex = 0.7, col = 2)
points(M, qhc, pch = 3, cex = 0.7, col = 3)
points(M[-1], lr, pch = 4, cex = 0.7, col = 4)
segments(which.max(aic)+M[1]-1->m1, ymin, m1, max(aic), lty = 2)
segments(which.max(bic)+M[1]-1->m2, ymin, m2, max(bic), lty = 2, col = 2)
segments(which.max(qhc)+M[1]-1->m3, ymin, m3, max(qhc), lty = 2, col = 3)
segments(which.max(lr)+M[1]->m4, ymin, m4, max(lr), lty = 2, col = 4)
axis(1, c(m1,m2, m3, m4), as.character(c(m1,m2,m3,m4)), col.axis = 4)
legend("topright", pch=c(1,2,3,4), c("AIC", "BIC", "QHC", "LR"), bty="n", col=c(1,2,3,4))
From Figure we see that the gamma change-point method gives the same optimal degree m= 19 as the exponential method; BIC and QHC give the same degree m= 11; the degree m= 16 selected by AIC method is closer to the one selected by change-point methods. From this Figure we also see that unlike the LR plot the information criteria do not have clear peak points.
For any given degree m, one can fit the data by specifying in to obtain an estimate of f.
The method prints and returns invisibly the summarized results.
## Call: mable() for raw data
## Obj Class: mable
## Input Data: Vaal.Flow$Flow
## Dimension of data: 1
## Optimal degree m: 19
## P-value of Change-point: 1.339692e-08
## Maximum loglikelihood: 56.81939
## MABLE of p: can be retrieved using name 'p'
## Note: the optimal model degrees are selected by the method of change-point.
The mixing proportions, the coefficients of the Bernstein polynomials, p can be obtained as either
## [1] 6.663951e-142 1.049278e-01 7.594313e-01 2.309036e-08 2.508441e-16
## [6] 7.194500e-21 1.715117e-20 1.049294e-07 1.202354e-01 1.097612e-22
## [11] 3.689746e-96 2.261501e-228 0.000000e+00 0.000000e+00 0.000000e+00
## [16] 5.038909e-234 1.226874e-39 1.540542e-02 4.735457e-197 0.000000e+00
or .
The method of change-point for choosing model degree is computer-intensive especially for multimodal density. A better lower bound for model degree is $$m_b=\frac{\mu(1-\mu)-\sigma^2}{\sigma^2-\sum_{i=1}^k\lambda_i(\mu_i-\mu)^2}-2,$$ where λi and μi are, respectively, the mixing proportion and mean value of teh ith component density. Less computer-intensive methods such as the method of moment and the method of mode implemented by can be used (see Figure ).
x<-faithful
x1<-faithful[,1]
x2<-faithful[,2]
a<-c(0, 40); b<-c(7, 110)
mu<-(apply(x,2,mean)-a)/(b-a)
s2<-apply(x,2,var)/(b-a)^2
# mixing proportions
lambda<-c(mean(x1<3),mean(x2<65)) # guess component mean
mu1<-(c(mean(x1[x1<3]), mean(x2[x2<65]))-a)/(b-a)
mu2<-(c(mean(x1[x1>=3]), mean(x2[x2>=65]))-a)/(b-a)
# estimate lower bound for m
mb<-ceiling((mu*(1-mu)-s2)/(s2-lambda*(1-lambda)*(mu1-mu2)^2)-2)
m1<-optimable(x1, interval=c(a[1],b[1]), nmod=2, modes=c(2,4.5))$m
m2<-optimable(x2, interval=c(a[2],b[2]), nmod=2, modes=c(52.5,80))$m
erupt1<-mable(x1, M=mb[1], interval=c(a[1],b[1]))
erupt2<-mable(x1, M=m1, interval=c(a[1],b[1]))
wait1<-mable(x2, M=mb[2],interval=c(a[2],b[2]))
wait2<-mable(x2, M=m2,interval=c(a[2],b[2]))
For this data set, we obtain lower bounds for the marginal densities mb = (78, 28) and m̂ = (122, 34).
op<-par(mfrow=c(1,2), cex=0.8)
hist(x1, probability = TRUE, col="grey", border="white", main="", xlab="Eruptions",
ylim=c(0,.65), las=1)
plot(erupt1, add=TRUE,"density")
plot(erupt2, add=TRUE,"density",lty=2,col=2)
legend("topleft", lty=c(1,2),col=1:2, bty="n", cex=.7,
c(expression(paste("m = ", m[b])),expression(paste("m = ", hat(m)))))
hist(x2, probability = TRUE, col="grey", border="white", main="", xlab="Waiting", las=1)
plot(wait1, add=TRUE,"density")
plot(wait2, add=TRUE,"density",lty=2,col=2)
legend("topleft", lty=c(1,2),col=1:2, bty="n", cex=.7,
c(expression(paste("m = ", m[b])),expression(paste("m = ", hat(m)))))
With a grouped dataset, a frequency table of data from a continuous population, one can estimate the density from histogram using with an optimal degree m chosen from or with a given degree m using (Guan 2017).
Consider the chicken embryo data contain the number of hatched eggs on each day during the 21 days of incubation period. The times of hatching (n = 43) are treated as grouped by intervals with equal width of one day. The data were studied first by Jassim et al. (1996). Kuurman et al. (2003) and Lin and He (2006) also analyzed the data using the minimum Hellinger distance estimator, in addition to other methods assuming some parametric mixture models including Weibull model.
## $day
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
##
## $nT
## [1] 6 5 11 2 2 3 0 0 0 0 1 0 0 0 1 0 1 5 1 4 1
embryo<-mable.group(x = nT, breaks = a:b, M=c(2,100), interval = c(a, b), IC = "aic",
controls = mable.ctrl(sig.level = 1e-6, maxit = 2000, eps = 1.0e-7))
Day <- rep(day,nT)
op <- par(mfrow = c(1,2), lwd = 2)
layout(rbind(c(1, 2), c(3, 3)))
plot(embryo, which = "likelihood")
plot(embryo, which = "change-point")
fk <- density(x = rep((0:20)+.5, nT), bw = "sj", n = 101, from = a, to = b)
hist(Day, breaks = seq(a,b, length = 12), freq = FALSE, col = "grey", border = "white",
main = "Histogram and Density Estimates")
plot(embryo, which = "density", cex = 0.7, add = TRUE)
lines(fk, lty = 2, col = 2)
legend("top", lty = c(1:2), c("MABLE", "Kernel"), bty = "n", col = c(1:2))
We see in Figure that AIC and gamma change-point method give the same optimal degree as the one, m= 13, given by the exponential change-point method. However, BIC fails in choosing a useful model degree.
M <- embryo$M[1]:embryo$M[2]
aic <- embryo$ic$AIC
bic <- embryo$ic$BIC
res.gcp <- optim.gcp(embryo) # choose m by gamma change-point model
lr <- res.gcp$lr
plot(M, aic, cex = 0.7, col = 1, xlab = "m", ylab = "", ylim = c(ymin<-min(aic,bic,lr),
ymax<-max(aic,bic,lr)), main = "AIC, BIC, and LR")
points(M, bic, pch = 2, cex = 0.7, col = 2)
points(M[-1], lr, pch = 3, cex = 0.7, col = 4)
segments(which.max(aic)+M[1]-1->m1, ymin, m1, max(aic), lty = 2)
segments(which.max(bic)+M[1]-1->m2, ymin, m2, max(bic), lty = 2, col = 2)
segments(which.max(lr)+M[1]->m3, ymin, m3, max(lr), lty = 2, col = 4)
axis(1, c(m1,m2, m3), as.character(c(m1,m2,m3)), col.axis = 4)
legend("right", pch = 1:3, c("AIC", "BIC", "LR"), bty = "n", col = c(1,2,4))
The results are summarized as follows.
## Call: mable.group() for grouped data
## Obj Class: mable
## Input Data: nT
## Dimension of data: 1
## Optimal degree m: 13
## P-value of Change-point: 9.925959e-07
## Maximum loglikelihood: -107.318
## MABLE of p: can be retrieved using name 'p'
## Note: the optimal model degrees are selected by the method of change-point.
Consider the additive measurement error model Y = X + ϵ, where X has an unknown distribution F, ϵ has a known distribution G, and X and ϵ are independent. We want to estimate density f = F′ based on independent observations, yi = xi + ϵi, i = 1, …, n, of Y, where xi’s and ϵi’s are unobservable. implements the method of Guan (2021a) and gives an estimate of the density f using the approximate Bernstein polynomial model.
set.seed(123)
mu <- 1; sig <- 2; a <- mu - sig*5; b <- mu + sig*5;
gn <- function(x) dnorm(x, 0, 1)
n <- 50;
x <- rnorm(n, mu, sig); e <- rnorm(n); y <- x + e;
op <- par(mfrow = c(2,2), lwd = 2)
plot(decn, which = "likelihood")
plot(decn, which = "change-point", lgd.x = "right")
plot(xx<-seq(a, b, length = 100), yy<-dnorm(xx, mu, sig), type = "l", xlab = "x",
ylab = "Density", ylim = c(0, max(yy)*1.1))
plot(decn, which = "density", add = TRUE, lty = 2, col = 2)
# kernel density based on pure data
lines(density(x), lty = 5, col = 4)
legend("topright", bty = "n", lty = c(1,2,5), col = c(1,2,4), c(expression(f),
expression(hat(f)), expression(tilde(f)[K])))
plot(xx, yy<-pnorm(xx, mu, sig), type = "l", xlab = "x", ylab = "Distribution Function")
plot(decn, which = "cumulative", add = TRUE, lty = 2, col = 2)
legend("bottomright", bty = "n", lty = c(1,2), col = c(1,2), c(expression(F),
expression(hat(F))))
When data are interval censored, the ``’’ type observations are of the form (l, u) which is the interval containing the event time. Data is uncensored if l = u, right censored if u= or u= , and left censored data if l = 0.
Let f(t) and F(t) = 1 − S(t) be the density and cumulative distribution functions of the event time, respectively, on (0, τ), where τ ≤ ∞. If τ is unknown or τ = ∞, then f(t) on [0, τn] can be approximated by $fm(t; p)=\tau_n^{-1}\sum_{i=0}^m p_i\beta_{mi}(t/\tau_n)$, where pi ≥ 0, i = 0, …, m, $\sum_{i=0}^mp_i=1-p_{m+1}$, and τn is the largest observation, either uncensored time, or right endpoint of interval/left censored, or left endpoint of right censored time. So we can approximate S(t) on [0, τ] by $Sm(t; p)=\sum_{i=0}^{m+1} p_i \bar B_{mi}(t/\tau)$, where B̄mi(u) = 1 − ∫0uβmj(t)dt, i = 0, …, m, is the beta survival function with shapes (i + 1, m − i + 1), B̄m, m + 1(t) = 1, pm + 1 = 1 − π, and π = F(τn). For data without right-censored time, pm + 1 = 1 − π = 0. The search for optimal degree m among using the method of change-point is stopped if either is reached or the test for change-point results in a p-value smaller than . Guan (2021b) proposed a method, as a special case of a semiparametric regression model, for estimating p with an optimal degree m. The method is implemented in R function .
Consider the breast cosmesis data as described in Finkelstein and Wolfe (1985) is used to study the cosmetic effects of cancer therapy. The time-to-breast-retractions in months (T) were subject to interval censoring and were measured for 94 women among them 46 received radiation only (X = 0) (25 right-censored, 3 left-censored and 18 interval censored) and 48 received radiation plus chemotherapy (X = 1) (13 right-censored, 2 left-censored and 33 interval censored). The right-censored event times were for those women who did not experienced cosmetic deterioration.
We fit the Breast Cosmesis Data as two-sample data separately.
## left right treat
## 1 45 NA RT
## 2 6 10 RT
## 3 0 7 RT
bc.res0 <- mable.ic(cosmesis[cosmesis$treat == "RT",1:2], M = c(1,50), IC = "none")
bc.res1 <- mable.ic(cosmesis[cosmesis$treat == "RCT",1:2], M = c(1,50), IC = "none")
As the warning message suggested, we check the . The when the search stopped is 0.0452.
op <- par(mfrow = c(2,2),lwd = 2)
plot(bc.res0, which = "change-point", lgd.x = "right")
plot(bc.res1, which = "change-point", lgd.x = "right")
plot(bc.res0, which = "survival", xlab = "Months", ylim = c(0,1), main = "Radiation Only")
plot(bc.res1, which = "survival", xlab = "Months", main = "Radiation and Chemotherapy")
A d-variate density f on a hyperrectangle [a, b] = [a1, b1] × ⋯ × [ad, bd] can be approximated by a mixture of d-variate beta densities on [a, b], $\beta_{mj}(x; a, b)=\prod_{i=1}^d\beta_{m_i,j_i}[(x_i-a_i)/(b_i-a_i)]/(b_i-a_i)$, with proportions p(j1, …, jd), 0 ≤ ji ≤ mi, i = 1, …, d. Because all the marginal densities can be approximated by Bernstein polynomials, we can choose optimal degree mi based on observations of the i-th component of x. For the i-th marginal density, an optimal degree is selected using . Then fit the data using EM algorithm with the selected optimal degrees m = (m1, …, md) to obtain a vector p of the mixture proportions p(j1, …, jd), arranged in the column-major order of j = (j1, …, jd), (p0, …, pK − 1), where $K=\prod_{i=1}^d (m_i+1)$. The proposed method of Wang and Guan (2019) is implemented by function .
## eruptions waiting
## 1 3.600 79
## 2 1.800 54
## 3 3.333 74
#faith2 <- mable.mvar(faithful, M = c(60,30), interval = cbind(a,b))
faith2 <- mable.mvar(faithful, M = c(46,19), search =FALSE, interval = cbind(a,b))
The density surface for two-dimensional data can be plot using the method (see Figure ). The summarized results are given below.
## Call: mable.mvar() for multivariate data
## Obj Class: mable
## Input Data: eruptions waiting
## Dimension of data: 2
## Optimal degrees:
## eruptions waiting
## m 46 19
## P-value of Change-point:
## Maximum loglikelihood: 507.5615
## MABLE of p: can be retrieved using name 'p'
## Note: the optimal model degrees are selected by the method of change-point.
For multivariate density, the model degree can be chosen based on marginal data. As in Section we obtained mb = (78, 28) and m̂ = (122, 34).
oldfaith1<- mable.mvar(faithful, M = mb, search =FALSE, interval = cbind(a,b))
oldfaith2<- mable.mvar(faithful, M = c(m1,m2), search =FALSE, interval = cbind(a,b))
op<-par(mfrow=c(1,2), cex=0.7)
plot(oldfaith1, which="density", contour=TRUE)
plot(oldfaith2, which="density", contour=TRUE, add=TRUE, lty=2, col=2)
plot(oldfaith1, which="cumulative", contour=TRUE)
plot(oldfaith2, which="cumulative", contour=TRUE, add=TRUE, lty=2, col=2)
Let T be an event time and X be an associated d-dimensional covariate with distribution H(x) on $\cal{X}$. We denote the marginal and the conditional survival functions of T, respectively, by S(t) = F̄(t) = 1 − F(t) = Pr (T > t) and S(t|x) = F̄(t|x) = 1 − F(t|x) = Pr (T > t|X = x). Let f(t|x) denote the conditional density of a continuous T given X = x. The conditional cumulative hazard function and odds ratio, respectively, Λ(t|x) = −log S(t|x) and O(t|x) = F(t|x)/S(t|x). We will consider the general situation where the event time is subject to interval censoring. The observed data are (X, Y), where Y = (Y1, Y2], 0 ≤ Y1 ≤ Y2 ≤ ∞. The reader is referred to Huang and Wellner (1997) for a review and more references about interval censoring. Special cases are right-censoring Y2 = ∞, left-censoring Y1 = 0, and current status data.
Let f0(⋅) = f(⋅ ∣ x0), F0(⋅) = F(⋅ ∣ x0), and S0(⋅) = S(⋅ ∣ x0) be the baseline density, cumulative distribution, and survival functions, respectively. If the baseline density f0 has support or can be truncated by [0, τ] then $$f_0(t)\approx f_m(t; \bm p)=\frac{1}{\tau}\sum_{j=0}^m p_j\beta_{mj}\Big(\frac{t}{\tau}\Big),\quad t\in[0,\tau],$$ where p = p(x0) = (p0, …, pm)⊤, $\sum_{j=0}^{m}p_j=1$, pj ≥ 0, $\beta_{mj}(t)=(m+1){m\choose j}t^j(1-t)^{m-j}$. Thus we have approximations $$F_0(t)\approx F_m(t; \bm p)= \sum_{j=0}^{m} p_j B_{mj}\Big(\frac{t}{\tau}\Big),\quad t\in[0,\tau],$$ $$S_0(t)\approx S_m(t; \bm p)= \sum_{j=0}^{m} p_j \bar B_{mj}\Big(\frac{t}{\tau}\Big),\quad t\in[0,\tau],$$ where Bmj(t) = ∫0tβmj(u)du, and B̄mj(t) = 1 − Bmj(t), j = 0, …, m.
The accelerated failure time (AFT) model can be specified as where γ ∈ 𝔾 ⊂ ℝd. Let γ0 ∈ 𝔾 be the true value of γ. The AFT model () is equivalent to S(t ∣ x; γ) = S(teγTx ∣ 0), t ∈ [0, ∞). Thus this is actually a scale regression model. The AFT model can also be written as linear regression log (T) = −γTx + ε. It is clear that one can choose any x0 in 𝒳 as baseline by transform x̃ = x − x0. If f0(t) = f(t ∣ x0) has support [0, τ), τ ≤ ∞, then f(t ∣ x) has support [0, τe−γ0Tx̃). The above AFT model can also be written as f(t ∣ x; γ) = eγTx̃f0(teγTx̃), S(t ∣ x; γ) = S0(teγTx̃), where f0(t) = f(t ∣ x0) and S0(t) = S(t ∣ x0) = ∫t∞f0(u)du.
We choose τ > y(n) = max {yi1, yj2 : yj2 < ∞; i, j = 1, …, n} so that S(τ) and maxx ∈ 𝒳S(τ ∣ x) are believed very small. Guan (2023) proposed to approximate f0(t) and S0(t) on [0, τ], respectively, by Then f(t ∣ x; γ) and S(t ∣ x; γ) can be approximated, respectively, by Guan (2023)’s proposal is implemented by functions and .
library(mable)
g <- 0.41 # Hanson and Johnson 2004, JCGS
aft.res<-mable.aft(cbind(left, right) ~ treat, data = cosmesis, M =c(1, 30), g=.41,
tau =100, x0=data.frame(treat = "RCT"))
## Call: mable.aft(cbind(left, right) ~ treat)
## Data: cosmesis
## Obj Class: mable_reg
## Dimension of response: 1
## Dimension of covariate: 1
## Optimal degree m: 6
## P-value: 0.008966527
## Maximum loglikelihood: -143.1529
## MABLE of p: can be retrieved using name 'p'
##
## Estimate Std.Error z value Pr(>|z|) Baseline x0
## treatRCT 0.57792280 0.13888899 4.16104112 0.00003168 1
##
## Note: the optimal model degree is selected by the method of change-point.
op <- par(mfrow = c(1,2), lwd = 1.5)
plot(x = aft.res, which = "likelihood")
plot(x = aft.res, y = data.frame(treat = "RT"), which = "survival",
type = "l", col = 1, main = "Survival Function")
plot(x = aft.res, y = data.frame(treat = "RCT"), which = "survival",
lty = 2, col = 1, add = TRUE)
legend("topright", bty = "n", lty = 1:2, col = 1, c("Radiation Only",
"Radiation & Chemotherapy"), cex = .7)
Alternatively we can use .
Consider the Cox proportional hazard regression model (Cox 1972) where γ ∈ 𝔾 ⊂ ℝd, x̃ = x − x0, x0 is any fixed covariate value, f0(⋅) = f(⋅|x0) is the unknown baseline density and S0(t) = ∫t∞f0(s)ds. Define τ = inf {t : F(t|x0) = 1}. It is true that τ is independent of x0, 0 < τ ≤ ∞, and f(t|x) have the same support [0, τ] for all $x\in \cal{X}$. Let (xi, yi) = (xi, (yi1, yj2]), i = 1, …, n, be independent observations of (X, Y) and τn ≥ y(n) = max {yi1, yj2 : yj2 < ∞; i, j = 1, …, n}. Given any x0, denote π = π(x0) = 1 − S0(τn). For integer m ≥ 1 we define $\mathbb{S}_m\equiv \{(u_0,\ldots,u_m)^T\in \mathbb{R}^{m+1}: u_i\ge 0, \sum_{i=0}^m u_i=1.\}.$ Guan (2021b) propose to approximate f0(t) on [0, τn] by $f_m(t; p) = \tau_n^{-1}\sum_{i=0}^{m} p_i\beta_{mi}(t/\tau_n)$, where p = p(x0) = (p0, …, pm + 1) are subject to constraints p ∈ 𝕊m + 1 and pm + 1 = 1 − π. Here the dependence of π and p on x0 will be suppressed. If π < 1, although we cannot estimate the values of f0(t) on (τn, ∞), we can put an arbitrary guess on them such as fm(t; p) = pm + 1α(t − τn), t ∈ (τn, ∞), where α(⋅) is a density on [0, ∞) such that (1 − π)α(0) = (m + 1)pm/τn so that fm(t; p) is continuous at t = τn, e.g., α(t) = α(0)exp [−α(0)t]. If τ is finite and known we choose τn = τ and specify pm + 1 = 0. Otherwise, we choose τn = y(n). For data without right-censoring or covariate we also have to specify pm + 1 = 0 due to its unidentifiability.
The above method is implemented in function which returns maximum approximate Bernstein likelihood estimates of (γ, p) with an optimal model degree m and a prespecified m, respectively. With a efficient estimate of γ obtained using other method, can be used to get an optimal degree m and a mable of p.
The method for class object returned by all the above functions produces graphs of the loglikelihoods at m in a set of consecutive candidate model degrees, the likelihood ratios of change-point at m in , estimated density and survival function on the truncated =[0, τn].
The Ovarian Cancer Survival Data is included in package .
library(survival)
futime2 <- ovarian$futime
futime2[ovarian$fustat==0] <- Inf
ovarian2 <- data.frame(age = ovarian$age, futime1 = ovarian$futime, futime2 = futime2)
head(ovarian2, 3)
## age futime1 futime2
## 1 72.3315 59 59
## 2 74.4932 115 115
## 3 66.4658 156 156
ova<-mable.ph(cbind(futime1, futime2) ~ age, data = ovarian2, M = c(2,35),
g = .16, x0=data.frame(age=35))
## Call: mable.ph(cbind(futime1, futime2) ~ age)
## Data: ovarian2
## Obj Class: mable_reg
## Dimension of response: 1
## Dimension of covariate: 1
## Optimal degree m: 23
## P-value: 0.008999588
## Maximum loglikelihood: -85.73586
## MABLE of p: can be retrieved using name 'p'
##
## Estimate Std.Error z value Pr(>|z|) Baseline x0
## age 1.7669e-01 1.0517e-02 1.6800e+01 2.4435e-63 35
##
## Note: the optimal model degree is selected by the method of change-point.
op <- par(mfrow = c(2,2))
plot(ova, which = "likelihood")
plot(ova, which = "change-point")
plot(ova, y=data.frame(age=60), which="survival", type="l", xlab="Days", main="Age = 60")
plot(ova, y=data.frame(age=65), which="survival", type="l", xlab="Days", main="Age = 65")
Alternatively we can use .
As an important alternative of the Cox proportional hazards regression model (Cox 1972) the proportional odds (PO) model (McCullagh 1980; Bennett 1983a, 1983b; Pettitt 1984) is defined by where $\tilde{\bm x}=\bm x-\bm x_0$ and O(t|x0) is an unknown increasing baseline odds function on (0, ∞) and γ are unknown regression coefficients. If the baseline density f0 has support or can be truncated by [0, τ] then $$f_0(t)\approx f_m(t; \bm p)=\frac{1}{\tau}\sum_{j=0}^m p_j\beta_{mj}\Big(\frac{t}{\tau}\Big),\quad t\in[0,\tau].$$
Therefore f(t ∣ x; γ, f0), F(t ∣ x; γ, f0), and S(t ∣ x; γ, f0) can be approximated, respectively, by
The data set is contained in R package . This is an interval-censored data. The semiparametric methods (package ) are available for PH and PO (but not AFT) models based on interval-censored data. The estimated survival functions are plotted in Fig .
## Interval-censored data on times to HIV infection. Carstensen (1996)
## see also Lawless and Babineau (2006)
# inspection times(months): started in Dec. 31, 1980,
it<-c(0, 12, 16, 27, 45, 76, 101, Inf)
# observed frequencies d[l,r] in interval (it[l], it[r]], 0<=l<r<=7
x<-c(24, 0, 4, 2,1,4,0,10,0,3,0,3,1,0,5,0,4,0,2,1,1,0,61,8,15,22,34,92)
n<-sum(x)
d<-matrix(c(24,rep(0,6),
0, 4, rep(0,5),
2,1,4,0,0,0,0,
0,10,0,3,0,0,0,
0,3,1,0,5,0,0,
0,4,0,2,1,1,0,
0,61,8,15,22,34,92), nrow=7, ncol=7)
# See Fay's tutorial
# us: whether visited the US.
# byr: birth year
# age: age at entry
# age=hiv$bth+30,
# pyr: Annual number of sexual partners.
library(Epi)
data(hivDK)
l<-as.numeric(hivDK$well-hivDK$entry)
l.na<-l
l[is.na(l)]<-0
r<- as.numeric(hivDK$ill - hivDK$entry)
r.na<-r
r[is.na(r)]<-Inf
hiv<-data.frame(l,r, us=hivDK$us, byr=hivDK$bth+1950, age=30-hivDK$bth, pyr=hivDK$pyr)
## icenReg
library(icenReg)
#library(interval)
phfit<-ic_sp(cbind(l,r)~us+age+pyr, data=hiv, bs_samples =100, model = 'ph')
pofit<-ic_sp(cbind(l,r)~us+age+pyr, data=hiv, bs_samples =100, model = 'po')
# MABLE
require(mable)
gama<-phfit$coefficients
tau<-5500 # truncation interval [0,tau]
# Mable PH model
x0<-data.frame(us=0,age=72,pyr=0)
#mblph<-mable.ph(cbind(l,r)~us+age+pyr, data=hiv, M=c(2,40), g=gama, x0=x0, tau=tau)
mblph<-mable.ph(cbind(l,r)~us+age+pyr, data=hiv, M=22, g=gama, x0=x0, tau=tau)
# Mable PO model:
mblpo<-mable.po(cbind(l,r)~us+age+pyr, data=hiv, M=c(3,20), g=gama, x0=x0, tau=tau)
# Mable AFT model: m=12 selected from M=c(5,50)
mblaft<-mable.aft(cbind(l,r)~us+age+pyr, data=hiv, M=12, g=NULL, x0=x0, tau=tau,
controls=mable.ctrl(sig.level=0.05, eps.em=1e-7, maxit.em=20000))
newdata=data.frame(us=0:1,age=mean(hiv$age),pyr=mean(hiv$pyr))
par(mfrow=c(2,2),lwd=1.5, cex=.7, mar=c(4,4,1,1), oma=c(0,0,3,1))
# upper-left panel
plot(c(0,3057), c(.6,1), type="n", xlab="Days", ylab="Probability")
lines(phfit, y=newdata[1,], lty=3, col="brown")
lines(phfit, y=newdata[2,], lty=4, col=4)
plot(mblpo, y=newdata[1,], which="survival", add=TRUE, col=1, lty=1)
plot(mblpo, y=newdata[2,], which="survival", add=TRUE, col=2, lty=2)
legend("topright", lty=c(1:4), col=c(1:2,"brown",4), bty="n",
legend=paste0(c("mable-po: us = ", "mable-po: us = ",
"semip-ph: us = ", "semip-ph: us = "), c(rep(newdata$us,2))))
# upper-right panel
plot(c(0,3057), c(.6,1), type="n", xlab="Days", ylab="Probability")
lines(pofit, y=newdata[1,], lty=3, col="brown")
lines(pofit, y=newdata[2,], lty=4, col=4)
plot(mblpo, y=newdata[1,], which="survival", add=TRUE, col=1, lty=1)
plot(mblpo, y=newdata[2,], which="survival", add=TRUE, col=2, lty=2)
legend("topright", lty=c(1:4), col=c(1:2,"brown",4), bty="n",
legend=paste0(c("mable-po: us = ", "mable-po: us = ",
"semip-po: us = ", "semip-po: us = "), c(rep(newdata$us,2))))
# lower-left panel
#newdata2=data.frame(us=mean(hiv$us),age=c(31,50),pyr=mean(hiv$pyr))
plot(c(0,3057), c(.6,1), type="n", xlab="Days", ylab="Probability")
plot(mblpo, y=newdata[1,], which="survival", add=TRUE, col=1, lty=1)
plot(mblpo, y=newdata[2,], which="survival", add=TRUE, col=2, lty=2)
plot(mblph, y=newdata[1,], which="survival", add=TRUE, col="brown", lty=3)
plot(mblph, y=newdata[2,], which="survival", add=TRUE, col=4, lty=4)
legend("topright", lty=1:4, col=c(1,2,"brown",4), bty="n",
legend=paste0(c("mable-po: us = ", "mable-po: us = ",
"mable-ph: us = ", "mable-ph: us = "), c(rep(newdata$us,2))))
# lower-right panel
plot(c(0,3057), c(.6,1), type="n", xlab="Days", ylab="Probability")
plot(mblpo, y=newdata[1,], which="survival", add=TRUE, col=1, lty=1)
plot(mblpo, y=newdata[2,], which="survival", add=TRUE, col=2, lty=2)
plot(mblaft, y=newdata[1,], which="survival", add=TRUE, col="brown", lty=3)
plot(mblaft, y=newdata[2,], which="survival", add=TRUE, col=4, lty=4)
legend("topright", lty=1:4, col=c(1,2,"brown",4), bty="n",
legend=paste0(c("mable-po: us = ", "mable-po: us = ",
"mable-aft: us = ", "mable-aft: us = "), c(rep(newdata$us,2))))
mtext("Survival Curves: Visited US or Not", side=3, line=1, outer=TRUE, cex=1, font=2)
In addition to the AFT, PH and PO regression models with single binary covariate, other two-sample semiparametric models can be estimated using the maximum approximate Bernstein/Beta likelihood method.
Two-sample density ratio(DR) model (see for example, Qin and Zhang (1997) and Qin and Zhang (2005)) is useful for fitting case-control data. Suppose that the densities f0 and f1 of X0 and X1, respectively, satisfy the following density ratio model (see Cheng and Chu (2004) and Qin and Zhang (2005), for example) where α = (α0, …, αd)T ∈ 𝒜 ⊂ Rd + 1, and r̃(x) = (1, rT(x))T with linearly independent components. Guan (2021c) proposed to approximate the density f0 by Bernstein polynomial and to estimate f0 and α using the maximum approximate Bernstein/Beta likelihood estimation. This method is implemented by function and in which α is a given estimated value such as the one obtained by the logistic regression as described in Qin and Zhang (1997) and Qin and Zhang (2005).
# Hosmer and Lemeshow (1989):
# ages and the status of coronary disease (CHD) of 100 subjects
x<-c(20, 23, 24, 25, 26, 26, 28, 28, 29, 30, 30, 30, 30, 30, 32,
32, 33, 33, 34, 34, 34, 34, 35, 35, 36, 36, 37, 37, 38, 38, 39,
40, 41, 41, 42, 42, 42, 43, 43, 44, 44, 45, 46, 47, 47, 48, 49,
49, 50, 51, 52, 55, 57, 57, 58, 60, 64)
y<-c(25, 30, 34, 36, 37, 39, 40, 42, 43, 44, 44, 45, 46, 47, 48,
48, 49, 50, 52, 53, 53, 54, 55, 55, 56, 56, 56, 57, 57, 57, 57,
58, 58, 59, 59, 60, 61, 62, 62, 63, 64, 65, 69)
a<-20; b<-70
regr<-function(x) cbind(1,x)
z<-seq(a,b,length=512)
f0hat<-dmixbeta(z, p=chd.mable$p, interval=c(a, b))
rf<-function(x) regr((x-a)/(b-a))
f1hat<-dtmixbeta(z, p=chd.mable$p, alpha=chd.mable$alpha,
interval=c(a, b), regr=regr)
op<-par(mfrow=c(1,2),lwd=1.2, cex=.7, mar=c(5,4,1,1))
hist(x, freq=F, col = "light grey", border = "white", xlab="Age",
ylab="Density", xlim=c(a,b), ylim=c(0,.055), main="Control")
lines(z, f0hat, lty=1, col=1)
hist(y, freq=F, col = "light grey", border = "white", xlab="Age",
ylab="Density", xlim=c(a,b), ylim=c(0,.055), main="Case")
lines(z, f1hat, lty=1, col=1)
For the logarithmic levels of CA 19-9 of the Pancreatic Cancer Data (Wieand et al. (1989)), the control group is used as baseline because the optimal model degree is smaller that the one using case as baseline.
## ca199 ca125 status
## 1 28.0 13.3 0
## 2 15.5 11.1 0
## 3 8.2 16.7 0
x<-log(pancreas$ca199[pancreas$status==0])
y<-log(pancreas$ca199[pancreas$status==1])
a<-min(x,y); b<-max(x,y)
M<-c(1,29)
regr<-function(x) cbind(1,x,x^2)
m=maple.dr(x, y, M, regr=regr, interval=c(a,b), controls=mable.ctrl(sig.level=.001))$m
pc.mable<-mable.dr(x, y, M=m, regr=regr, interval=c(a,b),
controls=mable.ctrl(sig.level=1/length(c(x,y))))
#pc.mable
z<-seq(a,b,length=512)
# baseline is "case"
f1hat<-dmixbeta(z, p=pc.mable$p, interval=c(a, b))
rf<-function(x) regr((x-a)/(b-a))
f0hat<-dtmixbeta(z, p=pc.mable$p, alpha=pc.mable$alpha,
interval=c(a, b), regr=regr)
op<-par(mfrow=c(1,2),lwd=1.2, cex=.7, mar=c(5,4,1,1))
hist(x, freq=F, col = "light grey", border = "white", xlab="log(CA19-9)",
ylab="Density", xlim=c(a,b), main="Control")
lines(z, f0hat, lty=1, col=1)
hist(y, freq=F, col = "light grey", border = "white", xlab="log(CA19-9)",
ylab="Density", xlim=c(a,b), ylim=c(0,.5), main="Case")
lines(z, f1hat, lty=1, col=1)