Title: | Maximum Approximate Bernstein/Beta Likelihood Estimation |
---|---|
Description: | Fit data from a continuous population with a smooth density on finite interval by an approximate Bernstein polynomial model which is a mixture of certain beta distributions and find maximum approximate Bernstein likelihood estimator of the unknown coefficients. Consequently, maximum likelihood estimates of the unknown density, distribution functions, and more can be obtained. If the support of the density is not the unit interval then transformation can be applied. This is an implementation of the methods proposed by the author of this package published in the Journal of Nonparametric Statistics: Guan (2016) <doi:10.1080/10485252.2016.1163349> and Guan (2017) <doi:10.1080/10485252.2017.1374384>. For data with covariates, under some semiparametric regression models such as Cox proportional hazards model and the accelerated failure time model, the baseline survival function can be estimated smoothly based on general interval censored data. |
Authors: | Zhong Guan [aut, cre] |
Maintainer: | Zhong Guan <[email protected]> |
License: | LGPL (>= 2.0, < 3) |
Version: | 4.1.1 |
Built: | 2024-11-01 06:28:29 UTC |
Source: | CRAN |
The chicken embryo dataset which contains day
, number of days, and nT
, the corresponding frequencies.
data(chicken.embryo)
data(chicken.embryo)
The format is: List of 2: day: int [1:21] 1 2 3 4 5 6 7 8 9 10 ...; nT : int [1:21] 6 5 11 2 2 3 0 0 0 0 ...
Jassim, E. W., Grossman, M., Koops, W. J. And Luykx, R. A. J. (1996). Multi-phasic analysis of embryonic mortality in chickens. Poultry Sci. 75, 464-71.
Kuurman, W. W., Bailey, B. A., Koops, W. J. And Grossman, M. (2003). A model for failure of a chicken embryo to survive incubation. Poultry Sci. 82, 214-22.
Guan, Z. (2017) Bernstein polynomial model for grouped continuous data. Journal of Nonparametric Statistics, 29(4):831-848.
data(chicken.embryo)
data(chicken.embryo)
Exponential change-point
chpt.exp(x)
chpt.exp(x)
x |
a vector of nondecreasing values of log-likelihood or -log of distance |
a list of exponential change-point, its p-value, and the likelihood ratios of the exponential change-point model
Parametric bivariate copulas, densities, and random number generators
d2dcop.asym(u, v, lambda, copula = "clayton", ...) p2dcop.asym(u, v, lambda, copula = "clayton", ...) r2dcop.asym(n, lambda, copula = "clayton", ...) dcopula(u, v, copula, ...) pcopula(u, v, copula, ...) rcopula(n, copula, ...)
d2dcop.asym(u, v, lambda, copula = "clayton", ...) p2dcop.asym(u, v, lambda, copula = "clayton", ...) r2dcop.asym(n, lambda, copula = "clayton", ...) dcopula(u, v, copula, ...) pcopula(u, v, copula, ...) rcopula(n, copula, ...)
u , v
|
vectors of same length at which the copula and its density is evaluated |
lambda |
a vector of three mixing proportions which sum to one |
copula |
the name of a copula to be called or a base copula for construncting asymmetric copula(see Details) |
... |
the parameter(s) of |
n |
number of random vectors to be generated |
The names of available copulas are 'amh'
(Ali-Mikhai-Haq), 'bern'
(Bernstein polynomial model),
'clayton'
(Clayton), 'exponential'
(Exponential), 'fgm'
(Farlie–Gumbel–Morgenstern),
'frank'
(Frank), 'gauss'
(Gaussian), 'gumbel'
(Gumbel),
'indep'
(Independence), 'joe'
(Joe), 'nakagami'
(Nakagami-m), 'plackett'
(Plackett),
't'
(Student's t).
d2dcop.asym
, etc, calculate the constructive assymmetric copula of Wu (2014)
using base copula
with mixing proportions
and
parameter values
:
.
If
copula='t'
or 'nakagami'
, df
or m
must be also given.
a vector of copula ot its density values evaluated at (u,v)
or an n x 2
matrix of the generated observations
Nelsen, R. B. (1999). An Introduction to Copulas. Springer Series in Statistics. New York: Springer. Wu, S. (2014). Construction of asymmetric copulas and its application in two-dimensional reliability modelling. European Journal of Operational Research 238 (2), 476–485.
Density, distribution function, quantile function and
random generation for conditional copula of
given
related to parametric bivariate copula
.
dcopula.cond(u, v, copula, ...) pcopula.cond(u, v, copula, ...) qcopula.cond(p, v, copula, ...) rcopula.cond(n, v, copula, ...)
dcopula.cond(u, v, copula, ...) pcopula.cond(u, v, copula, ...) qcopula.cond(p, v, copula, ...) rcopula.cond(n, v, copula, ...)
u |
vector of |
v |
a given value of |
copula |
the name of a copula density to be called (see Details) |
... |
the parameter(s) of |
p |
a vector of probabilities |
n |
number of observations to be generated from conditional copula |
the names of available copulas are 'amh'
(Ali-Mikhai-Haq), 'bern'
(Bernstein polynomial model),
'clayton'
(Clayton), 'exponential'
(Exponential),
'fgm'
(Farlie–Gumbel–Morgenstern), 'frank'
(Frank),
'gauss'
(Gaussian), 'gumbel'
(Gumbel), 'indep'
(Independence),
'joe'
(Joe), 'nakagami'
(Nakagami-m), 'plackett'
(Plackett),
't'
(Student's t).
a vector of copula density values evaluated at u
gvien V=v
or a vector of n
generated u
values from conditional copula .
Bhattacharyya coefficient and Hellinger correlation
corr.hellinger(dcopula, ...)
corr.hellinger(dcopula, ...)
dcopula |
a function object defining a 2d copula density function |
... |
argument(s) of copula density function |
Bhattacharyya coefficient B
and Hellinger correlation eta
Geenens, G. and Lafaye de Micheaux, P. (2022). The Hellinger correlation. Journal of the American Statistical Association 117(538), 639–653.
Data contain the interval-censored times to cosmetic deterioration for breast cancer patients undergoing radiation or radiation plus chemotherapy.
data(cosmesis)
data(cosmesis)
A data frame with 94 observations on the following 3 variables.
left left endpoint of the censoring interval in months
right right endpoint of the censoring interval in months
treat a factor with levels RT
and RCT
representing radiotherapy-only
and radiation plus chemotherapy treatments, respectively
Finkelstein, D. M. and Wolfe, R. A. (1985) A semiparametric model for regression analysis of interval-censored failure time data. Biometrics 41, 933–945.
Finkelstein, D. M. (1986) A proportional hazards model for interval-censored failure time data. Biometrics 42, 845–854.
data(cosmesis)
data(cosmesis)
Density, distribution function, quantile function and
pseudorandom number generation for the Bernstein polynomial model,
mixture of beta distributions, with shapes ,
,
given mixture proportions
and support
interval
.
dmixbeta(x, p, interval = c(0, 1)) pmixbeta(x, p, interval = c(0, 1)) qmixbeta(u, p, interval = c(0, 1)) rmixbeta(n, p, interval = c(0, 1))
dmixbeta(x, p, interval = c(0, 1)) pmixbeta(x, p, interval = c(0, 1)) qmixbeta(u, p, interval = c(0, 1)) rmixbeta(n, p, interval = c(0, 1))
x |
a vector of quantiles |
p |
a vector of |
interval |
support/truncation interval |
u |
a vector of probabilities |
n |
sample size |
The density of the mixture beta distribution on an interval can be written as a
Bernstein polynomial
,
where
,
,
and
,
,
is the beta density with shapes
. The cumulative distribution
function is
, where
,
, is the beta cumulative distribution function
with shapes
. If
, then
is a truncated desity on
with cumulative distribution function
. The argument
p
may be any numeric vector of m+1
values when pmixbeta()
and and qmixbeta()
return the integral
function and its inverse, respectively, and
dmixbeta()
returns a Bernstein polynomial . If components of
p
are not
all nonnegative or do not sum to one, warning message will be returned.
A vector of or
values at
.
dmixbeta
returns the density, pmixbeta
returns the cumulative
distribution function, qmixbeta
returns the quantile function, and
rmixbeta
generates pseudo random numbers.
Zhong Guan <[email protected]>
Bernstein, S.N. (1912), Demonstration du theoreme de Weierstrass fondee sur le calcul des probabilities, Communications of the Kharkov Mathematical Society, 13, 1–2.
Guan, Z. (2016) Efficient and robust density estimation using Bernstein type polynomials. Journal of Nonparametric Statistics, 28(2):250-271.
Guan, Z. (2017) Bernstein polynomial model for grouped continuous data. Journal of Nonparametric Statistics, 29(4):831-848.
# classical Bernstein polynomial approximation a<--4; b<-4; m<-200 x<-seq(a,b,len=512) u<-(0:m)/m p<-dnorm(a+(b-a)*u) plot(x, dnorm(x), type="l") lines(x, (b-a)*dmixbeta(x, p, c(a, b))/(m+1), lty=2, col=2) legend(a, dnorm(0), lty=1:2, col=1:2, c(expression(f(x)==phi(x)), expression(B^{f}*(x))))
# classical Bernstein polynomial approximation a<--4; b<-4; m<-200 x<-seq(a,b,len=512) u<-(0:m)/m p<-dnorm(a+(b-a)*u) plot(x, dnorm(x), type="l") lines(x, (b-a)*dmixbeta(x, p, c(a, b))/(m+1), lty=2, col=2) legend(a, dnorm(0), lty=1:2, col=1:2, c(expression(f(x)==phi(x)), expression(B^{f}*(x))))
Density, distribution function, and
pseudorandom number generation for the multivariate Bernstein polynomial model,
mixture of multivariate beta distributions, with given mixture proportions
, given degrees
,
and support
interval
.
dmixmvbeta(x, p, m, interval = NULL) pmixmvbeta(x, p, m, interval = NULL) rmixmvbeta(n, p, m, interval = NULL)
dmixmvbeta(x, p, m, interval = NULL) pmixmvbeta(x, p, m, interval = NULL) rmixmvbeta(n, p, m, interval = NULL)
x |
a matrix with |
p |
a vector of |
m |
a vector of degrees, |
interval |
a vector of two endpoints or a |
n |
sample size |
dmixmvbeta()
returns a linear combination of
-variate beta densities
on
,
,
with coefficients
,
, where
is a hyperrectangle, and the
coefficients are arranged in the column-major order of
,
, where
.
pmixmvbeta()
returns a linear combination of the distribution
functions of
-variate beta distribution.
If all 's are nonnegative and sum to one, then
p
are the mixture proportions of the mixture multivariate beta distribution.
Density, distribution function, quantile function and
pseudorandom number generation for the exponentially tilted mixture of
beta distributions, with shapes ,
,
given mixture proportions
and support
interval
.
dtmixbeta(x, p, alpha, interval = c(0, 1), regr, ...) ptmixbeta(x, p, alpha, interval = c(0, 1), regr, ...) qtmixbeta(u, p, alpha, interval = c(0, 1), regr, ...) rtmixbeta(n, p, alpha, interval = c(0, 1), regr, ...)
dtmixbeta(x, p, alpha, interval = c(0, 1), regr, ...) ptmixbeta(x, p, alpha, interval = c(0, 1), regr, ...) qtmixbeta(u, p, alpha, interval = c(0, 1), regr, ...) rtmixbeta(n, p, alpha, interval = c(0, 1), regr, ...)
x |
a vector of quantiles |
p |
a vector of |
alpha |
regression coefficients |
interval |
support/truncation interval |
regr |
regressor vector function |
... |
additional arguments to be passed to regr |
u |
a vector of probabilities |
n |
sample size |
The density of the mixture exponentially tilted beta distribution on an
interval can be written
,
where
,
,
and
,
,
is the beta density with shapes
. The cumulative distribution
function is
, where
,
, is the exponentially tilted
beta cumulative distribution function with shapes
.
A vector of or
values at
.
dmixbeta
returns the density, pmixbeta
returns the cumulative
distribution function, qmixbeta
returns the quantile function, and
rmixbeta
generates pseudo random numbers.
Zhong Guan <[email protected]>
Guan, Z., Application of Bernstein Polynomial Model to Density and ROC Estimation in a Semiparametric Density Ratio Model
# classical Bernstein polynomial approximation a<--4; b<-4; m<-200 x<-seq(a,b,len=512) u<-(0:m)/m p<-dnorm(a+(b-a)*u) plot(x, dnorm(x), type="l") lines(x, (b-a)*dmixbeta(x, p, c(a, b))/(m+1), lty=2, col=2) legend(a, dnorm(0), lty=1:2, col=1:2, c(expression(f(x)==phi(x)), expression(B^{f}*(x))))
# classical Bernstein polynomial approximation a<--4; b<-4; m<-200 x<-seq(a,b,len=512) u<-(0:m)/m p<-dnorm(a+(b-a)*u) plot(x, dnorm(x), type="l") lines(x, (b-a)*dmixbeta(x, p, c(a, b))/(m+1), lty=2, col=2) legend(a, dnorm(0), lty=1:2, col=1:2, c(expression(f(x)==phi(x)), expression(B^{f}*(x))))
Maximum approximate Bernstein/Beta likelihood estimation based on
one-sample raw data with an optimal selected by the change-point method among m0:m1
or a preselected model degree m
.
mable( x, M, interval = 0:1, IC = c("none", "aic", "hqic", "all"), vb = 0, controls = mable.ctrl(), progress = TRUE )
mable( x, M, interval = 0:1, IC = c("none", "aic", "hqic", "all"), vb = 0, controls = mable.ctrl(), progress = TRUE )
x |
a (non-empty) numeric |
M |
a positive integer or a vector |
interval |
a vector containing the endpoints of supporting/truncation interval |
IC |
information criterion(s) in addition to Bayesian information criterion (BIC). Current choices are "aic" (Akaike information criterion) and/or "qhic" (Hannan–Quinn information criterion). |
vb |
code for vanishing boundary constraints, -1: f0(a)=0 only, 1: f0(b)=0 only, 2: both, 0: none (default). |
controls |
Object of class |
progress |
if TRUE a text progressbar is displayed |
Any continuous density function on a known closed supporting interval
can be
estimated by Bernstein polynomial
,
where
,
,
and
,
,
is the beta density with shapes
.
For each
m
, the MABLE of the coefficients p
, the mixture proportions, are
obtained using EM algorithm. The EM iteration for each candidate m
stops if either
the total absolute change of the log likelihood and the coefficients of Bernstein polynomial
is smaller than eps
or the maximum number of iterations maxit
is reached.
If m0<m1
, an optimal model degree is selected as the change-point of the increments of
log-likelihood, log likelihood ratios, for . Alternatively,
one can choose an optimal degree based on the BIC (Schwarz, 1978) which are evaluated at
. The search for optimal degree
m
is stoped if either
m1
is reached with a warning or the test for change-point results in a p-value pval
smaller than sig.level
. The BIC for a given degree m
is calculated as in
Schwarz (1978) where the dimension of the model is and a default
is chosen as
.Machine$double.eps
.
If data show a clearly multimodal distribution by plotting the histogram for example,
the model degree is usually large. The range M
should be large enough to cover the
optimal degree and the computation is time-consuming. In this case the iterative method
of moment with an initial selected by a method of mode which is implemented
by optimable
can be used to reduce the computation time.
A list with components
m
the given or a selected degree by method of change-point
p
the estimated vector of mixture proportions
with the selected/given optimal degree
m
mloglik
the maximum log-likelihood at degree m
interval
support/truncation interval (a,b)
convergence
An integer code. 0 indicates successful completion (all the EM iterations are convergent and an optimal degree
is successfully selected in M
). Possible error codes are
1, indicates that the iteration limit maxit
had been reached in at least one EM iteration;
2, the search did not finish before m1
.
delta
the convergence criterion delta
value
and, if m0<m1
,
M
the vector (m0, m1)
, where m1
, if greater than m0
, is the
largest candidate when the search stoped
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
ic
a list containing the selected information criterion(s)
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
Since the Bernstein polynomial model of degree is nested in the model of
degree
, the maximum likelihood is increasing in
. The change-point method
is used to choose an optimal degree
. The degree can also be chosen by a method of
moment and a method of mode which are implemented by function
optimal()
.
Zhong Guan <[email protected]>
Guan, Z. (2016) Efficient and robust density estimation using Bernstein type polynomials. Journal of Nonparametric Statistics, 28(2):250-271. Wang, T. and Guan, Z.,(2019) Bernstein Polynomial Model for Nonparametric Multivariate Density, Statistics, Vol. 53, no. 2, 321-338
# Vaal Rive Flow Data data(Vaal.Flow) x<-Vaal.Flow$Flow res<-mable(x, M = c(2,100), interval = c(0, 3000), controls = mable.ctrl(sig.level = 1e-8, maxit = 2000, eps = 1.0e-9)) op<-par(mfrow = c(1,2),lwd = 2) layout(rbind(c(1, 2), c(3, 3))) plot(res, which = "likelihood", cex = .5) plot(res, which = c("change-point"), lgd.x = "topright") hist(x, prob = TRUE, xlim = c(0,3000), ylim = c(0,.0022), breaks = 100*(0:30), main = "Histogram and Densities of the Annual Flow of Vaal River", border = "dark grey",lwd = 1,xlab = "x", ylab = "f(x)", col = "light grey") lines(density(x, bw = "nrd0", adjust = 1), lty = 4, col = 4) lines(y<-seq(0, 3000, length = 100), dlnorm(y, mean(log(x)), sqrt(var(log(x)))), lty = 2, col = 2) plot(res, which = "density", add = TRUE) legend("top", lty = c(1, 2, 4), col = c(1, 2, 4), bty = "n", c(expression(paste("MABLE: ",hat(f)[B])), expression(paste("Log-Normal: ",hat(f)[P])), expression(paste("KDE: ",hat(f)[K])))) par(op) # Old Faithful Data library(mixtools) x<-faithful$eruptions a<-0; b<-7 v<-seq(a, b,len = 512) mu<-c(2,4.5); sig<-c(1,1) pmix<-normalmixEM(x,.5, mu, sig) lam<-pmix$lambda; mu<-pmix$mu; sig<-pmix$sigma y1<-lam[1]*dnorm(v,mu[1], sig[1])+lam[2]*dnorm(v, mu[2], sig[2]) res<-mable(x, M = c(2,300), interval = c(a,b), controls = mable.ctrl(sig.level = 1e-8, maxit = 2000L, eps = 1.0e-7)) op<-par(mfrow = c(1,2),lwd = 2) layout(rbind(c(1, 2), c(3, 3))) plot(res, which = "likelihood") plot(res, which = "change-point") hist(x, breaks = seq(0,7.5,len = 20), xlim = c(0,7), ylim = c(0,.7), prob = TRUE,xlab = "t", ylab = "f(t)", col = "light grey", main = "Histogram and Density of Duration of Eruptions of Old Faithful") lines(density(x, bw = "nrd0", adjust = 1), lty = 4, col = 4, lwd = 2) plot(res, which = "density", add = TRUE) lines(v, y1, lty = 2, col = 2, lwd = 2) legend("topright", lty = c(1,2,4), col = c(1,2,4), lwd = 2, bty = "n", c(expression(paste("MABLE: ",hat(f)[B](x))), expression(paste("Mixture: ",hat(f)[P](t))), expression(paste("KDE: ",hat(f)[K](t))))) par(op)
# Vaal Rive Flow Data data(Vaal.Flow) x<-Vaal.Flow$Flow res<-mable(x, M = c(2,100), interval = c(0, 3000), controls = mable.ctrl(sig.level = 1e-8, maxit = 2000, eps = 1.0e-9)) op<-par(mfrow = c(1,2),lwd = 2) layout(rbind(c(1, 2), c(3, 3))) plot(res, which = "likelihood", cex = .5) plot(res, which = c("change-point"), lgd.x = "topright") hist(x, prob = TRUE, xlim = c(0,3000), ylim = c(0,.0022), breaks = 100*(0:30), main = "Histogram and Densities of the Annual Flow of Vaal River", border = "dark grey",lwd = 1,xlab = "x", ylab = "f(x)", col = "light grey") lines(density(x, bw = "nrd0", adjust = 1), lty = 4, col = 4) lines(y<-seq(0, 3000, length = 100), dlnorm(y, mean(log(x)), sqrt(var(log(x)))), lty = 2, col = 2) plot(res, which = "density", add = TRUE) legend("top", lty = c(1, 2, 4), col = c(1, 2, 4), bty = "n", c(expression(paste("MABLE: ",hat(f)[B])), expression(paste("Log-Normal: ",hat(f)[P])), expression(paste("KDE: ",hat(f)[K])))) par(op) # Old Faithful Data library(mixtools) x<-faithful$eruptions a<-0; b<-7 v<-seq(a, b,len = 512) mu<-c(2,4.5); sig<-c(1,1) pmix<-normalmixEM(x,.5, mu, sig) lam<-pmix$lambda; mu<-pmix$mu; sig<-pmix$sigma y1<-lam[1]*dnorm(v,mu[1], sig[1])+lam[2]*dnorm(v, mu[2], sig[2]) res<-mable(x, M = c(2,300), interval = c(a,b), controls = mable.ctrl(sig.level = 1e-8, maxit = 2000L, eps = 1.0e-7)) op<-par(mfrow = c(1,2),lwd = 2) layout(rbind(c(1, 2), c(3, 3))) plot(res, which = "likelihood") plot(res, which = "change-point") hist(x, breaks = seq(0,7.5,len = 20), xlim = c(0,7), ylim = c(0,.7), prob = TRUE,xlab = "t", ylab = "f(t)", col = "light grey", main = "Histogram and Density of Duration of Eruptions of Old Faithful") lines(density(x, bw = "nrd0", adjust = 1), lty = 4, col = 4, lwd = 2) plot(res, which = "density", add = TRUE) lines(v, y1, lty = 2, col = 2, lwd = 2) legend("topright", lty = c(1,2,4), col = c(1,2,4), lwd = 2, bty = "n", c(expression(paste("MABLE: ",hat(f)[B](x))), expression(paste("Mixture: ",hat(f)[P](t))), expression(paste("KDE: ",hat(f)[K](t))))) par(op)
Maximum approximate Bernstein/Beta likelihood estimation for accelerated failure time model based on interval censored data.
mable.aft( formula, data, M, g = NULL, p = NULL, tau = NULL, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
mable.aft( formula, data, M, g = NULL, p = NULL, tau = NULL, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
formula |
regression formula. Response must be |
data |
a data frame containing variables in |
M |
a positive integer or a vector |
g |
a |
p |
an initial coefficients of Bernstein polynomial of degree |
tau |
the right endpoint of the support or truncation interval |
x0 |
a data frame specifying working baseline covariates on the right-hand-side of |
controls |
Object of class |
progress |
if |
Consider the accelerated failure time model with covariate for interval-censored failure time data:
, where
and
may
contain dummy variables and interaction terms. The working baseline
x0
in arguments
contains only the values of terms excluding dummy variables and interaction terms
in the right-hand-side of formula
. Thus g
is the initial guess of
the coefficients of
and could be longer than
x0
.
Let and
be the density and cumulative distribution
functions of the event time given
, respectively.
Then
on a truncation interval
can be approximated by
,
where
,
,
,
is the beta denity with shapes
and
, and
is larger than the largest observed time, either uncensored time, or right endpoint of interval/left censored,
or left endpoint of right censored time. So we can approximate
on
by
, where
is
the beta survival function with shapes
and
.
Response variable should be of the form cbind(l, u)
, where (l,u)
is the interval
containing the event time. Data is uncensored if l = u
, right censored
if u = Inf
or u = NA
, and left censored data if l = 0
.
The truncation time tau
and the baseline x0
should be chosen so that
on
is negligible for
all the observed
.
The search for optimal degree m
stops if either m1
is reached or the test
for change-point results in a p-value pval
smaller than sig.level
.
A list with components
m
the given or selected optimal degree m
p
the estimate of p = (p_0, ..., p_m)
, the coefficients of Bernstein polynomial of degree m
coefficients
the estimated regression coefficients of the AFT model
SE
the standard errors of the estimated regression coefficients
z
the z-scores of the estimated regression coefficients
mloglik
the maximum log-likelihood at an optimal degree m
tau.n
maximum observed time
tau
right endpoint of trucation interval
x0
the working baseline covariates
egx0
the value of
convergence
an integer code: 0 indicates a successful completion;
1 indicates that the search of an optimal degree using change-point method reached
the maximum candidate degree; 2 indicates that the matimum iterations was reached for
calculating and
with the selected degree
,
or the divergence of the last EM-like iteration for
or the divergence of
the last (quasi) Newton iteration for
; 3 indicates 1 and 2.
delta
the final delta
if m0 = m1
or the final pval
of the change-point
for searching the optimal degree m
;
and, if m0<m1
,
M
the vector (m0, m1)
, where m1
is the last candidate when the search stoped
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
Zhong Guan <[email protected]>
Guan, Z. (2019) Maximum Approximate Likelihood Estimation in Accelerated Failure Time Model for Interval-Censored Data, arXiv:1911.07087.
## Breast Cosmesis Data g <- 0.41 #Hanson and Johnson 2004, JCGS aft.res<-mable.aft(cbind(left, right)~treat, data=cosmesis, M=c(1, 30), g=g, tau=100, x0=data.frame(treat="RCT")) op<-par(mfrow=c(1,2), lwd=1.5) plot(x=aft.res, which="likelihood") plot(x=aft.res, y=data.frame(treat="RT"), which="survival", model='aft', type="l", col=1, add=FALSE, main="Survival Function") plot(x=aft.res, y=data.frame(treat="RCT"), which="survival", model='aft', lty=2, col=1) legend("bottomleft", bty="n", lty=1:2, col=1, c("Radiation Only", "Radiation and Chemotherapy")) par(op)
## Breast Cosmesis Data g <- 0.41 #Hanson and Johnson 2004, JCGS aft.res<-mable.aft(cbind(left, right)~treat, data=cosmesis, M=c(1, 30), g=g, tau=100, x0=data.frame(treat="RCT")) op<-par(mfrow=c(1,2), lwd=1.5) plot(x=aft.res, which="likelihood") plot(x=aft.res, y=data.frame(treat="RT"), which="survival", model='aft', type="l", col=1, add=FALSE, main="Survival Function") plot(x=aft.res, y=data.frame(treat="RCT"), which="survival", model='aft', lty=2, col=1) legend("bottomleft", bty="n", lty=1:2, col=1, c("Radiation Only", "Radiation and Chemotherapy")) par(op)
Maximum Approximate Bernstein Likelihood Estimate of Copula Density Function
mable.copula( x, M0 = 1, M, unif.mar = TRUE, pseudo.obs = c("empirical", "mable"), interval = NULL, search = TRUE, mar.deg = FALSE, high.dim = FALSE, controls = mable.ctrl(sig.level = 0.05), progress = TRUE )
mable.copula( x, M0 = 1, M, unif.mar = TRUE, pseudo.obs = c("empirical", "mable"), interval = NULL, search = TRUE, mar.deg = FALSE, high.dim = FALSE, controls = mable.ctrl(sig.level = 0.05), progress = TRUE )
x |
an |
M0 |
a nonnegative integer or a vector of |
M |
a positive integer or a vector of |
unif.mar |
logical, whether all the marginals distributions are uniform or not.
If not the pseudo observations will be created using |
pseudo.obs |
|
interval |
a vector of two endpoints or a |
search |
logical, whether to search optimal degrees between |
mar.deg |
logical, if TRUE (default), the optimal degrees are selected based on marginal data, otherwise, the optimal degrees are chosen by the method of change-point. See details. |
high.dim |
logical, data are high dimensional/large sample or not if TRUE, run a slower version procedure which requires less memory |
controls |
Object of class |
progress |
if TRUE a text progressbar is displayed |
A -variate copula density
on
can be approximated
by a mixture of
-variate beta densities on
,
,
with proportion
,
,
which satisfy the uniform marginal constraints, the copula (density) has
uniform marginal cdf (pdf). If
search=TRUE
and mar.deg=TRUE
, then the
optimal degrees are , where
is
chosen based on marginal data of
, $
. If
search=TRUE
and mar.deg=FALSE
, then the optimal degrees
are chosen using a change-point method based on the joint data.
For large data and high dimensional density, the search for the model degrees might be time-consuming. Thus patience is needed.
A list with components
m
a vector of the selected optimal degrees by the method of change-point
p
a vector of the mixture proportions , arranged in the
column-major order of
,
.
mloglik
the maximum log-likelihood at an optimal degree m
pval
the p-values of change-points for choosing the optimal degrees for the
marginal densities
M
the vector (m1, m2, ..., md)
at which the search of model degrees stopped.
If mar.deg=TRUE
mi
is the largest candidate degree when the search stoped for
the i
-th marginal density
convergence
An integer code. 0 indicates successful completion(the EM iteration is
convergent). 1 indicates that the iteration limit maxit
had been reached in the EM iteration;
if unif.mar=FALSE
, margin
contains objects of the results of mable fit
to the marginal data
Zhong Guan <[email protected]>
Wang, T. and Guan, Z. (2019). Bernstein polynomial model for nonparametric multivariate density. Statistics 53(2), 321–338. Guan, Z., Nonparametric Maximum Likelihood Estimation of Copula
## Simulated bivariate data from Gaussian copula set.seed(1) rho<-0.4; n<-1000 x<-rnorm(n) u<-pnorm(cbind(rnorm(n, mean=rho*x, sd=sqrt(1-rho^2)),x)) res<- mable.copula(u, M = c(3,3), search =FALSE, mar.deg=FALSE, progress=FALSE) plot(res, which="density")
## Simulated bivariate data from Gaussian copula set.seed(1) rho<-0.4; n<-1000 x<-rnorm(n) u<-pnorm(cbind(rnorm(n, mean=rho*x, sd=sqrt(1-rho^2)),x)) res<- mable.copula(u, M = c(3,3), search =FALSE, mar.deg=FALSE, progress=FALSE) plot(res, which="density")
Control parameters for mable fit
mable.ctrl( sig.level = 0.01, eps = 1e-07, maxit = 5000L, eps.em = 1e-07, maxit.em = 5000L, eps.nt = 1e-07, maxit.nt = 100L, tini = 1e-04 )
mable.ctrl( sig.level = 0.01, eps = 1e-07, maxit = 5000L, eps.em = 1e-07, maxit.em = 5000L, eps.nt = 1e-07, maxit.nt = 100L, tini = 1e-04 )
sig.level |
the sigificance level for change-point method of choosing optimal model degree |
eps |
convergence criterion for iteration involves EM like and Newton-Raphson iterations |
maxit |
maximum number of iterations involve EM like and Newton-Raphson iterations |
eps.em |
convergence criterion for EM like iteration |
maxit.em |
maximum number of EM like iterations |
eps.nt |
convergence criterion for Newton-Raphson iteration |
maxit.nt |
maximum number of Newton-Raphson iterations |
tini |
a small positive number used to make sure initial |
a list of the arguments' values
Zhong Guan <[email protected]>
Maximum approximate Bernstein/Beta likelihood estimation in additive density deconvolution model with a known error density.
mable.decon( y, gn = NULL, ..., M, interval = c(0, 1), IC = c("none", "aic", "hqic", "all"), vanished = TRUE, controls = mable.ctrl(maxit.em = 1e+05, eps.em = 1e-05, maxit.nt = 100, eps.nt = 1e-10), progress = TRUE )
mable.decon( y, gn = NULL, ..., M, interval = c(0, 1), IC = c("none", "aic", "hqic", "all"), vanished = TRUE, controls = mable.ctrl(maxit.em = 1e+05, eps.em = 1e-05, maxit.nt = 100, eps.nt = 1e-10), progress = TRUE )
y |
vector of observed data values |
gn |
error density function if known, default is NULL if unknown |
... |
additional arguments to be passed to gn |
M |
a vector |
interval |
a finite vector |
IC |
information criterion(s) in addition to Bayesian information criterion (BIC). Current choices are "aic" (Akaike information criterion) and/or "qhic" (Hannan–Quinn information criterion). |
vanished |
logical whether the unknown error density vanishes at both end-points of |
controls |
Object of class |
progress |
if |
Consider the additive measurement error model , where
has an unknown distribution
on a known support
[a,b]
, has a known or unknown distribution
,
and
and
are independent. We want to estimate density
based on independent observations,
,
, of
.
We approximate
by a Bernstein polynomial model on
[a,b]
. If is unknown on
a known support
[a1,b1]
, then we approximate by a Bernstein polynomial model on
[a1,b1]
, . We assume
. AIC and BIC methods are used to
select model degrees
(m,k)
.
A mable
class object with components, if is known,
M
the vector (m0, m1)
, where m1
is the last candidate degree when the search stoped
m
the selected optimal degree m
p
the estimate of p = (p_0, ..., p_m)
, the coefficients of Bernstein polynomial of degree m
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
convergence
An integer code. 0 indicates an optimal degree
is successfully selected in M
. 1 indicates that the search stoped at m1
.
ic
a list containing the selected information criterion(s)
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
if is unknown,
M
the 2 x 2 matrix with rows (m0, m1)
and (k0,k1)
nu_aic
the selected optimal degrees (m,k)
using AIC method
p_aic
the estimate of p = (p_0, ..., p_m)
, the coefficients
of Bernstein polynomial model for of degree
m
as in nu_aic
q_aic
the estimate of q = (q_0, ..., q_k)
, the coefficients
of Bernstein polynomial model for of degree
k
as in nu_aic
nu_bic
the selected optimal degrees (m,k)
using BIC method
p_bic
the estimate of p = (p_0, ..., p_m)
, the coefficients
of Bernstein polynomial model for of degree
m
as in nu_bic
q_bic
the estimate of q = (q_0, ..., q_k)
, the coefficients
of Bernstein polynomial model for of degree
k
as in nu_bic
lk
matrix of log-likelihoods evaluated at
and
aic
a matrix containing the Akaike information criterion(s) at
and
bic
a matrix containing the Bayesian information criterion(s) at
and
Zhong Guan <[email protected]>
Guan, Z., (2019) Fast Nonparametric Maximum Likelihood Density Deconvolution Using Bernstein Polynomials, Statistica Sinica, doi:10.5705/ss.202018.0173
# A simulated normal dataset set.seed(123) mu<-1; sig<-2; a<-mu-sig*5; b<-mu+sig*5; gn<-function(x) dnorm(x, 0, 1) n<-50; x<-rnorm(n, mu, sig); e<-rnorm(n); y<-x+e; res<-mable.decon(y, gn, interval = c(a, b), M = c(5, 50)) op<-par(mfrow = c(2, 2),lwd = 2) plot(res, which="likelihood") plot(res, which="change-point", lgd.x="topright") plot(xx<-seq(a, b, length=100), yy<-dnorm(xx, mu, sig), type="l", xlab="x", ylab="Density", ylim=c(0, max(yy)*1.1)) plot(res, which="density", types=c(2,3), colors=c(2,3)) # kernel density based on pure data lines(density(x), lty=4, col=4) legend("topright", bty="n", lty=1:4, col=1:4, c(expression(f), expression(hat(f)[cp]), expression(hat(f)[bic]), expression(tilde(f)[K]))) plot(xx, yy<-pnorm(xx, mu, sig), type="l", xlab="x", ylab="Distribution Function") plot(res, which="cumulative", types=c(2,3), colors=c(2,3)) legend("bottomright", bty="n", lty=1:3, col=1:3, c(expression(F), expression(hat(F)[cp]), expression(hat(F)[bic]))) par(op)
# A simulated normal dataset set.seed(123) mu<-1; sig<-2; a<-mu-sig*5; b<-mu+sig*5; gn<-function(x) dnorm(x, 0, 1) n<-50; x<-rnorm(n, mu, sig); e<-rnorm(n); y<-x+e; res<-mable.decon(y, gn, interval = c(a, b), M = c(5, 50)) op<-par(mfrow = c(2, 2),lwd = 2) plot(res, which="likelihood") plot(res, which="change-point", lgd.x="topright") plot(xx<-seq(a, b, length=100), yy<-dnorm(xx, mu, sig), type="l", xlab="x", ylab="Density", ylim=c(0, max(yy)*1.1)) plot(res, which="density", types=c(2,3), colors=c(2,3)) # kernel density based on pure data lines(density(x), lty=4, col=4) legend("topright", bty="n", lty=1:4, col=1:4, c(expression(f), expression(hat(f)[cp]), expression(hat(f)[bic]), expression(tilde(f)[K]))) plot(xx, yy<-pnorm(xx, mu, sig), type="l", xlab="x", ylab="Distribution Function") plot(res, which="cumulative", types=c(2,3), colors=c(2,3)) legend("bottomright", bty="n", lty=1:3, col=1:3, c(expression(F), expression(hat(F)[cp]), expression(hat(F)[bic]))) par(op)
Maximum approximate Bernstein/Beta likelihood estimation in a density ratio model based on two-sample raw data.
mable.dr( x, y, M, regr, ..., interval = c(0, 1), alpha = NULL, vb = 0, baseline = NULL, controls = mable.ctrl(), progress = TRUE, message = FALSE )
mable.dr( x, y, M, regr, ..., interval = c(0, 1), alpha = NULL, vb = 0, baseline = NULL, controls = mable.ctrl(), progress = TRUE, message = FALSE )
x , y
|
original two sample raw data, |
M |
a positive integer or a vector |
regr |
regressor vector function |
... |
additional arguments to be passed to regr |
interval |
a vector |
alpha |
initial regression coefficient, missing value is imputed by logistic regression |
vb |
code for vanishing boundary constraints, -1: f0(a)=0 only, 1: f0(b)=0 only, 2: both, 0: none (default). |
baseline |
the working baseline, "Control" or "Case", if |
controls |
Object of class |
progress |
logical: should a text progressbar be displayed |
message |
logical: should warning messages be displayed |
Suppose that x
("control") and y
("case") are independent
samples from f0 and f1 which samples
satisfy f1(x)=f0(x)exp[alpha0+alpha'r(x)] with r(x)=(r1(x),...,r_d(x)). Maximum
approximate Bernstein/Beta likelihood estimates of (alpha0,alpha), f0 and f1
are calculated. If support is (a,b) then replace r(x) by r[a+(b-a)x].
For a fixed m
, using the Bernstein polynomial model for baseline ,
MABLEs of
and parameters alpha can be estimated by EM algorithm and Newton
iteration. If estimated lower bound
for
m
based on y
is smaller that that based on x
, then switch x
and y
and
is used as baseline. If
M=m
or m0=m1=m
, then m
is a
preselected degree. If m0<m1
it specifies the set of consective
candidate model degrees m0:m1
for searching an optimal degree by
the change-point method, where m1-m0>3
.
A list with components
m
the given or a selected degree by method of change-point
p
the estimated vector of mixture proportions
with the given or selected degree
m
alpha
the estimated regression coefficients
mloglik
the maximum log-likelihood at degree m
interval
support/truncation interval (a,b)
baseline
="control" if is used as baseline,
or ="case" if
is used as baseline.
M
the vector (m0, m1)
, where m1
, if greater than m0
, is the
largest candidate when the search stoped
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
Zhong Guan <[email protected]>
Guan, Z., Maximum Approximate Bernstein Likelihood Estimation of Densities in a Two-sample Semiparametric Model
# Hosmer and Lemeshow (1989): # ages and the status of coronary disease (CHD) of 100 subjects x<-c(20, 23, 24, 25, 26, 26, 28, 28, 29, 30, 30, 30, 30, 30, 32, 32, 33, 33, 34, 34, 34, 34, 35, 35, 36, 36, 37, 37, 38, 38, 39, 40, 41, 41, 42, 42, 42, 43, 43, 44, 44, 45, 46, 47, 47, 48, 49, 49, 50, 51, 52, 55, 57, 57, 58, 60, 64) y<-c(25, 30, 34, 36, 37, 39, 40, 42, 43, 44, 44, 45, 46, 47, 48, 48, 49, 50, 52, 53, 53, 54, 55, 55, 56, 56, 56, 57, 57, 57, 57, 58, 58, 59, 59, 60, 61, 62, 62, 63, 64, 65, 69) regr<-function(x) cbind(1,x) chd.mable<-mable.dr(x, y, M=c(1, 15), regr, interval = c(20, 70)) chd.mable
# Hosmer and Lemeshow (1989): # ages and the status of coronary disease (CHD) of 100 subjects x<-c(20, 23, 24, 25, 26, 26, 28, 28, 29, 30, 30, 30, 30, 30, 32, 32, 33, 33, 34, 34, 34, 34, 35, 35, 36, 36, 37, 37, 38, 38, 39, 40, 41, 41, 42, 42, 42, 43, 43, 44, 44, 45, 46, 47, 47, 48, 49, 49, 50, 51, 52, 55, 57, 57, 58, 60, 64) y<-c(25, 30, 34, 36, 37, 39, 40, 42, 43, 44, 44, 45, 46, 47, 48, 48, 49, 50, 52, 53, 53, 54, 55, 55, 56, 56, 56, 57, 57, 57, 57, 58, 58, 59, 59, 60, 61, 62, 62, 63, 64, 65, 69) regr<-function(x) cbind(1,x) chd.mable<-mable.dr(x, y, M=c(1, 15), regr, interval = c(20, 70)) chd.mable
Maximum approximate Bernstein/Beta likelihood estimation in a density ratio model based on two-sample grouped data.
mable.dr.group( t, n0, n1, M, regr, ..., interval = c(0, 1), alpha = NULL, vb = 0, controls = mable.ctrl(), progress = TRUE, message = TRUE )
mable.dr.group( t, n0, n1, M, regr, ..., interval = c(0, 1), alpha = NULL, vb = 0, controls = mable.ctrl(), progress = TRUE, message = TRUE )
t |
cutpoints of class intervals |
n0 , n1
|
frequencies of two sample data grouped by the classes
specified by |
M |
a positive integer or a vector |
regr |
regressor vector function |
... |
additional arguments to be passed to regr |
interval |
a vector |
alpha |
a given regression coefficient, missing value is imputed by logistic regression |
vb |
code for vanishing boundary constraints, -1: f0(a)=0 only, 1: f0(b)=0 only, 2: both, 0: none (default). |
controls |
Object of class |
progress |
logical: should a text progressbar be displayed |
message |
logical: should warning messages be displayed |
Suppose that n0
("control") and n1
("case") are frequencies of
independent samples grouped by the classes t
from f0 and f1 which
satisfy f1(x)=f0(x)exp[alpha0+alpha'r(x)] with r(x)=(r1(x),...,r_d(x)). Maximum
approximate Bernstein/Beta likelihood estimates of (alpha0,alpha), f0 and f1
are calculated. If support is (a,b) then replace r(x) by r[a+(b-a)x].
For a fixed m
, using the Bernstein polynomial model for baseline ,
MABLEs of
and parameters alpha can be estimated by EM algorithm and Newton
iteration. If estimated lower bound
for
m
based on n1
is smaller that that based on n0
, then switch n0
and n1
and
use as baseline. If
M=m
or m0=m1=m
, then m
is a
preselected degree. If m0<m1
it specifies the set of consective
candidate model degrees m0:m1
for searching an optimal degree by
the change-point method, where m1-m0>3
.
Maximum approximate Bernstein/Beta likelihood estimation based on
one-sample grouped data with an optimal selected by the change-point method among m0:m1
or a preselected model degree m
.
mable.group( x, breaks, M, interval = c(0, 1), IC = c("none", "aic", "hqic", "all"), vb = 0, controls = mable.ctrl(), progress = TRUE )
mable.group( x, breaks, M, interval = c(0, 1), IC = c("none", "aic", "hqic", "all"), vb = 0, controls = mable.ctrl(), progress = TRUE )
x |
vector of frequencies |
breaks |
class interval end points |
M |
a positive integer or a vector |
interval |
a vector containing the endpoints of support/truncation interval |
IC |
information criterion(s) in addition to Bayesian information criterion (BIC). Current choices are "aic" (Akaike information criterion) and/or "qhic" (Hannan–Quinn information criterion). |
vb |
code for vanishing boundary constraints, -1: f0(a)=0 only, 1: f0(b)=0 only, 2: both, 0: none (default). |
controls |
Object of class |
progress |
if TRUE a text progressbar is displayed |
Any continuous density function on a known closed supporting interval
can be
estimated by Bernstein polynomial
,
where
,
,
and
,
,
is the beta density with shapes
.
For each
m
, the MABLE of the coefficients p
, the mixture proportions, are
obtained using EM algorithm. The EM iteration for each candidate m
stops if either
the total absolute change of the log likelihood and the coefficients of Bernstein polynomial
is smaller than eps
or the maximum number of iterations maxit
is reached.
If m0<m1
, an optimal model degree is selected as the change-point of the increments of
log-likelihood, log likelihood ratios, for . Alternatively,
one can choose an optimal degree based on the BIC (Schwarz, 1978) which are evaluated at
. The search for optimal degree
m
is stoped if either
m1
is reached with a warning or the test for change-point results in a p-value pval
smaller than sig.level
. The BIC for a given degree m
is calculated as in
Schwarz (1978) where the dimension of the model is and a default
is chosen as
.Machine$double.eps
.
A list with components
m
the given or a selected degree by method of change-point
p
the estimated p
with degree m
mloglik
the maximum log-likelihood at degree m
interval
supporting interval (a, b)
convergence
An integer code. 0 indicates successful completion
(all the EM iterations are convergent and an optimal degree
is successfully selected in M
). Possible error codes are
1, indicates that the iteration limit maxit
had been
reached in at least one EM iteration;
2, the search did not finish before m1
.
delta
the convergence criterion delta
value
and, if m0<m1
,
M
the vector (m0, m1)
, where m1
, if greater than m0
, is the
largest candidate when the search stoped
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
ic
a list containing the selected information criterion(s)
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
Zhong Guan <[email protected]>
Guan, Z. (2017) Bernstein polynomial model for grouped continuous data. Journal of Nonparametric Statistics, 29(4):831-848.
## Chicken Embryo Data data(chicken.embryo) a<-0; b<-21 day<-chicken.embryo$day nT<-chicken.embryo$nT Day<-rep(day,nT) res<-mable.group(x=nT, breaks=a:b, M=c(2,100), interval=c(a, b), IC="aic", controls=mable.ctrl(sig.level=1e-6, maxit=2000, eps=1.0e-7)) op<-par(mfrow=c(1,2), lwd=2) layout(rbind(c(1, 2), c(3, 3))) plot(res, which="likelihood") plot(res, which="change-point") fk<-density(x=rep((0:20)+.5, nT), bw="sj", n=101, from=a, to=b) hist(Day, breaks=seq(a,b, length=12), freq=FALSE, col="grey", border="white", main="Histogram and Density Estimates") plot(res, which="density",types=1:2, colors=1:2) lines(fk, lty=2, col=2) legend("topright", lty=c(1:2), c("MABLE", "Kernel"), bty="n", col=c(1:2)) par(op)
## Chicken Embryo Data data(chicken.embryo) a<-0; b<-21 day<-chicken.embryo$day nT<-chicken.embryo$nT Day<-rep(day,nT) res<-mable.group(x=nT, breaks=a:b, M=c(2,100), interval=c(a, b), IC="aic", controls=mable.ctrl(sig.level=1e-6, maxit=2000, eps=1.0e-7)) op<-par(mfrow=c(1,2), lwd=2) layout(rbind(c(1, 2), c(3, 3))) plot(res, which="likelihood") plot(res, which="change-point") fk<-density(x=rep((0:20)+.5, nT), bw="sj", n=101, from=a, to=b) hist(Day, breaks=seq(a,b, length=12), freq=FALSE, col="grey", border="white", main="Histogram and Density Estimates") plot(res, which="density",types=1:2, colors=1:2) lines(fk, lty=2, col=2) legend("topright", lty=c(1:2), c("MABLE", "Kernel"), bty="n", col=c(1:2)) par(op)
Estimate of Hellinger Correlation between two random variables and Bootstrap
mable.hellcorr( x, unif.mar = FALSE, pseudo.obs = c("empirical", "mable"), M0 = c(1, 1), M = c(30, 30), search = TRUE, mar.deg = TRUE, high.dim = FALSE, interval = cbind(0:1, 0:1), B = 200L, conf.level = 0.95, integral = TRUE, controls = mable.ctrl(sig.level = 0.05), progress = FALSE ) hellcorr( x, unif.mar = FALSE, pseudo.obs = c("empirical", "mable"), M0 = c(1, 1), M = c(30, 30), search = TRUE, mar.deg = TRUE, high.dim = FALSE, interval = cbind(0:1, 0:1), B = 200L, conf.level = 0.95, integral = TRUE, controls = mable.ctrl(sig.level = 0.05), progress = FALSE )
mable.hellcorr( x, unif.mar = FALSE, pseudo.obs = c("empirical", "mable"), M0 = c(1, 1), M = c(30, 30), search = TRUE, mar.deg = TRUE, high.dim = FALSE, interval = cbind(0:1, 0:1), B = 200L, conf.level = 0.95, integral = TRUE, controls = mable.ctrl(sig.level = 0.05), progress = FALSE ) hellcorr( x, unif.mar = FALSE, pseudo.obs = c("empirical", "mable"), M0 = c(1, 1), M = c(30, 30), search = TRUE, mar.deg = TRUE, high.dim = FALSE, interval = cbind(0:1, 0:1), B = 200L, conf.level = 0.95, integral = TRUE, controls = mable.ctrl(sig.level = 0.05), progress = FALSE )
x |
an |
unif.mar |
logical, whether all the marginals distributions are uniform or not.
If not the pseudo observations will be created using |
pseudo.obs |
|
M0 |
a nonnegative integer or a vector of |
M |
a positive integer or a vector of |
search |
logical, whether to search optimal degrees between |
mar.deg |
logical, if TRUE (default), the optimal degrees are selected based on marginal data, otherwise, the optimal degrees are chosen by the method of change-point. See details. |
high.dim |
logical, data are high dimensional/large sample or not if TRUE, run a slower version procedure which requires less memory |
interval |
a 2 by 2 matrix, columns are the marginal supports |
B |
the number of bootstrap samples and number of Monte Carlo runs for
estimating |
conf.level |
confidence level |
integral |
logical, using "integrate()" or not (Riemann sum) |
controls |
Object of class |
progress |
if TRUE a text progressbar is displayed |
This function calls mable.copula()
for estimation of the copula density.
eta
Hellinger correlation
CI.eta
Bootstrap confidence interval for
Hellinger correlation if B
>0.
Zhong Guan <[email protected]>
Guan, Z., Nonparametric Maximum Likelihood Estimation of Copula
mable
, mable.mvar
, mable.copula
Maximum approximate Bernstein/Beta likelihood estimation of density and cumulative/survival distributions functions based on interal censored event time data.
mable.ic( data, M, pi0 = NULL, tau = Inf, IC = c("none", "aic", "hqic", "all"), controls = mable.ctrl(), progress = TRUE )
mable.ic( data, M, pi0 = NULL, tau = Inf, IC = c("none", "aic", "hqic", "all"), controls = mable.ctrl(), progress = TRUE )
data |
a dataset either |
M |
an positive integer or a vector |
pi0 |
Initial guess of |
tau |
right endpoint of support |
IC |
information criterion(s) in addition to Bayesian information criterion (BIC). Current choices are "aic" (Akaike information criterion) and/or "qhic" (Hannan–Quinn information criterion). |
controls |
Object of class |
progress |
if |
Let and
be the density and cumulative distribution
functions of the event time, respectively. Then
on
can be
approximated by
,
where
,
,
,
is the beta denity with shapes
and
, and
is the largest observed time, either uncensored time, or right endpoint of
interval/left censored, or left endpoint of right censored time. We can approximate
on
by
,
where
,
, is the beta survival function with shapes
and
,
,
, and
. For data without right-censored time,
.
The search for optimal degree
m
is stoped if either m1
is reached or the test
for change-point results in a p-value pval
smaller than sig.level
.
Each row of data
, (l, u)
, is the interval containing the event time.
Data is uncensored if l = u
, right censored if u = Inf
or u = NA
,
and left censored data if l = 0
.
a class 'mable
' object with components
p
the estimated p
with degree m
selected by the change-point method
mloglik
the maximum log-likelihood at an optimal degree m
interval
support/truncation interval (0, b)
M
the vector (m0,m1)
, where m1
is the last candidate when the search stoped
m
the selected optimal degree by the method of change-point
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
tau.n
maximum observed time
tau
right endpoint of support
ic
a list containing the selected information criterion(s)
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
convergence
an integer code. 0 indicates successful completion(the iteration is
convergent). 1 indicates that the maximum candidate degree had been reached in the calculation;
delta
the final pval
of the change-point for selecting the optimal degree m
;
Zhong Guan <[email protected]>
Guan, Z. (2019) Maximum Approximate Bernstein Likelihood Estimation in Proportional Hazard Model for Interval-Censored Data, arXiv:1906.08882 .
library(mable) bcos=cosmesis bc.res0<-mable.ic(bcos[bcos$treat=="RT",1:2], M=c(1,50), IC="none") bc.res1<-mable.ic(bcos[bcos$treat=="RCT",1:2], M=c(1,50), IC="none") op<-par(mfrow=c(2,2),lwd=2) plot(bc.res0, which="change-point", lgd.x="right") plot(bc.res1, which="change-point", lgd.x="right") plot(bc.res0, which="survival", add=FALSE, xlab="Months", ylim=c(0,1), main="Radiation Only") legend("topright", bty="n", lty=1:2, col=1:2, c(expression(hat(S)[CP]), expression(hat(S)[BIC]))) plot(bc.res1, which="survival", add=FALSE, xlab="Months", main="Radiation and Chemotherapy") legend("topright", bty="n", lty=1:2, col=1:2, c(expression(hat(S)[CP]), expression(hat(S)[BIC]))) par(op)
library(mable) bcos=cosmesis bc.res0<-mable.ic(bcos[bcos$treat=="RT",1:2], M=c(1,50), IC="none") bc.res1<-mable.ic(bcos[bcos$treat=="RCT",1:2], M=c(1,50), IC="none") op<-par(mfrow=c(2,2),lwd=2) plot(bc.res0, which="change-point", lgd.x="right") plot(bc.res1, which="change-point", lgd.x="right") plot(bc.res0, which="survival", add=FALSE, xlab="Months", ylim=c(0,1), main="Radiation Only") legend("topright", bty="n", lty=1:2, col=1:2, c(expression(hat(S)[CP]), expression(hat(S)[BIC]))) plot(bc.res1, which="survival", add=FALSE, xlab="Months", main="Radiation and Chemotherapy") legend("topright", bty="n", lty=1:2, col=1:2, c(expression(hat(S)[CP]), expression(hat(S)[BIC]))) par(op)
Maximum Approximate Bernstein Likelihood Estimate of Multivariate Density Function
mable.mvar( x, M0 = 1L, M, search = TRUE, interval = NULL, mar.deg = TRUE, method = c("cd", "em", "lmem"), controls = mable.ctrl(), progress = TRUE )
mable.mvar( x, M0 = 1L, M, search = TRUE, interval = NULL, mar.deg = TRUE, method = c("cd", "em", "lmem"), controls = mable.ctrl(), progress = TRUE )
x |
an |
M0 |
a positive integer or a vector of |
M |
a positive integer or a vector of |
search |
logical, whether to search optimal degrees between |
interval |
a vector of two endpoints or a |
mar.deg |
logical, if TRUE, the optimal degrees are selected based on marginal data, otherwise, the optimal degrees are chosen the joint data. See details. |
method |
method for finding maximum likelihood estimate. "cd": coordinate-descent; less memory for data that are high dimensional/large sample. |
controls |
Object of class |
progress |
if TRUE a text progressbar is displayed |
A -variate density
on a hyperrectangle
can be approximated
by a mixture of
-variate beta densities on
,
,
with proportion
,
.
If
search=TRUE
then the model degrees are chosen using a method of change-point based on
the marginal data if mar.deg=TRUE
or the joint data if mar.deg=FALSE
.
If search=FALSE
, then the model degree is specified by .
For large data and multimodal density, the search for the model degrees is
very time-consuming. In this case, it is suggested that use
method="cd"
and select the degrees based on marginal data using mable
or
optimable
.
A list with components
m
a vector of the selected optimal degrees by the method of change-point
p
a vector of the mixture proportions , arranged in the
column-major order of
,
.
mloglik
the maximum log-likelihood at an optimal degree m
pval
the p-values of change-points for choosing the optimal degrees for the
marginal densities
M
the vector (m1, m2, ... , md)
, where mi
is the largest candidate
degree when the search stoped for the i
-th marginal density
interval
support hyperrectangle
convergence
An integer code. 0 indicates successful completion(the EM iteration is
convergent). 1 indicates that the iteration limit maxit
had been reached in the EM iteration;
Zhong Guan <[email protected]>
Guan, Z. (2016) Efficient and robust density estimation using Bernstein type polynomials. Journal of Nonparametric Statistics, 28(2):250-271. Wang, T. and Guan, Z.,(2019) Bernstein Polynomial Model for Nonparametric Multivariate Density, Statistics, Vol. 53, no. 2, 321-338
## Old Faithful Data a<-c(0, 40); b<-c(7, 110) ans<- mable.mvar(faithful, M = c(46,19), search =FALSE, method="em", interval = rbind(a,b), progress=FALSE) plot(ans, which="density") plot(ans, which="cumulative")
## Old Faithful Data a<-c(0, 40); b<-c(7, 110) ans<- mable.mvar(faithful, M = c(46,19), search =FALSE, method="em", interval = rbind(a,b), progress=FALSE) plot(ans, which="density") plot(ans, which="cumulative")
Maximum approximate Bernstein/Beta likelihood estimation in Cox's proportional hazards regression model based on interal censored event time data.
mable.ph( formula, data, M, g = NULL, p = NULL, pi0 = NULL, tau = Inf, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
mable.ph( formula, data, M, g = NULL, p = NULL, pi0 = NULL, tau = Inf, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
formula |
regression formula. Response must be |
data |
a data frame containing variables in |
M |
a positive integer or a vector |
g |
initial guess of |
p |
an initial coefficients of Bernstein polynomial model of degree |
pi0 |
Initial guess of |
tau |
right endpoint of support |
x0 |
a data frame specifying working baseline covariates on the right-hand-side of |
controls |
Object of class |
progress |
if |
Consider Cox's PH model with covariate for interval-censored failure time data:
, where
satisfies
,
where
and
may
contain dummy variables and interaction terms. The working baseline
x0
in arguments
contains only the values of terms excluding dummy variables and interaction terms
in the right-hand-side of formula
. Thus g
is the initial guess of
the coefficients of
and could be longer than
x0
.
Let and
be the density and cumulative distribution
functions of the event time given
, respectively.
Then
on
can be approximated by
,
where
,
,
,
is the beta denity with shapes
and
, and
is the largest observed time, either uncensored time, or right endpoint of interval/left censored,
or left endpoint of right censored time. So we can approximate
on
by
, where
,
, is the beta survival function with shapes
and
,
,
, and
. For data without right-censored time,
.
Response variable should be of the form cbind(l, u)
, where (l, u)
is the interval
containing the event time. Data is uncensored if l = u
, right censored
if u = Inf
or u = NA
, and left censored data if l = 0
.
The associated covariate contains columns. The baseline
x0
should chosen so that
is nonnegative for all the observed
and
all
in a neighborhood of its true value.
A missing initial value of g
is imputed by ic_sp()
of package icenReg
.
The search for optimal degree m
stops if either m1
is reached or the test
for change-point results in a p-value pval
smaller than sig.level
.
This process takes longer than maple.ph
to select an optimal degree.
A list with components
m
the selected/preselected optimal degree m
p
the estimate of , the coefficients of Bernstein polynomial of degree
m
coefficients
the estimated regression coefficients of the PH model
SE
the standard errors of the estimated regression coefficients
z
the z-scores of the estimated regression coefficients
mloglik
the maximum log-likelihood at an optimal degree m
tau.n
maximum observed time
tau
right endpoint of support
x0
the working baseline covariates
egx0
the value of
convergence
an integer code, 1 indicates either the EM-like
iteration for finding maximum likelihood reached the maximum iteration for at least one m
or the search of an optimal degree using change-point method reached the maximum candidate degree,
2 indicates both occured, and 0 indicates a successful completion.
delta
the final delta
if m0 = m1
or the final pval
of the change-point
for searching the optimal degree m
;
and, if m0<m1
,
M
the vector (m0, m1)
, where m1
is the last candidate degree when the search stoped
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
Zhong Guan <[email protected]>
Guan, Z. Maximum Approximate Bernstein Likelihood Estimation in Proportional Hazard Model for Interval-Censored Data, Statistics in Medicine. 2020; 1–21. https://doi.org/10.1002/sim.8801.
# Ovarian Cancer Survival Data require(survival) futime2<-ovarian$futime futime2[ovarian$fustat==0]<-Inf ovarian2<-data.frame(age=ovarian$age, futime1=ovarian$futime, futime2=futime2) ova<-mable.ph(cbind(futime1, futime2) ~ age, data = ovarian2, M=c(2,35), g=.16, x0=data.frame(age=35)) op<-par(mfrow=c(2,2)) plot(ova, which = "likelihood") plot(ova, which = "change-point") plot(ova, y=data.frame(age=60), which="survival", add=FALSE, type="l", xlab="Days", main="Age = 60") plot(ova, y=data.frame(age=65), which="survival", add=FALSE, type="l", xlab="Days", main="Age = 65") par(op)
# Ovarian Cancer Survival Data require(survival) futime2<-ovarian$futime futime2[ovarian$fustat==0]<-Inf ovarian2<-data.frame(age=ovarian$age, futime1=ovarian$futime, futime2=futime2) ova<-mable.ph(cbind(futime1, futime2) ~ age, data = ovarian2, M=c(2,35), g=.16, x0=data.frame(age=35)) op<-par(mfrow=c(2,2)) plot(ova, which = "likelihood") plot(ova, which = "change-point") plot(ova, y=data.frame(age=60), which="survival", add=FALSE, type="l", xlab="Days", main="Age = 60") plot(ova, y=data.frame(age=65), which="survival", add=FALSE, type="l", xlab="Days", main="Age = 65") par(op)
Maximum approximate Bernstein/Beta likelihood estimation in general proportional odds regression model based on interal censored event time data.
mable.po( formula, data, M, g = NULL, tau, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
mable.po( formula, data, M, g = NULL, tau, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
formula |
regression formula. Response must be |
data |
a data frame containing variables in |
M |
a positive integer or a vector |
g |
an initial guess of |
tau |
right endpoint of support |
x0 |
a data frame specifying working baseline covariates on the right-hand-side of |
controls |
Object of class |
progress |
if |
Consider PO model with covariate for interval-censored failure time data:
,
where
satisfies
, where
and
may
contain dummy variables and interaction terms. The working baseline
x0
in arguments
contains only the values of terms excluding dummy variables and interaction terms
in the right-hand-side of formula
. Thus g
is the initial guess of
the coefficients of
and could be longer than
x0
.
Let and
be the density and cumulative distribution
functions of the event time given
, respectively.
Then
on
can be approximated by
,
where
,
,
,
is the beta denity with shapes
and
, and
is the right endpoint of support interval of the baseline density.
We can approximate
on
by
, where
,
, is the beta survival function with shapes
and
.
Response variable should be of the form cbind(l,u)
, where (l,u)
is the interval
containing the event time. Data is uncensored if l = u
, right censored
if u = Inf
or u = NA
, and left censored if l = 0
.
The associated covariate contains columns. The baseline
x0
should chosen so
that is nonnegative for all the observed
and
all
in a neighborhood of its true value.
A missing initial value of g
is imputed by ic_sp()
of package icenReg
with model="po"
.
The search for optimal degree m
stops if either m1
is reached or the test
for change-point results in a p-value pval
smaller than sig.level
.
This process takes longer than maple.po
to select an optimal degree.
A list with components
m
the selected/preselected optimal degree m
p
the estimate of , the coefficients of
Bernstein polynomial of degree
m
coefficients
the estimated regression coefficients of the PO model
SE
the standard errors of the estimated regression coefficients
z
the z-scores of the estimated regression coefficients
mloglik
the maximum log-likelihood at an optimal degree m
tau.n
maximum observed time
tau
right endpoint of support
x0
the working baseline covariates
egx0
the value of
convergence
an integer code, 1 indicates either the EM-like iteration
for finding maximum likelihood reached the maximum iteration for at least one m
or the search of an optimal degree using change-point method reached the maximum
candidate degree, 2 indicates both occured, and 0 indicates a successful completion.
delta
the final delta
if m0 = m1
or the final pval
of
the change-point for searching the optimal degree m
;
and, if m0<m1
,
M
the vector (m0, m1)
, where m1
is the last candidate degree
when the search stoped
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
Zhong Guan <[email protected]>
Guan, Z. Maximum Likelihood Estimation in Proportional Odds Regression Model Based on Interval-Censored Event-time Data
# Veteran's Administration Lung Cancer Data require(survival) require(icenReg) require(mable) l<-veteran$time->u u[veteran$status==0]<-Inf veteran1<-data.frame(l=l, u=u, karno=veteran$karno, celltype=veteran$celltype, trt=veteran$trt, age=veteran$age, prior=veteran$prior>0) fit.sp<-ic_sp(cbind(l,u) ~ karno+celltype, data = veteran1, model="po") x0<-data.frame(karno=100, celltype="squamous") tau<-2000 res<-mable.po(cbind(l,u) ~ karno+celltype, data = veteran1, M=c(1,35), g=-fit.sp$coefficients, x0=x0, tau=tau) op<-par(mfrow=c(2,2)) plot(res, which = "likelihood") plot(res, which = "change-point") plot(res, y=data.frame(karno=20, celltype="squamous"), which="survival", add=FALSE, type="l", xlab="Days", main=expression(paste("Survival: ", bold(x)==0))) plot(res, y=data.frame(karno=80, celltype="smallcell"), which="survival", add=FALSE, type="l", xlab="Days", main=expression(paste("Survival: ", bold(x)==bold(x)[0]))) par(op)
# Veteran's Administration Lung Cancer Data require(survival) require(icenReg) require(mable) l<-veteran$time->u u[veteran$status==0]<-Inf veteran1<-data.frame(l=l, u=u, karno=veteran$karno, celltype=veteran$celltype, trt=veteran$trt, age=veteran$age, prior=veteran$prior>0) fit.sp<-ic_sp(cbind(l,u) ~ karno+celltype, data = veteran1, model="po") x0<-data.frame(karno=100, celltype="squamous") tau<-2000 res<-mable.po(cbind(l,u) ~ karno+celltype, data = veteran1, M=c(1,35), g=-fit.sp$coefficients, x0=x0, tau=tau) op<-par(mfrow=c(2,2)) plot(res, which = "likelihood") plot(res, which = "change-point") plot(res, y=data.frame(karno=20, celltype="squamous"), which="survival", add=FALSE, type="l", xlab="Days", main=expression(paste("Survival: ", bold(x)==0))) plot(res, y=data.frame(karno=80, celltype="smallcell"), which="survival", add=FALSE, type="l", xlab="Days", main=expression(paste("Survival: ", bold(x)==bold(x)[0]))) par(op)
Wrapping all mable
fit of regression models in one function.
Using maximum approximate Bernstein/Beta likelihood
estimation to fit semiparametric regression models: Cox ph model,
proportional odds(po) model, accelerated failure time model, and so on.
mable.reg( formula, data, model = c("ph", "aft", "po"), M, g = NULL, pi0 = NULL, tau = Inf, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
mable.reg( formula, data, model = c("ph", "aft", "po"), M, g = NULL, pi0 = NULL, tau = Inf, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
formula |
regression formula. Response must be of the form |
data |
a data frame containing variables in |
model |
the model to fit. Current options are " |
M |
a vector |
g |
an initial guess of the regression coefficients |
pi0 |
Initial guess of |
tau |
right endpoint of support |
x0 |
a data frame containing working baseline covariates on the right-hand-side of |
controls |
Object of class |
progress |
if |
For "ph
" model a missing initial guess of the regression coefficients
g
is obtained by ic_sp()
of package icenReg
. For "aft
" model a
missing g
is imputed by the rank estimate aftsrr()
of package aftgee
for right-censored data. For general interval censored observations, we keep the
right-censored but replace the finite interval with its midpoint and fit the data by
aftsrr()
as a right-censored data.
A 'mable_reg' class object
Zhong Guan <[email protected]>
Minimum Approximate Distance Estimate of Copula Density
made.copula( x, unif.mar = FALSE, M = 30, search = TRUE, interval = NULL, pseudo.obs = c("empirical", "mable"), sig.level = 0.01 )
made.copula( x, unif.mar = FALSE, M = 30, search = TRUE, interval = NULL, pseudo.obs = c("empirical", "mable"), sig.level = 0.01 )
x |
an |
unif.mar |
marginals are all uniform ( |
M |
d-vector of preselected or maximum model degrees |
search |
logical, whether to search optimal degrees between |
interval |
a 2 by d matrix specifying the support/truncate interval of |
pseudo.obs |
When |
sig.level |
significance level for p-value of change-point |
With given model degrees m
, the parameters p
, the mixing
proportions of the beta distribution, are calculated as the minimizer of the
approximate distance between the empirical distribution and
the Bernstein polynomial model. The optimal model degrees
m
are chosen by
a change-point method. The quadratic programming with linear constraints is
used to solve the problem.
An invisible mable
object with components
m
the given degree
p
the estimated vector of mixture proportions
with the given degree
m
D
the minimum distance at degree m
Minimum Approximate Distance Estimate of Density Function with an optimal model degree
made.density( x, M0 = 1L, M, search = TRUE, interval = NULL, mar.deg = TRUE, method = c("qp", "em"), controls = mable.ctrl(), progress = TRUE )
made.density( x, M0 = 1L, M, search = TRUE, interval = NULL, mar.deg = TRUE, method = c("qp", "em"), controls = mable.ctrl(), progress = TRUE )
x |
an |
M0 |
a positive integer or a vector of |
M |
a positive integer or a vector of |
search |
logical, whether to search optimal degrees between |
interval |
a vector of two endpoints or a |
mar.deg |
logical, if TRUE, the optimal degrees are selected based on marginal data, otherwise, the optimal degrees are chosen the joint data. See details. |
method |
method for finding minimum distance estimate. "em": EM like method; |
controls |
Object of class |
progress |
if TRUE a text progressbar is displayed |
A -variate cdf
on a hyperrectangle
can be approximated
by a mixture of
-variate beta cdfs on
,
,
with proportion
,
.
With a given model degree
m
, the parameters p
, the mixing
proportions of the beta distribution, are calculated as the minimizer of the
approximate distance between the empirical distribution and
the Bernstein polynomial model. The quadratic programming with linear constraints
is used to solve the problem.
If
search=TRUE
then the model degrees are chosen using a method of change-point based on
the marginal data if mar.deg=TRUE
or the joint data if mar.deg=FALSE
.
If search=FALSE
, then the model degree is specified by .
An invisible mable
object with components
m
the given model degree(s)
p
the estimated vector of mixture proportions
with the given optimal degree(s) m
interval
support/truncation interval [a, b]
D
the minimum distance at degree m
convergence
An integer code. 0 indicates successful completion(the EM iteration is
convergent). 1 indicates that the iteration limit maxit
had been reached in the EM iteration;
Minimum Approximate Distance Estimate of Multivariate Density Function
made.mvar( x, M0 = 1L, M, search = TRUE, interval = NULL, mar.deg = TRUE, method = c("cd", "quadprog"), controls = mable.ctrl(), progress = TRUE )
made.mvar( x, M0 = 1L, M, search = TRUE, interval = NULL, mar.deg = TRUE, method = c("cd", "quadprog"), controls = mable.ctrl(), progress = TRUE )
x |
an |
M0 |
a positive integer or a vector of |
M |
a positive integer or a vector of |
search |
logical, whether to search optimal degrees between |
interval |
a vector of two endpoints or a |
mar.deg |
logical, if TRUE, the optimal degrees are selected based on marginal data, otherwise, the optimal degrees are chosen the joint data. See details. |
method |
method for finding minimum distance estimate. "cd": coordinate-descent; |
controls |
Object of class |
progress |
if TRUE a text progressbar is displayed |
A -variate density
on a hyperrectangle
can be approximated
by a mixture of
-variate beta densities on
,
,
with proportion
,
.
If
search=TRUE
then the model degrees are chosen using a method of change-point based on
the marginal data if mar.deg=TRUE
or the joint data if mar.deg=FALSE
.
If search=FALSE
, then the model degree is specified by .
For large data and multimodal density, the search for the model degrees is
very time-consuming. In this case, it is suggested that use
method="cd"
and select the degrees based on marginal data using mable
or
optimable
.
A list with components
m
a vector of the selected optimal degrees by the method of change-point
p
a vector of the mixture proportions , arranged in the
column-major order of
,
.
minD
the minimum distance at an optimal degree m
pval
the p-values of change-points for choosing the optimal degrees for the
marginal densities
M
the vector (m1, m2, ... , md)
, where mi
is the largest candidate
degree when the search stoped for the i
-th marginal density
interval
support hyperrectangle
convergence
An integer code. 0 indicates successful completion(the EM iteration is
convergent). 1 indicates that the iteration limit maxit
had been reached in the EM iteration;
Zhong Guan <[email protected]>
Guan, Z. (2016) Efficient and robust density estimation using Bernstein type polynomials. Journal of Nonparametric Statistics, 28(2):250-271.
Wang, T. and Guan, Z.,(2019) Bernstein Polynomial Model for Nonparametric Multivariate Density, Statistics, Vol. 53, no. 2, 321-338
## Old Faithful Data library(mable) a<-c(0, 40); b<-c(7, 110) ans<- made.mvar(faithful, M = c(46,19), search =FALSE, method="quadprog", interval = rbind(a,b), progress=FALSE) plot(ans, which="density") plot(ans, which="cumulative")
## Old Faithful Data library(mable) a<-c(0, 40); b<-c(7, 110) ans<- made.mvar(faithful, M = c(46,19), search =FALSE, method="quadprog", interval = rbind(a,b), progress=FALSE) plot(ans, which="density") plot(ans, which="cumulative")
Minimum Approximate Distance Estimate of Copula with given model degrees
madem.copula(u, m)
madem.copula(u, m)
u |
an |
m |
|
With given model degrees m
, the parameters p
, the mixing
proportions of the beta distribution, are calculated as the minimizer of the
approximate distance between the empirical distribution and
the Bernstein polynomial model.
An invisible mable
object with components
m
the given degree
p
the estimated vector of mixture proportions
with the given degree
m
D
the minimum distance at degree m
Minimum Approximate Distance Estimate of univariate Density Function with given model degree(s)
madem.density( x, m, p = rep(1, prod(m + 1))/prod(m + 1), interval = NULL, method = c("qp", "em"), maxit = 10000, eps = 1e-07 )
madem.density( x, m, p = rep(1, prod(m + 1))/prod(m + 1), interval = NULL, method = c("qp", "em"), maxit = 10000, eps = 1e-07 )
x |
an |
m |
a positive integer or a vector of |
p |
initial guess of |
interval |
a vector of two endpoints or a |
method |
method for finding minimum distance estimate. "em": EM like method; |
maxit |
the maximum iterations |
eps |
the criterion for convergence |
A -variate cdf
on a hyperrectangle
can be approximated
by a mixture of
-variate beta cdfs on
,
,
with proportion
,
.
With a given model degree
m
, the parameters p
, the mixing
proportions of the beta distribution, are calculated as the minimizer of the
approximate distance between the empirical distribution and
the Bernstein polynomial model. The quadratic programming with linear constraints
is used to solve the problem.
An invisible mable
object with components
m
the given model degree(s)
p
the estimated vector of mixture proportions
with the given optimal degree(s) m
interval
support/truncation interval [a, b]
D
the minimum distance at degree m
Maximum approximate profile likelihood estimation of Bernstein polynomial model in accelerated failure time based on interal censored event time data with given regression coefficients which are efficient estimates provided by other semiparametric methods.
maple.aft( formula, data, M, g, tau = NULL, p = NULL, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
maple.aft( formula, data, M, g, tau = NULL, p = NULL, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
formula |
regression formula. Response must be |
data |
a data frame containing variables in |
M |
a positive integer or a vector |
g |
the given |
tau |
the right endpoint of the support or truncation interval |
p |
an initial coefficients of Bernstein polynomial of degree |
x0 |
a data frame specifying working baseline covariates on the right-hand-side of |
controls |
Object of class |
progress |
if |
Consider the accelerated failure time model with covariate for interval-censored failure time data:
, where
and
may
contain dummy variables and interaction terms. The working baseline
x0
in arguments
contains only the values of terms excluding dummy variables and interaction terms
in the right-hand-side of formula
. Thus g
is the initial guess of
the coefficients of
and could be longer than
x0
.
Let and
be the density and cumulative distribution
functions of the event time given
, respectively.
Then
on a support or truncation interval
can be approximated by
,
where
,
,
,
is the beta denity with shapes
and
, and
is larger than the largest observed time, either uncensored time, or right endpoint of interval/left censored,
or left endpoint of right censored time. We can approximate
on
by
, where
is
the beta survival function with shapes
and
.
Response variable should be of the form cbind(l, u)
, where (l,u)
is the interval
containing the event time. Data is uncensored if l = u
, right censored
if u = Inf
or u = NA
, and left censored data if l = 0
.
The truncation time tau
and the baseline x0
should be chosen so that
on
is negligible for
all the observed
.
The search for optimal degree m
stops if either m1
is reached or the test
for change-point results in a p-value pval
smaller than sig.level
.
A list with components
m
the selected optimal degree m
p
the estimate of , the coefficients of Bernstein polynomial of degree
m
coefficients
the given regression coefficients of the AFT model
SE
the standard errors of the estimated regression coefficients
z
the z-scores of the estimated regression coefficients
mloglik
the maximum log-likelihood at an optimal degree m
tau.n
maximum observed time
tau
right endpoint of trucation interval
x0
the working baseline covariates
egx0
the value of
convergence
an integer code, 1 indicates either the EM-like
iteration for finding maximum likelihood reached the maximum iteration for at least one m
or the search of an optimal degree using change-point method reached the maximum candidate degree,
2 indicates both occured, and 0 indicates a successful completion.
delta
the final delta
if m0 = m1
or the final pval
of the change-point
for searching the optimal degree m
;
and, if m0<m1
,
M
the vector (m0, m1)
, where m1
is the last candidate when the search stoped
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
Zhong Guan <[email protected]>
Guan, Z. (2019) Maximum Approximate Likelihood Estimation in Accelerated Failure Time Model for Interval-Censored Data, arXiv:1911.07087.
## Breast Cosmesis Data g<-0.41 #Hanson and Johnson 2004, JCGS, res1<-maple.aft(cbind(left, right)~treat, data=cosmesis, M=c(1,30), g=g, tau=100, x0=data.frame(treat="RCT")) op<-par(mfrow=c(1,2), lwd=1.5) plot(x=res1, which="likelihood") plot(x=res1, y=data.frame(treat="RT"), which="survival", model='aft', type="l", col=1, add=FALSE, main="Survival Function") plot(x=res1, y=data.frame(treat="RCT"), which="survival", model='aft', lty=2, col=1) legend("bottomleft", bty="n", lty=1:2, col=1, c("Radiation Only", "Radiation and Chemotherapy")) par(op)
## Breast Cosmesis Data g<-0.41 #Hanson and Johnson 2004, JCGS, res1<-maple.aft(cbind(left, right)~treat, data=cosmesis, M=c(1,30), g=g, tau=100, x0=data.frame(treat="RCT")) op<-par(mfrow=c(1,2), lwd=1.5) plot(x=res1, which="likelihood") plot(x=res1, y=data.frame(treat="RT"), which="survival", model='aft', type="l", col=1, add=FALSE, main="Survival Function") plot(x=res1, y=data.frame(treat="RCT"), which="survival", model='aft', lty=2, col=1) legend("bottomleft", bty="n", lty=1:2, col=1, c("Radiation Only", "Radiation and Chemotherapy")) par(op)
Select optimal degree with a given regression coefficients.
maple.dr( x, y, M, regr, ..., interval = c(0, 1), alpha = NULL, vb = 0, baseline = NULL, controls = mable.ctrl(), progress = TRUE, message = TRUE )
maple.dr( x, y, M, regr, ..., interval = c(0, 1), alpha = NULL, vb = 0, baseline = NULL, controls = mable.ctrl(), progress = TRUE, message = TRUE )
x , y
|
original two sample raw data, |
M |
a positive integer or a vector |
regr |
regressor vector function |
... |
additional arguments to be passed to regr |
interval |
a vector |
alpha |
a given regression coefficient, missing value is imputed by logistic regression |
vb |
code for vanishing boundary constraints, -1: f0(a)=0 only, 1: f0(b)=0 only, 2: both, 0: none (default). |
baseline |
the working baseline, "Control" or "Case", if |
controls |
Object of class |
progress |
logical: should a text progressbar be displayed |
message |
logical: should warning messages be displayed |
Suppose that ("control") and y
("case") are independent samples from
f0 and f1 which satisfy f1(x)=f0(x)exp[alpha0+alpha'r(x)]
with r(x)=(r1(x),...,r_d(x)). Maximum
approximate Bernstein/Beta likelihood estimates of f0 and f1 are calculated
with a given regression coefficients which are efficient estimates provided
by other semiparametric methods such as logistic regression.
If support is (a,b) then replace r(x) by r[a+(b-a)x].
For a fixed m
, using the Bernstein polynomial model for baseline ,
MABLEs of
and parameters alpha can be estimated by EM algorithm and Newton
iteration. If estimated lower bound
for
m
based on y
is smaller that that based on x
, then switch x
and y
and
is used as baseline. If
M=m
or m0=m1=m
, then m
is a
preselected degree. If m0<m1
it specifies the set of consective
candidate model degrees m0:m1
for searching an optimal degree by
the change-point method, where m1-m0>3
.
A list with components
m
the given or a selected degree by method of change-point
p
the estimated vector of mixture proportions
with the given or selected degree
m
alpha
the given regression coefficients
mloglik
the maximum log-likelihood at degree m
interval
support/truncation interval (a,b)
baseline
="control" if is used as baseline,
or ="case" if
is used as baseline.
M
the vector (m0, m1)
, where m1
, if greater than m0
, is the
largest candidate when the search stoped
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
Zhong Guan <[email protected]>
Guan, Z., Maximum Approximate Bernstein Likelihood Estimation of Densities in a Two-sample Semiparametric Model
Select optimal degree of Bernstein polynomial model for grouped data with a given regression coefficients.
maple.dr.group( t, n0, n1, M, regr, ..., interval = c(0, 1), alpha = NULL, vb = 0, controls = mable.ctrl(), progress = TRUE, message = TRUE )
maple.dr.group( t, n0, n1, M, regr, ..., interval = c(0, 1), alpha = NULL, vb = 0, controls = mable.ctrl(), progress = TRUE, message = TRUE )
t |
cutpoints of class intervals |
n0 , n1
|
frequencies of two sample data grouped by the classes
specified by |
M |
a positive integer or a vector |
regr |
regressor vector function |
... |
additional arguments to be passed to regr |
interval |
a vector |
alpha |
a given regression coefficient, missing value is imputed by logistic regression |
vb |
code for vanishing boundary constraints, -1: f0(a)=0 only, 1: f0(b)=0 only, 2: both, 0: none (default). |
controls |
Object of class |
progress |
logical: should a text progressbar be displayed |
message |
logical: should warning messages be displayed |
Suppose that n0
("control") and n1
("case") are frequencies of
independent samples grouped by the classes t
from f0 and f1 which
satisfy f1(x)=f0(x)exp[alpha0+alpha'r(x)] with r(x)=(r1(x),...,r_d(x)). Maximum
approximate Bernstein/Beta likelihood estimates of f0 and f1 are calculated
with a given regression coefficients which are efficient estimates provided
by other semiparametric methods such as logistic regression.
If support is (a,b) then replace r(x) by r[a+(b-a)x].
For a fixed m
, using the Bernstein polynomial model for baseline ,
MABLEs of
and parameters alpha can be estimated by EM algorithm and Newton
iteration. If estimated lower bound
for
m
based on n1
is smaller that that based on n0
, then switch n0
and n1
and
use as baseline. If
M=m
or m0=m1=m
, then m
is a
preselected degree. If m0<m1
it specifies the set of consective
candidate model degrees m0:m1
for searching an optimal degree by
the change-point method, where m1-m0>3
.
A list with components
m
the given or a selected degree by method of change-point
p
the estimated vector of mixture proportions
with the given or selected degree
m
alpha
the given regression coefficients
mloglik
the maximum log-likelihood at degree m
interval
support/truncation interval (a,b)
baseline
="control" if is used as baseline,
or ="case" if
is used as baseline.
M
the vector (m0, m1)
, where m1
, if greater than m0
, is the
largest candidate when the search stoped
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
Zhong Guan <[email protected]>
Guan, Z., Application of Bernstein Polynomial Model to Density and ROC Estimation in a Semiparametric Density Ratio Model
Maximum approximate profile likelihood estimation of Bernstein polynomial model in Cox's proportional hazards regression based on interal censored event time data with given regression coefficients which are efficient estimates provided by other semiparametric methods.
maple.ph( formula, data, M, g, pi0 = NULL, p = NULL, tau = Inf, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
maple.ph( formula, data, M, g, pi0 = NULL, p = NULL, tau = Inf, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
formula |
regression formula. Response must be |
data |
a data frame containing variables in |
M |
a positive integer or a vector |
g |
the given |
pi0 |
Initial guess of |
p |
an initial coefficients of Bernstein polynomial model of degree |
tau |
right endpoint of support |
x0 |
a data frame specifying working baseline covariates on the right-hand-side of |
controls |
Object of class |
progress |
if |
Consider Cox's PH model with covariate for interval-censored failure time data:
, where
satisfies
,
where
and
may
contain dummy variables and interaction terms. The working baseline
x0
in arguments
contains only the values of terms excluding dummy variables and interaction terms
in the right-hand-side of formula
. Thus g
is the initial guess of
the coefficients of
and could be longer than
x0
.
Let and
be the density and cumulative distribution
functions of the event time given
, respectively.
Then
on
can be approximated by
,
where
,
,
,
is the beta denity with shapes
and
, and
is the largest observed time, either uncensored time, or right endpoint of interval/left censored,
or left endpoint of right censored time. So we can approximate
on
by
, where
,
, is the beta survival function with shapes
and
,
,
, and
. For data without right-censored time,
Response variable should be of the form cbind(l, u)
, where (l, u)
is the interval
containing the event time. Data is uncensored if l = u
, right censored
if u = Inf
or u = NA
, and left censored data if l = 0
.
The associated covariate contains columns. The baseline
x0
should chosen so that
is nonnegative for all the observed
.
The search for optimal degree m
stops if either m1
is reached or the test
for change-point results in a p-value pval
smaller than sig.level
.
a class 'mable_reg
' object, a list with components
M
the vector (m0, m1)
, where m1
is the last candidate degree when the search stoped
m
the selected optimal degree m
p
the estimate of , the coefficients of Bernstein polynomial of degree
m
coefficients
the given regression coefficients of the PH model
mloglik
the maximum log-likelihood at an optimal degree m
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
tau.n
maximum observed time
tau
right endpoint of support
x0
the working baseline covariates
egx0
the value of
convergence
an integer code. 0 indicates successful completion(the iteration is
convergent). 1 indicates that the maximum candidate degree had been reached in the calculation;
delta
the final convergence criterion for EM iteration;
chpts
the change-points among the candidate degrees;
pom
the p-value of the selected optimal degree m
as a change-point;
Zhong Guan <[email protected]>
Guan, Z. (2019) Maximum Approximate Bernstein Likelihood Estimation in Proportional Hazard Model for Interval-Censored Data, arXiv:1906.08882 .
## Simulated Weibull data require(icenReg) set.seed(123) simdata<-simIC_weib(70, inspections = 5, inspectLength = 1) sp<-ic_sp(cbind(l, u) ~ x1 + x2, data = simdata) res0<-maple.ph(cbind(l, u) ~ x1 + x2, data = simdata, M=c(2,20), g=sp$coefficients, tau=7) op<-par(mfrow=c(1,2)) plot(res0, which=c("likelihood","change-point")) par(op) res1<-mable.ph(cbind(l, u) ~ x1 + x2, data = simdata, M=res0$m, g=c(.5,-.5), tau=7) op<-par(mfrow=c(1,2)) plot(res1, y=data.frame(x1=0, x2=0), which="density", add=FALSE, type="l", xlab="Time", main="Desnity Function") lines(xx<-seq(0, 7, len=512), dweibull(xx, 2,2), lty=2, col=2) legend("topright", bty="n", lty=1:2, col=1:2, c("Estimated","True")) plot(res1, y=data.frame(x1=0, x2=0), which="survival", add=FALSE, type="l", xlab="Time", main="Survival Function") lines(xx, 1-pweibull(xx, 2, 2), lty=2, col=2) legend("topright", bty="n", lty=1:2, col=1:2, c("Estimated","True")) par(op)
## Simulated Weibull data require(icenReg) set.seed(123) simdata<-simIC_weib(70, inspections = 5, inspectLength = 1) sp<-ic_sp(cbind(l, u) ~ x1 + x2, data = simdata) res0<-maple.ph(cbind(l, u) ~ x1 + x2, data = simdata, M=c(2,20), g=sp$coefficients, tau=7) op<-par(mfrow=c(1,2)) plot(res0, which=c("likelihood","change-point")) par(op) res1<-mable.ph(cbind(l, u) ~ x1 + x2, data = simdata, M=res0$m, g=c(.5,-.5), tau=7) op<-par(mfrow=c(1,2)) plot(res1, y=data.frame(x1=0, x2=0), which="density", add=FALSE, type="l", xlab="Time", main="Desnity Function") lines(xx<-seq(0, 7, len=512), dweibull(xx, 2,2), lty=2, col=2) legend("topright", bty="n", lty=1:2, col=1:2, c("Estimated","True")) plot(res1, y=data.frame(x1=0, x2=0), which="survival", add=FALSE, type="l", xlab="Time", main="Survival Function") lines(xx, 1-pweibull(xx, 2, 2), lty=2, col=2) legend("topright", bty="n", lty=1:2, col=1:2, c("Estimated","True")) par(op)
Maximum approximate profile likelihood estimation of Bernstein polynomial model in proportional odds rate regression based on interal censored event time data with given regression coefficients and select an optimal degree m if coefficients are efficient estimates provided by other semiparametric methods.
maple.po( formula, data, M, g, tau, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
maple.po( formula, data, M, g, tau, x0 = NULL, controls = mable.ctrl(), progress = TRUE )
formula |
regression formula. Response must be |
data |
a data frame containing variables in |
M |
a positive integer or a vector |
g |
the given |
tau |
right endpoint of support |
x0 |
a data frame specifying working baseline covariates on the right-hand-side of |
controls |
Object of class |
progress |
if |
Consider Generalized PO model with covariate for interval-censored failure time data:
, where
satisfies
, where
and
may
contain dummy variables and interaction terms. The working baseline
x0
in arguments
contains only the values of terms excluding dummy variables and interaction terms
in the right-hand-side of formula
. Thus g
is the initial guess of
the coefficients of
and could be longer than
x0
.
Let and
be the density and cumulative distribution
functions of the event time given
, respectively.
Then
on
can be approximated by
,
where
,
,
,
is the beta denity with shapes
and
,
and
is the right endpoint of support interval. So we can approximate
on
by
, where
,
, is the beta survival function
with shapes
and
.
Response variable should be of the form cbind(l, u)
, where
(l, u)
is the interval containing the event time. Data are
uncensored if l = u
, right censored if u = Inf
or
u = NA
, and left censored data if l = 0
. The associated
covariate contains columns. The baseline
x0
should chosen
so that is nonnegative for all the observed
.
The search for optimal degree m
stops if either m1
is reached or the test for change-point results in a p-value pval
smaller than sig.level
.
a class 'mable_reg
' object, a list with components
M
the vector (m0, m1)
, where m1
is the last
candidate degree when the search stoped
m
the selected optimal degree m
p
the estimate of ,
the coefficients of Bernstein polynomial of degree
m
coefficients
the given regression coefficients of the PH model
mloglik
the maximum log-likelihood at an optimal degree m
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
tau.n
maximum observed time
tau
right endpoint of support
x0
the working baseline covariates
egx0
the value of
convergence
an integer code, 0 indicates successful
completion(the iteration is convergent), 1 indicates that
the maximum candidate degree had been reached in the calculation;
delta
the final convergence criterion for EM iteration;
chpts
the change-points among the candidate degrees;
pom
the p-value of the selected optimal degree m
as a change-point;
Zhong Guan <[email protected]>
Guan, Z. et al. (???) Maximum Approximate Bernstein Likelihood Estimation in Generalized Proportional Odds Regression Model for Interval-Censored Data
## Simulated Weibull data require(icenReg) set.seed(111) simdata<-simIC_weib(100, model = "po", inspections = 2, inspectLength = 2.5, prob_cen=1) sp<-ic_sp(cbind(l, u) ~ x1 + x2, data = simdata, model="po") gt<--sp$coefficients res0<-maple.po(cbind(l, u) ~ x1 + x2, data = simdata, M=c(1,20), g=gt, tau=6) op<-par(mfrow=c(1,2)) plot(res0, which=c("likelihood","change-point")) par(op) res1<-mable.po(cbind(l, u) ~ x1 + x2, data = simdata, M=c(1,20), g=gt, tau=6, x0=data.frame(x1=max(simdata$x1),x2=-1)) op<-par(mfrow=c(2,2)) plot(res1, which=c("likelihood","change-point")) plot(res0, y=data.frame(x1=0,x2=0), which="density", add=FALSE, type="l", xlab="Time", main="Desnity Function") plot(res1, y=data.frame(x1=0,x2=0), which="density", add=TRUE, lty=2, col=4) lines(xx<-seq(0, 7, len=512), dweibull(xx, 2,2), lty=3, col=2, lwd=1.5) legend("topright", bty="n", lty=1:3, col=c(1,4,2), c(expression(hat(f)[0]), expression(tilde(f)[0]), expression(f[0]))) plot(res0, y=data.frame(x1=0,x2=0), which="survival", add=FALSE, type="l", xlab="Time", main="Survival Function") plot(res1, y=data.frame(x1=0,x2=0), which="survival", add=TRUE, lty=2, col=4) lines(xx, 1-pweibull(xx, 2, 2), lty=2, col=2) legend("topright", bty="n", lty=1:3, col=c(1,4,2), c(expression(hat(S)[0]), expression(tilde(S)[0]), expression(S[0]))) par(op)
## Simulated Weibull data require(icenReg) set.seed(111) simdata<-simIC_weib(100, model = "po", inspections = 2, inspectLength = 2.5, prob_cen=1) sp<-ic_sp(cbind(l, u) ~ x1 + x2, data = simdata, model="po") gt<--sp$coefficients res0<-maple.po(cbind(l, u) ~ x1 + x2, data = simdata, M=c(1,20), g=gt, tau=6) op<-par(mfrow=c(1,2)) plot(res0, which=c("likelihood","change-point")) par(op) res1<-mable.po(cbind(l, u) ~ x1 + x2, data = simdata, M=c(1,20), g=gt, tau=6, x0=data.frame(x1=max(simdata$x1),x2=-1)) op<-par(mfrow=c(2,2)) plot(res1, which=c("likelihood","change-point")) plot(res0, y=data.frame(x1=0,x2=0), which="density", add=FALSE, type="l", xlab="Time", main="Desnity Function") plot(res1, y=data.frame(x1=0,x2=0), which="density", add=TRUE, lty=2, col=4) lines(xx<-seq(0, 7, len=512), dweibull(xx, 2,2), lty=3, col=2, lwd=1.5) legend("topright", bty="n", lty=1:3, col=c(1,4,2), c(expression(hat(f)[0]), expression(tilde(f)[0]), expression(f[0]))) plot(res0, y=data.frame(x1=0,x2=0), which="survival", add=FALSE, type="l", xlab="Time", main="Survival Function") plot(res1, y=data.frame(x1=0,x2=0), which="survival", add=TRUE, lty=2, col=4) lines(xx, 1-pweibull(xx, 2, 2), lty=2, col=2) legend("topright", bty="n", lty=1:3, col=c(1,4,2), c(expression(hat(S)[0]), expression(tilde(S)[0]), expression(S[0]))) par(op)
The mixing proportions of marginal distribution from the mixture of multivariate beta distribution
marginal.p(p, m)
marginal.p(p, m)
p |
the mixing proportions of the mixture of multivariate beta distribution |
m |
the model degrees |
a list of mixing proportions of all the marginal distributions
Multivariate empirical cumulative distribution evaluated at sample data
mvecdf(x)
mvecdf(x)
x |
an |
a vector of n
values
Component Beta cumulative distribution functions of the Bernstein polynomial model
mvpbeta(x, m)
mvpbeta(x, m)
x |
|
m |
vector of d nonnegative integers |
an n x K
matrix, K=(m[1]+1)...(m[d]+1)
.
Choose an optimal degree using gamma change-point model with two changing shape and scale parameters.
optim.gcp(obj)
optim.gcp(obj)
obj |
a class "mable" or 'mable_reg' object containig a vector |
a list with components
m
the selected optimal degree m
M
the vector (m0, m1)
, where m1
is the last candidate when the search stoped
mloglik
the maximum log-likelihood at degree m
interval
support/truncation interval (a, b)
lk
log-likelihoods evaluated at
lr
likelihood ratios for change-points evaluated at
pval
the p-values of the change-point tests for choosing optimal model degree
chpts
the change-points chosen with the given candidate model degrees
# simulated data p<-c(1:5,5:1) p<-p/sum(p) x<-rmixbeta(100, p) res1<-mable(x, M=c(2, 50), IC="none") m1<-res1$m[1] res2<-optim.gcp(res1) m2<-res2$m op<-par(mfrow=c(1,2)) plot(res1, which="likelihood", add=FALSE) plot(res2, which="likelihood") #segments(m2, min(res1$lk), m2, res2$mloglik, col=4) plot(res1, which="change-point", add=FALSE) plot(res2, which="change-point") par(op)
# simulated data p<-c(1:5,5:1) p<-p/sum(p) x<-rmixbeta(100, p) res1<-mable(x, M=c(2, 50), IC="none") m1<-res1$m[1] res2<-optim.gcp(res1) m2<-res2$m op<-par(mfrow=c(1,2)) plot(res1, which="likelihood", add=FALSE) plot(res2, which="likelihood") #segments(m2, min(res1$lk), m2, res2$mloglik, col=4) plot(res1, which="change-point", add=FALSE) plot(res2, which="change-point") par(op)
Maximum Approximate Bernstein/Beta Likelihood Estimation with an optimal model degree estimated by the Method of Moment
optimable( x, interval, m = NULL, mu = NULL, lam = NULL, modes = NULL, nmod = 1, ushaped = FALSE, maxit = 50L )
optimable( x, interval, m = NULL, mu = NULL, lam = NULL, modes = NULL, nmod = 1, ushaped = FALSE, maxit = 50L )
x |
a univariate sample data in |
interval |
a closed interval |
m |
initial degree, default is 2 times the number of modes |
mu |
a vector of component means of multimodal mixture density, default is NULL for unimodal or unknown |
lam |
a vector of mixture proportions of same length of |
modes |
a vector of the locations of modes, if it is NULL (default) and
|
nmod |
the number of modes, if |
ushaped |
logical, whether or not the density is clearly U-shaped including J- and L-shaped with mode occurs at the endpoint of the support. |
maxit |
maximum iterations |
If the data show a clear uni- or multi-modal distribution, then give
the value of nmod
as the number of modes. Otherwise nmod
=0.
The degree is estimated by the iterative method of moment with an initial
degree estimated by the method of mode. For multimodal density,
if useful estimates of the component means mu
and proportions
lam
are available then they can be used to give an initial degree.
If the distribution is clearly U-, J-, or L-shaped, i.e., the mode occurs
at the endpoint of interval
, then set ushaped
=TRUE.
In this case the degree is estimated by the method of mode.
A class "mable" object with components
m
the given or a selected degree by method of change-point
p
the estimated vector of mixture proportions
with the selected/given optimal degree
m
mloglik
the maximum log-likelihood at degree m
interval
support/truncation interval (a,b)
convergence
An integer code. 0 indicates successful completion
(all the EM iterations are convergent and an optimal degree
is successfully selected in M
). Possible error codes are
1, indicates that the iteration limit maxit
had been
reached in at least one EM iteration;
2, the search did not finish before m1
.
delta
the convergence criterion delta
value
Zhong Guan <[email protected]>
## Old Faithful Data x<-faithful x1<-faithful[,1] x2<-faithful[,2] a<-c(0, 40); b<-c(7, 110) mu<-(apply(x,2,mean)-a)/(b-a) s2<-apply(x,2,var)/(b-a)^2 # mixing proportions lambda<-c(mean(x1<3),mean(x2<65)) # guess component mean mu1<-(c(mean(x1[x1<3]), mean(x2[x2<65]))-a)/(b-a) mu2<-(c(mean(x1[x1>=3]), mean(x2[x2>=65]))-a)/(b-a) # estimate lower bound for m mb<-ceiling((mu*(1-mu)-s2)/(s2-lambda*(1-lambda)*(mu1-mu2)^2)-2) mb m1<-optimable(x1, interval=c(a[1],b[1]), nmod=2, modes=c(2,4.5))$m m2<-optimable(x2, interval=c(a[2],b[2]), nmod=2, modes=c(52.5,80))$m m1;m2 erupt1<-mable(x1, M=mb[1], interval=c(a[1],b[1])) erupt2<-mable(x1, M=m1, interval=c(a[1],b[1])) wait1<-mable(x2, M=mb[2],interval=c(a[2],b[2])) wait2<-mable(x2, M=m2,interval=c(a[2],b[2])) ans1<- mable.mvar(faithful, M = mb, search =FALSE, method="em", interval = cbind(a,b)) ans2<- mable.mvar(faithful, M = c(m1,m2), search =FALSE, method="em", interval = cbind(a,b)) op<-par(mfrow=c(1,2), cex=0.8) hist(x1, probability = TRUE, col="grey", border="white", main="", xlab="Eruptions", ylim=c(0,.65), las=1) plot(erupt1, add=TRUE,"density") plot(erupt2, add=TRUE,"density",lty=2,col=2) legend("topleft", lty=c(1,2),col=1:2, bty="n", cex=.7, c(expression(paste("m = ", m[b])),expression(paste("m = ", hat(m))))) hist(x2, probability = TRUE, col="grey", border="white", main="", xlab="Waiting", las=1) plot(wait1, add=TRUE,"density") plot(wait2, add=TRUE,"density",lty=2,col=2) legend("topleft", lty=c(1,2),col=1:2, bty="n", cex=.7, c(expression(paste("m = ", m[b])),expression(paste("m = ", hat(m))))) par(op) op<-par(mfrow=c(1,2), cex=0.7) plot(ans1, which="density", contour=TRUE) plot(ans2, which="density", contour=TRUE, add=TRUE, lty=2, col=2) plot(ans1, which="cumulative", contour=TRUE) plot(ans2, which="cumulative", contour=TRUE, add=TRUE, lty=2, col=2) par(op)
## Old Faithful Data x<-faithful x1<-faithful[,1] x2<-faithful[,2] a<-c(0, 40); b<-c(7, 110) mu<-(apply(x,2,mean)-a)/(b-a) s2<-apply(x,2,var)/(b-a)^2 # mixing proportions lambda<-c(mean(x1<3),mean(x2<65)) # guess component mean mu1<-(c(mean(x1[x1<3]), mean(x2[x2<65]))-a)/(b-a) mu2<-(c(mean(x1[x1>=3]), mean(x2[x2>=65]))-a)/(b-a) # estimate lower bound for m mb<-ceiling((mu*(1-mu)-s2)/(s2-lambda*(1-lambda)*(mu1-mu2)^2)-2) mb m1<-optimable(x1, interval=c(a[1],b[1]), nmod=2, modes=c(2,4.5))$m m2<-optimable(x2, interval=c(a[2],b[2]), nmod=2, modes=c(52.5,80))$m m1;m2 erupt1<-mable(x1, M=mb[1], interval=c(a[1],b[1])) erupt2<-mable(x1, M=m1, interval=c(a[1],b[1])) wait1<-mable(x2, M=mb[2],interval=c(a[2],b[2])) wait2<-mable(x2, M=m2,interval=c(a[2],b[2])) ans1<- mable.mvar(faithful, M = mb, search =FALSE, method="em", interval = cbind(a,b)) ans2<- mable.mvar(faithful, M = c(m1,m2), search =FALSE, method="em", interval = cbind(a,b)) op<-par(mfrow=c(1,2), cex=0.8) hist(x1, probability = TRUE, col="grey", border="white", main="", xlab="Eruptions", ylim=c(0,.65), las=1) plot(erupt1, add=TRUE,"density") plot(erupt2, add=TRUE,"density",lty=2,col=2) legend("topleft", lty=c(1,2),col=1:2, bty="n", cex=.7, c(expression(paste("m = ", m[b])),expression(paste("m = ", hat(m))))) hist(x2, probability = TRUE, col="grey", border="white", main="", xlab="Waiting", las=1) plot(wait1, add=TRUE,"density") plot(wait2, add=TRUE,"density",lty=2,col=2) legend("topleft", lty=c(1,2),col=1:2, bty="n", cex=.7, c(expression(paste("m = ", m[b])),expression(paste("m = ", hat(m))))) par(op) op<-par(mfrow=c(1,2), cex=0.7) plot(ans1, which="density", contour=TRUE) plot(ans2, which="density", contour=TRUE, add=TRUE, lty=2, col=2) plot(ans1, which="cumulative", contour=TRUE) plot(ans2, which="cumulative", contour=TRUE, add=TRUE, lty=2, col=2) par(op)
Contain sera measurements from 51 control patients with pancreatitis and 90 case patients with pancreatic cancer at the Mayo Clinic with a cancer antigen, CA125, and with a carbohydrate antigen, CA19-9 (Wieand, et al, 1989)
data(pancreas)
data(pancreas)
A data frame with 141 rows and 3 variables.
ca199. CA19-9 levels
ca125. CA125 levels
status. 0 = controls (non-cancer) and 1 = cases (cancer).
Wieand, S., Gail, M. H., James, B. R., and James, K.L. (1989). A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika, 76, 585–592.
Wieand, S., Gail, M. H., James, B. R., and James, K.L. (1989). A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika, 76, 585–592.
data(pancreas)
data(pancreas)
Plot mathod for class 'mable'
## S3 method for class 'mable' plot( x, which = c("density", "cumulative", "survival", "likelihood", "change-point", "all"), add = FALSE, contour = FALSE, lgd.x = NULL, lgd.y = NULL, nx = 512, ... )
## S3 method for class 'mable' plot( x, which = c("density", "cumulative", "survival", "likelihood", "change-point", "all"), add = FALSE, contour = FALSE, lgd.x = NULL, lgd.y = NULL, nx = 512, ... )
x |
Class "mable" object return by |
which |
indicates which graphs to plot, options are
"density", "cumulative", "likelihood", "change-point", "all". If not "all",
|
add |
logical add to an existing plot or not |
contour |
logical plot contour or not for two-dimensional data |
lgd.x , lgd.y
|
coordinates of position where the legend is displayed |
nx |
number of evaluations of density, or cumulative distribution curve to be plotted. |
... |
additional arguments to be passed to the base plot function |
The data used for 'plot()', 'lines()', or 'persp()' are returned invisibly.
Plot mathod for class 'mable_reg'
## S3 method for class 'mable_reg' plot( x, y, newdata = NULL, ntime = 512, xlab = "Time", which = c("survival", "likelihood", "change-point", "density", "all"), add = FALSE, ... )
## S3 method for class 'mable_reg' plot( x, y, newdata = NULL, ntime = 512, xlab = "Time", which = c("survival", "likelihood", "change-point", "density", "all"), add = FALSE, ... )
x |
a class 'mable_reg' object return by functions such as |
y |
a new data.frame of covariate value(s) as row(s), whose columns are
arranged in the same order as in the |
newdata |
a new data.frame (ignored if |
ntime |
number of evaluations of density, survival or cumulative distribution curve to be plotted. |
xlab |
x-axis label |
which |
indicates which graphs to plot, options are
"survival", "likelihood", "change-point", "density", or "all". If not "all",
|
add |
logical add to an existing plot or not |
... |
additional arguments to be passed to the base plot function |
Zhong Guan <[email protected]>
Bootstrap estimates of standard errors for the regression coefficients which are estimated by maximum approximate Bernstein/Beta likelihood estimation method in a density ratio model based on two-sample raw data.
se.coef.dr( obj, grouped = FALSE, B = 500L, parallel = FALSE, ncore = NULL, controls = mable.ctrl() )
se.coef.dr( obj, grouped = FALSE, B = 500L, parallel = FALSE, ncore = NULL, controls = mable.ctrl() )
obj |
Class 'mable_dr' object return by |
grouped |
logical: are data grouped or not. |
B |
number of bootstrap runs. |
parallel |
logical: do parallel or not. |
ncore |
number of cores used for parallel computing. Default is half of availables. |
controls |
Object of class |
Bootstrap method is used based on bootstrap samples generated from
the MABLE's of the densities f0 and f1. The bootstrap samples are fitted by
the Bernstein polynomial model and the glm()
to obtain bootstrap
versions of coefficient estimates.
the estimated standard errors
Produces a summary of a mable fit.
## S3 method for class 'mable' summary(object, ...) ## S3 method for class 'mable_reg' summary(object, ...)
## S3 method for class 'mable' summary(object, ...) ## S3 method for class 'mable_reg' summary(object, ...)
object |
Class "mable" or 'mable_reg' object return by |
... |
for future methods |
Invisibly returns its argument, object
.
## Breast Cosmesis Data aft.res<-mable.aft(cbind(left, right)~treat, data=cosmesis, M=c(1, 30), g=.41, tau=100, x0=data.frame(treat="RCT")) summary(aft.res)
## Breast Cosmesis Data aft.res<-mable.aft(cbind(left, right)~treat, data=cosmesis, M=c(1, 30), g=.41, tau=100, x0=data.frame(treat="RCT")) summary(aft.res)
Matrix of the uniform marginal constraints
umc.mat(m)
umc.mat(m)
m |
vector of d nonnegative integers |
the matrix of the uniform marginal constraints is used to form the
linear equality constraints on parameter
:
.
an |m| x K
matrix, |m|=m[1]+...+m[d])
, K=(m[1]+1)...(m[d]+1)
.
The annual flow data of Vaal River at Standerton as given by Table 1.1 of Linhart and Zucchini (1986) give the flow in millions of cubic metres.
data(Vaal.Flow)
data(Vaal.Flow)
The format is: int [1:65] 222 1094 452 1298 882 988 276 216 103 490 ...
Linhart, H., and Zucchini, W., Model Selection, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, New York: John Wiley and Sons Inc, 1986.
data(Vaal.Flow)
data(Vaal.Flow)
Maximum likelihood estimation in generalized proportional odds rate regression model with Weibull baseline based on interal censored event time data
weib.gpo( formula, data, g, scale, shape, eta = 1, eta.known = TRUE, controls = mable.ctrl(), progress = TRUE )
weib.gpo( formula, data, g, scale, shape, eta = 1, eta.known = TRUE, controls = mable.ctrl(), progress = TRUE )
formula |
regression formula. Response must be |
data |
a dataset |
g |
initial |
scale |
initial guess of the scale parameter for Weibull baseline |
shape |
initial guess of the shape parameter for Weibull baseline |
eta |
the given positive value of |
eta.known |
logical. If |
controls |
Object of class |
progress |
if |
???
a class 'mable_reg
' object, a list with components
convergence
an integer code, 0 indicates successful
completion(the iteration is convergent), 1 indicates that
the maximum iteration had been reached in the calculation;
delta
the final convergence criterion for Newton iteration;
## Simulated Weibull data require(icenReg) set.seed(111) simdata<-simIC_weib(100, model = "po", inspections = 2, inspectLength = 2.5, prob_cen=1) sp<-ic_sp(cbind(l, u) ~ x1 + x2, data = simdata, model="po") gt<--sp$coefficients res0<-maple.po(cbind(l, u) ~ x1 + x2, data = simdata, M=c(1,20), g=gt, tau=6) op<-par(mfrow=c(1,2)) plot(res0, which=c("likelihood","change-point")) par(op) res1<-mable.po(cbind(l, u) ~ x1 + x2, data = simdata, M=c(1,20), g=gt, tau=6, x0=data.frame(x1=max(simdata$x1),x2=-1)) res2<-weib.gpo(cbind(l, u) ~ x1 + x2, data = simdata, g=gt, scale=2, shape=2) op<-par(mfrow=c(2,2)) plot(res1, which=c("likelihood","change-point")) plot(res0, y=data.frame(x1=0,x2=0), which="density", add=FALSE, type="l", xlab="Time", main="Desnity Function") plot(res1, y=data.frame(x1=0,x2=0), which="density", add=TRUE, lty=2, col=4) lines(xx<-seq(0, 7, len=512), dweibull(xx, 2,2), lty=3, col=2, lwd=1.5) lines(xx, dweibull(xx, res2$shape, res2$scale), lty=5, col=5, lwd=1.5) legend("topright", bty="n", lty=1:3, col=c(1,4,2), c(expression(hat(f)[0]), expression(tilde(f)[0]), expression(f[0]))) plot(res0, y=data.frame(x1=0,x2=0), which="survival", add=FALSE, type="l", xlab="Time", main="Survival Function") plot(res1, y=data.frame(x1=0,x2=0), which="survival", add=TRUE, lty=2, col=4) lines(xx, 1-pweibull(xx, 2, 2), lty=2, col=2) lines(xx, 1-pweibull(xx, res2$shape, res2$scale), lty=5, col=5, lwd=1.5) legend("topright", bty="n", lty=1:3, col=c(1,4,2), c(expression(hat(S)[0]), expression(tilde(S)[0]), expression(S[0]))) par(op)
## Simulated Weibull data require(icenReg) set.seed(111) simdata<-simIC_weib(100, model = "po", inspections = 2, inspectLength = 2.5, prob_cen=1) sp<-ic_sp(cbind(l, u) ~ x1 + x2, data = simdata, model="po") gt<--sp$coefficients res0<-maple.po(cbind(l, u) ~ x1 + x2, data = simdata, M=c(1,20), g=gt, tau=6) op<-par(mfrow=c(1,2)) plot(res0, which=c("likelihood","change-point")) par(op) res1<-mable.po(cbind(l, u) ~ x1 + x2, data = simdata, M=c(1,20), g=gt, tau=6, x0=data.frame(x1=max(simdata$x1),x2=-1)) res2<-weib.gpo(cbind(l, u) ~ x1 + x2, data = simdata, g=gt, scale=2, shape=2) op<-par(mfrow=c(2,2)) plot(res1, which=c("likelihood","change-point")) plot(res0, y=data.frame(x1=0,x2=0), which="density", add=FALSE, type="l", xlab="Time", main="Desnity Function") plot(res1, y=data.frame(x1=0,x2=0), which="density", add=TRUE, lty=2, col=4) lines(xx<-seq(0, 7, len=512), dweibull(xx, 2,2), lty=3, col=2, lwd=1.5) lines(xx, dweibull(xx, res2$shape, res2$scale), lty=5, col=5, lwd=1.5) legend("topright", bty="n", lty=1:3, col=c(1,4,2), c(expression(hat(f)[0]), expression(tilde(f)[0]), expression(f[0]))) plot(res0, y=data.frame(x1=0,x2=0), which="survival", add=FALSE, type="l", xlab="Time", main="Survival Function") plot(res1, y=data.frame(x1=0,x2=0), which="survival", add=TRUE, lty=2, col=4) lines(xx, 1-pweibull(xx, 2, 2), lty=2, col=2) lines(xx, 1-pweibull(xx, res2$shape, res2$scale), lty=5, col=5, lwd=1.5) legend("topright", bty="n", lty=1:3, col=c(1,4,2), c(expression(hat(S)[0]), expression(tilde(S)[0]), expression(S[0]))) par(op)