Title: | Nonparametric Hazard Rate Estimation |
---|---|
Description: | Provides functions and examples for histogram, kernel (classical, variable bandwidth and transformations based), discrete and semiparametric hazard rate estimators. |
Authors: | Dimitrios Bagkavos [aut, cre] |
Maintainer: | Dimitrios Bagkavos <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1 |
Built: | 2024-11-20 06:28:46 UTC |
Source: | CRAN |
Implements the cross validation function for determining the optimal number of bins for the histogram hazard rate estimator of Patil and Bagkavos (2012). It is used as input in HazardHistogram
.
cvfunction(h, xin, xout, cens)
cvfunction(h, xin, xout, cens)
h |
Target number of bins. |
xin |
A vector of data points. Missing values not allowed. |
xout |
A vector of grid points at which the histogram will be calculated. |
cens |
A vector of 1s and zeros, 1's indicate uncensored observations, 0's correspond to censored obs. |
The least square cross validation criterion, defined in (12), Patil and Bagkavos (2012) is
Optimization of the criterion is done through a nonlinear optimization function such as nlminb
as illustrated also in the example of HazardHistogram
.
Returns the optimal number of bins.
Patil and Bagkavos (2012), Histogram for hazard rate estimation, pp. 286-301, Sankhya, B.
Implements an adaptive variable bandwidth hazard rate rule for use with the VarBandHazEst
based on the Weibull distribution, with parameters estimated by maximum likelihood
DefVarBandRule(xin, cens)
DefVarBandRule(xin, cens)
xin |
A vector of data points. Missing values not allowed. |
cens |
A vector of censoring indicators: 1's indicate uncensored observations, 0's correspond to censored obs. |
The adaptive AMISE optimal bandwidth for the variable bandwidth hazard rate estimator VarBandHazEst
is given by
where
and
the value of the adaptive bandwidth
HazardRateEst, TransHazRateEst, PlugInBand
library(survival) x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize <- 100 ti<- rweibull(SampleSize, .6, 1)#draw a random sample from the actual distribution ui<-rexp(SampleSize, .05) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero h2<-DefVarBandRule(ti, cen) #Deafult Band. Rule - Weibull Reference
library(survival) x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize <- 100 ti<- rweibull(SampleSize, .6, 1)#draw a random sample from the actual distribution ui<-rexp(SampleSize, .05) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero h2<-DefVarBandRule(ti, cen) #Deafult Band. Rule - Weibull Reference
Defines equispaced disjoint intervals based on the range of the sample and calculates empirical hazard rate estimates at each interval center
DiscretizeData(xin, xout)
DiscretizeData(xin, xout)
xin |
A vector of input values |
xout |
Grid points where the function will be evaluated |
The function defines the subinterval length where
is the sample size. Then at each bin (subinterval) center, the empirical hazard rate estimate is calculated by
where is the frequency of observations in the ith bin and
is the empirical cummulative distribution estimate.
A vector with the values of the function at the designated points xout or the random numbers drawn.
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators a.use<-DiscretizeData(ti, x) # discretize the data BinCenters<-a.use$BinCenters # get the data centers ci<-a.use$ci # get empircal hazard rate estimates Delta=a.use$Delta # Binning range
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators a.use<-DiscretizeData(ti, x) # discretize the data BinCenters<-a.use$BinCenters # get the data centers ci<-a.use$ci # get empircal hazard rate estimates Delta=a.use$Delta # Binning range
Implements the histogram hazard rate estimator of Patil and Bagkavos (2012)
HazardHistogram(xin, xout, cens, bin)
HazardHistogram(xin, xout, cens, bin)
xin |
A vector of data points. Missing values not allowed. |
xout |
A vector of grid points at which the histogram will be calculated. |
cens |
A vector of 1s and zeros, 1's indicate uncensored observations, 0's correspond to censored obs. |
bin |
Number of bins to use in construction of the histogram. |
The histogram hazard rate estimator is defined in (1), Patil and Bagkavos (2012) by
A vector with the values of the histogram estimate at each bin.
Patil and Bagkavos (2012), Histogram for hazard rate estimation, pp. 286-301, Sankhya, B.
SampleSize <-400 ti<-rweibull(SampleSize,0.5,0.8) xout<-seq(0.02, 3.5, length=80) true.hazard<-dweibull(xout,0.5, 0.8)/(1-pweibull(xout, 0.5, 0.8)) cen<-rep.int(1, SampleSize) cen[sample(1:SampleSize, SampleSize/10)]<-0 band<-nlminb(start= 2, obj=cvfunction, control = list(iter.max = 100, x.tol = .001) ,xin=ti, xout= xout, cens = cen, lower=.01, upper=max(xout)) bin<- 3.49 * sd(ti)^2 * SampleSize^(-1/3) /50 #Scott 1979 Biometrika default rule bin<-unlist(band[1]) histest<- HazardHistogram(ti,xout, cen, bin+0.013 ) plot(xout, true.hazard, type="l") lines(histest[,1], histest[,2], col=2, type="s") barplot( histest[,2], rep(bin, times=length(histest[,2]))) lines(xout, true.hazard, type="l", lwd=2, col=2)
SampleSize <-400 ti<-rweibull(SampleSize,0.5,0.8) xout<-seq(0.02, 3.5, length=80) true.hazard<-dweibull(xout,0.5, 0.8)/(1-pweibull(xout, 0.5, 0.8)) cen<-rep.int(1, SampleSize) cen[sample(1:SampleSize, SampleSize/10)]<-0 band<-nlminb(start= 2, obj=cvfunction, control = list(iter.max = 100, x.tol = .001) ,xin=ti, xout= xout, cens = cen, lower=.01, upper=max(xout)) bin<- 3.49 * sd(ti)^2 * SampleSize^(-1/3) /50 #Scott 1979 Biometrika default rule bin<-unlist(band[1]) histest<- HazardHistogram(ti,xout, cen, bin+0.013 ) plot(xout, true.hazard, type="l") lines(histest[,1], histest[,2], col=2, type="s") barplot( histest[,2], rep(bin, times=length(histest[,2]))) lines(xout, true.hazard, type="l", lwd=2, col=2)
Implements the (classical) kernel hazard rate estimator for right censored data defined in Tanner and Wong (1983).
HazardRateEst(xin, xout, kfun, h, ci)
HazardRateEst(xin, xout, kfun, h, ci)
xin |
A vector of data points. Missing values not allowed. |
xout |
A vector of grid points at which the estimate will be calculated. |
kfun |
Kernel function to use. Supported kernels: Epanechnikov, Biweight, Gaussian, Rectangular, Triangular, HigherOrder. |
h |
A scalar, the bandwidth to use in the estimate. |
ci |
A vector of censoring indicators: 1's indicate uncensored observations, 0's correspond to censored obs. |
The kernel hazard rate estimator of Tanner and Wong (1983) is given by
is determined by a bandwidth rule such as
PlugInBand
. HazardRateEst
is also used as a pilot estimate in the implementation of both the variable bandwidth estimate VarBandHazEst
and the transformed hazard rate estimate TransHazRateEst
.
A vector with the hazard rate estimates at the designated points xout.
VarBandHazEst, TransHazRateEst, PlugInBand
x<-seq(0, 5,length=100) #design points where the estimate will be calculated plot(x, HazardRate(x, "weibull", .6, 1), type="l", xlab = "x", ylab="Hazard rate") #plot true hazard rate function SampleSize <- 100 ti<- rweibull(SampleSize, .6, 1) #draw a random sample from the actual distribution ui<-rexp(SampleSize, .2) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero huse<-PlugInBand(x1, x, cen, Biweight) arg2<-HazardRateEst(x1, x, Epanechnikov, huse, cen) #Calculate the estimate lines(x, arg2, lty=2) #draw the result on the graphics device.
x<-seq(0, 5,length=100) #design points where the estimate will be calculated plot(x, HazardRate(x, "weibull", .6, 1), type="l", xlab = "x", ylab="Hazard rate") #plot true hazard rate function SampleSize <- 100 ti<- rweibull(SampleSize, .6, 1) #draw a random sample from the actual distribution ui<-rexp(SampleSize, .2) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero huse<-PlugInBand(x1, x, cen, Biweight) arg2<-HazardRateEst(x1, x, Epanechnikov, huse, cen) #Calculate the estimate lines(x, arg2, lty=2) #draw the result on the graphics device.
Calculation of the integrand of the contant term in the AMISE plugin bandwidth rule implemented in PlugInBand
.
HRSurv(x, xin, cens, h, kfun)
HRSurv(x, xin, cens, h, kfun)
xin |
A vector of data points |
x |
The point at which the estimates should be calculated. |
cens |
Censoring Indicators. |
h |
bandwidth to use. |
kfun |
The kernel function to use. |
Calculates the term
which is passed then as argument to the function NP.M.Estimate
for numerical integtaion. Currrently the fraction is estimated by
where is implemented by
HazardRateEst
using bandwidth bw.nrd{xin}
. For the Kaplan-Meier estimate
KMest
is used.
A vector with the value of the fraction.
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators HRSurv(x, x1, cen, bw.nrd(x1), Biweight)
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators HRSurv(x, x1, cen, bw.nrd(x1), Biweight)
Implements the integrated kernel hazard rate estimator for right censored data, i.e. a kernel estimate of the cummulative hazard function.
iHazardRateEst(xin, xout, ikfun, h, ci)
iHazardRateEst(xin, xout, ikfun, h, ci)
xin |
A vector of data points. Missing values not allowed. |
xout |
A vector of grid points at which the estimates will be calculated. |
ikfun |
Integrated kernel function to use |
h |
A scalar, the bandwidth to use in the estimate. |
ci |
A vector of censoring indicators: 1's indicate uncensored observations, 0's correspond to censored obs. |
The function iHazardRateEst
implements the cummulative hazard rate estimator given by
where
Note that iHazardRateEst
is used in the implementation of the transformed hazard rate estimate TransHazRateEst
.
A vector with the cummulative hazard rate estimates at the designated points xout.
VarBandHazEst, TransHazRateEst, PlugInBand
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize <- 100 ti<- rweibull(SampleSize, .6, 1) #draw a random sample from the actual distribution ui<-rexp(SampleSize, .2) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero huse<-PlugInBand(x1, x, cen, Biweight) arg2<-iHazardRateEst(x1, x, IntEpanechnikov, huse, cen) #Calculate the estimate
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize <- 100 ti<- rweibull(SampleSize, .6, 1) #draw a random sample from the actual distribution ui<-rexp(SampleSize, .2) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero huse<-PlugInBand(x1, x, cen, Biweight) arg2<-iHazardRateEst(x1, x, IntEpanechnikov, huse, cen) #Calculate the estimate
Implements various kernel functions, including boundary, integrated and discrete kernels for use in the definition of the nonparametric estimates
Biweight(x, ...) Epanechnikov(x, ...) Triangular(x, ...) Gaussian(x, ...) HigherOrder(x, ...) Rectangular(x, ...) IntBiweight(x) IntEpanechnikov(x) IntRectangular(x) IntTriangular(x) IntGaussian(x) SDBiweight(x) a0(x,h) a1(x,h) a2(x,h) BoundaryBiweight(x, h) b0(x,h) b1(x,h) b2(x,h) BoundaryEpanechnikov(x, h) Habbema(xin, x)
Biweight(x, ...) Epanechnikov(x, ...) Triangular(x, ...) Gaussian(x, ...) HigherOrder(x, ...) Rectangular(x, ...) IntBiweight(x) IntEpanechnikov(x) IntRectangular(x) IntTriangular(x) IntGaussian(x) SDBiweight(x) a0(x,h) a1(x,h) a2(x,h) BoundaryBiweight(x, h) b0(x,h) b1(x,h) b2(x,h) BoundaryEpanechnikov(x, h) Habbema(xin, x)
x |
A vector of data points where the kernel will be evaluated. |
h |
A scalar. |
xin |
Discrete data inputs especially for the Habbema discrete kernel. |
... |
Further arguments. |
Implements the Biweight, Second Derivative Biweight, Epanechnikov, Triangular, Guassian, Rectangular, the Boundary adjusted Biweight and Epanechnikov kernels. It also provides the kernel distribution functions for the Biweight, Epanechnikov, Rectangular, Triangular and Guassian kernels. Additionally it implements the discrete kernel Habbema.
The value of the kernel at
Custom implementation of the Kaplan Meier estimate. The major difference with existing implementations is that the user can specify exactly the grid points where the estimate is calculated. The implementation corresponds to of Hua, Patil and Bagkavos (2018), and is used mainly for estimation of the censoring distribution.
KMest(xin, cens, xout)
KMest(xin, cens, xout)
xin |
A vector of data points |
xout |
The point at which the estimates should be calculated. |
cens |
Censoring Indicators. |
Calculates the well known Kaplan-Meier estimate
or
or
The implementation is mainly for estimating the censoring distribution of the available sample.
A vector with the Kaplan-Meier estimate at xout.
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators arg1<- KMest(x1, cen, x) plot(x, arg1, type="l")
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators arg1<- KMest(x1, cen, x) plot(x, arg1, type="l")
Privides the various hazard rate function derivatives and related functionals with reference to the Weibull function
l1(x,p,l) l2(x,p,l) l3(x,p,l) l4(x,p,l) lw(x,p,l) lwF(x,p,l) gx(x,p,l)
l1(x,p,l) l2(x,p,l) l3(x,p,l) l4(x,p,l) lw(x,p,l) lwF(x,p,l) gx(x,p,l)
x |
A vector of points at which the hazard rate function will be estimated. |
p |
MLE estimate of the shape parameter |
l |
MLE estimate of the scale parameter |
Implements the necessary functions for calculating the squared bias term of the variable bandwidth estimate.
A vector with the values of the function at the designated points x.
Implementation of the purely nonparametric discrete hazard rate estimator lambdahat
discussed among others in Patil and Bagkavos (2012). lambdahat
is also used as the nonparametric component in the implementation of SemiparamEst
.
lambdahat(xin, cens, xout)
lambdahat(xin, cens, xout)
xin |
A vector of data points. Missing values not allowed. |
cens |
Censoring indicators as a vector of 1s and zeros, 1's indicate uncensored observations, 0's correspond to censored obs. |
xout |
The grid points where the estimates will be calculated. |
The discrete - crude - hazard rate estimator (NPMLE) in Patil and Bagkavos (2012) is given by
Returns a vector with the values of the hazard rate estimates at .
options(echo=FALSE) xin<-c(7,34,42,63,64, 74, 83, 84, 91, 108, 112,129, 133,133,139,140,140,146, 149,154,157,160,160,165,173,176,185, 218,225,241, 248,273,277,279,297, 319,405,417,420,440, 523,523,583, 594, 1101, 1116, 1146, 1226, 1349, 1412, 1417) cens<-c(1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1, 0,1,0,1,1,1,1,1,0,1,1,1,0,1) xin<-xin/30.438 #Adjust the data storage.mode(xin)<-"integer" # turn the data to integers xout<-seq(1,47, by=1) # define the grid points to evaluate the estimate arg<-TutzPritscher(xin,cens,xout) #Discrete kernel estimate plot(xout, arg, type="l", ylim=c(0, .35), lty=2, col=6) # plot the estimate argSM<-lambdahat(xin, cens, xout) #crude nonparametric estimate lines(xout, argSM, lty=3, col=5) # plot the crude estimate
options(echo=FALSE) xin<-c(7,34,42,63,64, 74, 83, 84, 91, 108, 112,129, 133,133,139,140,140,146, 149,154,157,160,160,165,173,176,185, 218,225,241, 248,273,277,279,297, 319,405,417,420,440, 523,523,583, 594, 1101, 1116, 1146, 1226, 1349, 1412, 1417) cens<-c(1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1, 0,1,0,1,1,1,1,1,0,1,1,1,0,1) xin<-xin/30.438 #Adjust the data storage.mode(xin)<-"integer" # turn the data to integers xout<-seq(1,47, by=1) # define the grid points to evaluate the estimate arg<-TutzPritscher(xin,cens,xout) #Discrete kernel estimate plot(xout, arg, type="l", ylim=c(0, .35), lty=2, col=6) # plot the estimate argSM<-lambdahat(xin, cens, xout) #crude nonparametric estimate lines(xout, argSM, lty=3, col=5) # plot the crude estimate
Provides the asymptotic MISE optimal plug-in bandwidth for the local linear hazard rate estimator LocLinEst
, defined in (4), Bagkavos (2011). This is the binned data version of the PlugInBand
AMISE optimal bandwidth rule.
LLHRPlugInBand(BinCenters, h, kfun, Delta, xin, xout, IntKfun, ci, cens)
LLHRPlugInBand(BinCenters, h, kfun, Delta, xin, xout, IntKfun, ci, cens)
BinCenters |
A vector of data points, the centers of the bins resulting from the discretization of the data. |
h |
Bandwidth for the estimate of the distribution function. |
kfun |
A kernel function. |
Delta |
A scalar. The length of the bins. |
xin |
A vector of data points |
xout |
The point at which the estimates should be calculated. |
IntKfun |
The integrated kernel function. |
ci |
Crude hazard rate estimates. |
cens |
Censoring Indicators. |
The bandwidth selector requires binned data, i.e. data in the form where
are the bin centers and
are empirircal hazard rate estimates at each
. This is achieved via the
DiscretizeData
function. As it can be seen from (4) in Bagkavos (2011), the bandwidth selector also requires an estimate of the functional
which is readily implemented in PlugInBand
. It also requires an estimate of the constant
For this reason additionally the plug in bandwidth rule is also used, as it is implemented in the bw.nrd
distribution function default bandwidth rule of Swanepoel and Van Graan (2005). The constants and
are deterministic and specific to the kernel used in the implementation hence can be claculated precisely.
A scalar with the value of the suggested bandwidth.
Bagkavos (2011), Annals of the Institute of Statistical Mathematics, 63(5), 1019-1046.
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators a.use<-DiscretizeData(ti, x) # discretize the data BinCenters<-a.use$BinCenters # get the data centers ci<-a.use$ci # get empircal hazard rate estimates Delta=a.use$Delta # Binning range h2<-bw.nrd(ti) # Bandwidth to use in constant est. of the plug in rule h.use<-h2 # the first element is the band to use huse1<- LLHRPlugInBand(BinCenters,h.use,Epanechnikov,Delta,ti,x,IntEpanechnikov,ci,cen) huse1
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators a.use<-DiscretizeData(ti, x) # discretize the data BinCenters<-a.use$BinCenters # get the data centers ci<-a.use$ci # get empircal hazard rate estimates Delta=a.use$Delta # Binning range h2<-bw.nrd(ti) # Bandwidth to use in constant est. of the plug in rule h.use<-h2 # the first element is the band to use huse1<- LLHRPlugInBand(BinCenters,h.use,Epanechnikov,Delta,ti,x,IntEpanechnikov,ci,cen) huse1
Implements the local linear kernel hazard rate estimate of Bagkavos and Patil (2008) and Bagkavos (2011). The estimate assumes binned data (fixed design), of the form where
are the bin centers and
are empirircal hazard rate estimates at each
. These are calculated via the
DiscretizeData
function. The estimate then smooths the empircal hazard rate estimates and achieves automatic boundary adjustments through approrpiately defined kernel weights. The user is able to supply their own bandwidth values through the argument.
Currently only the LLHRPlugInBand
bandwidth selector is provided which itself it depends on the bw.nrd
distribution function default bandwidth rule of Swanepoel and Van Graan (2005) for the constant estimate.
TO DO: In future implementations the EBBS (empirical bias bandwidth) and AIC based bandwidth methods (see Bagkavos (2011)) will be added to the package
LocLinEst(BinCenters, xout, h, kfun, ci)
LocLinEst(BinCenters, xout, h, kfun, ci)
BinCenters |
A vector with the bin centers of the discretized data. |
xout |
A vector of points at which the hazard rate function will be estimated. |
h |
A scalar, the bandwidth to use in the estimate. |
kfun |
Kernel function to use. Supported kernels: Epanechnikov, Biweight, Gaussian, Rectangular, Triangular |
ci |
Empirical hazard rate estimates. |
The estimate in both Bagkavos and Patil (2008) and Bagkavos (2011) is given by
The difference between the censored and the uncensored cased is only on the calculation of the empirical hazard rate estimates.
A vector with the values of the function at the designated points xout.
x<-seq(0.05, 5,length=80) #grid points to calculate the estimates plot(x, HazardRate(x,"weibull", .6, 1),type="l", xlab = "x",ylab="Hazard rate") SampleSize = 100 #select sample size ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators a.use<-DiscretizeData(ti, x) # discretize the data BinCenters<-a.use$BinCenters # get the data centers ci<-a.use$ci # get empircal hazard rate estimates Delta=a.use$Delta # Binning range h2<-bw.nrd(ti) # Bandwidth to use in constant est. of the plug in rule h.use<-h2 # the first element is the band to use # Calcaculate the plug-in bandwidth: huse1<- LLHRPlugInBand(BinCenters,h.use,Epanechnikov,Delta,ti,x,IntEpanechnikov,ci, cen) arg2<-HazardRateEst(x1,x,Epanechnikov, huse1, cen) # Tanner-Wong Estimate lines(x, arg2, lty=2) # draw the Tanner-Wong estimate # Draw TW estimate arg5<-HazardRateEst(x1,x,BoundaryBiweight,huse1,cen) # Boundary adjusted TW est lines(x, arg5, lty=2, col=4) # draw the variable bandwidth # Draw the estimate arg6<-LocLinEst(BinCenters ,x, huse1, Epanechnikov, ci) # Local linear est. lines(x, arg6, lty=5, col=5) # Draw the estimate legend("topright", c("Tanner-Wong", "TW - Boundary Corrected", "Local Linear"), lty=c(2,2, 5), col=c(1,4, 5)) # add legend
x<-seq(0.05, 5,length=80) #grid points to calculate the estimates plot(x, HazardRate(x,"weibull", .6, 1),type="l", xlab = "x",ylab="Hazard rate") SampleSize = 100 #select sample size ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators a.use<-DiscretizeData(ti, x) # discretize the data BinCenters<-a.use$BinCenters # get the data centers ci<-a.use$ci # get empircal hazard rate estimates Delta=a.use$Delta # Binning range h2<-bw.nrd(ti) # Bandwidth to use in constant est. of the plug in rule h.use<-h2 # the first element is the band to use # Calcaculate the plug-in bandwidth: huse1<- LLHRPlugInBand(BinCenters,h.use,Epanechnikov,Delta,ti,x,IntEpanechnikov,ci, cen) arg2<-HazardRateEst(x1,x,Epanechnikov, huse1, cen) # Tanner-Wong Estimate lines(x, arg2, lty=2) # draw the Tanner-Wong estimate # Draw TW estimate arg5<-HazardRateEst(x1,x,BoundaryBiweight,huse1,cen) # Boundary adjusted TW est lines(x, arg5, lty=2, col=4) # draw the variable bandwidth # Draw the estimate arg6<-LocLinEst(BinCenters ,x, huse1, Epanechnikov, ci) # Local linear est. lines(x, arg6, lty=5, col=5) # Draw the estimate legend("topright", c("Tanner-Wong", "TW - Boundary Corrected", "Local Linear"), lty=c(2,2, 5), col=c(1,4, 5)) # add legend
Calculation of the contant term in the AMISE plugin bandwidth rule PlugInBand
.
NP.M.Estimate(xin, cens, xout)
NP.M.Estimate(xin, cens, xout)
xin |
A vector of data points |
xout |
The point at which the estimates should be calculated. |
cens |
Censoring Indicators. |
Approximates the term
which is needed in the optimal AMISE bandwidth expression of PlugInBand
. The integrand
is calculated by HRSurv
and integration is performed via the extended Simpson's numerical integration rule (SimpsonInt
).
A scalar with the value of the constant.
Auxiliary functions for discrete semiparametric and kernel smooth hazard rate estimation
nsf(xin, cens, xout) Tm(tk, xout, distribution, par1, par2) CparamCalculation(gamparam, VehHazard) power.matrix(M, n) base(m, b) SmoothedEstimate(NonParEst, VehHazard, gammapar, SCproduct, Cpar)
nsf(xin, cens, xout) Tm(tk, xout, distribution, par1, par2) CparamCalculation(gamparam, VehHazard) power.matrix(M, n) base(m, b) SmoothedEstimate(NonParEst, VehHazard, gammapar, SCproduct, Cpar)
xin |
A vector of data points. Missing values not allowed. |
cens |
A vector of 1s and zeros, 1's indicate uncensored observations, 0's correspond to censored obs. |
xout |
The points where the estimate should be calculated. |
tk |
desing points for the NPMLE estimate. |
distribution |
which distribution to use? |
par1 |
distribution parameter 1 |
par2 |
distribution parameter 2 |
gamparam |
gamma parameter |
M |
a matrix to be raised to a power |
n |
the power the matrix will be raised at |
m |
express m as a power of b |
b |
express m as a power of b |
NonParEst |
The crude nonparametric hazard rate estimate. |
VehHazard |
Vehicle hazard rate |
gammapar |
gamma parameter |
SCproduct |
SC product, the result of DetermineSCprod |
Cpar |
C parameter, the result of CparamCalculation. |
Auxiliary functions for discrete hazard rate estimators. The function nsf
is used for the kernel smooth estimate TutzPritscher
.
Tm used to calculate in the implementation of the semiparametric estimate
CparamCalculationreturns the C smoothing parameter calculated as
DetermineSCprodthis finds n = number of obs, gammapar = sum of vehicle haz at xout (computed elsewhere)
A vector with the values of the hazard rate estimates.
Provides the asymptotic MISE optimal plug-in bandwidth for the hazard rate estimator HazardRateEst
, see Hua, Patil and Bagkavos (2018). The bandwidth is also suitable for use as a pilot bandwidth in TransHazRateEst
and VarBandHazEst
.
PlugInBand(xin, xout, cens, kfun )
PlugInBand(xin, xout, cens, kfun )
xin |
A vector of data points |
xout |
The point at which the estimates should be calculated. |
cens |
Censoring Indicators. |
kfun |
A kernel function. |
The asymptotic MISE optimal plug-in bandwidth selector for HazardRateEst
is defined by
see (9) in Hua, Patil and Bagkavos (2018). The estimate of to be used in
is
Also,
is estimated by applying the extended Simpson's numerical integration rule, SimpsonInt
, on
where is estimated by
KMest
. The estimation is implemented in the NP.M.Estimate
function.
Currently is estimated by
bw.nrd
. However according to (11) in Hua, Patil and Bagkavos (2018)., in future versions this package will support
where
and is already estimated by
NP.M.Estimate
as expalined above (it will be much more stable than using a Weibull reference model).
A scalar with the value of the suggested bandwidth.
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators huse1<- PlugInBand(x1, x, cen, Biweight) huse1
x<-seq(0, 5,length=100) #design points where the estimate will be calculated SampleSize<-100 #amount of data to be generated ti<- rweibull(SampleSize, .6, 1) # draw a random sample ui<-rexp(SampleSize, .2) # censoring sample cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) # observed data cen<-rep.int(1, SampleSize) # initialize censoring indicators cen[which(ti>ui)]<-0 # 0's correspond to censored indicators huse1<- PlugInBand(x1, x, cen, Biweight) huse1
Auxiliary functions that help automate the process of random number generation or pdf, survival function or hazard rate functions
RdistSwitch(dist, SampleSize, par1, par2) PdfSwitch(xout, dist, par1, par2) CdfSwitch(xout, dist, par1, par2) HazardRate(xout, dist, par1, par2)
RdistSwitch(dist, SampleSize, par1, par2) PdfSwitch(xout, dist, par1, par2) CdfSwitch(xout, dist, par1, par2) HazardRate(xout, dist, par1, par2)
dist |
A string. Corresponds to one of weibull, lognorm, chisquare, exponential, binomial, geometric, poisson, negativebinomial, uniform |
SampleSize |
The size of the random sample to be drawn |
xout |
Grid points where the function will be evaluated |
par1 |
parameter 1 of the distirbution |
par2 |
parameter 2 of the distirbution |
Implements random number generation and density, survival and hazard rate estimates for several distributions. These functions are mainly used when simulating the mean square error etc from known distributions.
A vector with the values of the function at the designated points xout or the random numbers drawn.
Implements the kernel estimate of the second derivative of the hazard rate for right censored data defined - based on the estimate of Tanner and Wong (1983). The implementation is based on the second derivative of the Biweight Kernel.
SDHazardRateEst(xin, xout, h, ci)
SDHazardRateEst(xin, xout, h, ci)
xin |
A vector of data points. Missing values not allowed. |
xout |
A vector of grid points at which the estimates will be calculated. |
h |
A scalar, the bandwidth to use in the estimate. |
ci |
A vector of censoring indicators: 1's indicate uncensored observations, 0's correspond to censored obs. |
The function SDHazardRateEst
implements the kernel estimate of the second derivative of the hazard rate estimator, given by
where is taken to be the
Biweight
kernel. The function is used for estimation of the functional in
PlugInBand
so a default bandwidth rule is used for provided in (16), Hua, Patil and Bagkavos (2018).
A vector with the second derivative of the hazard rate at the designated points xout.
Implements the semiparametric hazard rate estimator for discrete data developed in Patil and Bagkavos (2012). The estimate is obtained by semiparametric smoothing of the (nonsmooth) nonparametric maximum likelihood estimator, which is achieved by repeated multiplication of a Markov chain transition-type matrix. This matrix is constructed with basis a parametric discrete hazard rate model (vehicle model).
SemiparamEst(xin, cens, xout, Xdistr, Udistr, vehicledistr, Xpar1=1, Xpar2=0.5, Upar1=1, Upar2=0.5, vdparam1=1, vdparam2=0.5)
SemiparamEst(xin, cens, xout, Xdistr, Udistr, vehicledistr, Xpar1=1, Xpar2=0.5, Upar1=1, Upar2=0.5, vdparam1=1, vdparam2=0.5)
xin |
A vector of data points. Missing values not allowed. |
cens |
Censoring indicators as a vector of 1s and zeros, 1's indicate uncensored observations, 0's correspond to censored obs. |
xout |
Design points where the estimate will be calculated. |
Xdistr |
The distribution where the data are coming from, currently ignored |
Udistr |
Censoring distribution, currently ignored |
vehicledistr |
String specifying the vehicle hazard rate (the assumed parametric model) |
Xpar1 |
Parameter 1 for the X distr, currently ignored |
Xpar2 |
Parameter 2 for the X distr, currently ignored |
Upar1 |
Parameter 1 for the Cens. distr., currently ignored |
Upar2 |
Parameter 2 for the Cens. distr., currently ignored |
vdparam1 |
Parameter 1 for the vehicle hazard rate. |
vdparam2 |
Parameter 2 for the vehicle hazard rate. |
The semiparmaetric estimator implemented is defined in (1) in Patil and Bagkavos (2012) by
where determines the number of repetions and hence the amount of smoothing applied to the estimate. For
the semiparametric estimate equals the nonparmaetric estimate
lambdahat
. On the other hand, if the true unknown underlying probability model is known (up to an unknown constant or constants) then, the greater the , the closer the semiparmaetric estimate to the vehicle hazard rate model.
TO DO: The extension to hazard rate estimation with covariates will be added in a future release.
TO DO: Also, the data driven estimation of the parameter will be also added in a future release; this will inlcude the
product and
and
parameter calculations.
A vector with the values of the discrete hazard rate estimate, calculated at .
options(echo=FALSE) xin<-c(7,34,42,63,64, 74, 83, 84, 91, 108, 112,129, 133,133,139,140,140,146, 149,154,157,160,160,165,173,176,185, 218,225,241, 248,273,277,279,297, 319,405,417,420,440, 523,523,583, 594, 1101, 1116, 1146, 1226, 1349, 1412, 1417) #head and neck data set cens<-c(1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,0,1,1,1,1,1,1,0,1,0,1,1,1,1,1,0,1,1,1,0,1) #censoring indicators xin<-xin/30.438 #mean adjust the data storage.mode(xin)<-"integer" # turn the data to integers xout<-seq(1,47, by=1) #design points where to calculate the estimate arg<-TutzPritscher(xin,cens,xout) #Kernel smooth estimate plot(xout, arg, type="l", ylim=c(0, .35), lty=2, col=6) argSM<-SemiparamEst(xin, cens, xout, "geometric", "uniform", "geometric", 0.2, .6, 0, 90, .25, .9) #semipar. est. lines(xout, argSM[,2], lty=3, col=5) #add tilde lambda to the plot
options(echo=FALSE) xin<-c(7,34,42,63,64, 74, 83, 84, 91, 108, 112,129, 133,133,139,140,140,146, 149,154,157,160,160,165,173,176,185, 218,225,241, 248,273,277,279,297, 319,405,417,420,440, 523,523,583, 594, 1101, 1116, 1146, 1226, 1349, 1412, 1417) #head and neck data set cens<-c(1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,0,1,1,1,1,1,1,0,1,0,1,1,1,1,1,0,1,1,1,0,1) #censoring indicators xin<-xin/30.438 #mean adjust the data storage.mode(xin)<-"integer" # turn the data to integers xout<-seq(1,47, by=1) #design points where to calculate the estimate arg<-TutzPritscher(xin,cens,xout) #Kernel smooth estimate plot(xout, arg, type="l", ylim=c(0, .35), lty=2, col=6) argSM<-SemiparamEst(xin, cens, xout, "geometric", "uniform", "geometric", 0.2, .6, 0, 90, .25, .9) #semipar. est. lines(xout, argSM[,2], lty=3, col=5) #add tilde lambda to the plot
Implements Simpson's extended numerical integration rule
SimpsonInt(xin, h)
SimpsonInt(xin, h)
xin |
A vector of data points |
h |
grid length |
The extended numerical integration rule is given by
returns the approximate integral value
Weisstein, Eric W. "Simpson's Rule." From MathWorld–A Wolfram Web Resource
Implements the local kernel weights which are used in the implementation of LocLinEst
and the second derivative estimate used in PlugInBand
.
sn.0(xin, xout, h, kfun) sn.1(xin, xout, h, kfun) sn.2(xin, xout, h, kfun) sn.3(xin, xout, h, kfun) sn.4(xin, xout, h, kfun) sn.5(xin, xout, h, kfun) sn.6(xin, xout, h, kfun) tn.0(xin, xout, h, kfun, Y) tn.1(xin, xout, h, kfun, Y) tn.2(xin, xout, h, kfun, Y) tn.3(xin, xout, h, kfun, Y)
sn.0(xin, xout, h, kfun) sn.1(xin, xout, h, kfun) sn.2(xin, xout, h, kfun) sn.3(xin, xout, h, kfun) sn.4(xin, xout, h, kfun) sn.5(xin, xout, h, kfun) sn.6(xin, xout, h, kfun) tn.0(xin, xout, h, kfun, Y) tn.1(xin, xout, h, kfun, Y) tn.2(xin, xout, h, kfun, Y) tn.3(xin, xout, h, kfun, Y)
xin |
A vector of data points, typicaly these are the bin centers. Missing values not allowed. |
xout |
A vector of data points where the estimate will be evaluated. |
h |
A scalar. The bandwidth to use. |
kfun |
The kernel function to use. |
Y |
Empirical hazard rate estimates. |
The functions calculate the quantities
and
These qunatities are used to adjust the hazard rate estimate and its second derivative in the boundary.
The weight of the functional at
Implements the transformated kernel hazard rate estimator of Bagkavos (2008). The estimate is expected to have less bias compared to the ordinary kernel estimate HazardRateEst
. The estimate results by first transforming the data to a sample from the exponential distribution through the integrated hazard rate function, estimated by iHazardRateEst
and uses the result as input to the classical kernel hazard rate estimate HazardRateEst
. An inverse transform turn the estimate to a hazard rate estimate of the original sample. See section "Details" below.
TransHazRateEst(xin, xout, kfun, ikfun, h1, h2, ci)
TransHazRateEst(xin, xout, kfun, ikfun, h1, h2, ci)
xin |
A vector of data points. Missing values not allowed. |
xout |
A vector of points at which the hazard rate function will be estimated. |
kfun |
Kernel function to use. Supported kernels: Epanechnikov, Biweight, Gaussian, Rectangular, Triangular, HigherOrder. |
ikfun |
An integrated kernel function to use. Supported kernels: Epanechnikov, Biweight, Gaussian, Rectangular, Triangular, HigherOrder. |
h1 |
A scalar, pilot bandwidth. |
h2 |
A scalar, transformed kernel bandwidth. |
ci |
A vector of censoring indicators: 1's indicate uncensored observations, 0's correspond to censored obs. |
The transformed kernel hazard rate estimate of Bagkavos (2008) is given by
The estimate uses the classical kernel hazard rate estimate implemented in
HazardRateEst
and its integrated version
where
implemented in
iHazardRateEst
. The pilot bandwidth is determined by an optimal bandwidth rule such as
PlugInBand
.
TO DO: Insert a rule for the adaptive bandwidth .
A vector with the values of the function at the designated points xout.
Bagkavos (2008), Transformations in hazard rate estimation, J. Nonparam. Statist., 20, 721-738
VarBandHazEst, HazardRateEst, PlugInBand
x<-seq(0, 5,length=100) #design points where the estimate will be calculated plot(x, HazardRate(x, "weibull", .6, 1), type="l", xlab = "x", ylab="Hazard rate") #plot true hazard rate function SampleSize <- 100 mat<-matrix(nrow=SampleSize, ncol=20) for(i in 1:20) { #Calculate the average of 20 estimates and draw on the screen ti<- rweibull(SampleSize, .6, 1) #draw a random sample from the actual distribution ui<-rexp(SampleSize, .05) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero h2<-DefVarBandRule(ti, cen) #Deafult Band. Rule - Weibull Reference huse1<- PlugInBand(x1, x, cen, Biweight) # mat[,i]<-TransHazRateEst(x1,x,Epanechnikov,IntEpanechnikov,huse1,h2,cen) } lines(x, rowMeans(mat) , lty=2) #draw the average transformed estimate
x<-seq(0, 5,length=100) #design points where the estimate will be calculated plot(x, HazardRate(x, "weibull", .6, 1), type="l", xlab = "x", ylab="Hazard rate") #plot true hazard rate function SampleSize <- 100 mat<-matrix(nrow=SampleSize, ncol=20) for(i in 1:20) { #Calculate the average of 20 estimates and draw on the screen ti<- rweibull(SampleSize, .6, 1) #draw a random sample from the actual distribution ui<-rexp(SampleSize, .05) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero h2<-DefVarBandRule(ti, cen) #Deafult Band. Rule - Weibull Reference huse1<- PlugInBand(x1, x, cen, Biweight) # mat[,i]<-TransHazRateEst(x1,x,Epanechnikov,IntEpanechnikov,huse1,h2,cen) } lines(x, rowMeans(mat) , lty=2) #draw the average transformed estimate
Implementation of the kernel discrete hazard rate estimator of Tutz and Pritscher (1996) based on the discrete Habbema
kernel. The estimate is used for comparison with the semiparametric estimate deveoped in Tutz and Pritscher (1996).
TutzPritscher(xin, cens, xout)
TutzPritscher(xin, cens, xout)
xin |
A vector of data points. Missing values not allowed. |
cens |
Censoring indicators as a vector of 1s and zeros, 1's indicate uncensored observations, 0's correspond to censored obs. |
xout |
The grid points where the estimates will be calculated. |
The discrete kernel estimate of Tutz and Pritscher (1996) is defined by
where is the discrete Habbema kernel.
Returns a vector with the values of the hazard rate estimates at .
options(echo=FALSE) xin<-c(7,34,42,63,64, 74, 83, 84, 91, 108, 112,129, 133,133,139,140,140,146, 149,154,157,160,160,165,173,176,185, 218,225,241, 248,273,277,279,297, 319,405,417,420,440, 523,523,583, 594, 1101, 1116, 1146, 1226, 1349, 1412, 1417) cens<-c(1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1, 0,1,0,1,1,1,1,1,0,1,1,1,0,1) xin<-xin/30.438 #Adjust the data storage.mode(xin)<-"integer" # turn the data to integers xout<-seq(1,47, by=1) # define the grid points to evaluate the estimate arg<-TutzPritscher(xin,cens,xout) #Discrete kernel estimate plot(xout, arg, type="l", ylim=c(0, .35), lty=2, col=6) # plot the estimate argSM<-lambdahat(xin, cens, xout) #crude nonparametric estimate lines(xout, argSM, lty=3, col=5) # plot the crude estimate
options(echo=FALSE) xin<-c(7,34,42,63,64, 74, 83, 84, 91, 108, 112,129, 133,133,139,140,140,146, 149,154,157,160,160,165,173,176,185, 218,225,241, 248,273,277,279,297, 319,405,417,420,440, 523,523,583, 594, 1101, 1116, 1146, 1226, 1349, 1412, 1417) cens<-c(1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1, 0,1,0,1,1,1,1,1,0,1,1,1,0,1) xin<-xin/30.438 #Adjust the data storage.mode(xin)<-"integer" # turn the data to integers xout<-seq(1,47, by=1) # define the grid points to evaluate the estimate arg<-TutzPritscher(xin,cens,xout) #Discrete kernel estimate plot(xout, arg, type="l", ylim=c(0, .35), lty=2, col=6) # plot the estimate argSM<-lambdahat(xin, cens, xout) #crude nonparametric estimate lines(xout, argSM, lty=3, col=5) # plot the crude estimate
Implements the adaptive variable bandwidth hazard rate estimator of Bagkavos and Patil (2009). The estimate itself is an extension of the classical kernel hazard rate estimator of Tanner and Wong (1983) implemented in HazardRateEst
. The difference is that instead of , the variable bandwidth estimate uses bandwidth
. This particular choice cancels the second order term in the bias expansion of the hazard rate estimate and thus it is expected to result in a more precise estimation compared to
HazardRateEst
.
VarBandHazEst(xin, xout, kfun, h1, h2, ci)
VarBandHazEst(xin, xout, kfun, h1, h2, ci)
xin |
A vector of data points. Missing values not allowed. |
xout |
A vector of points at which the hazard rate function will be estimated. |
kfun |
Kernel function to use. Supported kernels: Epanechnikov, Biweight, Gaussian, Rectangular, Triangular, HigherOrder |
h1 |
A scalar, pilot bandwidth. |
h2 |
A scalar, variable kernel (adaptive) bandwidth. |
ci |
A vector of censoring indicators: 1's indicate uncensored observations, 0's correspond to censored obs. |
Implements the adaptive variable bandwidth hazard rate estimator of Bagkavos and Patil (2009), Comm. Statist. Theory and Methods.
The pilot bandwidth is determined by an optimal bandwidth rule such as
PlugInBand
. and used as input to the pilot kernel estimate, implemented by HazardRateEst
.
TO DO: Insert a rule for the adaptive bandwidth .
A vector with the values of the function at the designated points xout.
HazardRateEst, TransHazRateEst, PlugInBand
x<-seq(0, 5,length=100) #design points where the estimate will be calculated plot(x, HazardRate(x, "weibull", .6, 1), type="l", xlab = "x", ylab="Hazard rate") #plot true hazard rate function SampleSize <- 100 mat<-matrix(nrow=SampleSize, ncol=20) for(i in 1:20) { ti<- rweibull(SampleSize, .6, 1)#draw a random sample from the actual distribution ui<-rexp(SampleSize, .05) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero h2<-DefVarBandRule(ti, cen) #Deafult Band. Rule - Weibull Reference huse1<- PlugInBand(x1, x, cen, Biweight) mat[,i]<- VarBandHazEst(x1, x, Epanechnikov, huse1,h2, cen) #Var. bandwidth est. } lines(x, rowMeans(mat) , lty=2) #draw the average vb estimate
x<-seq(0, 5,length=100) #design points where the estimate will be calculated plot(x, HazardRate(x, "weibull", .6, 1), type="l", xlab = "x", ylab="Hazard rate") #plot true hazard rate function SampleSize <- 100 mat<-matrix(nrow=SampleSize, ncol=20) for(i in 1:20) { ti<- rweibull(SampleSize, .6, 1)#draw a random sample from the actual distribution ui<-rexp(SampleSize, .05) #draw a random sample from the censoring distribution cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n") x1<-pmin(ti,ui) #this is the observed sample cen<-rep.int(1, SampleSize) #censoring indicators cen[which(ti>ui)]<-0 #censored values correspond to zero h2<-DefVarBandRule(ti, cen) #Deafult Band. Rule - Weibull Reference huse1<- PlugInBand(x1, x, cen, Biweight) mat[,i]<- VarBandHazEst(x1, x, Epanechnikov, huse1,h2, cen) #Var. bandwidth est. } lines(x, rowMeans(mat) , lty=2) #draw the average vb estimate