The purpose of this vignette is to illustrate the use of
{icpack
}, using the right-censored Ova
data,
and to contrast the smooth modeling using {icpack
} with the
more standard non- and semi-parametric approaches of the
survival
package.
The ovarian cancer data are available after loading the package.
There are three covariates (see the help file). Here is a summary of their frequencies and distribution.
##
## 0 1 2 3 4
## 29 67 49 68 145
##
## 0 1
## 262 96
##
## 6 7 8 9 10
## 20 46 47 108 137
Diameter
is the diameter of the tumor, FIGO
is the FIGO classification, where in fact FIGO=0
corresponds to a FIGO III and FIGO=1
to a FIGO IV
classification, and Karnofsky
is the often used Karnofsky
score divided by 10, so for instance Karnofsky=9
corresponds to a Karnofksy score of 90.
The time-to-event data are right-censored. Both Diameter
and Karnofsky
are ordinal, so we could use them both as
categorical and as continuous variables. Let’s first use them both as
categorical, and see whether it makes sense to use them as continuous
variables by inspecting their (log) hazard ratios.
## Object of class 'icfit'
## Call:
## icfit(formula = Surv(time, d) ~ factor(Diameter) + FIGO + factor(Karnofsky),
## data = Ova)
##
## Bins summary: tmin = 0, tmax = 2756, number of bins = 100, bin width = 27.56
## P-splines summary: number of segments = 20, degree = 3, penalty order = 2
##
## Parameter estimates:
## coef SE HR lower upper pvalue
## factor(Diameter)1 0.3446 0.3204 1.4114 0.7532 2.645 0.28217
## factor(Diameter)2 0.8383 0.3237 2.3123 1.2261 4.361 0.00961 **
## factor(Diameter)3 0.9130 0.3135 2.4919 1.3479 4.607 0.00359 **
## factor(Diameter)4 0.9389 0.3011 2.5573 1.4173 4.614 0.00182 **
## FIGO 0.5397 0.1368 1.7156 1.3121 2.243 7.95e-05 ***
## factor(Karnofsky)7 -0.3239 0.2814 0.7233 0.4167 1.256 0.24973
## factor(Karnofsky)8 -0.7132 0.2909 0.4901 0.2771 0.867 0.01422 *
## factor(Karnofsky)9 -0.8391 0.2694 0.4321 0.2549 0.733 0.00184 **
## factor(Karnofsky)10 -0.8242 0.2646 0.4386 0.2611 0.737 0.00184 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Effective baseline dimension: 2.779, log-likelihood: -1251, AIC: 2507
## Smoothness parameter lambda: 236.6
## Number of iterations: 24
## n = 358, number of events = 266
It looks like both for Diameter
and
Karnofsky
the increments of the log hazard ratios are
reasonably equal for neighboring factor levels, which could argue for
using both as continuous covariates in the future. Before we do that,
however, let’s compare the result with that from coxph
in
the survival
package.
## Call:
## coxph(formula = Surv(time, d) ~ factor(Diameter) + FIGO + factor(Karnofsky),
## data = Ova)
##
## coef exp(coef) se(coef) z p
## factor(Diameter)1 0.3452 1.4123 0.3205 1.077 0.28147
## factor(Diameter)2 0.8494 2.3383 0.3237 2.624 0.00870
## factor(Diameter)3 0.9204 2.5103 0.3136 2.935 0.00333
## factor(Diameter)4 0.9381 2.5551 0.3014 3.113 0.00185
## FIGO 0.5471 1.7282 0.1367 4.002 6.28e-05
## factor(Karnofsky)7 -0.3662 0.6934 0.2821 -1.298 0.19424
## factor(Karnofsky)8 -0.7477 0.4735 0.2914 -2.566 0.01029
## factor(Karnofsky)9 -0.8729 0.4177 0.2700 -3.233 0.00122
## factor(Karnofsky)10 -0.8643 0.4213 0.2653 -3.258 0.00112
##
## Likelihood ratio test=70.54 on 9 df, p=1.193e-11
## n= 358, number of events= 266
The results, in terms of the hazard ratios and their standard errors, are very similar indeed.
Let’s refit the smooth and semi-parametric model, now using
Diameter
and Karnofsky
as continuous
variables.
ic_fit <- icfit(Surv(time, d) ~ Diameter + FIGO + Karnofsky, data = Ova)
cox_fit <- coxph(Surv(time, d) ~ Diameter + FIGO + Karnofsky, data = Ova)
ic_fit
## Object of class 'icfit'
## Call:
## icfit(formula = Surv(time, d) ~ Diameter + FIGO + Karnofsky,
## data = Ova)
##
## Bins summary: tmin = 0, tmax = 2756, number of bins = 100, bin width = 27.56
## P-splines summary: number of segments = 20, degree = 3, penalty order = 2
##
## Parameter estimates:
## coef SE HR lower upper pvalue
## Diameter 0.20057 0.04947 1.22209 1.10917 1.347 5.03e-05 ***
## FIGO 0.53863 0.13549 1.71366 1.31400 2.235 7.03e-05 ***
## Karnofsky -0.16843 0.05310 0.84499 0.76148 0.938 0.00151 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Effective baseline dimension: 2.739, log-likelihood: -1254, AIC: 2513
## Smoothness parameter lambda: 250.2
## Number of iterations: 25
## n = 358, number of events = 266
## Call:
## coxph(formula = Surv(time, d) ~ Diameter + FIGO + Karnofsky,
## data = Ova)
##
## coef exp(coef) se(coef) z p
## Diameter 0.20018 1.22162 0.04947 4.047 5.2e-05
## FIGO 0.54183 1.71915 0.13529 4.005 6.2e-05
## Karnofsky -0.17173 0.84221 0.05319 -3.228 0.00125
##
## Likelihood ratio test=64.15 on 3 df, p=7.64e-14
## n= 358, number of events= 266
Without claiming in any way that using Diameter
and
Karnofsky
as continuous variables is better, we continue
with these last two models, mainly for simplicity.
The main difference between the smooth and semi-parametric fits of course resides in the estimates of the baseline hazard. We illustrate this by extracting the baseline hazards and plotting the corresponding baseline survival curves.
The icpack
package uses ggplot2
for
plotting. The default plot is the (baseline) hazard; by setting
type="survival"
we get the (baseline) survival function.
Default option is to add shaded pointwise confidence intervals; they
have been switched off here.
Here is the corresponding baseline survival function from
coxph
, with the smooth baseline survival curve from
icfit
added to it in blue.
bh <- basehaz(cox_fit, centered=FALSE)
bh$surv <- exp(-bh$hazard)
plot(c(0, bh$time), c(1, bh$surv), type="s",
xlab="Time", ylab="Survival", main="Baseline survival functions from coxph and icfit")
nd_baseline <- data.frame(Diameter = 0, FIGO = 0, Karnofsky = 0)
pred_baseline <- predict(ic_fit, newdata = nd_baseline)[[1]]
lines(pred_baseline$time, pred_baseline$Surv, col = "blue")
legend("topright", c("coxph", "icfit"), lwd=1, col=c("black", "blue"), bty="n")
The plot does not show pointwise confidence intervals; it is possible
to obtain those using predict
, which is illustrated in the
next section. Obviously, the estimated baseline survival curve form
coxph
is less smooth; their values however are very
similar.
The estimate of the baseline hazard from the Cox model fit is not smooth; plotting it does not produce anything remotely useful.
plot(bh$time, diff(c(0, bh$hazard)),
xlab="Time", ylab="Hazard", main="Baseline hazard function from coxph")
The baseline hazard obtained from icfit
is smooth and
does give a useful illustration of the hazard, along with its pointwise
confidence intervals.
The cumulative hazard can also be shown using
type="cumhazard"
.
Actually the baseline hazards and survival function are not really
that interesting, in the sense that they correspond to a subject with
Diameter=0
, FIGO=0
and
Karnofsky=0
. Such persons (at least
Karnofsky=0
) are not in the data and are even biologically
impossible.
Modeled after the survival
package, the function
predict
with a newdata
object can be used to
obtain hazards and survival functions for subjects with given
characteristics. Let us illustrate this with subject A, who has
Diameter=0
, FIGO=0
and
Karnofsky=10
, the most optimistic possibility, and subject
B, who has Diameter=4
, FIGO=0
and
Karnofsky=6
, the most pessimistic possibility, apart from
FIGO. This is simply entered into a data frame with the same names as
the original data.
newd <- data.frame(subject = c("A", "B"),
Diameter = c(0, 4),
FIGO = c(0, 0),
Karnofsky = c(10, 6))
newd
## subject Diameter FIGO Karnofsky
## 1 A 0 0 10
## 2 B 4 0 6
The result is an object of class predict.icfit
, which is
a list with 2 items (one for each subject), each containing a data frame
with time points and hazards, cumulative hazards, survival functions at
these time points with associated standard errors and confidence bounds.
The summary
method can be used to extract the hazards etc
at specified time points.
## [[1]]
## time haz sehaz lowerhaz upperhaz Haz
## 1 0.00 0.0002748966 5.985453e-05 1.794045e-04 0.0004212166 0.0000000
## 2 365.25 0.0003757599 6.227895e-05 2.715383e-04 0.0005199837 0.1190020
## 3 730.50 0.0004032806 6.606705e-05 2.925231e-04 0.0005559742 0.2653842
## 4 1095.75 0.0003192450 5.448410e-05 2.284828e-04 0.0004460616 0.3990633
## 5 1461.00 0.0002341901 4.440138e-05 1.615045e-04 0.0003395881 0.4990482
## 6 1826.25 0.0001838503 4.416871e-05 1.148073e-04 0.0002944148 0.5745981
## 7 2191.50 0.0001426012 5.385327e-05 6.802511e-05 0.0002989378 0.6342641
## 8 2556.75 0.0001053165 6.677041e-05 3.039765e-05 0.0003648876 0.6792628
## seHaz lowerHaz upperHaz Surv seSurv lowerSurv upperSurv
## 1 0.00000000 0.00000000 0.0000000 1.0000000 0.00000000 1.0000000 1.0000000
## 2 0.02074635 0.07833988 0.1596641 0.8878064 0.01841863 0.8524307 0.9246503
## 3 0.04241678 0.18224886 0.3485196 0.7669117 0.03252979 0.7057328 0.8333942
## 4 0.06139357 0.27873415 0.5193925 0.6709484 0.04119187 0.5948821 0.7567412
## 5 0.07527367 0.35151452 0.6465819 0.6071083 0.04569926 0.5238333 0.7036216
## 6 0.08590000 0.40623716 0.7429590 0.5629311 0.04835577 0.4757043 0.6661522
## 7 0.09539228 0.44729862 0.8212295 0.5303257 0.05058896 0.4398905 0.6393530
## 8 0.10572956 0.47203664 0.8864889 0.5069906 0.05360389 0.4121002 0.6237307
##
## [[2]]
## time haz sehaz lowerhaz upperhaz Haz
## 1 0.00 0.0012027579 0.0002547998 0.0007940619 0.001821806 0.0000000
## 2 365.25 0.0016440659 0.0002813880 0.0011755278 0.002299352 0.5206705
## 3 730.50 0.0017644776 0.0003205709 0.0012358608 0.002519201 1.1611382
## 4 1095.75 0.0013967958 0.0002740174 0.0009509285 0.002051720 1.7460257
## 5 1461.00 0.0010246541 0.0002210092 0.0006713984 0.001563775 2.1834904
## 6 1826.25 0.0008044020 0.0002106302 0.0004814917 0.001343872 2.5140445
## 7 2191.50 0.0006239245 0.0002448002 0.0002891746 0.001346193 2.7751017
## 8 2556.75 0.0004607925 0.0002964758 0.0001305695 0.001626203 2.9719850
## seHaz lowerHaz upperHaz Surv seSurv lowerSurv upperSurv
## 1 0.00000000 0.0000000 0.0000000 1.00000000 0.00000000 1.00000000 1.0000000
## 2 0.09042499 0.3434407 0.6979002 0.59412674 0.05372243 0.49763584 0.7093282
## 3 0.19314401 0.7825828 1.5396935 0.31313329 0.06047861 0.21445125 0.4572260
## 4 0.29068118 1.1763011 2.3157503 0.17446686 0.05071392 0.09869302 0.3084182
## 5 0.36541104 1.4672979 2.8996829 0.11264771 0.04116270 0.05504072 0.2305477
## 6 0.42310149 1.6847808 3.3433081 0.08094038 0.03424594 0.03532005 0.1854852
## 7 0.47316123 1.8477227 3.7024806 0.06234323 0.02949836 0.02466235 0.1575957
## 8 0.52346939 1.9460038 3.9979661 0.05120160 0.02680245 0.01835295 0.1428438
Finally, the hazards or cumulative hazards or survival functions can be plotted.
library(gridExtra)
plot(picf, type="surv", ylim=c(0, 1), xlab="Days since randomisation",
title = c("Subject A", "Subject B"), nrow=1)
## TableGrob (1 x 2) "arrange": 2 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
The results from coxph
for these patients are shown in
the plots below as black lines, in blue the results from
icpack
.
sfAB <- survfit(cox_fit, newdata=newd)
sfA <- sfAB[1]
sfB <- sfAB[2]
# Retrieve separate plots from icfit for comparison
picf1 <- picf[[1]]
class(picf1) <- c("predict.icfit", "data.frame")
plt1 <- plot(picf1, type='surv', ylim=c(0, 1),
xlab = "Days since randomisation",
title = "Subject A")
picf2 <- picf[[2]]
class(picf2) <- c("predict.icfit", "data.frame")
plt2 <- plot(picf2, type='surv', ylim=c(0, 1),
xlab = "Days since randomisation",
title = "Subject B")
oldpar <- par(mfrow=c(1, 2))
plot(sfA, xlab="Days since randomisation", ylab="Survival", main="Subject A")
lines(plt1$data$time, plt1$data$Surv, col = "blue")
lines(plt1$data$time, plt1$data$lowerSurv, col = "blue")
lines(plt1$data$time, plt1$data$upperSurv, col = "blue")
legend("topright", c("coxph", "icfit"), lwd=1, col=c("black", "blue"), bty="n")
plot(sfB, xlab="Days since randomisation", ylab="Survival", main="Subject B")
lines(plt2$data$time, plt2$data$Surv, col = "blue")
lines(plt2$data$time, plt2$data$lowerSurv, col = "blue")
lines(plt2$data$time, plt2$data$upperSurv, col = "blue")
legend("topright", c("coxph", "icfit"), lwd=1, col=c("black", "blue"), bty="n")
Again, the estimates and their standard errors are very similar.