1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302
|
\name{censboot}
\alias{censboot}
\alias{cens.return}
\title{
Bootstrap for Censored Data
}
\description{
This function applies types of bootstrap resampling which have
been suggested to deal with right-censored data. It can also do model-based
resampling using a Cox regression model.
}
\usage{
censboot(data, statistic, R, F.surv, G.surv, strata = matrix(1,n,2),
sim = "ordinary", cox = NULL, index = c(1, 2), \dots,
parallel = c("no", "multicore", "snow"),
ncpus = getOption("boot.ncpus", 1L), cl = NULL)
}
\arguments{
\item{data}{
The data frame or matrix containing the data. It must have at least two
columns, one of which contains the times and the other the censoring
indicators. It is allowed to have as many other columns as desired
(although efficiency is reduced for large numbers of columns) except for
\code{sim = "weird"} when it should only have two columns - the times and
censoring indicators. The columns of \code{data} referenced by the
components of \code{index} are taken to be the times and censoring
indicators.
}
\item{statistic}{
A function which operates on the data frame and returns the required
statistic. Its first argument must be the data. Any other arguments
that it requires can be passed using the \code{\dots} argument. In
the case of \code{sim = "weird"}, the data passed to \code{statistic} only
contains the times and censoring indicator regardless of the actual
number of columns in \code{data}. In all other cases the data passed to
statistic will be of the same form as the original data. When
\code{sim = "weird"}, the actual number of observations in the resampled
data sets may not be the same as the number in \code{data}. For this
reason, if \code{sim = "weird"} and \code{strata} is supplied,
\code{statistic} should also take a numeric vector indicating the
strata. This allows the statistic to depend on the strata if required.
}
\item{R}{
The number of bootstrap replicates.
}
\item{F.surv}{
An object returned from a call to \code{survfit} giving the survivor
function for the data. This is a required argument unless
\code{sim = "ordinary"} or \code{sim = "model"} and \code{cox} is missing.
}
\item{G.surv}{
Another object returned from a call to \code{survfit} but with the
censoring indicators reversed to give the product-limit estimate of the
censoring distribution. Note that for consistency the uncensored times
should be reduced by a small amount in the call to \code{survfit}. This
is a required argument whenever \code{sim = "cond"} or when
\code{sim = "model"} and \code{cox} is supplied.
}
\item{strata}{
The strata used in the calls to \code{survfit}. It can be a vector or a
matrix with 2 columns. If it is a vector then it is assumed to be the
strata for the survival distribution, and the censoring distribution is
assumed to be the same for all observations. If it is a matrix then the
first column is the strata for the survival distribution and the second
is the strata for the censoring distribution. When \code{sim = "weird"}
only the strata for the survival distribution are used since the
censoring times are considered fixed. When \code{sim = "ordinary"}, only
one set of strata is used to stratify the observations, this is taken to
be the first column of \code{strata} when it is a matrix.
}
\item{sim}{
The simulation type. Possible types are \code{"ordinary"} (case
resampling), \code{"model"} (equivalent to \code{"ordinary"} if
\code{cox} is missing, otherwise it is model-based resampling),
\code{"weird"} (the weird bootstrap - this cannot be used if \code{cox}
is supplied), and \code{"cond"} (the conditional bootstrap, in which
censoring times are resampled from the conditional censoring
distribution).
}
\item{cox}{
An object returned from \code{coxph}. If it is supplied, then
\code{F.surv} should have been generated by a call of the form
\code{survfit(cox)}.
}
\item{index}{
A vector of length two giving the positions of the columns in
\code{data} which correspond to the times and censoring indicators
respectively.
}
\item{\dots}{
Other named arguments which are passed unchanged to \code{statistic}
each time it is called. Any such arguments to \code{statistic} must
follow the arguments which \code{statistic} is required to have for
the simulation. Beware of partial matching to arguments of
\code{censboot} listed above, and that arguments named \code{X}
and \code{FUN} cause conflicts in some versions of \pkg{boot} (but
not this one).
}
\item{parallel, ncpus, cl}{
See the help for \code{\link{boot}}.
}
}
\value{
An object of class \code{"boot"} containing the following components:
\item{t0}{
The value of \code{statistic} when applied to the original data.
}
\item{t}{
A matrix of bootstrap replicates of the values of \code{statistic}.
}
\item{R}{
The number of bootstrap replicates performed.
}
\item{sim}{
The simulation type used. This will usually be the input value of
\code{sim} unless that was \code{"model"} but \code{cox} was not
supplied, in which case it will be \code{"ordinary"}.
}
\item{data}{
The data used for the bootstrap. This will generally be the input
value of \code{data} unless \code{sim = "weird"}, in which case it
will just be the columns containing the times and the censoring
indicators.
}
\item{seed}{
The value of \code{.Random.seed} when \code{censboot} started work.
}
\item{statistic}{
The input value of \code{statistic}.
}
\item{strata}{
The strata used in the resampling. When \code{sim = "ordinary"}
this will be a vector which stratifies the observations, when
\code{sim = "weird"} it is the strata for the survival distribution
and in all other cases it is a matrix containing the strata for the
survival distribution and the censoring distribution.
}
\item{call}{
The original call to \code{censboot}.
}
}
\details{
The various types of resampling are described in Davison and Hinkley (1997)
in sections 3.5 and 7.3. The simplest is case resampling which simply
resamples with replacement from the observations.
The conditional bootstrap simulates failure times from the estimate of
the survival distribution. Then, for each observation its simulated
censoring time is equal to the observed censoring time if the
observation was censored and generated from the estimated censoring
distribution conditional on being greater than the observed failure time
if the observation was uncensored. If the largest value is censored
then it is given a nominal failure time of \code{Inf} and conversely if
it is uncensored it is given a nominal censoring time of \code{Inf}.
This is necessary to allow the largest observation to be in the
resamples.
If a Cox regression model is fitted to the data and supplied, then the
failure times are generated from the survival distribution using that
model. In this case the censoring times can either be simulated from
the estimated censoring distribution (\code{sim = "model"}) or from the
conditional censoring distribution as in the previous paragraph
(\code{sim = "cond"}).
The weird bootstrap holds the censored observations as fixed and also
the observed failure times. It then generates the number of events at
each failure time using a binomial distribution with mean 1 and
denominator the number of failures that could have occurred at that time
in the original data set. In our implementation we insist that there is
a least one simulated event in each stratum for every bootstrap dataset.
When there are strata involved and \code{sim} is either \code{"model"}
or \code{"cond"} the situation becomes more difficult. Since the strata
for the survival and censoring distributions are not the same it is
possible that for some observations both the simulated failure time and
the simulated censoring time are infinite. To see this consider an
observation in stratum 1F for the survival distribution and stratum 1G
for the censoring distribution. Now if the largest value in stratum 1F
is censored it is given a nominal failure time of \code{Inf}, also if
the largest value in stratum 1G is uncensored it is given a nominal
censoring time of \code{Inf} and so both the simulated failure and
censoring times could be infinite. When this happens the simulated
value is considered to be a failure at the time of the largest observed
failure time in the stratum for the survival distribution.
When \code{parallel = "snow"} and \code{cl} is not supplied,
\code{library(survival)} is run in each of the worker processes.
}
\references{
Andersen, P.K., Borgan, O., Gill, R.D. and Keiding,
N. (1993) \emph{Statistical Models Based on Counting
Processes}. Springer-Verlag.
Burr, D. (1994) A comparison of certain bootstrap confidence intervals
in the Cox model. \emph{Journal of the American Statistical
Association}, \bold{89}, 1290--1302.
Davison, A.C. and Hinkley, D.V. (1997)
\emph{Bootstrap Methods and Their Application}. Cambridge University Press.
Efron, B. (1981) Censored data and the bootstrap.
\emph{Journal of the American Statistical Association}, \bold{76}, 312--319.
Hjort, N.L. (1985) Bootstrapping Cox's regression model. Technical report
NSF-241, Dept. of Statistics, Stanford University.
}
\seealso{
\code{\link{boot}},
\code{\link{coxph}}, \code{\link{survfit}}
}
\examples{
library(survival)
# Example 3.9 of Davison and Hinkley (1997) does a bootstrap on some
# remission times for patients with a type of leukaemia. The patients
# were divided into those who received maintenance chemotherapy and
# those who did not. Here we are interested in the median remission
# time for the two groups.
data(aml, package = "boot") # not the version in survival.
aml.fun <- function(data) {
surv <- survfit(Surv(time, cens) ~ group, data = data)
out <- NULL
st <- 1
for (s in 1:length(surv$strata)) {
inds <- st:(st + surv$strata[s]-1)
md <- min(surv$time[inds[1-surv$surv[inds] >= 0.5]])
st <- st + surv$strata[s]
out <- c(out, md)
}
out
}
aml.case <- censboot(aml, aml.fun, R = 499, strata = aml$group)
# Now we will look at the same statistic using the conditional
# bootstrap and the weird bootstrap. For the conditional bootstrap
# the survival distribution is stratified but the censoring
# distribution is not.
aml.s1 <- survfit(Surv(time, cens) ~ group, data = aml)
aml.s2 <- survfit(Surv(time-0.001*cens, 1-cens) ~ 1, data = aml)
aml.cond <- censboot(aml, aml.fun, R = 499, strata = aml$group,
F.surv = aml.s1, G.surv = aml.s2, sim = "cond")
# For the weird bootstrap we must redefine our function slightly since
# the data will not contain the group number.
aml.fun1 <- function(data, str) {
surv <- survfit(Surv(data[, 1], data[, 2]) ~ str)
out <- NULL
st <- 1
for (s in 1:length(surv$strata)) {
inds <- st:(st + surv$strata[s] - 1)
md <- min(surv$time[inds[1-surv$surv[inds] >= 0.5]])
st <- st + surv$strata[s]
out <- c(out, md)
}
out
}
aml.wei <- censboot(cbind(aml$time, aml$cens), aml.fun1, R = 499,
strata = aml$group, F.surv = aml.s1, sim = "weird")
# Now for an example where a cox regression model has been fitted
# the data we will look at the melanoma data of Example 7.6 from
# Davison and Hinkley (1997). The fitted model assumes that there
# is a different survival distribution for the ulcerated and
# non-ulcerated groups but that the thickness of the tumour has a
# common effect. We will also assume that the censoring distribution
# is different in different age groups. The statistic of interest
# is the linear predictor. This is returned as the values at a
# number of equally spaced points in the range of interest.
data(melanoma, package = "boot")
library(splines)# for ns
mel.cox <- coxph(Surv(time, status == 1) ~ ns(thickness, df=4) + strata(ulcer),
data = melanoma)
mel.surv <- survfit(mel.cox)
agec <- cut(melanoma$age, c(0, 39, 49, 59, 69, 100))
mel.cens <- survfit(Surv(time - 0.001*(status == 1), status != 1) ~
strata(agec), data = melanoma)
mel.fun <- function(d) {
t1 <- ns(d$thickness, df=4)
cox <- coxph(Surv(d$time, d$status == 1) ~ t1+strata(d$ulcer))
ind <- !duplicated(d$thickness)
u <- d$thickness[!ind]
eta <- cox$linear.predictors[!ind]
sp <- smooth.spline(u, eta, df=20)
th <- seq(from = 0.25, to = 10, by = 0.25)
predict(sp, th)$y
}
mel.str <- cbind(melanoma$ulcer, agec)
# this is slow!
mel.mod <- censboot(melanoma, mel.fun, R = 499, F.surv = mel.surv,
G.surv = mel.cens, cox = mel.cox, strata = mel.str, sim = "model")
# To plot the original predictor and a 95\% pointwise envelope for it
mel.env <- envelope(mel.mod)$point
th <- seq(0.25, 10, by = 0.25)
plot(th, mel.env[1, ], ylim = c(-2, 2),
xlab = "thickness (mm)", ylab = "linear predictor", type = "n")
lines(th, mel.mod$t0, lty = 1)
matlines(th, t(mel.env), lty = 2)
}
\author{Angelo J. Canty. Parallel extensions by Brian Ripley}
\keyword{survival}
|