This package implements functions to convert prevalence to incidence based on data obtained in point-prevalence studies (PPSs) along the lines of (Rhame and Sudderth 1981),(Freeman and Hutchison 1980) and is a companion to an upcoming paper (Willrich et al. 2019). It also implements methods to simulate PPS-data to benchmark different estimation methods. Notation will follow the companion paper. So a good idea is to read the paper first and afterwards the vignette.
The package has functions to simulate PPS-data based on distributions of length of infection and length of stay. Results of PPS simulations and incidence calculations are stored as tibbles. The functionality in the package can be divided in three parts:
In the following sections, we will go through these different aspects, starting with simulation of PPS-data.
The function simulate_pps_fast
can be used to generate
PPS data. This functions simulates a PPS on the basis of a given
prevalence P
using a vector of probabilities
dist.X.loi
for the values 1:length(dist.X.loi) of Xloi.
It directly samples the time of infection up to date based on
dist.X.loi
. Optionally, the length of stay is also sampled
independently using dist.X.los
which is in the same format
as dist.X.loi
. The sample is generated by treating the
marginal distributions of length of stay and length of infection as
independent by assumption. Because of this non-joint sampling rows
should not be interpreted as individual patients.
example.dist <- create_dist_vec(function(x) dpois(x-1, 7), max.dist = 70)
example.dist.los <- create_dist_vec(function(x) dpois(x-1, lambda = 12),
max.dist = 70)
data.pps.fast <- simulate_pps_fast(n.sample=5000,
P=0.05,
dist.X.loi = example.dist,
dist.X.los = example.dist.los)
head(data.pps.fast)
## # A tibble: 6 × 5
## A.loi L.loi A.los L.los patient.type
## <dbl> <dbl> <int> <int> <dbl>
## 1 0 0 11 16 1
## 2 0 0 3 16 1
## 3 0 0 16 16 1
## 4 0 0 1 5 1
## 5 0 0 3 17 1
## 6 0 0 1 15 1
Values of zero for A.loi
and L.loi
indicate
absence of a HAI.
The incidence rate per patient day I and the expected length of
infection in the whole population xloi
for a given distribution of Xloi
and given P can be calculated
with simulate_incidence_stats_fast
by supplying a
prevalence P
and a vector of probabilities
dist.X.loi
for Xloi.
Optionally one can also calculate Ipp and
xlos
if one supplies a vector of probabilites dist.X.los
for
Xlos.
data.fast.inc.theo <- simulate_incidence_stats_fast(P=0.05,
dist.X.loi = example.dist,
dist.X.los = example.dist.los)
data.fast.inc.theo
## $x.loi
## [1] 8
##
## $x.los
## [1] 13
##
## $I
## [1] 0.006578947
##
## $I.pp
## [1] 0.08125
While the above method of simulation is fast and efficient and is useful for larger simulation studies, it is useful to have a more explicit simulation technique which samples from the joint distribution of Xlos and Xloi and gives more control over subpopulations of patients.
The setup of this simulation model is described in the following.
We assume the following setup. Patients arrive sequentially at a
hospital X. A hospital
is a named list with the following
named elements:
inc.factor
which modifies the risk of
nosocomial infections for all types of patients,patient.list
characterized
below,pat.dist
.A patient.type
is a list with the following named
elements
dist.X.los
of probabilities for the values
1:length(dist.X.los)
of Xlos.,dist.X.loi
of probabilities for the values
1:length(dist.X.loi)
of Xloi.,I.p
for
this type of patient.The base-value of the length of stay is additive with the possible length of a nosocomial infection. Clustering of infections is not explicitly modelled.
As an example we define a hospital with two different patient types.
pat.1 <- list(dist.X.los =
create_dist_vec(function(x) dpois(x-1, lambda = 12),
70),
I.p = 0.008,
dist.X.loi =
create_dist_vec(function(x) dpois(x-1, lambda = 10),
70))
pat.2 <- list(dist.X.los =
create_dist_vec(function(x) dpois(x-1, lambda = 10),
70),
I.p = 0.02,
dist.X.loi =
create_dist_vec(function(x) dpois(x-1, lambda = 7),
70))
patient.list <- list(pat.1, pat.2)
# define distribution of patients
pat.1.prob <- 0.4; pat.2.prob <- 0.6
pat.dist.hosp <- c(pat.1.prob, pat.2.prob)
hospital.1 <- list(inc.factor = 1,
pat.dist = pat.dist.hosp,
patient.list = patient.list)
Using simulate_pps_data
one can generate PPS data by
simulating the evolution of n.sample
beds for
steps
days.
## # A tibble: 6 × 5
## A.loi A.los L.loi L.los patient.type
## <dbl> <dbl> <dbl> <int> <int>
## 1 2 12 13 25 1
## 2 0 4 0 6 1
## 3 0 1 0 12 1
## 4 1 10 9 21 2
## 5 0 12 0 19 1
## 6 0 15 0 16 1
To get additional theoretical quantities based on the whole
population, one can use simulate_incidence_stats
.
data.inc.theo <- simulate_incidence_stats(hospital.1, 365 * 1000)
# gives incidence rate I
data.inc.theo$I
## [1] 0.01451649
## [1] 0.1743867
## [1] 12.01301
# average length of stay of patients who had at least one HAI during their stay
data.inc.theo$x.los.only.noso
## [1] 17.75444
To use the newly proposed estimator gren presented in the
companion paper, one can use the function
calculate_I_smooth
with method="gren"
. The
data
should be supplied as a data frame with at least a
column named A.loi
giving lengths of infection up to date
of PPS. Values of zero for A.loi
indicate absence of a HAI.
Optionally, the data frame can also contain a column A.los
supplying lengths of stay up to PPS to estimate xlos
with the same method as well.
## # A tibble: 1 × 8
## n n.noso P.hat I.hat I.pp.hat x.loi.hat x.los.hat method
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 5000 564 0.113 0.0147 0.171 8.68 13.1 gren
## [1] 0.01451649
## # A tibble: 1 × 8
## n n.noso P.hat I.hat I.pp.hat x.loi.hat x.los.hat method
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 5000 255 0.051 0.00738 0.0923 7.29 13.2 gren
## [1] 0.006578947
There is another variation of this estimator specified with
method = "rear"
. This uses the rearrangement estimator
studied in (Jankowski and Wellner 2009)
instead of the Grenander estimator as an estimator for the monotonously
decreasing distribution of Aloi
and Alos.
We will denote this type of estimator by rear.
There are two helper functions to calculate confidence intervals for
the estimates of Ipp
with the gren estimator: One ( calculate_CI_I_pp
)
is based on the typical output of calculate_I_smooth
:
## # A tibble: 1 × 8
## n n.noso P.hat I.hat I.pp.hat x.loi.hat x.los.hat method
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 5000 255 0.051 0.00738 0.0923 7.29 13.2 gren
## # A tibble: 1 × 2
## CI.lower.Ipp CI.upper.Ipp
## <dbl> <dbl>
## 1 0.0606 0.124
The other (CI_np_bs
) is based on a bootstrapping
approach which resamples from the estimates of the distributions for
Aloi
and Alos
based on the Grenander estimator. It works with data as output by
simulate_pps_data*
.
## # A tibble: 1 × 2
## CI.lower.Ipp CI.upper.Ipp
## <dbl> <dbl>
## 1 0.0753 0.123
The function calculate_I_rhame
can be used to calculate
the incidence with a user-supplied value x.loi.hat
for the
estimated length of infection xloi
and an optional specification of x.los.hat
for the
estimated length of stay xlos
to get an estimate of Ipp
too. Here we take the example of an estimator where
x.loi.hat
and x.los.hat
are fixed to their
theoretical values and which depends only on the estimate of P. We will call this type of
estimator rhame.theo .
calculate_I_rhame(data.pps,
x.loi.hat = data.inc.theo$x.loi,
x.los.hat = data.inc.theo$x.los,
method = "method identifier")
## # A tibble: 1 × 8
## n n.noso P.hat I.hat I.pp.hat x.loi.hat x.los.hat method
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 5000 564 0.113 0.0147 0.176 8.67 13.5 method identifier
## [1] 0.01451649
## [1] 0.1743867
calculate_I_rhame(data.pps.fast,
x.loi.hat = data.fast.inc.theo$x.loi,
x.los.hat = data.fast.inc.theo$x.los,
method = "method identifier")
## # A tibble: 1 × 8
## n n.noso P.hat I.hat I.pp.hat x.loi.hat x.los.hat method
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 5000 255 0.051 0.00672 0.0829 8 13 method identifier
## [1] 0.006578947
## [1] 0.08125
As a convenience function, one can use calculate_I
to
get estimates of I for a range of estimators (including the ones studied
in (Willrich et al. 2019)) based on PPS
data and the accompanying theoretical data .
The estimators are the following:
## # A tibble: 8 × 8
## n n.noso P.hat I.hat I.pp.hat x.loi.hat x.los.hat method
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 5000 255 0.051 0.00738 0.0923 7.29 13.2 gren
## 2 5000 255 0.051 0.00759 0.09 7.08 12.5 rear
## 3 5000 255 0.051 0.0106 0.133 5.05 13.2 pps.mixed
## 4 5000 255 0.051 0.0107 0.0714 5 7 pps.median
## 5 5000 255 0.051 0.0111 0.0790 4.86 7.53 pps.mean
## 6 5000 255 0.051 0.00668 0.0830 8.04 13.1 L.full
## 7 5000 255 0.051 0.00672 0.0829 8 13 rhame.theo
## 8 5000 255 0.051 0.00738 0.0962 7.29 13.7 naive
If one wants to combine the (fast) simulation step with the
estimation step one can use generate_I_fast
. This is just a
wrapper for first calling simulate_pps_fast
and then
calling calculate_I
.
generate_I_fast(n.sample = 10000,
P = 0.05,
dist.X.loi = example.dist,
data.theo = data.fast.inc.theo)
## # A tibble: 8 × 8
## n n.noso P.hat I.hat I.pp.hat x.loi.hat x.los.hat method
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 10000 530 0.053 0.00771 NA 7.26 NA gren
## 2 10000 530 0.053 0.00771 NA 7.26 NA rear
## 3 10000 530 0.053 0.00902 NA 6.21 NA pps.mixed
## 4 10000 530 0.053 0.0140 NA 4 NA pps.median
## 5 10000 530 0.053 0.0117 NA 4.78 NA pps.mean
## 6 10000 530 0.053 0.00703 NA 7.96 NA L.full
## 7 10000 530 0.053 0.00700 0.0861 8 13 rhame.theo
## 8 10000 530 0.053 0.00771 NA 7.26 NA naive
The function monotone_smoother
implements the
rearrangement estimator and Grenander estimator described in (Jankowski and Wellner 2009).
A.loi.smoothed <- monotone_smoother(A.loi.sample, method = "gren")
# estimated monotonously decreasing distribution
plot(A.loi.smoothed)
For creating length-biased distributions there is
length_biased_dist
, which takes a vector of probabilities
of a discrete positive distribution as an argument.
# geometric distribution starting in 1 and cutoff at 70 with mean at about 8.
geom.dist <- create_dist_vec(geom_dist_fct, max.dist = 70)
# calculate mean
sum(1:length(geom.dist)*geom.dist)
## [1] 7.993895
To calculate the mean of the original distribution based on the
length-biased distribution one can use
length_unbiased_mean
.
## [1] 7.993895