CareDensity
package is a small R package that can be
used mainly to calculate various measures of Care Density (Pollack et
al. 2013, Engels et al. 2024) and Coordination of Care Indices (Jee
& Cabana 2006). In this vignette we will show some simple examples
for each index and give some further explanations on what they mean.
This package is currently not available on CRAN, but a developmental version can be installed from github, using the following code:
The care density was first proposed by Pollack et al. (2013). It is a measure that is supposed to measure how closely the “care team” of a patient (the providers / medical doctors he is in contact with) are connected. As an indirect measure of how closely two providers work together, the number of patients shared by the providers is used. Therefore, in order to calculate the care density, the only information we need is which patient visited which provider. This information can be considered a “patient-sharing network” and has been used in multiple studies already (DuGoff et al. 2018, Landon et al. 2012).
Formally, the care density is defined as:
$$C_p = \frac{\sum_{i = 1}^m w_{p, j}}{n_p (n_p - 1)/2},$$
with np being the number of providers a patient has visited, m defined as the number of all possible combinations of length two and wp, j being the number of patients that a pair of provider is sharing.
Suppose we have the following data about 5 patients (denoted by numbers) and the 3 providers they visited (denoted by uppercase letters):
library(CareDensity)
library(igraph)
library(data.table)
library(MatrixExtra)
set.seed(3431)
# some arbitrary patient-provider contact data
data <- data.frame(PatID=c("1", "1", "1", "2", "2", "3", "3", "4", "5"),
ArztID=c("A", "C", "D", "A", "D", "A", "D", "D", "C"))
data
#> PatID ArztID
#> 1 1 A
#> 2 1 C
#> 3 1 D
#> 4 2 A
#> 5 2 D
#> 6 3 A
#> 7 3 D
#> 8 4 D
#> 9 5 C
Since there are only connections between patients and providers and
no connections between provider and providers or patients and patients,
this can be considered a bipartite graph. We can visualize this
graph using the igraph
package:
# create graph
g <- graph_from_data_frame(data, directed=FALSE)
# add type
V(g)$type <- bipartite_mapping(g)$type
# change some things for better plots
V(g)$color <- ifelse(V(g)$type, "salmon", "lightblue")
V(g)$shape <- ifelse(V(g)$type, "square", "circle")
E(g)$color <- "lightgray"
plot(g, vertex.label.cex=0.8, vertex.label.color="black")
Consider patient number 1 in this graph. We can see that this patient visited 3 different providers (A, B and C). Therefore, np is equal to 3 for this patient. But how about the weights? To gain a better understanding, we can project this graph into a provider-centric network, where only providers are included:
# project to provider mode
mat <- project_to_one_mode(g, mode="cols")
# make it an igraph object again
g_provider <- graph_from_adjacency_matrix(mat, mode="upper", weighted=TRUE, diag=FALSE)
# plot it
plot(g_provider, edge.label=E(g_provider)$weight, vertex.shape="square",
edge.color="black", vertex.color="salmon",
vertex.label.cex=0.8)
In this graph we can see exactly which provider shared how many patients. So for patient 1 the care density would be:
$$C_1 = \frac{1 + 1 + 3}{3 (3 - 1)/2} \approx 1.6667.$$
While this is easy to calculate for a single patient in a small
network, it is unfeasible to do this by hand in a large network and for
multiple patients. The care_density()
function can be used
to automatically perform the step taken by hand above. All we need to do
is supply the original data
to it:
care_density(data)
#> PatID sum_weights n care_density
#> 1 1 5 3 1.666667
#> 2 2 3 2 3.000000
#> 3 3 3 2 3.000000
#> 4 4 NA 1 NA
#> 5 5 NA 1 NA
Here we can see that it gets the same result for patient 1 and also
outputs the care density for all other patients. Since patient 4 and 5
have only visited one provider, the care density is undefined for them
and is thus denoted by NA
.
The care density as defined by Pollack et al. (2014) has recently been criticed by Engels et al. (2024). Their main point are that:
Engels et al. (2024) instead propose a modified version of the care density, which they call fragmented care density. The main idea is to stratify the sum term of the classic care density into multiple provider-relationship specific sums. $$C_p = \sum_{j = 1}^{k} \frac{s_j}{n_p(n_p - 1) /2},$$
with k being defined as:
$$k = {l \choose 2} + l,$$
where l is the number of provider types. For example, if we have three different provider types (psychiatrist, general provider, other), we would have 6 different relationships. It is easy to see that this formulation is equivalent to the one given by Pollack et al. (2013). The new proposal is to weight the different connection types differently. The fragmented care density as proposed by Engels et al. (2024) is then given by:
$$FC_p = \sum_{j = 1}^{k} w_j \frac{s_j}{n_p(n_p - 1) /2}.$$
where wj are some connection specific weights. Engels et al. (2024) propose to use regression models to estimate these weights. This is explained in a little more detail below.
Consider again the example dataset defined above. Suppose, however,
that each provider is known to be either a general practitioner or a
psychiatrist. The following data.frame
defines the type of
each provider in our example:
d_type <- data.frame(ID=c("A", "C", "D"),
Type=c("GP", "GP", "Psychiatrist"))
d_type
#> ID Type
#> 1 A GP
#> 2 C GP
#> 3 D Psychiatrist
Since there are two different kinds of providers, there are 3 possible provider connections:
We can safely omit the “Psychiatrist - GP” connection type, because
we are always dealing with undirected connections, which means that the
order of appearance does not make a difference. Now we can use the
fragmented_care_density()
function as follows:
fragmented_care_density(data, weights=NULL, type=d_type, by_connection=TRUE)
#> PatID connection sum_weights n care_density
#> 1 1 GP - GP 1 3 0.3333333
#> 2 1 Psychiatrist - GP 4 3 1.3333333
#> 3 2 Psychiatrist - GP 3 2 3.0000000
#> 4 3 Psychiatrist - GP 3 2 3.0000000
#> 5 4 <NA> NA 1 NA
#> 6 5 <NA> NA 1 NA
By setting by_connection
to TRUE
we are
telling the function to not directly calculate the fragmented care
density for us, but to instead return the sum of the weights and care
densities per connection type and patient. From this dataset we can see
that only patient 1 has two different kinds of provider connections
among his network. If we sum up the care densities of each patient over
all his connection types, we get the classic care density.
To calculate the actual fragmented care density for each patient, we
need to supply weights
for each connection type. We can do
this by constructing a data.frame
that looks like this:
d_weights <- data.frame(from=c("GP", "GP", "Psychiatrist"),
to=c("GP", "Psychiatrist", "Psychiatrist"),
weight=c(1, 1, 1))
d_weights
#> from to weight
#> 1 GP GP 1
#> 2 GP Psychiatrist 1
#> 3 Psychiatrist Psychiatrist 1
Here, we set all weights to 1. Therefore, if we now call the
fragmented_care_density()
function again (this time without
using the by_connection
argument):
fragmented_care_density(data, weights=d_weights, type=d_type)
#> PatID fragmented_care_density
#> 1 1 1.666667
#> 2 2 3.000000
#> 3 3 3.000000
#> 4 4 NA
#> 5 5 NA
we get the same results as we did before when using the simple
care_density()
function. However, if we use different
weights for each connection type:
d_weights <- data.frame(from=c("GP", "GP", "Psychiatrist"),
to=c("GP", "Psychiatrist", "Psychiatrist"),
weight=c(1.1, 0.5, 1.3))
d_weights
#> from to weight
#> 1 GP GP 1.1
#> 2 GP Psychiatrist 0.5
#> 3 Psychiatrist Psychiatrist 1.3
and call the function again:
fragmented_care_density(data, weights=d_weights, type=d_type)
#> PatID fragmented_care_density
#> 1 1 1.033333
#> 2 2 1.500000
#> 3 3 1.500000
#> 4 4 NA
#> 5 5 NA
we obtain different results. Note that this never matters for
patients 4 and 5, because they only have one provider connection,
resulting in NA
values no matter the weights.
In the example above, we used arbitrary weights to showcase the
functionality of the fragmented_care_density()
function. In
reality we would like to use sensible weights instead. Engels et
al. (2024) recommend using regression models to obtain the relevant
weights. In their article, they use a logistic regression model to
predict a binary outcome, given the presence of a connection type and
some covariates. We will illustrate this approach here without the use
of other covariates.
Suppose in addition to data
(patient-provider
relationships) and type
(the provider types) we also have
some patient level information on a binary outcome of interest Y. Suppose this is that data:
d_outcome <- data.frame(PatID=c("1", "2", "3", "4", "5"),
Y=c(0, 0, 1, 1, 0))
d_outcome
#> PatID Y
#> 1 1 0
#> 2 2 0
#> 3 3 1
#> 4 4 1
#> 5 5 0
First, we again obtain data about the connection sums (sj) per connection type:
We then binarize these weights to 0 / 1. If at least one connection is present, it is considered a 1, otherwise it is a 0:
Next, we transform this information into the long-format using the
dcast()
function of the data.table
package:
d_consum <- dcast(as.data.table(d_consum), PatID ~ connection,
value.var="sum_weights", fill=0)
d_consum$`NA` <- NULL
Afterwards we can merge the outcome data to this dataset:
Now we can finally fit the regression model, using the outcome as the dependent variable and the connection-type dummies as independent variables:
mod <- glm(Y ~ `GP - GP` + `Psychiatrist - GP`, data=d_outcome, family="binomial")
summary(mod)
#>
#> Call:
#> glm(formula = Y ~ `GP - GP` + `Psychiatrist - GP`, family = "binomial",
#> data = d_outcome)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -1.256e-16 1.414e+00 0.000 1.000
#> `GP - GP` -1.857e+01 6.523e+03 -0.003 0.998
#> `Psychiatrist - GP` 0.000e+00 2.000e+00 0.000 1.000
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 6.7301 on 4 degrees of freedom
#> Residual deviance: 5.5452 on 2 degrees of freedom
#> AIC: 11.545
#>
#> Number of Fisher Scoring iterations: 17
The estimated coefficients associated with each connection type may now be used as weights. Note that in this example the weights don’t make any sense due to (1) the very low sample size, (2) the very small network and (3) because all data is completely made up with no underlying true relationships. The general procedure, however, should work with real data.
DuGoff, Eva H., Sara Fernandes-Taylor, Gary E. Weissman, Joseph H. Huntley, and Craig Evan Pollack. (2018). “A Scoping Review of Patient-Sharing Network Studies Using Administrative Data”. Translational Behavioral Medicine 8 (4), pp. 598-625.
Engels, Alexander, Claudia Konnopka, Espen Henken, Martin Härter, and Hans-Helmut König. (2024). “A Flexible Approach to Measure Care Coordination Based on Patient-Sharing Networks”. BMC Medical Research Methodology 24 (1), pp. 1-12.
Jee, Sandra H., and Michael D. Cabana. (2006). “Indices for Continuity of Care: A Systematic Review of the Literature”. Medical Care Research and Review 63 (2), pp. 158-188.
Landon, Bruce E., Nancy L. Keating, Michael L. Barnett, Jukka-Pekka Onnela, Sudeshna Paul, A. James O’Malley, Thomas Keegan, and Nicholas A. Christakis. (2012). “Variation in Patient-Sharing Networks of Physicians Across the United States”. JAMA 308 (3): pp. 265-273.
Pollack, Craig Evan, Gary E. Weissman, Klaus W. Lemke, Peter S. Hussey, and Jonathan P. Weiner. (2013). “Patient Sharing Among Physicians and Costs of Care: A Network Analytic Approach to Care Coordination Using Claims Data”. Journal of General Internal Medicine 28 (3), pp. 459-465.