In the introduction we have see that a dependency
network can be built using get_dep(). While it is
theoretically possible to use get_dep() iteratively to
obtain all dependencies of all packages available on CRAN, it is not practical to do
so. This package provides two functions
get_dep_all_packages() and
get_graph_all_packges() for obtaining the dependencies of
all CRAN packages directly, as well as an example dataset.
The example dataset cran_dependencies contains all
dependencies as of 2020-05-09.
data(cran_dependencies)
cran_dependencies
#> # A tibble: 211,381 × 4
#> from to type reverse
#> <chr> <chr> <chr> <lgl>
#> 1 A3 xtable depends FALSE
#> 2 A3 pbapply depends FALSE
#> 3 A3 randomForest suggests FALSE
#> 4 A3 e1071 suggests FALSE
#> 5 aaSEA DT imports FALSE
#> 6 aaSEA networkD3 imports FALSE
#> 7 aaSEA shiny imports FALSE
#> 8 aaSEA shinydashboard imports FALSE
#> 9 aaSEA magrittr imports FALSE
#> 10 aaSEA Bios2cor imports FALSE
#> # ℹ 211,371 more rows
dplyr::count(cran_dependencies, type, reverse)
#> # A tibble: 8 × 3
#> type reverse n
#> <chr> <lgl> <int>
#> 1 depends FALSE 11123
#> 2 depends TRUE 9672
#> 3 imports FALSE 57617
#> 4 imports TRUE 51913
#> 5 linking to FALSE 3433
#> 6 linking to TRUE 3721
#> 7 suggests FALSE 35018
#> 8 suggests TRUE 38884This is essentially a snapshot of CRAN. We can obtain all the current
dependencies using get_dep_all_packages(), which requires
no arguments:
df0.cran <- get_dep_all_packages()$dependencies
head(df0.cran)
#> from to type reverse
#> 1 a11yShiny shiny imports FALSE
#> 2 a11yShiny htmltools imports FALSE
#> 3 a11yShiny DT imports FALSE
#> 4 a11yShiny ggplot2 imports FALSE
#> 5 a11yShiny rlang imports FALSE
#> 6 a5R cli imports FALSE
dplyr::count(df0.cran, type, reverse) # numbers in general larger than above
#> type reverse n
#> 1 depends FALSE 9931
#> 2 depends TRUE 8626
#> 3 enhances FALSE 641
#> 4 enhances TRUE 641
#> 5 imports FALSE 125403
#> 6 imports TRUE 115737
#> 7 linking to FALSE 6629
#> 8 linking to TRUE 7107
#> 9 suggests FALSE 84255
#> 10 suggests TRUE 92201As of 2026-06-11, there are 0 packages that have all 10 types of dependencies, and 9 packages that have 9 types of dependencies: Matrix, bigmemory, ergm, igraph, lme4, miceadds, quanteda, rstan, xts.
We can build dependency network using
get_graph_all_packages(). Furthermore, we can verify that
the forward and reverse dependency networks are (almost) the same, by
looking at their size (number of edges) and order (number of nodes).
g0.depends <- get_graph_all_packages(type = "depends")
g0.depends
#> IGRAPH 17bb0b7 DN-- 4360 7016 --
#> + attr: name (v/c)
#> + edges from 17bb0b7 (vertex names):
#> [1] abc ->abc.data abc ->locfit abc ->MASS
#> [4] abc ->nnet abc ->quantreg abctools ->abc
#> [7] abctools ->abind abctools ->Hmisc abctools ->plyr
#> [10] abd ->lattice abd ->mosaic abd ->nlme
#> [13] abodOutlier ->cluster absorber ->fda absorber ->Matrix
#> [16] absorber ->sparsegl abundant ->glasso Ac3net ->data.table
#> [19] acc ->mhsmm accelmissing->mice accelmissing->pscl
#> [22] accessrmd ->ggplot2 accrual ->tcltk2 accrualPlot ->lubridate
#> + ... omitted several edgesWe could obtain essentially the same graph, but with the direction of
the edges reversed, by using the argument reverse:
The dependency words accepted by the argument type is
the same as in get_dep(). The two networks’ size and order
should be very close if not identical to each other. Because of the
dependency direction, their edge lists should be the same but with the
column names from and to swapped.
For verification, the exact same graphs can be obtained by filtering
the data frame for the required dependency and applying
df_to_graph():
g1.depends <- df0.cran |>
dplyr::filter(type == "depends" & !reverse) |>
df_to_graph(nodelist = dplyr::rename(df0.cran, name = from))
g1.depends # same as g0.depends
#> IGRAPH 3b71227 DN-- 4360 7016 --
#> + attr: name (v/c), type (e/c), reverse (e/l)
#> + edges from 3b71227 (vertex names):
#> [1] abctools ->abind abctools ->Hmisc
#> [3] abctools ->plyr abctools ->abc
#> [5] absorber ->Matrix absorber ->sparsegl
#> [7] absorber ->fda acc ->mhsmm
#> [9] accessrmd ->ggplot2 accrual ->tcltk2
#> [11] accrualPlot->lubridate acebayes ->lhs
#> [13] Achilles ->DatabaseConnector acid ->gamlss
#> [15] acid ->gamlss.dist acid ->Hmisc
#> + ... omitted several edgesIf we extract the equivalent graph of reverse dependencies, we should obtain the same graph as before (had it been extracted above):
# Not run
g1.rev_depends <- df0.cran |>
dplyr::filter(type == "depends" & reverse) |>
df_to_graph(nodelist = dplyr::rename(df0.cran, name = from))
g1.rev_depends # should be same as g0.rev_dependsThe networks obtained above should all be directed acyclic graphs:
One may notice that there are external reverse dependencies which
won’t be appear in the forward dependencies if the scraping is limited
to CRAN packages. We can find these external reverse dependencies by
nodelist = NULL in df_to_graph():
df1.rev_depends <- df0.cran |>
dplyr::filter(type == "depends" & reverse) |>
df_to_graph(nodelist = NULL, gc = FALSE) |>
igraph::as_data_frame() # to obtain the edge list
df1.depends <- df0.cran |>
dplyr::filter(type == "depends" & !reverse) |>
df_to_graph(nodelist = NULL, gc = FALSE) |>
igraph::as_data_frame()
dfa.diff.depends <- dplyr::anti_join(
df1.rev_depends,
df1.depends,
c("from" = "to", "to" = "from")
)
head(dfa.diff.depends)
#> from to type reverse
#> 1 abind CNORdt depends TRUE
#> 2 abind FISHalyseR depends TRUE
#> 3 abind riboSeqR depends TRUE
#> 4 abind S4Arrays depends TRUE
#> 5 adabag m6Aboost depends TRUE
#> 6 ade4 covRNA depends TRUEThis means we are extracting the reverse dependencies of which the
forward equivalents are not listed. The column to shows the
packages external to CRAN. On the other hand, if we apply
dplyr::anti_join() by switching the order of two edge
lists,
dfb.diff.depends <- dplyr::anti_join(
df1.depends,
df1.rev_depends,
c("from" = "to", "to" = "from")
)
head(dfb.diff.depends)
#> from to type reverse
#> 1 abctools parallel depends FALSE
#> 2 abd grid depends FALSE
#> 3 absorber parallel depends FALSE
#> 4 AcceptanceSampling methods depends FALSE
#> 5 AcceptanceSampling stats depends FALSE
#> 6 acdcR stats depends FALSEthe column to lists those which are not on the page of
available packages on CRAN
(anymore). These are either defunct or core packages.
Using the data frame df0.cran, we can also obtain the
degree for each package and each type:
df0.summary <- dplyr::count(df0.cran, from, type, reverse)
head(df0.summary)
#> from type reverse n
#> 1 AATtools imports FALSE 4
#> 2 ABACUS imports FALSE 2
#> 3 ABACUS suggests FALSE 2
#> 4 ABC.RAP imports FALSE 3
#> 5 ABC.RAP suggests FALSE 2
#> 6 ABCDscores imports FALSE 15We can look at the “winner” in each of the reverse dependencies:
df0.summary |>
dplyr::filter(reverse) |>
dplyr::group_by(type) |>
dplyr::top_n(1, n)
#> # A tibble: 5 × 4
#> # Groups: type [5]
#> from type reverse n
#> <chr> <chr> <lgl> <int>
#> 1 Rcpp linking to TRUE 3411
#> 2 dplyr imports TRUE 5227
#> 3 ggplot2 depends TRUE 472
#> 4 shiny enhances TRUE 12
#> 5 testthat suggests TRUE 12601This is not surprising given the nature of each package. To take the summarisation one step further, we can obtain the frequencies of the degrees, and visualise the empirical degree distribution neatly on the log-log scale:
df1.summary <- df0.summary |>
dplyr::count(type, reverse, n)
#> Storing counts in `nn`, as `n` already present in input
#> ℹ Use `name = "new_name"` to pick a new name.
gg0.summary <- df1.summary |>
dplyr::mutate(reverse = ifelse(reverse, "reverse", "forward")) |>
ggplot2::ggplot() +
ggplot2::geom_point(ggplot2::aes(n, nn)) +
ggplot2::facet_grid(type ~ reverse) +
ggplot2::scale_x_log10() +
ggplot2::scale_y_log10() +
ggplot2::labs(x = "Degree", y = "Number of packages") +
ggplot2::theme_bw(20)
gg0.summary
This shows the reverse dependencies, in particular
Reverse_depends and Reverse_imports, follow
the power law,
which is empirically observed in various academic fields.
We can now visualise (the giant component of) the CRAN network of
Depends, using functions in the package
visNetwork. To do this, we will need to convert the
igraph object g0.depends to the node list
and edge list as data frames.
prefix <- "http://CRAN.R-project.org/package=" # canonical form
degrees <- igraph::degree(g0.depends)
df0.nodes <- data.frame(id = names(degrees), value = degrees) |>
dplyr::mutate(title = paste0('<a href=\"', prefix, id, '\">', id, '</a>'))
df0.edges <- igraph::as_data_frame(g0.depends, what = "edges")We could use igraph::membership() &
igraph::cluster_*() for community detection and
visualisation of the clusters using different colours, which however
will take too much computing time and therefore not shown here.
By adding the column title in df0.nodes, we
enable clicking the nodes and being directed to their CRAN pages, in the
interactive visualisation below:
set.seed(2345L)
vis0 <- visNetwork::visNetwork(df0.nodes, df0.edges, width = "100%", height = "720px") |>
visNetwork::visOptions(highlightNearest = TRUE) |>
visNetwork::visEdges(arrows = "to", color = list(opacity = 0.5)) |>
visNetwork::visNodes(fixed = TRUE) |>
visNetwork::visIgraphLayout(layout = "layout_with_drl")
vis0Methods in social network analysis, such as stochastic block models, can be applied to study the properties of the dependency network. Ideally, by analysing the dependencies of all CRAN packages, we can obtain a bird’s-eye view of the ecosystem. The number of reverse dependencies is modelled in this other vignette.