This vignette provides an
introduction to the functions facilitating the analysis of the
dependencies of CRAN packages, specifically get_dep()
,
df_to_graph()
and topo_sort_kahn()
.
To obtain the information about various kinds of dependencies of a
package, we can use the function get_dep()
which takes the
package name and the type of dependencies as the first and second
arguments, respectively. Currently, the second argument accepts a
character vector of one or more of the following words:
Depends
, Imports
, LinkingTo
,
Suggests
, Enhances
, or any variations in their
letter cases, or if LinkingTo
is written as
Linking_To
or Linking To
.
get_dep("dplyr", "Imports")
#> from to type reverse
#> 1 dplyr cli imports FALSE
#> 2 dplyr generics imports FALSE
#> 3 dplyr glue imports FALSE
#> 4 dplyr lifecycle imports FALSE
#> 5 dplyr magrittr imports FALSE
#> 6 dplyr methods imports FALSE
#> 7 dplyr pillar imports FALSE
#> 8 dplyr R6 imports FALSE
#> 9 dplyr rlang imports FALSE
#> 10 dplyr tibble imports FALSE
#> 11 dplyr tidyselect imports FALSE
#> 12 dplyr utils imports FALSE
#> 13 dplyr vctrs imports FALSE
get_dep("MASS", c("depends", "suggests"))
#> from to type reverse
#> 1 MASS grDevices depends FALSE
#> 2 MASS graphics depends FALSE
#> 3 MASS stats depends FALSE
#> 4 MASS utils depends FALSE
#> 5 MASS lattice suggests FALSE
#> 6 MASS nlme suggests FALSE
#> 7 MASS nnet suggests FALSE
#> 8 MASS survival suggests FALSE
For more information on different types of dependencies, see the official guidelines and https://r-pkgs.org/description.html.
In the output, the column type
is the type of the
dependency converted to lower case. Also, LinkingTo
is now
converted to linking to
for consistency.
get_dep("xts", "LinkingTo")
#> from to type reverse
#> 1 xts zoo linking to FALSE
get_dep("xts", "linking to")
#> from to type reverse
#> 1 xts zoo linking to FALSE
For the reverse dependencies, instead of including the prefix
“Reverse” in type
, we use the argument
reverse
:
get_dep("abc", c("depends", "depends"), reverse = TRUE)
#> from to type reverse
#> 1 abc abctools depends TRUE
#> 2 abc EasyABC depends TRUE
get_dep("xts", c("linking to", "linking to"), reverse = TRUE)
#> from to type reverse
#> 1 xts RcppXts linking to TRUE
#> 2 xts TTR linking to TRUE
Theoretically, for each forward dependency
#> from to type reverse
#> 1 A B c FALSE
there should be an equivalent reverse dependency
#> from to type reverse
#> 1 B A c TRUE
Aligning the type
in the forward and reverse
dependencies enables this to be checked easily.
To obtain all types of dependencies, we can use "all"
in
the second argument, instead of typing a character vector of all 5
words:
df0.rstan <- get_dep("rstan", "all")
dplyr::count(df0.rstan, type)
#> type n
#> 1 depends 1
#> 2 imports 10
#> 3 linking to 5
#> 4 suggests 12
df1.rstan <- get_dep("rstan", "all", reverse = TRUE) # too many rows to display
dplyr::count(df1.rstan, type) # hence the summary using count()
#> type n
#> 1 depends 20
#> 2 enhances 3
#> 3 imports 142
#> 4 linking to 123
#> 5 suggests 36
To build a dependency network, we have to obtain the dependencies for
multiple packages. For illustration, we choose the core packages of the
tidyverse, and find out what each package Imports
. We
put all the dependencies into one data frame, in which the package in
the from
column imports the package in the to
column. This is essentially the edge list of the dependency network.
df0.imports <- rbind(
get_dep("ggplot2", "Imports"),
get_dep("dplyr", "Imports"),
get_dep("tidyr", "Imports"),
get_dep("readr", "Imports"),
get_dep("purrr", "Imports"),
get_dep("tibble", "Imports"),
get_dep("stringr", "Imports"),
get_dep("forcats", "Imports")
)
head(df0.imports)
#> from to type reverse
#> 1 ggplot2 cli imports FALSE
#> 2 ggplot2 glue imports FALSE
#> 3 ggplot2 grDevices imports FALSE
#> 4 ggplot2 grid imports FALSE
#> 5 ggplot2 gtable imports FALSE
#> 6 ggplot2 isoband imports FALSE
tail(df0.imports)
#> from to type reverse
#> 73 forcats cli imports FALSE
#> 74 forcats glue imports FALSE
#> 75 forcats lifecycle imports FALSE
#> 76 forcats magrittr imports FALSE
#> 77 forcats rlang imports FALSE
#> 78 forcats tibble imports FALSE
With the help of the ‘igraph’ package, we can use this data frame to build a graph object that represents the dependency network.
g0.imports <- igraph::graph_from_data_frame(df0.imports)
set.seed(1457L)
old.par <- par(mar = rep(0.0, 4))
plot(g0.imports, vertex.label.cex = 1.5)
par(old.par)
The nature of a dependency network makes it a directed acyclic graph
(DAG). We can use the ‘igraph’ function is_dag()
to
check.
Note that this applies to Imports
(and
Depends
) only due to their nature. This acyclic nature does
not apply to a network of, for example, Suggests
.
It is possible to set a boundary on the nodes to which the edges are
directed, using the function df_to_graph()
. The second
argument takes in a data frame that contains the list of such nodes in
the column name
.
Since networks according to Imports
or
Depends
are DAGs, we can obtain the topological
ordering using, for example, Kahn’s (1962) sorting
algorithm.
topo_sort_kahn(g0.core)
#> id id_num
#> 1 forcats 1
#> 2 ggplot2 2
#> 3 readr 3
#> 4 tidyr 4
#> 5 dplyr 5
#> 6 purrr 6
#> 7 stringr 7
#> 8 tibble 8
In the topological ordering, represented by the column
id_num
, a low (high) number represents being at the front
(back) of the ordering. If package A Imports
package B
i.e. there is a directed edge from A to B, then A will be topologically
before B. As the package ‘tibble’ doesn’t import any package but is
imported by most other packages, it naturally goes to the back of the
ordering. This ordering may not be unique for a DAG, and other
admissible orderings can be obtained by setting random=TRUE
in the function:
set.seed(387L); topo_sort_kahn(g0.core, random = TRUE)
#> id id_num
#> 1 ggplot2 1
#> 2 readr 2
#> 3 forcats 3
#> 4 tidyr 4
#> 5 stringr 5
#> 6 purrr 6
#> 7 dplyr 7
#> 8 tibble 8
We can also apply the topological sorting to the bigger dependencies network.
In this other vignette, we show how to obtain the dependency network of all CRAN packages using other functions in the package. The number of reverse dependencies can then be modelled.