Title: | Network Analysis of Dependencies of CRAN Packages |
---|---|
Description: | The dependencies of CRAN packages can be analysed in a network fashion. For each package we can obtain the packages that it depends, imports, suggests, etc. By iterating this procedure over a number of packages, we can build, visualise, and analyse the dependency network, enabling us to have a bird's-eye view of the CRAN ecosystem. One aspect of interest is the number of reverse dependencies of the packages, or equivalently the in-degree distribution of the dependency network. This can be fitted by the power law and/or an extreme value mixture distribution <doi:10.1111/stan.12355>, of which functions are provided. |
Authors: | Clement Lee [aut, cre] |
Maintainer: | Clement Lee <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.3.10 |
Built: | 2024-11-03 06:48:28 UTC |
Source: | CRAN |
A dataset containing the citations of conference papers of the ACM Conference on Human Factors in Computing Systems (CHI) from 1981 to 2019, obtained from the ACM digital library. The resulting citation network can be compared to the dependencies network of CRAN packages, in terms of network-related characteristics, such as degree distribution and sparsity.
chi_citations
chi_citations
A data from with21951 rows and 4 variables:
the unique identifier (in the digital library) of the paper that cites other papers
the unique identifier of the paper that is being cited
the publication year of the citing paper
the publication year of the cited paper
https://dl.acm.org/conference/chi
A dataset containing the dependencies of various types (Imports, Depends, Suggests, LinkingTo, and their reverse counterparts) of more than 14600 packages available on CRAN as of 2020-05-09.
cran_dependencies
cran_dependencies
A data frame with 211408 rows and 4 variables:
the name of the package that introduced the dependencies
the name of the package that the dependency is directed towards
the type of dependency, which can take the follow values (all in lowercase): "depends", "imports", "linking to", "suggests"
a boolean representing whether the dependency is a reverse one (TRUE) or a forward one (FALSE)
The CRAN pages of all the packages available on https://cran.r-project.org
Construct the giant component of the network from two data frames
df_to_graph(edgelist, nodelist = NULL, gc = TRUE)
df_to_graph(edgelist, nodelist = NULL, gc = TRUE)
edgelist |
A data frame with (at least) two columns: from and to |
nodelist |
NULL, or a data frame with (at least) one column: name, that contains the nodes to include |
gc |
Boolean, if 'TRUE' (default) then the giant component is extracted, if 'FALSE' then the whole graph is returned |
An igraph object & a connected graph if gc is 'TRUE'
from <- c("1", "2", "4") to <- c("2", "3", "5") edges <- data.frame(from = from, to = to, stringsAsFactors = FALSE) nodes <- data.frame(name = c("1", "2", "3", "4", "5"), stringsAsFactors = FALSE) df_to_graph(edges, nodes)
from <- c("1", "2", "4") to <- c("2", "3", "5") edges <- data.frame(from = from, to = to, stringsAsFactors = FALSE) nodes <- data.frame(name = c("1", "2", "3", "4", "5"), stringsAsFactors = FALSE) df_to_graph(edges, nodes)
dmix2
returns the PMF at x for the 2-component discrete extreme value mixture distribution. The components below and above the threshold u are the (truncated) Zipf-polylog(alpha,theta) and the generalised Pareto(shape, sigma) distributions, respectively.
dmix2(x, u, alpha, theta, shape, sigma, phiu)
dmix2(x, u, alpha, theta, shape, sigma, phiu)
x |
Vector of positive integers |
u |
Positive integer representing the threshold |
alpha |
Real number, first parameter of the Zipf-polylog component |
theta |
Real number in (0, 1], second parameter of the Zipf-polylog component |
shape |
Real number, shape parameter of the generalised Pareto component |
sigma |
Real number, scale parameter of the generalised Pareto component |
phiu |
Real number in (0, 1), exceedance rate of the threshold u |
A numeric vector of the same length as x
Smix2
for the corresponding survival function, dpol
and dmix3
for the PMFs of the Zipf-polylog and 3-component discrete extreme value mixture distributions, respectively.
dmix3
returns the PMF at x for the 3-component discrete extreme value mixture distribution. The component below v is the (truncated) Zipf-polylog(alpha1,theta1) distribution, between v & u the (truncated) Zipf-polylog(alpha2,theta2) distribution, and above u the generalised Pareto(shape, sigma) distribution.
dmix3(x, v, u, alpha1, theta1, alpha2, theta2, shape, sigma, phi1, phi2, phiu)
dmix3(x, v, u, alpha1, theta1, alpha2, theta2, shape, sigma, phi1, phi2, phiu)
x |
Vector of positive integers |
v |
Positive integer representing the lower threshold |
u |
Positive integer representing the upper threshold |
alpha1 |
Real number, first parameter of the Zipf-polylog component below v |
theta1 |
Real number in (0, 1], second parameter of the Zipf-polylog component below v |
alpha2 |
Real number, first parameter of the Zipf-polylog component between v & u |
theta2 |
Real number in (0, 1], second parameter of the Zipf-polylog component between v & u |
shape |
Real number, shape parameter of the generalised Pareto component |
sigma |
Real number, scale parameter of the generalised Pareto component |
phi1 |
Real number in (0, 1), proportion of values below v |
phi2 |
Real number in (0, 1), proportion of values between v & u |
phiu |
Real number in (0, 1), exceedance rate of the threshold u |
A numeric vector of the same length as x
Smix3
for the corresponding survival function, dpol
and dmix2
for the PMFs of the Zipf-polylog and 2-component discrete extreme value mixture distributions, respectively.
dpol
returns the PMF at x for the Zipf-polylog distribution with parameters (alpha, theta). The distribution is reduced to the discrete power law when theta = 1.
dpol(x, alpha, theta, x_max = 100000L)
dpol(x, alpha, theta, x_max = 100000L)
x |
Vector of positive integers |
alpha |
Real number greater than 1 |
theta |
Real number in (0, 1] |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
The PMF is proportional to x^(-alpha) * theta^x. It is normalised in order to be a proper PMF.
A numeric vector of the same length as x
Spol
for the corresponding survival function, dmix2
and dmix3
for the PMFs of the 2-component and 3-component discrete extreme value mixture distributions, respectively.
dpol(c(1,2,3,4,5), 1.2, 0.5)
dpol(c(1,2,3,4,5), 1.2, 0.5)
get_dep
returns a data frame of multiple types of dependencies of a package
get_dep(name, type, reverse = FALSE)
get_dep(name, type, reverse = FALSE)
name |
String, name of the package |
type |
A character vector that contains one or more of the following dependency words: "Depends", "Imports", "LinkingTo", "Suggests", "Enhances", up to letter case and space replaced by underscore. Alternatively, if 'type = "all"', all five dependencies will be obtained. |
reverse |
Boolean, whether forward (FALSE, default) or reverse (TRUE) dependencies are requested. |
A data frame of dependencies
get_dep_all_packages
for the dependencies of all CRAN packages, and get_graph_all_packages
for obtaining directly a network of dependencies as an igraph object
get_dep("dplyr", c("Imports", "Depends")) get_dep("MASS", c("Suggests", "Depends", "Imports"), TRUE)
get_dep("dplyr", c("Imports", "Depends")) get_dep("MASS", c("Suggests", "Depends", "Imports"), TRUE)
get_dep_all_packages
returns the data frame of dependencies of all packages currently available on CRAN.
get_dep_all_packages()
get_dep_all_packages()
A data frame of dependencies of all CRAN packages
get_dep
for multiple types of dependencies, and get_graph_all_packages
for obtaining directly a network of dependencies as an igraph object
## Not run: df.cran <- get_dep_all_packages() ## End(Not run)
## Not run: df.cran <- get_dep_all_packages() ## End(Not run)
get_graph_all_packages
returns an igraph object representing the network of one or more types of dependencies of all CRAN packages.
get_graph_all_packages(type, gc = TRUE, reverse = FALSE)
get_graph_all_packages(type, gc = TRUE, reverse = FALSE)
type |
A character vector that contains one or more of the following dependency words: "Depends", "Imports", "LinkingTo", "Suggests", "Enhances", up to letter case and space replaced by underscore. Alternatively, if 'types = "all"', all five dependencies will be obtained. |
gc |
Boolean, if 'TRUE' (default) then the giant component is extracted, if 'FALSE' then the whole graph is returned |
reverse |
Boolean, whether forward (FALSE, default) or reverse (TRUE) dependencies are requested. |
An igraph object & a connected graph if gc is 'TRUE'
get_dep_all_packages
for the dependencies of all CRAN packages in a data frame, and df_to_graph
for constructing the giant component of the network from two data frames
## Not run: g0.cran.depends <- get_graph_all_packages("depends") g1.cran.imports <- get_graph_all_packages("imports", reverse = TRUE) ## End(Not run)
## Not run: g0.cran.depends <- get_graph_all_packages("depends") g1.cran.imports <- get_graph_all_packages("imports", reverse = TRUE) ## End(Not run)
Marginal log-likelihood and posterior density of discrete power law via numerical integration
marg_pow(df, lower, upper, m_alpha = 0, s_alpha = 10, by = 0.001)
marg_pow(df, lower, upper, m_alpha = 0, s_alpha = 10, by = 0.001)
df |
A data frame with at least two columns, x & count |
lower |
Real number greater than 1, lower limit for numerical integration |
upper |
Real number greater than lower, upper limit for numerical integration |
m_alpha |
Real number (default 0.0), mean of the prior normal distribution for alpha |
s_alpha |
Positive real number (default 10.0), standard deviation of the prior normal distribution for alpha |
by |
Positive real number, the width of subintervals between lower and upper, for numerical integration and posterior density evaluation |
A list: log_marginal
is the marginal log-likelihood, posterior
is a data frame of non-zero posterior densities
mcmc_mix1
returns the posterior samples of the parameters, for fitting the TZP-power-law mixture distribution. The samples are obtained using Markov chain Monte Carlo (MCMC).
mcmc_mix1( x, count, u_set, u, alpha1, theta1, alpha2, a_psiu, b_psiu, a_alpha1, b_alpha1, a_theta1, b_theta1, a_alpha2, b_alpha2, positive, iter, thin, burn, freq, invt, mc3_or_marg, x_max )
mcmc_mix1( x, count, u_set, u, alpha1, theta1, alpha2, a_psiu, b_psiu, a_alpha1, b_alpha1, a_theta1, b_theta1, a_alpha2, b_alpha2, positive, iter, thin, burn, freq, invt, mc3_or_marg, x_max )
x |
Vector of the unique values (positive integers) of the data |
count |
Vector of the same length as x that contains the counts of each unique value in the full data, which is essentially rep(x, count) |
u_set |
Positive integer vector of the values u will be sampled from |
u |
Positive integer, initial value of the threshold |
alpha1 |
Real number, initial value of the parameter |
theta1 |
Real number in (0, 1], initial value of the parameter |
alpha2 |
Real number greater than 1, initial value of the parameter |
a_psiu , b_psiu , a_alpha1 , b_alpha1 , a_theta1 , b_theta1 , a_alpha2 , b_alpha2
|
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
positive |
Boolean, is alpha positive (TRUE) or unbounded (FALSE)? |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invt |
Vector of the inverse temperatures for Metropolis-coupled MCMC |
mc3_or_marg |
Boolean, is invt for parallel tempering / Metropolis-coupled MCMC (TRUE, default) or marginal likelihood via power posterior (FALSE)? |
x_max |
Scalar, positive integer limit for computing the normalising constant |
In the MCMC, a componentwise Metropolis-Hastings algorithm is used. The threshold u is treated as a parameter and therefore sampled. The hyperparameters are used in the following priors: u is such that the implied unique exceedance probability psiu ~ Uniform(a_psi, b_psi); alpha1 ~ Normal(mean = a_alpha1, sd = b_alpha1); theta1 ~ Beta(a_theta1, b_theta1); alpha2 ~ Normal(mean = a_alpha2, sd = b_alpha2)
A list: $pars is a data frame of iter rows of the MCMC samples, $fitted is a data frame of length(x) rows with the fitted values, amongst other quantities related to the MCMC
mcmc_pol
, mcmc_mix2
and mcmc_mix3
for MCMC for the Zipf-polylog, and 2-component and 3-component discrete extreme value mixture distributions, respectively.
Wrapper of mcmc_mix1
mcmc_mix1_wrapper( df, seed, u_max = 2000L, log_diff_max = 11, a_psiu = 0.1, b_psiu = 0.9, m_alpha1 = 0, s_alpha1 = 10, a_theta1 = 1, b_theta1 = 1, m_alpha2 = 0, s_alpha2 = 10, positive = FALSE, iter = 20000L, thin = 1L, burn = 10000L, freq = 100L, invts = 1, mc3_or_marg = TRUE, x_max = 1e+05 )
mcmc_mix1_wrapper( df, seed, u_max = 2000L, log_diff_max = 11, a_psiu = 0.1, b_psiu = 0.9, m_alpha1 = 0, s_alpha1 = 10, a_theta1 = 1, b_theta1 = 1, m_alpha2 = 0, s_alpha2 = 10, positive = FALSE, iter = 20000L, thin = 1L, burn = 10000L, freq = 100L, invts = 1, mc3_or_marg = TRUE, x_max = 1e+05 )
df |
A data frame with at least two columns, x & count |
seed |
Integer for |
u_max |
Scalar (default 2000), positive integer for the maximum threshold to be passed to |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
a_psiu , b_psiu , m_alpha1 , s_alpha1 , a_theta1 , b_theta1 , m_alpha2 , s_alpha2
|
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
positive |
Boolean, is alpha1 positive (TRUE) or unbounded (FALSE)? |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invts |
Vector of the inverse temperatures for Metropolis-coupled MCMC (if mc3_or_marg = TRUE) or power posterior (if mc3_or_marg = FALSE) |
mc3_or_marg |
Boolean, is Metropolis-coupled MCMC to be used? Ignored if invts = c(1.0) |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
A list returned by mcmc_mix1
mcmc_mix2
returns the posterior samples of the parameters, for fitting the 2-component discrete extreme value mixture distribution. The samples are obtained using Markov chain Monte Carlo (MCMC).
mcmc_mix2( x, count, u_set, u, alpha, theta, shape, sigma, a_psiu, b_psiu, a_alpha, b_alpha, a_theta, b_theta, m_shape, s_shape, a_sigma, b_sigma, positive, a_pseudo, b_pseudo, pr_power, iter, thin, burn, freq, invt, mc3_or_marg = TRUE, constrained = FALSE )
mcmc_mix2( x, count, u_set, u, alpha, theta, shape, sigma, a_psiu, b_psiu, a_alpha, b_alpha, a_theta, b_theta, m_shape, s_shape, a_sigma, b_sigma, positive, a_pseudo, b_pseudo, pr_power, iter, thin, burn, freq, invt, mc3_or_marg = TRUE, constrained = FALSE )
x |
Vector of the unique values (positive integers) of the data |
count |
Vector of the same length as x that contains the counts of each unique value in the full data, which is essentially rep(x, count) |
u_set |
Positive integer vector of the values u will be sampled from |
u |
Positive integer, initial value of the threshold |
alpha |
Real number greater than 1, initial value of the parameter |
theta |
Real number in (0, 1], initial value of the parameter |
shape |
Real number, initial value of the parameter |
sigma |
Positive real number, initial value of the parameter |
a_psiu , b_psiu , a_alpha , b_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma
|
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
positive |
Boolean, is alpha positive (TRUE) or unbounded (FALSE)? Ignored if constrained is TRUE |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
pr_power |
Real number in [0, 1], prior probability of the discrete power law (below u). Overridden if constrained is TRUE |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invt |
Vector of the inverse temperatures for Metropolis-coupled MCMC |
mc3_or_marg |
Boolean, is invt for parallel tempering / Metropolis-coupled MCMC (TRUE, default) or marginal likelihood via power posterior (FALSE)? |
constrained |
Boolean, are alpha & shape constrained such that 1/shape+1 > alpha > 1 with the powerlaw assumed in the body & "continuity" at the threshold u (TRUE), or is there no constraint between alpha & shape, with the former governed by positive, and no powerlaw and continuity enforced (FALSE, default)? |
In the MCMC, a componentwise Metropolis-Hastings algorithm is used. The threshold u is treated as a parameter and therefore sampled. The hyperparameters are used in the following priors: u is such that the implied unique exceedance probability psiu ~ Uniform(a_psi, b_psi); alpha ~ Normal(mean = a_alpha, sd = b_alpha); theta ~ Beta(a_theta, b_theta); shape ~ Normal(mean = m_shape, sd = s_shape); sigma ~ Gamma(a_sigma, scale = b_sigma). If pr_power = 1.0, the discrete power law (below u) is assumed, and the samples of theta will be all 1.0. If pr_power is in (0.0, 1.0), model selection between the polylog distribution and the discrete power law will be performed within the MCMC.
A list: $pars is a data frame of iter rows of the MCMC samples, $fitted is a data frame of length(x) rows with the fitted values, amongst other quantities related to the MCMC
mcmc_pol
and mcmc_mix3
for MCMC for the Zipf-polylog and 3-component discrete extreme value mixture distributions, respectively.
Wrapper of mcmc_mix2
mcmc_mix2_wrapper( df, seed, u_max = 2000L, log_diff_max = 11, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01, a_pseudo = 10, b_pseudo = 1, pr_power = 0.5, positive = FALSE, iter = 20000L, thin = 20L, burn = 100000L, freq = 1000L, invts = 1, mc3_or_marg = TRUE, constrained = FALSE )
mcmc_mix2_wrapper( df, seed, u_max = 2000L, log_diff_max = 11, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01, a_pseudo = 10, b_pseudo = 1, pr_power = 0.5, positive = FALSE, iter = 20000L, thin = 20L, burn = 100000L, freq = 1000L, invts = 1, mc3_or_marg = TRUE, constrained = FALSE )
df |
A data frame with at least two columns, x & count |
seed |
Integer for |
u_max |
Scalar (default 2000), positive integer for the maximum threshold to be passed to |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma
|
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
pr_power |
Real number in [0, 1], prior probability of the discrete power law (below u) |
positive |
Boolean, is alpha positive (TRUE) or unbounded (FALSE)? |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invts |
Vector of the inverse temperatures for Metropolis-coupled MCMC (if mc3_or_marg = TRUE) or power posterior (if mc3_or_marg = FALSE) |
mc3_or_marg |
Boolean, is Metropolis-coupled MCMC to be used? Ignored if invts = c(1.0) |
constrained |
Boolean, are alpha & shape constrained such that 1/shape+1 > alpha > 1 with the powerlaw assumed in the body & "continuity" at the threshold u (TRUE), or is there no constraint between alpha & shape, with the former governed by positive, and no powerlaw and continuity enforced (FALSE, default)? |
A list returned by mcmc_mix2
mcmc_mix3
returns the posterior samples of the parameters, for fitting the 3-component discrete extreme value mixture distribution. The samples are obtained using Markov chain Monte Carlo (MCMC).
mcmc_mix3( x, count, v_set, u_set, v, u, alpha1, theta1, alpha2, theta2, shape, sigma, a_psi1, a_psi2, a_psiu, b_psiu, a_alpha1, b_alpha1, a_theta1, b_theta1, a_alpha2, b_alpha2, a_theta2, b_theta2, m_shape, s_shape, a_sigma, b_sigma, powerlaw1, positive1, positive2, a_pseudo, b_pseudo, pr_power2, iter, thin, burn, freq, invt, mc3_or_marg = TRUE )
mcmc_mix3( x, count, v_set, u_set, v, u, alpha1, theta1, alpha2, theta2, shape, sigma, a_psi1, a_psi2, a_psiu, b_psiu, a_alpha1, b_alpha1, a_theta1, b_theta1, a_alpha2, b_alpha2, a_theta2, b_theta2, m_shape, s_shape, a_sigma, b_sigma, powerlaw1, positive1, positive2, a_pseudo, b_pseudo, pr_power2, iter, thin, burn, freq, invt, mc3_or_marg = TRUE )
x |
Vector of the unique values (positive integers) of the data |
count |
Vector of the same length as x that contains the counts of each unique value in the full data, which is essentially rep(x, count) |
v_set |
Positive integer vector of the values v will be sampled from |
u_set |
Positive integer vector of the values u will be sampled from |
v |
Positive integer, initial value of the lower threshold |
u |
Positive integer, initial value of the upper threshold |
alpha1 |
Real number greater than 1, initial value of the parameter |
theta1 |
Real number in (0, 1], initial value of the parameter |
alpha2 |
Real number greater than 1, initial value of the parameter |
theta2 |
Real number in (0, 1], initial value of the parameter |
shape |
Real number, initial value of the parameter |
sigma |
Positive real number, initial value of the parameter |
a_psi1 , a_psi2 , a_psiu , b_psiu , a_alpha1 , b_alpha1 , a_theta1 , b_theta1 , a_alpha2 , b_alpha2 , a_theta2 , b_theta2 , m_shape , s_shape , a_sigma , b_sigma
|
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
powerlaw1 |
Boolean, is the discrete power law assumed for below v? |
positive1 |
Boolean, is alpha1 positive (TRUE) or unbounded (FALSE)? |
positive2 |
Boolean, is alpha2 positive (TRUE) or unbounded (FALSE)? |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta2 in model selection; ignored if pr_power2 = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta2 in model selection; ignored if pr_power2 = 1.0 |
pr_power2 |
Real number in [0, 1], prior probability of the discrete power law (between v and u) |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invt |
Vector of the inverse temperatures for Metropolis-coupled MCMC |
mc3_or_marg |
Boolean, is invt for parallel tempering / Metropolis-coupled MCMC (TRUE, default) or marginal likelihood via power posterior (FALSE)? |
In the MCMC, a componentwise Metropolis-Hastings algorithm is used. The thresholds v and u are treated as parameters and therefore sampled. The hyperparameters are used in the following priors: psi1 / (1.0 - psiu) ~ Beta(a_psi1, a_psi2); u is such that the implied unique exceedance probability psiu ~ Uniform(a_psi, b_psi); alpha1 ~ Normal(mean = a_alpha1, sd = b_alpha1); theta1 ~ Beta(a_theta1, b_theta1); alpha2 ~ Normal(mean = a_alpha2, sd = b_alpha2); theta2 ~ Beta(a_theta2, b_theta2); shape ~ Normal(mean = m_shape, sd = s_shape); sigma ~ Gamma(a_sigma, scale = b_sigma). If pr_power2 = 1.0, the discrete power law (between v and u) is assumed, and the samples of theta2 will be all 1.0. If pr_power2 is in (0.0, 1.0), model selection between the polylog distribution and the discrete power law will be performed within the MCMC.
A list: $pars is a data frame of iter rows of the MCMC samples, $fitted is a data frame of length(x) rows with the fitted values, amongst other quantities related to the MCMC
mcmc_pol
and mcmc_mix2
for MCMC for the Zipf-polylog and 2-component discrete extreme value mixture distributions, respectively.
Wrapper of mcmc_mix3
mcmc_mix3_wrapper( df, seed, v_max = 100L, u_max = 2000L, log_diff_max = 11, a_psi1 = 1, a_psi2 = 1, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01, a_pseudo = 10, b_pseudo = 1, pr_power2 = 0.5, powerlaw1 = FALSE, positive1 = FALSE, positive2 = TRUE, iter = 20000L, thin = 20L, burn = 100000L, freq = 1000L, invts = 1, mc3_or_marg = TRUE )
mcmc_mix3_wrapper( df, seed, v_max = 100L, u_max = 2000L, log_diff_max = 11, a_psi1 = 1, a_psi2 = 1, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01, a_pseudo = 10, b_pseudo = 1, pr_power2 = 0.5, powerlaw1 = FALSE, positive1 = FALSE, positive2 = TRUE, iter = 20000L, thin = 20L, burn = 100000L, freq = 1000L, invts = 1, mc3_or_marg = TRUE )
df |
A data frame with at least two columns, x & count |
seed |
Integer for |
v_max |
Scalar (default 100), positive integer for the maximum lower threshold to be passed to |
u_max |
Scalar (default 2000), positive integer for the maximum upper threshold to be passed to |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
a_psi1 , a_psi2 , a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma
|
Scalars, real numbers representing the hyperparameters of the prior distributions for the respective parameters. See details for the specification of the priors. |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta2 in model selection; ignored if pr_power2 = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta2 in model selection; ignored if pr_power2 = 1.0 |
pr_power2 |
Real number in [0, 1], prior probability of the discrete power law (between v and u) |
powerlaw1 |
Boolean, is the discrete power law assumed for below v? |
positive1 |
Boolean, is alpha1 positive (TRUE) or unbounded (FALSE)? |
positive2 |
Boolean, is alpha2 positive (TRUE) or unbounded (FALSE)? |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invts |
Vector of the inverse temperatures for Metropolis-coupled MCMC (if mc3_or_marg = TRUE) or power posterior (if mc3_or_marg = FALSE) |
mc3_or_marg |
Boolean, is Metropolis-coupled MCMC to be used? Ignored if invts = c(1.0) |
A list returned by mcmc_mix3
mcmc_pol
returns the samples from the posterior of alpha and theta, for fitting the Zipf-polylog distribution to the data x. The samples are obtained using Markov chain Monte Carlo (MCMC). In the MCMC, a Metropolis-Hastings algorithm is used.
mcmc_pol( x, count, alpha, theta, a_alpha, b_alpha, a_theta, b_theta, a_pseudo, b_pseudo, pr_power, iter, thin, burn, freq, invt, mc3_or_marg, x_max )
mcmc_pol( x, count, alpha, theta, a_alpha, b_alpha, a_theta, b_theta, a_pseudo, b_pseudo, pr_power, iter, thin, burn, freq, invt, mc3_or_marg, x_max )
x |
Vector of the unique values (positive integers) of the data |
count |
Vector of the same length as x that contains the counts of each unique value in the full data, which is essentially rep(x, count) |
alpha |
Real number greater than 1, initial value of the parameter |
theta |
Real number in (0, 1], initial value of the parameter |
a_alpha |
Real number, mean of the prior normal distribution for alpha |
b_alpha |
Positive real number, standard deviation of the prior normal distribution for alpha |
a_theta |
Positive real number, first parameter of the prior beta distribution for theta; ignored if pr_power = 1.0 |
b_theta |
Positive real number, second parameter of the prior beta distribution for theta; ignored if pr_power = 1.0 |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
pr_power |
Real number in [0, 1], prior probability of the discrete power law |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invt |
Vector of the inverse temperatures for Metropolis-coupled MCMC |
mc3_or_marg |
Boolean, is invt for parallel tempering / Metropolis-coupled MCMC (TRUE, default) or marginal likelihood via power posterior (FALSE)? |
x_max |
Scalar, positive integer limit for computing the normalising constant |
A list: $pars is a data frame of iter rows of the MCMC samples, $fitted is a data frame of length(x) rows with the fitted values, amongst other quantities related to the MCMC
mcmc_mix2
and mcmc_mix3
for MCMC for the 2-component and 3-component discrete extreme value mixture distributions, respectively.
Wrapper of mcmc_pol
mcmc_pol_wrapper( df, seed, alpha_init = 1.5, theta_init = 0.5, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, a_pseudo = 10, b_pseudo = 1, pr_power = 0.5, iter = 20000L, thin = 20L, burn = 100000L, freq = 1000L, invts = 1, mc3_or_marg = TRUE, x_max = 1e+05 )
mcmc_pol_wrapper( df, seed, alpha_init = 1.5, theta_init = 0.5, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, a_pseudo = 10, b_pseudo = 1, pr_power = 0.5, iter = 20000L, thin = 20L, burn = 100000L, freq = 1000L, invts = 1, mc3_or_marg = TRUE, x_max = 1e+05 )
df |
A data frame with at least two columns, x & count |
seed |
Integer for |
alpha_init |
Real number greater than 1, initial value of the parameter |
theta_init |
Real number in (0, 1], initial value of the parameter |
m_alpha |
Real number, mean of the prior normal distribution for alpha |
s_alpha |
Positive real number, standard deviation of the prior normal distribution for alpha |
a_theta |
Positive real number, first parameter of the prior beta distribution for theta; ignored if pr_power = 1.0 |
b_theta |
Positive real number, second parameter of the prior beta distribution for theta; ignored if pr_power = 1.0 |
a_pseudo |
Positive real number, first parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
b_pseudo |
Positive real number, second parameter of the pseudoprior beta distribution for theta in model selection; ignored if pr_power = 1.0 |
pr_power |
Real number in [0, 1], prior probability of the discrete power law |
iter |
Positive integer representing the length of the MCMC output |
thin |
Positive integer representing the thinning in the MCMC |
burn |
Non-negative integer representing the burn-in of the MCMC |
freq |
Positive integer representing the frequency of the sampled values being printed |
invts |
Vector of the inverse temperatures for Metropolis-coupled MCMC (if mc3_or_marg = TRUE) or power posterior (if mc3_or_marg = FALSE) |
mc3_or_marg |
Boolean, is Metropolis-coupled MCMC to be used? Ignored if invts = c(1.0) |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
A list returned by mcmc_pol
obtain_u_set_mix1
computes the profile posterior density of the threshold u, and subsets the thresholds (and other parameter values) with high profile values i.e. within a certain value from the maximum posterior density. The set of u can then be used for mcmc_mix1
.
obtain_u_set_mix1( df, positive = FALSE, u_max = 2000L, log_diff_max = 11, alpha1_init = 0.01, theta1_init = exp(-1), alpha2_init = 2, a_psiu = 0.1, b_psiu = 0.9, m_alpha1 = 0, s_alpha1 = 10, a_theta1 = 1, b_theta1 = 1, m_alpha2 = 0, s_alpha2 = 10, x_max = 1e+05 )
obtain_u_set_mix1( df, positive = FALSE, u_max = 2000L, log_diff_max = 11, alpha1_init = 0.01, theta1_init = exp(-1), alpha2_init = 2, a_psiu = 0.1, b_psiu = 0.9, m_alpha1 = 0, s_alpha1 = 10, a_theta1 = 1, b_theta1 = 1, m_alpha2 = 0, s_alpha2 = 10, x_max = 1e+05 )
df |
A data frame with at least two columns, x & count |
positive |
Boolean, is alpha1 positive (TRUE) or unbounded (FALSE, default)? |
u_max |
Positive integer for the maximum threshold |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
alpha1_init |
Scalar, initial value of alpha1 |
theta1_init |
Scalar, initial value of theta1 |
alpha2_init |
Scalar, initial value of alpha2 |
a_psiu , b_psiu , m_alpha1 , s_alpha1 , a_theta1 , b_theta1 , m_alpha2 , s_alpha2
|
Scalars, hyperparameters of the priors for the parameters |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
A list: u_set
is the vector of thresholds with high posterior density, init
is the data frame with the maximum profile posterior density and associated parameter values, profile
is the data frame with all thresholds with high posterior density and associated parameter values, scalars
is the data frame with all arguments (except df)
mcmc_mix1_wrapper
that wraps obtain_u_set_mix1
and mcmc_mix1
, obtain_u_set_mix2
for the equivalent function for the 2-component mixture model
obtain_u_set_mix2
computes the profile posterior density of the threshold u, and subsets the thresholds (and other parameter values) with high profile values i.e. within a certain value from the maximum posterior density. The set of u can then be used for mcmc_mix2
.
obtain_u_set_mix2( df, powerlaw = FALSE, positive = FALSE, u_max = 2000L, log_diff_max = 11, alpha_init = 0.01, theta_init = exp(-1), shape_init = 0.1, sigma_init = 1, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01 )
obtain_u_set_mix2( df, powerlaw = FALSE, positive = FALSE, u_max = 2000L, log_diff_max = 11, alpha_init = 0.01, theta_init = exp(-1), shape_init = 0.1, sigma_init = 1, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01 )
df |
A data frame with at least two columns, x & count |
powerlaw |
Boolean, is the power law (TRUE) or polylogarithm (FALSE, default) assumed? |
positive |
Boolean, is alpha positive (TRUE) or unbounded (FALSE, default)? |
u_max |
Positive integer for the maximum threshold |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
alpha_init |
Scalar, initial value of alpha |
theta_init |
Scalar, initial value of theta |
shape_init |
Scalar, initial value of shape parameter |
sigma_init |
Scalar, initial value of sigma |
a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma
|
Scalars, hyperparameters of the priors for the parameters |
A list: u_set
is the vector of thresholds with high posterior density, init
is the data frame with the maximum profile posterior density and associated parameter values, profile
is the data frame with all thresholds with high posterior density and associated parameter values, scalars
is the data frame with all arguments (except df)
mcmc_mix2_wrapper
that wraps obtain_u_set_mix2
and mcmc_mix2
, obtain_u_set_mix1
for the equivalent function for the TZP-power-law mixture model
obtain_u_set_mix2_constrained
computes the profile posterior density of the threshold u, and subsets the thresholds (and other parameter values) with high profile values i.e. within a certain value from the maximum posterior density. The set of u can then be used for mcmc_mix2
. Power law is assumed for the body, and alpha is assumed to be greater than 1.0 and smaller than 1.0/shape+1.0
obtain_u_set_mix2_constrained( df, u_max = 2000L, log_diff_max = 11, alpha_init = 2, shape_init = 0.1, sigma_init = 1, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01 )
obtain_u_set_mix2_constrained( df, u_max = 2000L, log_diff_max = 11, alpha_init = 2, shape_init = 0.1, sigma_init = 1, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01 )
df |
A data frame with at least two columns, x & count |
u_max |
Positive integer for the maximum threshold |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
alpha_init |
Scalar, initial value of alpha |
shape_init |
Scalar, initial value of shape parameter |
sigma_init |
Scalar, initial value of sigma |
a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma
|
Scalars, hyperparameters of the priors for the parameters |
A list: u_set
is the vector of thresholds with high posterior density, init
is the data frame with the maximum profile posterior density and associated parameter values, profile
is the data frame with all thresholds with high posterior density and associated parameter values, scalars
is the data frame with all arguments (except df)
obtain_u_set_mix2
for the unconstrained version
obtain_u_set_mix3
computes the profile posterior density of the thresholds v & u, and subsets the thresholds (and other parameter values) with high profile values i.e. within a certain value from the maximum posterior density. The sets of v & u can then be used for mcmc_mix3
.
obtain_u_set_mix3( df, powerlaw1 = FALSE, powerlaw2 = FALSE, positive1 = FALSE, positive2 = TRUE, log_diff_max = 11, v_max = 100L, u_max = 2000L, alpha_init = 0.01, theta_init = exp(-1), shape_init = 1, sigma_init = 1, a_psi1 = 1, a_psi2 = 1, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01 )
obtain_u_set_mix3( df, powerlaw1 = FALSE, powerlaw2 = FALSE, positive1 = FALSE, positive2 = TRUE, log_diff_max = 11, v_max = 100L, u_max = 2000L, alpha_init = 0.01, theta_init = exp(-1), shape_init = 1, sigma_init = 1, a_psi1 = 1, a_psi2 = 1, a_psiu = 0.001, b_psiu = 0.9, m_alpha = 0, s_alpha = 10, a_theta = 1, b_theta = 1, m_shape = 0, s_shape = 10, a_sigma = 1, b_sigma = 0.01 )
df |
A data frame with at least two columns, degree & count |
powerlaw1 |
Boolean, is the power law (TRUE) or polylogarithm (FALSE, default) assumed for the left tail? |
powerlaw2 |
Boolean, is the power law (TRUE) or polylogarithm (FALSE, default) assumed for the middle bulk? |
positive1 |
Boolean, is alpha positive (TRUE) or unbounded (FALSE, default) for the left tail? |
positive2 |
Boolean, is alpha positive (TRUE) or unbounded (FALSE, default) for the middle bulk? |
log_diff_max |
Positive real number, the value such that thresholds with profile posterior density not less than the maximum posterior density - |
v_max |
Positive integer for the maximum lower threshold |
u_max |
Positive integer for the maximum upper threshold |
alpha_init |
Scalar, initial value of alpha |
theta_init |
Scalar, initial value of theta |
shape_init |
Scalar, initial value of shape parameter |
sigma_init |
Scalar, initial value of sigma |
a_psi1 , a_psi2 , a_psiu , b_psiu , m_alpha , s_alpha , a_theta , b_theta , m_shape , s_shape , a_sigma , b_sigma
|
Scalars, hyperparameters of the priors for the parameters |
A list: v_set
is the vector of lower thresholds with high posterior density, u_set
is the vector of upper thresholds with high posterior density, init
is the data frame with the maximum profile posterior density and associated parameter values, profile
is the data frame with all thresholds with high posterior density and associated parameter values, scalars
is the data frame with all arguments (except df)
mcmc_mix3_wrapper
that wraps obtain_u_set_mix3
and mcmc_mix3
Smix2
returns the survival function at x for the 2-component discrete extreme value mixture distribution. The components below and above the threshold u are the (truncated) Zipf-polylog(alpha,theta) and the generalised Pareto(shape, sigma) distributions, respectively.
Smix2(x, u, alpha, theta, shape, sigma, phiu)
Smix2(x, u, alpha, theta, shape, sigma, phiu)
x |
Vector of positive integers |
u |
Positive integer representing the threshold |
alpha |
Real number, first parameter of the Zipf-polylog component |
theta |
Real number in (0, 1], second parameter of the Zipf-polylog component |
shape |
Real number, shape parameter of the generalised Pareto component |
sigma |
Real number, scale parameter of the generalised Pareto component |
phiu |
Real number in (0, 1), exceedance rate of the threshold u |
A numeric vector of the same length as x
dmix2
for the corresponding probability mass function, Spol
and Smix3
for the survival functions of the Zipf-polylog and 3-component discrete extreme value mixture distributions, respectively.
Smix3
returns the survival function at x for the 3-component discrete extreme value mixture distribution. The component below v is the (truncated) Zipf-polylog(alpha1,theta1) distribution, between v & u the (truncated) Zipf-polylog(alpha2,theta2) distribution, and above u the generalised Pareto(shape, sigma) distribution.
Smix3(x, v, u, alpha1, theta1, alpha2, theta2, shape, sigma, phi1, phi2, phiu)
Smix3(x, v, u, alpha1, theta1, alpha2, theta2, shape, sigma, phi1, phi2, phiu)
x |
Vector of positive integers |
v |
Positive integer representing the lower threshold |
u |
Positive integer representing the upper threshold |
alpha1 |
Real number, first parameter of the Zipf-polylog component below v |
theta1 |
Real number in (0, 1], second parameter of the Zipf-polylog component below v |
alpha2 |
Real number, first parameter of the Zipf-polylog component between v & u |
theta2 |
Real number in (0, 1], second parameter of the Zipf-polylog component between v & u |
shape |
Real number, shape parameter of the generalised Pareto component |
sigma |
Real number, scale parameter of the generalised Pareto component |
phi1 |
Real number in (0, 1), proportion of values below v |
phi2 |
Real number in (0, 1), proportion of values between v & u |
phiu |
Real number in (0, 1), exceedance rate of the threshold u |
A numeric vector of the same length as x
dmix3
for the corresponding probability mass function, Spol
and Smix2
for the survival functions of the Zipf-polylog and 2-component discrete extreme value mixture distributions, respectively.
Spol
returns the survival function at x for the Zipf-polylog distribution with parameters (alpha, theta). The distribution is reduced to the discrete power law when theta = 1.
Spol(x, alpha, theta, x_max = 100000L)
Spol(x, alpha, theta, x_max = 100000L)
x |
Vector of positive integers |
alpha |
Real number greater than 1 |
theta |
Real number in (0, 1] |
x_max |
Scalar (default 100000), positive integer limit for computing the normalising constant |
A numeric vector of the same length as x
dpol
for the corresponding probability mass function, Smix2
and Smix3
for the survival functions of the 2-component and 3-component discrete extreme value mixture distributions, respectively.
Spol(c(1,2,3,4,5), 1.2, 0.5)
Spol(c(1,2,3,4,5), 1.2, 0.5)
Return a sorted vector of nodes id
topo_sort_kahn(g, random = FALSE)
topo_sort_kahn(g, random = FALSE)
g |
An igraph object of a DAG |
random |
Boolean, whether the order of selected nodes is randomised in the process |
A data frame with two columns: "id" is the names of nodes in g, and "id_num" is the topological ordering
df0 <- data.frame(from = c("a", "b"), to = c("b", "c"), stringsAsFactors = FALSE) g0 <- igraph::graph_from_data_frame(df0, directed = TRUE) topo_sort_kahn(g0)
df0 <- data.frame(from = c("a", "b"), to = c("b", "c"), stringsAsFactors = FALSE) g0 <- igraph::graph_from_data_frame(df0, directed = TRUE) topo_sort_kahn(g0)