Title: | Bayesian Fertility Projection |
---|---|
Description: | Making probabilistic projections of total fertility rate for all countries of the world, using a Bayesian hierarchical model <doi:10.1007/s13524-011-0040-5> <doi:10.18637/jss.v106.i08>. Subnational probabilistic projections are also supported <doi:10.4054/DemRes.2018.38.60>. |
Authors: | Hana Sevcikova [cre, aut], Leontine Alkema [aut], Peiran Liu [aut], Adrian Raftery [aut], Bailey Fosdick [aut], Patrick Gerland [aut] |
Maintainer: | Hana Sevcikova <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 7.4-4 |
Built: | 2024-12-11 06:44:19 UTC |
Source: | CRAN |
Collection of functions for making probabilistic projections of total fertility rate (TFR) for all countries of the world, using a Bayesian hierarchical model (BHM) and the United Nations demographic time series. Functions for subnational projections are also available.
The projection follows a method developed by Alkema et al. (2011) and Raftery et al (2014). It uses historical data provided by the United Nations to simulate a posterior distribution of total fertility rates for all countries in the world simultaneously.
The estimation is split into two parts:
BHM for fertility in a transition phase (Phase II), as described in Alkema et al. (2011),
BHM for fertility in a post-transition phase (Phase III), as described in Raftery et al (2013).
The second part is optional and can be replaced by a simple AR(1) process.
In addition, the package allows to assess uncertainty about the past (Liu and Rafftery 2020). Estimation and projection can be performed either for 5-year or 1-year time intervals.
The main functions of the package are:
run.tfr.mcmc: Evokes running a Markov Chain Monte Carlo (MCMC) simulation for TFR in Phase II using one or more chains, possibly in parallel. It results in a posterior sample of the mcmc parameters. Existing simulation runs can be resumed using continue.tfr.mcmc.
run.tfr3.mcmc: Starts MCMCs for TFR in Phase III. Existing simulation runs can be resumed using continue.tfr3.mcmc.
tfr.predict: Using the posterior parameter samples it derives posterior trajectories of the total fertility rate for all countries.
run.tfr.mcmc.extra: Runs MCMC for extra countries or regions, i.e. for countries not included in the Bayesian hierarchical model. It can be also used for aggregations.
tfr.predict.extra: Generates predictions for extra countries or aggregated regions.
The order of the functions above roughly corresponds to a typical workflow when using the package: 1. run a Phase II MCMC simulation, 2. run a Phase III MCMC simulation (optional but recommended), 3. generate predictions, 4. analyze results (using the functions below). If there are countries that were not included in steps 1.-3., or if there are aggregated regions for which a prediction is desired, one proceeds with the two functions at the bottom of the list above, followed by the analyzing functions below.
A number of functions analyzing results are included in the package:
tfr.trajectories.plot: Shows the posterior trajectories for a given country, including their median and given probability intervals.
tfr.trajectories.table: Shows the posterior trajectories for a given country in a tabular form.
tfr.map: Shows a TFR world map for a given projection period.
DLcurve.plot: Shows the posterior curves of the double logistic function used in the simulation of PhaseII, including their median and given probability intervals.
tfr.partraces.plot and tfr.partraces.cs.plot: Plot the Phase II MCMC traces of country-independent parameters and country-specific parameters, respectively. tfr3.partraces.plot and tfr3.partraces.cs.plot do the same for Phase III MCMCs.
tfr.pardensity.plot and tfr.pardensity.cs.plot: Plot the posterior density of the Phase II MCMCs for country-independent parameters and country-specific parameters, respectively. tfr3.pardensity.plot and tfr3.pardensity.cs.plot do the same for Phase III MCMCs.
summary.bayesTFR.mcmc.set: Summary function for the MCMC results.
summary.bayesTFR.prediction: Summary function for the prediction results.
For MCMC diagnostics, functions coda.list.mcmc and coda.list.mcmc3 create an object of type “mcmc.list” that can be used with the coda package. Furthermore, function tfr.diagnose and tfr3.diagnose analyze the MCMCs using the Raftery diagnostics implemented in the coda package and gives information about parameters that did not converge.
Existing simulation results can be accessed using the get.tfr.mcmc (Phase II) and get.tfr3.mcmc (Phase III) functions. An existing prediction can be accessed via get.tfr.prediction. Existing convergence diagnostics can be accessed using the get.tfr.convergence, get.tfr.convergence.all, get.tfr3.convergence and get.tfr3.convergence.all functions.
The historical national TFR data are taken from one of the packages wpp2019 (default), wpp2017, wpp2015, wpp2012 or wpp2010, depending on users settings.
Subnational TFR projections can be generated using tfr.predict.subnat. In this case, historical data must be provided by the user. Existing projections can be accessed from disk via get.regtfr.prediction.
There is a directory ex-data
shipped with the package which contains results from an example simulation, containing one chain with 60 iterations. The Example section below shows how these results were created. These data are used in Example sections throughout the manual. The user can either reproduce the data in her/his local directory, or use the ones from the package.
Hana Sevcikova <[email protected]>, Leontine Alkema <[email protected]>, Peiran Liu ([email protected]), Adrian Raftery <[email protected]>, Bailey Fosdick <[email protected]>, Patrick Gerland ([email protected])
Maintainer: Hana Sevcikova <[email protected]>
Hana Sevcikova, Leontine Alkema, Adrian E. Raftery (2011). bayesTFR: An R Package for Probabilistic Projections of the Total Fertility Rate. Journal of Statistical Software, 43(1), 1-29. doi:10.18637/jss.v043.i01.
Peiran Liu, Hana Sevcikova, Adrian E. Raftery (2023): Probabilistic Estimation and Projection of the Annual Total Fertility Rate Accounting for Past Uncertainty: A Major Update of the bayesTFR R Package. Journal of Statistical Software, 106(8), 1-36. doi:10.18637/jss.v106.i08.
L. Alkema, A. E. Raftery, P. Gerland, S. J. Clark, F. Pelletier, Buettner, T., Heilig, G.K. (2011). Probabilistic Projections of the Total Fertility Rate for All Countries. Demography, Vol. 48, 815-839. doi:10.1007/s13524-011-0040-5.
P. Liu, and A. E. Raftery (2020). Accounting for Uncertainty About Past Values In Probabilistic Projections of the Total Fertility Rate for All Countries. Annals of Applied Statistics, Vol 14, no. 2, 685-705. doi:10.1214/19-AOAS1294.
Raftery, A.E., Alkema, L. and Gerland, P. (2014). Bayesian Population Projections for the United Nations. Statistical Science, Vol. 29, 58-68. doi:10.1214/13-STS419.
Hana Sevcikova, Adrian E. Raftery, Patrick Gerland (2018). Probabilistic Projection of Subnational Total Fertility Rates. Demographic Research, Vol. 38(60), 1843-1884. doi:10.4054/DemRes.2018.38.60.
## Not run: # This command produces output data such as in the directory ex-data sim.dir <- tempfile() # Phase II MCMCs m <- run.tfr.mcmc(nr.chains=1, iter=60, output.dir=sim.dir, seed=1, verbose=TRUE) # Phase III MCMCs (not included in the package) m3 <- run.tfr3.mcmc(sim.dir=sim.dir, nr.chains=2, iter=100, thin=1, seed=1, verbose=TRUE) # Prediction pred <- tfr.predict(m, burnin=30, burnin3=50, verbose=TRUE) summary(pred, country='Ghana') unlink(sim.dir, recursive=TRUE) ## End(Not run)
## Not run: # This command produces output data such as in the directory ex-data sim.dir <- tempfile() # Phase II MCMCs m <- run.tfr.mcmc(nr.chains=1, iter=60, output.dir=sim.dir, seed=1, verbose=TRUE) # Phase III MCMCs (not included in the package) m3 <- run.tfr3.mcmc(sim.dir=sim.dir, nr.chains=2, iter=100, thin=1, seed=1, verbose=TRUE) # Prediction pred <- tfr.predict(m, burnin=30, burnin3=50, verbose=TRUE) summary(pred, country='Ghana') unlink(sim.dir, recursive=TRUE) ## End(Not run)
MCMC simulation object bayesTFR.mcmc
containing information about one MCMC chain, either from Phase II or Phase III simulation. A set of such objects belonging to the same simulation together with a bayesTFR.mcmc.meta
object constitute a bayesTFR.mcmc.set
object.
An object bayesTFR.mcmc
points to a place on disk (element output.dir
) where MCMC results from all iterations are stored. They can be retrieved to the memory using get.tfr.mcmc(...)
(Phase II) or get.tfr3.mcmc(...)
(Phase III), and tfr.mcmc(...)
.
The object is in standard cases not to be manipulated by itself, but rather as part of a bayesTFR.mcmc.set
object.
A bayesTFR.mcmc
object contains parameters of the Bayesian hierarchical model, more specifically, their values from the last iteration. If it is a Phase II object these parameters are: psi, chi, a_sd, b_sd, const_sd, S_sd, sigma0, mean_eps_tau, sd_eps_tau, Triangle4
- non-country specific parameters, containing one value each.alpha, delta
- non-country specific parameters, containing three values each.U_c, d_c, Triangle_c4
- country-specific parameters (1d array).gamma_ci
- country-specific parameter with three values for each country, i.e. an matrix where
is the number of countries.
Phase III MCMC objects contain single-value parameters mu
, rho
, sigma.mu
, sigma.rho
, sigma.eps
and -size vectors
mu.c
and rho.c
.
Furthermore, the object (independent of Phase) contains components:
iter |
Total number of iterations the simulation was started with. |
finished.iter |
Number of iterations that were finished. Results from the last finished iteration are stored in the parameters above. |
length |
Length of the MCMC stored on disk. It differs from |
thin |
Thinning interval used when simulating the MCMCs. |
id |
Identifier of this chain. |
output.dir |
Subdirectory (relative to |
traces |
This is a placeholder for keeping whole parameter traces in the memory. If the processing operates in a low memory mode, it will be 0. It can be filled in using the function |
traces.burnin |
Burnin used to retrieve the traces, i.e. how many stored iterations are missing from the beginning in the |
rng.state |
State of the random number generator at the end of the last finished interation. |
compression.type |
Type of compression of the underlying files. |
meta |
Object of class |
Hana Sevcikova
run.tfr.mcmc
, get.tfr.mcmc
, run.tfr3.mcmc
, get.tfr3.mcmc
, bayesTFR.mcmc.set
, bayesTFR.mcmc.meta
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") # loads traces from one chain m <- get.tfr.mcmc(sim.dir, low.memory=FALSE, burnin=35, chain.ids=1) # should have 25 rows, since 60 iterations in total minus 35 burnin dim(tfr.mcmc(m, 1)$traces) summary(m, chain.id=1) ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") # loads traces from one chain m <- get.tfr.mcmc(sim.dir, low.memory=FALSE, burnin=35, chain.ids=1) # should have 25 rows, since 60 iterations in total minus 35 burnin dim(tfr.mcmc(m, 1)$traces) summary(m, chain.id=1) ## End(Not run)
Simulation meta object bayesTFR.mcmc.meta
used by all chains of the same MCMC simulation.
It contains information that is common to all chains. It is part of a bayesTFR.mcmc.set
object.
The object is in standard cases not to be manipulated by itself, but rather as part of a bayesTFR.mcmc.set
object.
A bayesTFR.mcmc.meta
object contains various components that correspond to the input arguments
of the run.tfr.mcmc
and run.tfr3.mcmc
functions. Furthermore, it contains components:
nr.chains |
Number of MCMC chains. |
phase |
Value 2 or 3, depending which Phase the object belongs to. |
output.dir |
Directory for storing simulation output. |
Furthermore, Phase II meta objects contain components:
A matrix with the United Nations TFR estimates.
is number of years (see
T_end
below), is number of countries (see
nr_countries
below). The first columns correspond to countries included in the MCMC estimation (see
nr_countries_estimation
below), where .
Like tfr_matrix_all
, but it has NA
values for years where no historical data is available (i.e. after the last observed time period).
Like tfr_matrix_observed
, but it has NA
values before and after country”s fertility transition.
Number of countries included in the tfr matrices.
Number of countries included in the MCMC estimation. It must be smaller or equal to nr_countries
.
Estimated start year of the fertility decline for each country (as a row index within the tfr matrices). -1 means that the decline started before start.year
.
Index of countries for which present.year
is equal to tau_c
.
Index of countries for which the projection is made using the double logistic function, i.e. high fertility countries.
Index of countries with early decline, i.e. countries for which tau_c=-1
.
Index of countries with not early decline.
Start period of the recovery phase for each country (as an index of the tfr_matrix
).
Maximum of tau_c
and 1 for each country. Thus, it is a row index of the tfr_matrix
where the fertility decline starts.
Proposal covariance matrices of for each country.
Number of years for which United Nations historical data are available (i.e. number of rows of tfr_matrix
).
Like T_end
but country specific.
List of arrays of length nr_countries
. These are: name
- Region name for each country.code
- Region code for each country. area_name
- Area name for each country. area_code
- Area code for each country. country_name
- Array of country names. country_code
- Array of country codes.
Any country indices in the bayesTFR.mcmc.meta
object are derived from this component.
Phase III meta objects contain additional components:
Indices of countries included in the Phase III estimation. It is relative to the order of countries in the region
object in the parent meta object.
Number of countries included in the estimation.
Link to the Phase II meta object.
Hana Sevcikova, Leontine Alkema
run.tfr.mcmc
, get.tfr.mcmc
, run.tfr3.mcmc
, get.tfr3.mcmc
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) summary(m, meta.only = TRUE) names(m$meta)
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) summary(m, meta.only = TRUE) names(m$meta)
The functions convert MCMC traces (simulated using run.tfr.mcmc
and run.tfr3.mcmc
) into objects that can be used with the coda package.
coda.list.mcmc(mcmc = NULL, country = NULL, chain.ids = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), par.names = tfr.parameter.names(), par.names.cs = tfr.parameter.names.cs(), rm.const.pars = FALSE, burnin = 0, low.memory = FALSE, ...) coda.list.mcmc3(mcmc = NULL, country = NULL, chain.ids = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), par.names = tfr3.parameter.names(), par.names.cs = tfr3.parameter.names.cs(), burnin = 0, low.memory = FALSE, ...) ## S3 method for class 'bayesTFR.mcmc' coda.mcmc(mcmc, country = NULL, par.names = NULL, par.names.cs = NULL, burnin = 0, thin = 1, ...)
coda.list.mcmc(mcmc = NULL, country = NULL, chain.ids = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), par.names = tfr.parameter.names(), par.names.cs = tfr.parameter.names.cs(), rm.const.pars = FALSE, burnin = 0, low.memory = FALSE, ...) coda.list.mcmc3(mcmc = NULL, country = NULL, chain.ids = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), par.names = tfr3.parameter.names(), par.names.cs = tfr3.parameter.names.cs(), burnin = 0, low.memory = FALSE, ...) ## S3 method for class 'bayesTFR.mcmc' coda.mcmc(mcmc, country = NULL, par.names = NULL, par.names.cs = NULL, burnin = 0, thin = 1, ...)
mcmc |
In |
country |
Country name or code. It is used in connection with the |
chain.ids |
Vector of chain identifiers. By default, all chains available in the |
sim.dir |
Directory with the MCMC simulation results. Only used if |
par.names |
Names of country-independent parameters to be included. In |
par.names.cs |
Names of country-specific parameters to be included. The argument |
rm.const.pars |
Logical indicating if parameters with constant values should be removed. |
burnin |
Burnin indicating how many iterations should be removed from the beginning of each chain. |
low.memory |
Logical indicating if the function should run in a memory-efficient mode. |
thin |
Thinning interval. |
... |
Additional arguments passed to the coda's |
Function coda.list.mcmc
is for accessing all chains of phase II MCMCs; Function coda.list.mcmc3
is for accessing all chains of phase III MCMCs.
The function coda.list.mcmc
and coda.list.mcmc3
return an object of class “mcmc.list”. The function coda.mcmc
returns an object of class “mcmc”, both defined in the coda package.
Hana Sevcikova
Converts TFR trajectories stored in a binary format into two CSV files of a UN-specific format.
convert.tfr.trajectories(dir = file.path(getwd(), 'bayesTFR.output'), n = 1000, subdir = "predictions", output.dir = NULL, verbose = FALSE)
convert.tfr.trajectories(dir = file.path(getwd(), 'bayesTFR.output'), n = 1000, subdir = "predictions", output.dir = NULL, verbose = FALSE)
dir |
Directory containing the prediction object. It should correspond to the |
n |
Number of trajectories to be stored. It can be either a single number or the word “all” in which case all trajectories are stored. |
subdir |
Name of subdirectory of |
output.dir |
Directory in which the resulting files will be stored. If |
verbose |
Logical switching log messages on and off. |
The function creates two files. One is called “ascii_trajectories.csv”, it is a comma-separated table with the following columns:
country code
prediction interval, e.g. 2015-2020
middle year of the prediction interval
identifier of the trajectory
total fertility rate
The second file is called “ascii_trajectories_wide.csv”, it is also a comma-separated table and it contains the same information as above but in a ‘transposed’ format. I.e. the data for one country are ordered in columns, thus, there is one column per country. The country columns are ordered alphabetically.
If n
is smaller than the total number of trajectories, the trajectories are selected using equal spacing.
This function is automatically called from the tfr.predict
function, therefore in standard cases it will not be needed to call it directly. However, it can be useful for example, if different number of trajectories are to be converted, without having to re-run the prediction.
Hana Sevcikova
write.projection.summary
, tfr.predict
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred.dir <- file.path(getwd(), "exampleTFRpred") # stores 10 trajectories out of 35 (1x(60-25)) into # exampleTFRpred/predictions/ascii_trajectories.csv tfr.predict(sim.dir=sim.dir, output.dir=pred.dir, use.tfr3=FALSE, burnin=25, save.as.ascii=10, verbose=TRUE) # stores all 35 trajectories into the current directory convert.tfr.trajectories(dir=pred.dir, n="all", output.dir=".", verbose=TRUE) # Note: If the output.dir argument in tfr.predict is omitted, # call convert.tfr.trajectories with dir=sim.dir ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred.dir <- file.path(getwd(), "exampleTFRpred") # stores 10 trajectories out of 35 (1x(60-25)) into # exampleTFRpred/predictions/ascii_trajectories.csv tfr.predict(sim.dir=sim.dir, output.dir=pred.dir, use.tfr3=FALSE, burnin=25, save.as.ascii=10, verbose=TRUE) # stores all 35 trajectories into the current directory convert.tfr.trajectories(dir=pred.dir, n="all", output.dir=".", verbose=TRUE) # Note: If the output.dir argument in tfr.predict is omitted, # call convert.tfr.trajectories with dir=sim.dir ## End(Not run)
The function returns country names for countries given either by their codes or by index.
country.names(meta, countries = NULL, index = FALSE)
country.names(meta, countries = NULL, index = FALSE)
meta |
Object of class |
countries |
Vector of country codes or indices. If it is not given, names of all countries are returned. |
index |
Logical indicating if the argument |
The function considers countries that are included in the simulations and predictions. If the argument countries
is not given, all countries are returned in the same order as they are stored in the meta object.
Vector of country names.
Hana Sevcikova
get.countries.table
, get.country.object
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) country.names(m) # these two calls should give the same answer country.names(m, c(800, 120)) country.names(m, c(15, 20), index=TRUE)
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) country.names(m) # these two calls should give the same answer country.names(m, c(800, 120)) country.names(m, c(15, 20), index=TRUE)
The functions for plotting and retrieving the posterior distribution of the double logistic function used in the simulation of Phase II. Plots include the median and given probability intervals of the distribution.
DLcurve.plot(mcmc.list, country, burnin = NULL, pi = 80, tfr.max = 10, nr.curves = NULL, predictive.distr = FALSE, ylim = NULL, xlab = 'TFR (reversed)', ylab = 'TFR decrement', main = NULL, show.legend = TRUE, col=c('black', 'red', "#00000020"), ...) DLcurve.plot.all(mcmc.list = NULL, sim.dir = NULL, output.dir = file.path(getwd(), 'DLcurves'), output.type = "png", burnin = NULL, verbose = FALSE, ...) tfr.world.dlcurves(x, mcmc.list, burnin=NULL, countryUc=NULL, ...) tfr.country.dlcurves(x, mcmc.list, country, burnin=NULL, ...)
DLcurve.plot(mcmc.list, country, burnin = NULL, pi = 80, tfr.max = 10, nr.curves = NULL, predictive.distr = FALSE, ylim = NULL, xlab = 'TFR (reversed)', ylab = 'TFR decrement', main = NULL, show.legend = TRUE, col=c('black', 'red', "#00000020"), ...) DLcurve.plot.all(mcmc.list = NULL, sim.dir = NULL, output.dir = file.path(getwd(), 'DLcurves'), output.type = "png", burnin = NULL, verbose = FALSE, ...) tfr.world.dlcurves(x, mcmc.list, burnin=NULL, countryUc=NULL, ...) tfr.country.dlcurves(x, mcmc.list, country, burnin=NULL, ...)
mcmc.list |
List of |
country |
Name or code of a country. The code can be either numeric or ISO-2 or ISO-3 characters. |
burnin |
Number of iterations to be discarded from the beginning of parameter traces. |
pi |
Probability interval. It can be a single number or an array. |
tfr.max |
Maximum TFR to be shown in the plot. |
nr.curves |
Number of curves to be plotted. If |
predictive.distr |
Logical. If |
ylim , xlab , ylab , main
|
Graphical parameters passed to the |
show.legend |
Logical determining if the legend should be shown. |
col |
Vector of colors in this order: 1. observed data points, 2. quantiles, 3. trajectories |
... |
For the plotting functions, there are additional graphical parameters. For |
sim.dir |
Directory with the simulation results. Only relevant, if |
output.dir |
Directory into which resulting graphs are stored. |
output.type |
Type of the resulting files. It can be “png”, “pdf”, “jpeg”, “bmp”, “tiff”, or “postscript”. |
verbose |
Logical switching log messages on and off. |
x |
TFR values for which the double logistic should be computed. |
countryUc |
Country to use the parameter U_c from (start of the fertility transition). If it is not given, the middle point of the prior distribution is used. |
DLcurve.plot
plots double logistic curves for the given country. DLcurve.plot.all
creates such plots for all countries and stores them in output.dir
. Parameters inputting the double logistic function are either thinned traces created by the tfr.predict
function (if mcmc.list
is an object of class bayesTFR.prediction
), or they are selected by equal spacing from the MCMC traces. In the former case, burnin
is set automatically; in the latter case, burnin
defaults to 0 since such object has already been “burned”. If nr.curves
is smaller than 2000, the median and probability intervals are computed on a sample of 2000 equally spaced data points, otherwise on all plotted curves.
Function tfr.world.dlcurves
returns the DL curves of the hierarchical distribution, conditioned on the starting point of the fertility transition in a given country (given by the countryUc
argument). Function tfr.country.dlcurves
returns DL curves for a given country. If mcmc.list
is a prediction object, burnin
should not be given, as such object has already been “burned”.
tfr.world.dlcurves
and tfr.country.dlcurves
return a matrix of size where
is the number of trajectories and
is the number of values of
. If the argument
return.sigma
is set to TRUE
, the return value is a list with the first element being the DL values and the second element being a matrix of the standard deviation of the DL error term sigma_eps
.
Hana Sevcikova, Leontine Alkema
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") mcmc.set <- get.tfr.mcmc(sim.dir=sim.dir) DLcurve.plot(country="Burkina Faso", mcmc.set, burnin=15) # add the median of the hierarchical DL curves x <- seq(0, 10, length=100) world <- tfr.world.dlcurves(x, mcmc.set, burnin=15, countryUc="Burkina Faso") qw <- apply(world, 2, median) lines(x, qw, col='blue') ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") mcmc.set <- get.tfr.mcmc(sim.dir=sim.dir) DLcurve.plot(country="Burkina Faso", mcmc.set, burnin=15) # add the median of the hierarchical DL curves x <- seq(0, 10, length=100) world <- tfr.world.dlcurves(x, mcmc.set, burnin=15, countryUc="Burkina Faso") qw <- apply(world, 2, median) lines(x, qw, col='blue') ## End(Not run)
Function get.country.object
returns an object containing country name, code and index. Functions get.countries.table
return a data frame containing codes, names and optionally ISO character codes of all countries. Functions get.countries.phase
return countries table with the TFR phase they are currently in (1, 2, or 3).
get.country.object(country, meta = NULL, country.table = NULL, index = FALSE) ## S3 method for class 'bayesTFR.mcmc.set' get.countries.table(object, iso = FALSE, ...) ## S3 method for class 'bayesTFR.prediction' get.countries.table(object, iso = FALSE, ...) ## S3 method for class 'bayesTFR.mcmc.set' get.countries.phase(object, ...) ## S3 method for class 'bayesTFR.prediction' get.countries.phase(object, ...)
get.country.object(country, meta = NULL, country.table = NULL, index = FALSE) ## S3 method for class 'bayesTFR.mcmc.set' get.countries.table(object, iso = FALSE, ...) ## S3 method for class 'bayesTFR.prediction' get.countries.table(object, iso = FALSE, ...) ## S3 method for class 'bayesTFR.mcmc.set' get.countries.phase(object, ...) ## S3 method for class 'bayesTFR.prediction' get.countries.phase(object, ...)
country |
Country name, code or index. If it is an index, the argument |
meta |
Object of class |
country.table |
A table containing columns “name” and “code” from which the country info can be extracted. Only relevant, if |
index |
Logical determining if the argument |
object |
Object of class |
iso |
Logical. If |
... |
Not used. |
Given partial information about a country (i.e. having either name or code or index), the function get.country.object
returns an object containing all three pieces of information. Only countries are considered that are included in the simulations and predictions. Country index is an internal index used in various components of a bayesTFR.mcmc.meta
object.
Function get.country.object
returns a list with components:
name |
Country name |
code |
Country numeric code |
index |
Country index |
Function get.countries.table
returns a data frame with columns code
, name
, and optionally (if iso
is TRUE
) iso2
and iso3
.
Function get.countries.phase
returns a data frame with columns code
, name
and phase
.
Hana Sevcikova
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) # all five calls should give the same answer get.country.object('China', m$meta) get.country.object('CN', m$meta) get.country.object(156, m$meta) get.country.object(56, m$meta, index=TRUE) get.country.object(156, NULL, country.table=get.countries.table(m)) # phase 3 countries subset(get.countries.phase(m), phase == 3)
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) # all five calls should give the same answer get.country.object('China', m$meta) get.country.object('CN', m$meta) get.country.object(156, m$meta) get.country.object(56, m$meta, index=TRUE) get.country.object(156, NULL, country.table=get.countries.table(m)) # phase 3 countries subset(get.countries.phase(m), phase == 3)
From a given MCMC, obtain a covariance matrix of the (
) parameters for each country
.
get.cov.gammas(mcmc.set = NULL, sim.dir = NULL, burnin = 200, chain.id = 1)
get.cov.gammas(mcmc.set = NULL, sim.dir = NULL, burnin = 200, chain.id = 1)
mcmc.set |
Object of class |
sim.dir |
Directory with existing MCMC simulation results. It is only used if |
burnin |
Number of burn-in iterations to be discarded from the begining of the chain. |
chain.id |
Identifier of the MCMC to be used. By default the first chain is used. |
In order to speed-up MCMC convergence, the package contains default values of gamma covariance that were obtained from a previous run (they can be loaded using data(proposal_cov_gammas_cii)
). However, this function allows to obtain new values and overwrite the default values by
passing the resulting object to the run.tfr.mcmc
function as the proposal_cov_gammas
argument.
A list with components:
values |
An array of size nr_countries |
country_codes |
A vector of size nr_countries. A covariance matrix |
Leontine Alkema, Hana Sevcikova
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") cov.gammas <- get.cov.gammas(sim.dir=sim.dir, burnin=0) m <- run.tfr.mcmc(nr.chains=1, iter=10, proposal_cov_gammas=cov.gammas, verbose=TRUE) ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") cov.gammas <- get.cov.gammas(sim.dir=sim.dir, burnin=0) m <- run.tfr.mcmc(nr.chains=1, iter=10, proposal_cov_gammas=cov.gammas, verbose=TRUE) ## End(Not run)
Functions for obtaining bias and standard deviation of the estimated models as well as the model fits.
tfr.bias.sd(mcmc.list = NULL, country = NULL, sim.dir = NULL, ...) get.bias.model(mcmc.list = NULL, country = NULL, sim.dir = NULL, ...) get.std.model(mcmc.list = NULL, country = NULL, sim.dir = NULL, ...)
tfr.bias.sd(mcmc.list = NULL, country = NULL, sim.dir = NULL, ...) get.bias.model(mcmc.list = NULL, country = NULL, sim.dir = NULL, ...) get.std.model(mcmc.list = NULL, country = NULL, sim.dir = NULL, ...)
mcmc.list |
Object of class |
country |
Name or numerical code of a country. It can also be given as ISO-2 or ISO-3 characters. |
sim.dir |
Directory with the MCMC simulation results. Only used if |
... |
Not used. |
Functions get.bias.model
and get.std.model
are used to obtain the model fit for estimated bias and standard deviation, respectively, when uncertainty about input data is taken into account. These are used in the MCMC steps stored in mcmc.list
. Function tfr.bias.sd
combines both infos into one object.
Functions get.bias.model
and get.std.model
return a list with
model |
|
table |
|
Function tfr.bias.sd
consolidates these items into a single list where the elements are model_bias
, model_sd
and table
.
Peiran Liu, Hana Sevcikova
## Not run: sim.dir <- tempfile() mcmc.set <- run.tfr.mcmc(nr.chains = 1, iter = 10, output.dir = sim.dir, uncertainty = TRUE) tfr.bias.sd(mcmc.set, "Nigeria") unlink(sim.dir, recursive = TRUE) ## End(Not run)
## Not run: sim.dir <- tempfile() mcmc.set <- run.tfr.mcmc(nr.chains = 1, iter = 10, output.dir = sim.dir, uncertainty = TRUE) tfr.bias.sd(mcmc.set, "Nigeria") unlink(sim.dir, recursive = TRUE) ## End(Not run)
Retrieve subnational (regional) prediction results produced by tfr.predict.subnat
, either for one country or for all available countries.
get.regtfr.prediction(sim.dir, country = NULL)
get.regtfr.prediction(sim.dir, country = NULL)
sim.dir |
Simulation directory of the subnational predictions. It corresponds to the argument |
country |
Numerical country code. If it is not given, all available subnational predictions are retrieved. |
Predictions for country are assumed to be stored in “
sim.dir
/subnat/c”.
If argument country
is given, the function returns an object of class bayesTFR.prediction
. If it is NULL
, it returns a list of such objects. Names of the list items are the country codes.
# Subnational example data my.subtfr.file <- file.path(find.package("bayesTFR"), 'extdata', 'subnational_tfr_template.txt') subtfr <- read.delim(my.subtfr.file, check.names=FALSE, stringsAsFactor=FALSE) countries <- unique(subtfr[, c("country_code", "country")]) # Directory with national projections (contains 30 trajectories for each country) nat.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") # Subnational projections for all three countries () subnat.dir <- tempfile() tfr.predict.subnat(countries$country_code, my.tfr.file=my.subtfr.file, sim.dir=nat.dir, output.dir=subnat.dir, start.year=2013) # Retrieve results for all countries preds <- get.regtfr.prediction(subnat.dir) names(preds) # View tables of subregions for each country for(i in 1:nrow(countries)) { cat("\n\n", countries$country[i], "\n") print(get.countries.table(preds[[as.character(countries$country_code[i])]])) } # Quantiles for individual subregions tfr.trajectories.table(preds[["218"]], "Bolivar") # Retrieve results for one country pred <- get.regtfr.prediction(subnat.dir, 218) tfr.trajectories.plot(pred, "Loja") # cleanup unlink(subnat.dir) # See more examples in ?tfr.predict.subnat
# Subnational example data my.subtfr.file <- file.path(find.package("bayesTFR"), 'extdata', 'subnational_tfr_template.txt') subtfr <- read.delim(my.subtfr.file, check.names=FALSE, stringsAsFactor=FALSE) countries <- unique(subtfr[, c("country_code", "country")]) # Directory with national projections (contains 30 trajectories for each country) nat.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") # Subnational projections for all three countries () subnat.dir <- tempfile() tfr.predict.subnat(countries$country_code, my.tfr.file=my.subtfr.file, sim.dir=nat.dir, output.dir=subnat.dir, start.year=2013) # Retrieve results for all countries preds <- get.regtfr.prediction(subnat.dir) names(preds) # View tables of subregions for each country for(i in 1:nrow(countries)) { cat("\n\n", countries$country[i], "\n") print(get.countries.table(preds[[as.character(countries$country_code[i])]])) } # Quantiles for individual subregions tfr.trajectories.table(preds[["218"]], "Bolivar") # Retrieve results for one country pred <- get.regtfr.prediction(subnat.dir, 218) tfr.trajectories.plot(pred, "Loja") # cleanup unlink(subnat.dir) # See more examples in ?tfr.predict.subnat
The function loads objects of class bayesTFR.convergence
from disk.
get.tfr.convergence(sim.dir = file.path(getwd(), "bayesTFR.output"), thin=80, burnin = 2000) get.tfr.convergence.all(sim.dir = file.path(getwd(), "bayesTFR.output")) get.tfr3.convergence(sim.dir = file.path(getwd(), "bayesTFR.output"), thin=60, burnin = 10000) get.tfr3.convergence.all(sim.dir = file.path(getwd(), "bayesTFR.output"))
get.tfr.convergence(sim.dir = file.path(getwd(), "bayesTFR.output"), thin=80, burnin = 2000) get.tfr.convergence.all(sim.dir = file.path(getwd(), "bayesTFR.output")) get.tfr3.convergence(sim.dir = file.path(getwd(), "bayesTFR.output"), thin=60, burnin = 10000) get.tfr3.convergence.all(sim.dir = file.path(getwd(), "bayesTFR.output"))
sim.dir |
Simulation directory used for computing the diagnostics. |
thin |
Thinning interval used with this diagnostics. |
burnin |
Burnin used for computing the diagnostics. |
Function get.tfr.convergence
loads an object of class bayesTFR.convergence
for the specific thin
and burnin
generated for Phase II MCMCs. Function get.tfr.convergence.all
loads all Phase II bayesTFR.convergence
objects available for sim.dir
. Functions get.tfr3.convergence
and get.tfr3.convergence.all
do the same thing for Phase III MCMCs.
get.tfr.convergence
and get.tfr3.convergence
return an object of class bayesTFR.convergence
; get.tfr.convergence.all
and get.tfr3.convergence.all
return a list of objects of class bayesTFR.convergence
.
Hana Sevcikova
bayesTFR.convergence
, summary.bayesTFR.convergence
.
Get past TFR estimation, including trajectories and quantiles if required.
get.tfr.estimation(mcmc.list = NULL, country = NULL, sim.dir = NULL, burnin = 0, thin = 1, probs = NULL, adjust = TRUE, country.code = deprecated(), ISO.code = deprecated())
get.tfr.estimation(mcmc.list = NULL, country = NULL, sim.dir = NULL, burnin = 0, thin = 1, probs = NULL, adjust = TRUE, country.code = deprecated(), ISO.code = deprecated())
mcmc.list |
Object of class |
country |
Name or numerical code of a country. It can also be given as ISO-2 or ISO-3 characters. |
sim.dir |
Directory with the MCMC simulation results. Only used if |
burnin |
Burn-in for getting trajectories and quantiles. A positive burn-in |
thin |
Thin for getting trajectories and quantiles. Thinning level |
probs |
A vector of numbers between |
adjust |
Logical indicating whether the adjusted median and trajectories should be returned. |
country.code , ISO.code
|
Deprecated arguments. Use argument |
This function is used to obtain the TFR estimation trajectories as well as corresponding quantiles if the mcmc.list
has been obtained while taking account for uncertainty about the past, i.e. uncertainty=TRUE
in run.tfr.mcmc
. Quantiles or the mean are included in the results if probs
is not NULL
.
tfr_table |
Table storing the trajectories. It is a matrix with rows equal to number of trajectories, and column equal to number of time periods. |
country.obj |
A list storing information about the country which the trajectories and quantiles correspond to. It corresponds to the output of |
tfr_quantile |
Optional. A data.table object, storing the quantiles or the mean of estimates for each time period as specified by the |
Peiran Liu, Hana Sevcikova
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains = 1, iter = 10, output.dir = sim.dir, uncertainty = TRUE) get.tfr.estimation(m, "Nigeria", probs = c(0.1, 0.5, 0.9)) unlink(sim.dir, recursive = TRUE) ## End(Not run)
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains = 1, iter = 10, output.dir = sim.dir, uncertainty = TRUE) get.tfr.estimation(m, "Nigeria", probs = c(0.1, 0.5, 0.9)) unlink(sim.dir, recursive = TRUE) ## End(Not run)
The function get.tfr.mcmc
retrieves results of an MCMC simulation of Phase II and creates an object of class bayesTFR.mcmc.set
. Function has.tfr.mcmc
checks the existence of such results. Functions get.tfr3.mcmc
and has.tfr3.mcmc
do the same for Phase III MCMCs. Function tfr.mcmc
extracts a single chain and tfr.mcmc.list
extracts several or all chains from the simulation results.
get.tfr.mcmc(sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, low.memory = TRUE, burnin = 0, verbose = FALSE) has.tfr.mcmc(sim.dir) get.tfr3.mcmc(sim.dir = file.path(getwd(), "bayesTFR.output"), ...) has.tfr3.mcmc(sim.dir) tfr.mcmc(mcmc.set, chain.id) tfr.mcmc.list(mcmc.set, chain.ids=NULL)
get.tfr.mcmc(sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, low.memory = TRUE, burnin = 0, verbose = FALSE) has.tfr.mcmc(sim.dir) get.tfr3.mcmc(sim.dir = file.path(getwd(), "bayesTFR.output"), ...) has.tfr3.mcmc(sim.dir) tfr.mcmc(mcmc.set, chain.id) tfr.mcmc.list(mcmc.set, chain.ids=NULL)
sim.dir |
Directory where the simulation results are stored. |
chain.ids |
Chain identifiers in case only specific chains should be included in the resulting object. By default, all available chains are included. |
low.memory |
If |
burnin |
Burnin used for loading traces. Only relevant, if |
verbose |
Logical switching log messages on and off. |
chain.id |
Chain identifier. |
mcmc.set |
Object of class |
... |
Arguments passed to |
Function get.tfr.mcmc
is an accessor of results generated using run.tfr.mcmc
and continue.tfr.mcmc
. Function get.tfr3.mcmc
can be used to access results generated using run.tfr3.mcmc
and continue.tfr3.mcmc
.
get.tfr.mcmc
and get.tfr3.mcmc
return an object of class bayesTFR.mcmc.set
. has.tfr.mcmc
and has.tfr3.mcmc
return a logical value. tfr.mcmc
returns an object of class bayesTFR.mcmc
, and tfr.mcmc.list
returns a list of bayesTFR.mcmc
objects.
Hana Sevcikova
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) summary(m) # summary of the single chains for(mc in tfr.mcmc.list(m)) print(summary(mc))
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) summary(m) # summary of the single chains for(mc in tfr.mcmc.list(m)) print(summary(mc))
Functions for accessing traces of the MCMC parameters, either country-independent or country-specific. Functions get.tfr.parameter.traces
and get.tfr.parameter.traces.cs
access Phase II MCMCs; Functions get.tfr3.parameter.traces
and get.tfr3.parameter.traces.cs
access Phase III MCMCs.
get.tfr.parameter.traces(mcmc.list, par.names = tfr.parameter.names(), burnin = 0, thinning.index = NULL, thin = NULL) get.tfr.parameter.traces.cs(mcmc.list, country.obj, par.names=tfr.parameter.names.cs(), burnin=0, thinning.index=NULL, thin=NULL) get.tfr3.parameter.traces(mcmc.list, par.names = tfr3.parameter.names(), ...) get.tfr3.parameter.traces.cs(mcmc.list, country.obj, par.names=tfr3.parameter.names.cs(), ...)
get.tfr.parameter.traces(mcmc.list, par.names = tfr.parameter.names(), burnin = 0, thinning.index = NULL, thin = NULL) get.tfr.parameter.traces.cs(mcmc.list, country.obj, par.names=tfr.parameter.names.cs(), burnin=0, thinning.index=NULL, thin=NULL) get.tfr3.parameter.traces(mcmc.list, par.names = tfr3.parameter.names(), ...) get.tfr3.parameter.traces.cs(mcmc.list, country.obj, par.names=tfr3.parameter.names.cs(), ...)
mcmc.list |
List of |
country.obj |
Country object list (see |
par.names |
Names of country-independent parameters (in case of |
burnin |
Burnin indicating how many iterations should be removed from the beginning of each chain. |
thinning.index |
Index of the traces for thinning. If it is |
thin |
Alternative to |
... |
Arguments passed to underlying functions (i.e. to |
All functions return a matrix with columns being the parameters and rows being the MCMC values, attached to one another in case of multiple chains. get.tfr.parameter.traces
and get.tfr3.parameter.traces
return country-independent parameters, get.tfr.parameter.traces.cs
and get.tfr3.parameter.traces.cs
return country-specific parameters.
Hana Sevcikova
coda.list.mcmc
for another way of retrieving parameter traces.
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) tfr.values <- get.tfr.parameter.traces(m$mcmc.list, burnin=10, par.names="sigma0") ## Not run: hist(tfr.values, main=colnames(tfr.values)) ## End(Not run) tfr.values.cs <- get.tfr.parameter.traces.cs(m$mcmc.list, get.country.object("Canada", meta=m$meta), burnin=10, par.names="Triangle_c4") ## Not run: hist(tfr.values.cs, main=colnames(tfr.values.cs)) ## End(Not run)
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) tfr.values <- get.tfr.parameter.traces(m$mcmc.list, burnin=10, par.names="sigma0") ## Not run: hist(tfr.values, main=colnames(tfr.values)) ## End(Not run) tfr.values.cs <- get.tfr.parameter.traces.cs(m$mcmc.list, get.country.object("Canada", meta=m$meta), burnin=10, par.names="Triangle_c4") ## Not run: hist(tfr.values.cs, main=colnames(tfr.values.cs)) ## End(Not run)
Function get.tfr.prediction
retrieves results of a prediction and creates an object of class bayesTFR.prediction
. Function has.tfr.prediction
checks an existence of such results. Function available.tfr.predictions
lists predictions available in the given simulation directory.
get.tfr.prediction(mcmc = NULL, sim.dir = NULL, mcmc.dir = NULL, subdir = "predictions") has.tfr.prediction(mcmc = NULL, sim.dir = NULL, subdir = "predictions") available.tfr.predictions(mcmc = NULL, sim.dir = NULL, full.names = FALSE)
get.tfr.prediction(mcmc = NULL, sim.dir = NULL, mcmc.dir = NULL, subdir = "predictions") has.tfr.prediction(mcmc = NULL, sim.dir = NULL, subdir = "predictions") available.tfr.predictions(mcmc = NULL, sim.dir = NULL, full.names = FALSE)
mcmc |
Object of class |
sim.dir |
Directory where the prediction is stored. It should correspond to the value of the |
mcmc.dir |
Optional argument to be used only in a special case when the mcmc object contained in the prediction object was estimated in different directory than in the one to which it points to (for example due to moving or renaming the original directory). The argument causes that the mcmc is redirected to the given directory. It can be set to |
subdir |
Subdirectory of |
full.names |
Logical. If |
If mcmc
is not NULL
, the search directory is set to mcmc$meta$output.dir
. This approach assumes that the prediction was stored in the same directory as the MCMC simulation, i.e. the output.dir
argument of the tfr.predict
function was set to NULL
. If it is not the case, the argument mcmc.dir
should be used.
Usually, all predictions are stored in the subdirectory “predictions” of the simulation directory. If the subdirectory has a different name, the argument subdir
should be used. This allows to keep multiple predictions in one (MCMC) simulation directory.
The function available.tfr.predictions
can be used to view all available predictions in the simulation directory.
Function has.tfr.prediction
returns a logical indicating if a prediction exists for the given mcmc
.
Function available.tfr.predictions
returns a vector of directory names containing TFR predictions.
Function get.tfr.prediction
returns an
object of class bayesTFR.prediction
.
Hana Sevcikova
bayesTFR.prediction
, tfr.predict
, summary.bayesTFR.prediction
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- get.tfr.prediction(sim.dir=sim.dir) summary(pred, country="Canada")
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- get.tfr.prediction(sim.dir=sim.dir) summary(pred, country="Canada")
Function for accessing TFR trajectories.
get.tfr.trajectories(tfr.pred, country)
get.tfr.trajectories(tfr.pred, country)
tfr.pred |
Object of class |
country |
Name or code of a country. The code can be either numeric or ISO-2 or ISO-3 characters. |
The function loads TFR trajectories for the given country from disk, offsets it if needed (see tfr.median.shift
) and returns it as a matrix.
Array of size number of projection periods (including the present year) times the number of trajectories. The row names correspond to the mid-years of the prediction periods.
Hana Sevcikova
bayesTFR.prediction
, get.tfr.prediction
, tfr.trajectories.table
, tfr.median.shift
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- get.tfr.prediction(sim.dir=sim.dir) get.tfr.trajectories(pred, "Germany")
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- get.tfr.prediction(sim.dir=sim.dir) get.tfr.trajectories(pred, "Germany")
The function get.thinned.tfr.mcmc
accesses
a thinned and burned version of the given Phase II MCMC set. create.thinned.tfr.mcmc
creates or updates such a set.
get.thinned.tfr.mcmc(mcmc.set, thin = 1, burnin = 0) create.thinned.tfr.mcmc(mcmc.set, thin = 1, burnin = 0, output.dir = NULL, verbose = TRUE, uncertainty = FALSE, update.with.countries = NULL)
get.thinned.tfr.mcmc(mcmc.set, thin = 1, burnin = 0) create.thinned.tfr.mcmc(mcmc.set, thin = 1, burnin = 0, output.dir = NULL, verbose = TRUE, uncertainty = FALSE, update.with.countries = NULL)
mcmc.set |
Object of class |
thin , burnin
|
Thinning interval and burnin used for creating or identifying the thinned object. |
output.dir |
Output directory. It is only used if the output goes to a non-standard directory. |
verbose |
Logical switching log messages on and off. |
uncertainty |
If users want to save the thinned estimated TFR in the new mcmc object, this parameter should be set |
update.with.countries |
If an existing set is to be updated, this should be a vector of country indices for the update. |
The function create.thinned.tfr.mcmc
is called from tfr.predict
and thus, the resulting object contains exactly the same MCMCs used for generating projections. In addition, it can be also called from tfr.diagnose
if desired, so that the projection process can re-use such a set that leads to a convergence.
The thinning is done as follows: The given burnin
is removed from the beginning of each chain in the original MCMC set. Then each chain is thinned by thin
using equal spacing and all chains are collapsed into one single chain per parameter. They are stored in the main simulation directory under the name ‘thinned_mcmc_t_b’ where t is the value of thin
and b the value of burnin
.
If uncertainty=TRUE
, the estimated TFR is thinned and saved as well.
Both functions return an object of class bayesTFR.mcmc.set
. get.thinned.tfr.mcmc
returns NULL
if such object does not exist.
Hana Sevcikova
bayesTFR.mcmc.set
, tfr.predict
, tfr.diagnose
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains=2, iter=30, seed=1, output.dir=sim.dir, verbose=TRUE) tfr.predict(m, burnin=15, use.tfr3=FALSE) # creates thinned MCMCs mb <- get.thinned.tfr.mcmc(m, thin=1, burnin=15) summary(mb, meta.only=TRUE) # length 30 = 2chains x (30-15)iters. unlink(sim.dir, recursive=TRUE) ## End(Not run)
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains=2, iter=30, seed=1, output.dir=sim.dir, verbose=TRUE) tfr.predict(m, burnin=15, use.tfr3=FALSE) # creates thinned MCMCs mb <- get.thinned.tfr.mcmc(m, thin=1, burnin=15) summary(mb, meta.only=TRUE) # length 30 = 2chains x (30-15)iters. unlink(sim.dir, recursive=TRUE) ## End(Not run)
Function get.total.iterations
gives the total number of iterations of MCMCs summed over chains whith burnin being subtracted from each chain. Function get.stored.mcmc.length
gives the total length of the MCMCs stored on disk minus those iterations that correspond to burnin. Result of the latter will be different from the former only if the MCMCs were run with value of thin
larger than one.
get.total.iterations(mcmc.list, burnin = 0) get.stored.mcmc.length(mcmc.list, burnin = 0)
get.total.iterations(mcmc.list, burnin = 0) get.stored.mcmc.length(mcmc.list, burnin = 0)
mcmc.list |
List of |
burnin |
Number of iterations to be subtracted from each chain. |
A single number.
Hana Sevcikova
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") mcmc.set <- get.tfr.mcmc(sim.dir=sim.dir) get.total.iterations(mcmc.set$mcmc.list) # 60=1chain x 60iters get.total.iterations(mcmc.set$mcmc.list, burnin=20) # 40=1x(60-20) ## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(iter=10, nr.chains=2, output.dir=sim.dir, thin=5, verbose=TRUE) get.total.iterations(m$mcmc.list) # 20=2x10 get.stored.mcmc.length(m$mcmc.list) # 6=2x3 unlink(sim.dir, recursive=TRUE) ## End(Not run)
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") mcmc.set <- get.tfr.mcmc(sim.dir=sim.dir) get.total.iterations(mcmc.set$mcmc.list) # 60=1chain x 60iters get.total.iterations(mcmc.set$mcmc.list, burnin=20) # 40=1x(60-20) ## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(iter=10, nr.chains=2, output.dir=sim.dir, thin=5, verbose=TRUE) get.total.iterations(m$mcmc.list) # 20=2x10 get.stored.mcmc.length(m$mcmc.list) # 6=2x3 unlink(sim.dir, recursive=TRUE) ## End(Not run)
Data sets containing codes that determine which countries are to be included into a simulation or/and projections.
data(include_2024) data(include_2022) data(include_2019) data(include_2017) data(include_2015) data(include_2012) data(include_2010)
data(include_2024) data(include_2022) data(include_2019) data(include_2017) data(include_2015) data(include_2012) data(include_2010)
Data frames containing one record per country or region. It has the following variables:
country_code
Numerical Location Code (3-digit codes following ISO 3166-1 numeric standard) - see https://en.wikipedia.org/wiki/ISO_3166-1_numeric.
Entries for which include_code=2
are included in the MCMC estimation of the hyperparameters of the model. Entries for which include_code
is 1 or 2 are included in the prediction. Entries with 0 are excluded from both.
In a simulation, an include_*
dataset is selected that corresponds to the given wpp.year
passed to the function run.tfr.mcmc
. It is merged with a tfr
dataset from the corresponding wpp package using the country_code
column. Thus, the country entries in this dataset should correspond to entries in the tfr
dataset.
The package contains also a dataset called ‘my_tfr_template’ (in ‘extdata’ directory) which is a template for user-specified TFR time series. It has the same structure as the tfr
dataset, except that most of the columns are optional. The only required column is country_code
(see description of the argument my.tfr.file
in run.tfr.mcmc
).
Data provided by the United Nations Population Division.
data(include_2019) head(include_2019)
data(include_2019) head(include_2019)
Runs (or continues running) MCMCs for simulating the total fertility rate of all countries of the world (phase II), using a Bayesian hierarchical model.
run.tfr.mcmc(nr.chains = 3, iter = 62000, output.dir = file.path(getwd(), "bayesTFR.output"), thin = 1, replace.output = FALSE, annual = FALSE, uncertainty = FALSE, start.year = 1950, present.year = 2020, wpp.year = 2019, my.tfr.file = NULL, my.locations.file = NULL, my.tfr.raw.file = NULL, use.wpp.data = TRUE, ar.phase2 = FALSE, buffer.size = 100, raw.outliers = c(-2, 1), U.c.low = 5.5, U.up = 8.8, U.width = 3, mean.eps.tau0 = -0.25, sd.eps.tau0 = 0.4, nu.tau0 = 2, Triangle_c4.low = 1, Triangle_c4.up = 2.5, Triangle_c4.trans.width = 2, Triangle4.0 = 0.3, delta4.0 = 0.8, nu4 = 2, S.low = 3.5, S.up = 6.5, S.width = 0.5, a.low = 0, a.up = 0.2, a.width = 0.02, b.low = a.low, b.up = a.up, b.width = 0.05, sigma0.low = if (annual) 0.0045 else 0.01, sigma0.up = 0.6, sigma0.width = 0.1, sigma0.min = 0.04, const.low = 0.8, const.up = 2, const.width = 0.3, d.low = 0.05, d.up = 0.5, d.trans.width = 1, chi0 = -1.5, psi0 = 0.6, nu.psi0 = 2, alpha0.p = c(-1, 0.5, 1.5), delta0 = 1, nu.delta0 = 2, dl.p1 = 9, dl.p2 = 9, phase3.parameter=NULL, S.ini = NULL, a.ini = NULL, b.ini = NULL, sigma0.ini = NULL, Triangle_c4.ini = NULL, const.ini = NULL, gamma.ini = 1, phase3.starting.values = NULL, proposal_cov_gammas = NULL, iso.unbiased = NULL, covariates = c("source", "method"), cont_covariates = NULL, source.col.name="source", seed = NULL, parallel = FALSE, nr.nodes = nr.chains, save.all.parameters = FALSE, compression.type = 'None', auto.conf = list(max.loops = 5, iter = 62000, iter.incr = 10000, nr.chains = 3, thin = 80, burnin = 2000), verbose = FALSE, verbose.iter = 10, ...) continue.tfr.mcmc(iter, chain.ids = NULL, output.dir = file.path(getwd(), "bayesTFR.output"), parallel = FALSE, nr.nodes = NULL, auto.conf = NULL, verbose = FALSE, verbose.iter = 10, ...)
run.tfr.mcmc(nr.chains = 3, iter = 62000, output.dir = file.path(getwd(), "bayesTFR.output"), thin = 1, replace.output = FALSE, annual = FALSE, uncertainty = FALSE, start.year = 1950, present.year = 2020, wpp.year = 2019, my.tfr.file = NULL, my.locations.file = NULL, my.tfr.raw.file = NULL, use.wpp.data = TRUE, ar.phase2 = FALSE, buffer.size = 100, raw.outliers = c(-2, 1), U.c.low = 5.5, U.up = 8.8, U.width = 3, mean.eps.tau0 = -0.25, sd.eps.tau0 = 0.4, nu.tau0 = 2, Triangle_c4.low = 1, Triangle_c4.up = 2.5, Triangle_c4.trans.width = 2, Triangle4.0 = 0.3, delta4.0 = 0.8, nu4 = 2, S.low = 3.5, S.up = 6.5, S.width = 0.5, a.low = 0, a.up = 0.2, a.width = 0.02, b.low = a.low, b.up = a.up, b.width = 0.05, sigma0.low = if (annual) 0.0045 else 0.01, sigma0.up = 0.6, sigma0.width = 0.1, sigma0.min = 0.04, const.low = 0.8, const.up = 2, const.width = 0.3, d.low = 0.05, d.up = 0.5, d.trans.width = 1, chi0 = -1.5, psi0 = 0.6, nu.psi0 = 2, alpha0.p = c(-1, 0.5, 1.5), delta0 = 1, nu.delta0 = 2, dl.p1 = 9, dl.p2 = 9, phase3.parameter=NULL, S.ini = NULL, a.ini = NULL, b.ini = NULL, sigma0.ini = NULL, Triangle_c4.ini = NULL, const.ini = NULL, gamma.ini = 1, phase3.starting.values = NULL, proposal_cov_gammas = NULL, iso.unbiased = NULL, covariates = c("source", "method"), cont_covariates = NULL, source.col.name="source", seed = NULL, parallel = FALSE, nr.nodes = nr.chains, save.all.parameters = FALSE, compression.type = 'None', auto.conf = list(max.loops = 5, iter = 62000, iter.incr = 10000, nr.chains = 3, thin = 80, burnin = 2000), verbose = FALSE, verbose.iter = 10, ...) continue.tfr.mcmc(iter, chain.ids = NULL, output.dir = file.path(getwd(), "bayesTFR.output"), parallel = FALSE, nr.nodes = NULL, auto.conf = NULL, verbose = FALSE, verbose.iter = 10, ...)
nr.chains |
Number of MCMC chains to run. |
iter |
Number of iterations to run in each chain. In addition to a single value, it can have the value ‘auto’ in which case the function runs for the number of iterations given in the |
output.dir |
Directory which the simulation output should be written into. |
thin |
Thinning interval between consecutive observations to be stored on disk. |
replace.output |
If |
annual |
If |
uncertainty |
Logical. If |
use.wpp.data |
Logical indicating if default WPP data should be used, i.e. if |
ar.phase2 |
Logical where |
start.year |
Start year for using historical data. |
present.year |
End year for using historical data. |
wpp.year |
Year for which WPP data is used. The functions loads a package called wpp |
my.tfr.file |
File name containing user-specified TFR time series for one or more countries. See Details below. |
my.locations.file |
File name containing user-specified locations. See Details below. |
my.tfr.raw.file |
File name of the raw TFR, used when |
buffer.size |
Buffer size (in number of iterations) for keeping data in the memory. The smaller the |
raw.outliers |
Vector of size two giving the maximum annual decrease and increase of raw TFR change, respectively. The default values mean that any raw TFR data that decrease more than 2 or increase more than 1 in one year are considered as outliers. |
U.c.low , U.up
|
Lower and upper bound of the parameter |
U.width |
Width for slice sampling used when updating the |
mean.eps.tau0 , sd.eps.tau0
|
Mean and standard deviation of the prior distribution of |
nu.tau0 |
The shape parameter of the prior Gamma distribution of |
Triangle_c4.low , Triangle_c4.up
|
Lower and upper bound of the |
Triangle_c4.trans.width |
Width for slice sampling used when updating the logit-transformed |
Triangle4.0 , delta4.0
|
Mean and standard deviation of the prior distribution of the |
nu4 |
The shape parameter of the prior Gamma distribution of |
S.low , S.up
|
Lower and upper bound of the uniform prior distribution of the |
S.width |
Width for slice sampling used when updating the |
a.low , a.up
|
Lower and upper bound of the uniform prior distribution of the |
a.width |
Width for slice sampling used when updating the |
b.low , b.up
|
Lower and upper bound of the uniform prior distribution of the |
b.width |
Width for slice sampling used when updating the |
sigma0.low , sigma0.up
|
Lower and upper bound of the uniform prior distribution of the |
sigma0.width |
Width for slice sampling used when updating the |
sigma0.min |
Minimum value that |
const.low , const.up
|
Lower and upper bound of the uniform prior distribution of the |
const.width |
Width for slice sampling used when updating the |
d.low , d.up
|
Lower and upper bound of the parameter |
d.trans.width |
Width for slice sampling used when updating the logit-transformed |
chi0 , psi0
|
Prior mean and standard deviation of the |
nu.psi0 |
The shape parameter of the prior Gamma distribution of |
alpha0.p |
Vector of prior means of the |
delta0 |
Prior standard deviation of the |
nu.delta0 |
The shape parameter of the prior Gamma distribution of |
dl.p1 , dl.p2
|
Values of the parameters |
phase3.parameter |
When |
S.ini |
Initial value for the |
a.ini |
Initial value for the |
b.ini |
Initial value for the |
sigma0.ini |
Initial value for the |
Triangle_c4.ini |
Initial value for the |
const.ini |
Initial value for the |
gamma.ini |
Initial value for the |
phase3.starting.values |
This parameter is used to provide a list of Phase 3 initial values, such as |
proposal_cov_gammas |
Proposal for the gamma covariance matrices for each country. It should be a list with two values: |
iso.unbiased |
Codes of countries for which the vital registration TFR estimates are considered unbiased. Only used if |
covariates , cont_covariates
|
Categorical and continuous features used in estimating bias and standard deviation if |
source.col.name |
If |
seed |
Seed of the random number generator. If |
parallel |
Logical determining if the simulation should run multiple chains in parallel. If it is |
nr.nodes |
Relevant only if |
save.all.parameters |
If |
compression.type |
One of ‘None’, ‘gz’, ‘xz’, ‘bz’, determining type of a compression of the MCMC files. Important: Do not use this option for a long MCMC simulation as this tends to cause very long run times due to slow reading! |
auto.conf |
List containing a configuration for an ‘automatic’ run (see description of argument |
verbose |
Logical switching log messages on and off. |
verbose.iter |
Integer determining how often (in number of iterations) log messages are outputted during the estimation. |
... |
Additional parameters to be passed to the function |
chain.ids |
Array of chain identifiers that should be resumed. If it is |
The function run.tfr.mcmc
creates an object of class bayesTFR.mcmc.meta
and stores it in output.dir
. It launches nr.chains
MCMCs, either sequentially or in parallel. Parameter traces of each chain are stored as (possibly compressed) ASCII files in a subdirectory of output.dir
, called mc
x where x is the identifier of that chain. There is one file per parameter, named after the parameter with the suffix “.txt”, possibly followed by a compression suffix if compression.type
is given. Country-specific parameters () have the suffix
_c
y where y is the country code. In addition to the trace files, each mc
x directory contains the object bayesTFR.mcmc
in binary format. All chain-specific files are written into disk after the first, last and each buffer.size
-th iteration.
Using the function continue.tfr.mcmc
one can continue simulating an existing MCMCs by iter
iterations for either all or selected chains.
The function loads observed data (further denoted as WPP dataset) from the tfr
and tfr_supplemental
datasets in a wpp package where
is the
wpp.year
. It is then merged with the include
dataset that corresponds to the same wpp.year
. The argument my.tfr.file
can be used to overwrite those default data. If use.wpp.data
is FALSE
, it fully replaces the default dataset. Otherwise (by default), such a file can include a subset of countries contained in the WPP dataset, as well as a set of new countries. In the former case,
the function replaces the corresponding country data from the WPP dataset by values in this file. Only columns are replaced that match column names of the WPP dataset, and in addition, columns ‘last.observed’ and ‘include_code’ are used, if present. Countries are merged with WPP using the column ‘country_code’. In addition, in order the countries to be included in the simulation, in both cases (whether they are included in the WPP dataset or not), they must be contained in the table of locations (UNlocations
). In addition, their corresponding include_code
must be set to 2. If the column ‘include_code’ is present in my.tfr.file
, its value overwrites the default include code, unless it is -1.
The default UN table of locations mentioned above can be overwritten/extended by using a file passed as the my.locations.file
argument. Such a file must have the same structure as the UNlocations
dataset. Entries in this file will overwrite corresponding entries in UNlocations
matched by the column ‘country_code’. If there is no such entry in the default dataset, it will be appended. This option of appending new locations is especially useful in cases when my.tfr.file
contains new countries/regions that are not included in UNlocations
. In such a case, one must provide a my.locations.file
with a definition of those countries/regions.
For simulation of the hyperparameters of the Bayesian hierarchical model, all countries are used that are included in the WPP dataset, possibly complemented by the my.tfr.file
, that have include_code
equal to 2. The hyperparameters are used to simulate country-specific parameters, which is done for all countries with include_code
equal 1 or 2. The following values of include_code
in my.tfr.file
are recognized: -1 (do not overwrite the default include code), 0 (ignore), 1 (include in prediction but not estimation), 2 (include in both, estimation and prediction). Thus, the set of countries included in the estimation and prediction can be fully user-specific.
Optionally, my.tfr.file
can contain a column called last.observed
containing the year of the last observation for each country. In such a case, the code would ignore any data after that time point. Furthermore, the function tfr.predict
fills in the missing values using the median of the BHM procedure (stored in tfr_matrix_reconstructed
of the bayesTFR.prediction
object). For last.observed
values that are below a middle year of a time interval (computed as
) the last valid data point is the time interval
, whereas for values larger equal a middle year, the data point in
is valid.
The package contains a dataset called ‘my_tfr_template’ (in ‘extdata’ directory) which is a template for user-specified my.tfr.file
.
The parameter uncertainty
is set to control whether past TFR is considered to be precise (FALSE
), or need to be estimated from the raw data (TRUE
). In the latter case, the raw TFR observations are taken either from the rawTFR
dataset (default) or from a file given by the my.tfr.raw.file
argument. The Bayesian hierarchical model considers the past TFR as unknown, estimates it and stores in output.dir
. Details can be found in Liu and Raftery (2020). The covariates
, cont_covariates
arguments are for listing categorical and continuous features for estimating bias and standard deviation of past TFR observations. If a country is known to have unbiased vital registration (VR) records, one can include it in the iso.unbiased
argument as those countries will estimate their past VR records to have 0 bias and 0.0161 standard deviation. The VR records are identified as having “VR” in the column given by source.col.name
(“source” by default).
If annual=TRUE
, which implies using annual data for training the model, the parameter ar.phase2
will be activated. If ar.phase2
is set to TRUE
, then the model of Phase II will change from to
.
is considered as country-independent and is called
rho_phase2
.
Furthermore, if annual
is TRUE
and my.tfr.file
is given, the data in the file must be on annual basis and no matching with the WPP dataset takes place.
An object of class bayesTFR.mcmc.set
which is a list with two components:
meta |
An object of class |
mcmc.list |
A list of objects of class |
Hana Sevcikova, Leontine Alkema, Peiran Liu
Peiran Liu, Hana Sevcikova, Adrian E. Raftery (2023): Probabilistic Estimation and Projection of the Annual Total Fertility Rate Accounting for Past Uncertainty: A Major Update of the bayesTFR R Package. Journal of Statistical Software, 106(8), 1-36. doi:10.18637/jss.v106.i08.
L. Alkema, A. E. Raftery, P. Gerland, S. J. Clark, F. Pelletier, Buettner, T., Heilig, G.K. (2011). Probabilistic Projections of the Total Fertility Rate for All Countries. Demography, Vol. 48, 815-839. doi:10.1007/s13524-011-0040-5.
P. Liu, and A. E. Raftery (2020). Accounting for Uncertainty About Past Values In Probabilistic Projections of the Total Fertility Rate for All Countries. Annals of Applied Statistics, Vol 14, no. 2, 685-705. doi:10.1214/19-AOAS1294.
get.tfr.mcmc
, summary.bayesTFR.mcmc.set
.
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains = 1, iter = 5, output.dir = sim.dir, verbose = TRUE) summary(m) m <- continue.tfr.mcmc(iter = 5, verbose = TRUE) summary(m) unlink(sim.dir, recursive = TRUE) ## End(Not run)
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains = 1, iter = 5, output.dir = sim.dir, verbose = TRUE) summary(m) m <- continue.tfr.mcmc(iter = 5, verbose = TRUE) summary(m) unlink(sim.dir, recursive = TRUE) ## End(Not run)
Run MCMC for extra countries, areas or regions. It uses the posterior distribution of model hyperparameters from an existing simulation to generate country-specific parameters.
run.tfr.mcmc.extra(sim.dir = file.path(getwd(), "bayesTFR.output"), countries = NULL, my.tfr.file = NULL, iter = NULL, thin = 1, thin.extra = 1, burnin = 2000, parallel = FALSE, nr.nodes = NULL, my.locations.file = NULL, uncertainty = FALSE, my.tfr.raw.file = NULL, use.wpp.data = TRUE, iso.unbiased = NULL, covariates = c('source', 'method'), cont_covariates = NULL, source.col.name = "source", average.gammas.cov = TRUE, verbose = FALSE, verbose.iter = 100, ...)
run.tfr.mcmc.extra(sim.dir = file.path(getwd(), "bayesTFR.output"), countries = NULL, my.tfr.file = NULL, iter = NULL, thin = 1, thin.extra = 1, burnin = 2000, parallel = FALSE, nr.nodes = NULL, my.locations.file = NULL, uncertainty = FALSE, my.tfr.raw.file = NULL, use.wpp.data = TRUE, iso.unbiased = NULL, covariates = c('source', 'method'), cont_covariates = NULL, source.col.name = "source", average.gammas.cov = TRUE, verbose = FALSE, verbose.iter = 100, ...)
sim.dir |
Directory with an existing simulation. |
countries |
Vector of country codes. These include codes of areas and regions (see column |
my.tfr.file |
File name containing user-specified TFR time series for countries for which the simulation should run (see Details below). |
iter |
Number of iterations to be used for sampling from the posterior distribution of the hyperparameters. By default, the number of iterations used in the existing simulation is taken. |
thin |
Thinning interval for sampling from the posterior distribution of the hyperparameters. |
thin.extra |
Thinning interval for the MCMC run for extra countries. |
burnin |
Number of iterations discarded before sampling from the posterior distribution of the hyperparameters. It is also used when computing proposal of gamma covariance matrices (see |
parallel |
Logical determining if the simulation should run multiple chains in parallel. |
nr.nodes |
Relevant only if |
my.locations.file |
File name containing user-specified locations. See Details below. |
uncertainty |
Whether past TFR uncertainty is considered. If |
my.tfr.raw.file |
File name of the raw TFR used when uncertainty is |
use.wpp.data |
Logical indicating if default WPP data should be used, i.e. if |
iso.unbiased |
Codes of countries for which the vital registration TFR estimates are considered unbiased. See details in |
covariates , cont_covariates
|
Categorical and continuous features used in estimating bias and standard deviation if |
source.col.name |
If |
average.gammas.cov |
Set this to |
verbose |
Logical switching log messages on and off. |
verbose.iter |
Integer determining how often (in number of iterations) log messages are outputted during the estimation. |
... |
Additional parameters to be passed to the function |
The function can be used to make predictions for countries, areas or regions (further denoted as ‘countries’) that were not included in the MCMC estimation (invoked by run.tfr.mcmc
). It creates MCMC traces for country-specific parameters. The purpose of this function is to have country-specific parameters available in order to be able to generate projections for additional countries or their aggregations, without having to re-run the often time-expensive MCMC simulation.
The set of countries to be considered by this function can be given either by their codes, using the argument countries
, in which case the countries must be included in the UN WPP tfr
dataset. Or, it can be given by a user-specific TFR file, using the argument my.tfr.file
. The countries
argument haas a priority over my.tfr.file
.
In the default case of uncertainty = FALSE
, the function will ignore all countries that were used in the existing MCMC simulation for estimating the hyperparameters. However, countries that already own country-specific parameters (e.g. because they were included in my.tfr.file
passed to run.tfr.mcmc
with include_code = 1
, or from a previous pass of the run.tfr.mcmc.extra
function) get their parameters recomputed. In case of uncertainty = TRUE
, all specified countries, regardless if they were included in the existing world simulation or not, get their parameters recomputed. It is therefore advisable to make a backup copy of the exisiting MCMC simulation, as there is a no easy way to revert the parameters to their original values.
Note that all affected countries should be included in the UNlocations
dataset, but unlike in run.tfr.mcmc
, their include_code
is ignored. As in the case of run.tfr.mcmc
, the default dataset of locations UNlocations
can be overwritten using a file of the same structure as UNlocations
passed via the my.locations.file
argument. This file should be especially used, if TFR is simulated for new locations that are not included in UNlocations
.
An object of class bayesTFR.mcmc.set
.
If there is an existing projection for the directory sim.dir
, use tfr.predict.extra
to obtain projections for the extra countries used in this function.
Hana Sevcikova, Leontine Alkema, Peiran Liu
run.tfr.mcmc
, tfr.predict.extra
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains = 1, iter = 20, output.dir = sim.dir, verbose = TRUE) m <- run.tfr.mcmc.extra(sim.dir = sim.dir, countries = c(908, 924), burnin = 10, verbose = TRUE) summary(m, country = 924) pred <- tfr.predict(m, burnin = 10, use.tfr3 = FALSE, verbose = TRUE) summary(pred, country = 908) unlink(sim.dir, recursive = TRUE) ## End(Not run)
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains = 1, iter = 20, output.dir = sim.dir, verbose = TRUE) m <- run.tfr.mcmc.extra(sim.dir = sim.dir, countries = c(908, 924), burnin = 10, verbose = TRUE) summary(m, country = 924) pred <- tfr.predict(m, burnin = 10, use.tfr3 = FALSE, verbose = TRUE) summary(pred, country = 908) unlink(sim.dir, recursive = TRUE) ## End(Not run)
Runs (or continues running) MCMCs for simulating Phase III total fertility rate, using a Bayesian hierarchical version of an AR(1) model.
run.tfr3.mcmc(sim.dir, nr.chains = 3, iter = 50000, thin = 10, replace.output = FALSE, my.tfr.file = NULL, buffer.size = 100, use.extra.countries = FALSE, mu.prior.range = c(0, 2.1), rho.prior.range = c(0, 1 - .Machine$double.xmin), sigma.mu.prior.range = c(1e-05, 0.318), sigma.rho.prior.range = c(1e-05, 0.289), sigma.eps.prior.range = c(1e-05, 0.5), mu.ini = NULL, mu.ini.range = mu.prior.range, rho.ini = NULL, rho.ini.range = rho.prior.range, sigma.mu.ini = NULL, sigma.mu.ini.range = sigma.mu.prior.range, sigma.rho.ini = NULL, sigma.rho.ini.range = sigma.rho.prior.range, sigma.eps.ini = NULL, sigma.eps.ini.range = sigma.eps.prior.range, seed = NULL, parallel = FALSE, nr.nodes = nr.chains, compression.type = "None", auto.conf = list(max.loops = 5, iter = 50000, iter.incr = 20000, nr.chains = 3, thin = 60, burnin = 10000), verbose = FALSE, verbose.iter = 1000, ...) continue.tfr3.mcmc(sim.dir, iter, chain.ids=NULL, parallel = FALSE, nr.nodes = NULL, auto.conf = NULL, verbose=FALSE, verbose.iter = 1000, ...)
run.tfr3.mcmc(sim.dir, nr.chains = 3, iter = 50000, thin = 10, replace.output = FALSE, my.tfr.file = NULL, buffer.size = 100, use.extra.countries = FALSE, mu.prior.range = c(0, 2.1), rho.prior.range = c(0, 1 - .Machine$double.xmin), sigma.mu.prior.range = c(1e-05, 0.318), sigma.rho.prior.range = c(1e-05, 0.289), sigma.eps.prior.range = c(1e-05, 0.5), mu.ini = NULL, mu.ini.range = mu.prior.range, rho.ini = NULL, rho.ini.range = rho.prior.range, sigma.mu.ini = NULL, sigma.mu.ini.range = sigma.mu.prior.range, sigma.rho.ini = NULL, sigma.rho.ini.range = sigma.rho.prior.range, sigma.eps.ini = NULL, sigma.eps.ini.range = sigma.eps.prior.range, seed = NULL, parallel = FALSE, nr.nodes = nr.chains, compression.type = "None", auto.conf = list(max.loops = 5, iter = 50000, iter.incr = 20000, nr.chains = 3, thin = 60, burnin = 10000), verbose = FALSE, verbose.iter = 1000, ...) continue.tfr3.mcmc(sim.dir, iter, chain.ids=NULL, parallel = FALSE, nr.nodes = NULL, auto.conf = NULL, verbose=FALSE, verbose.iter = 1000, ...)
sim.dir |
Directory with an existing simulation of phase II TFR (see |
nr.chains |
Number of MCMC chains to run. |
iter |
Number of iterations to run in each chain. In addition to a single value, it can have the value ‘auto’ in which case the function runs for the number of iterations given in the |
thin |
Thinning interval between consecutive observations to be stored on disk. |
replace.output |
If |
my.tfr.file |
File name containing user-specified TFR time series for one or more countries. See description of this argument in |
buffer.size |
Buffer size (in number of iterations) for keeping data in the memory. |
use.extra.countries |
By default, only countries are used in the MCMCs that were assigned for estimation (i.e. their ‘include_code’ is 2 in the include) dataset and are in phase III at present time (argument |
mu.prior.range , rho.prior.range , sigma.mu.prior.range , sigma.rho.prior.range , sigma.eps.prior.range
|
Min and max for the prior (uniform) distribution of these paraemters. |
mu.ini , rho.ini , sigma.mu.ini , sigma.rho.ini , sigma.eps.ini
|
Initial value(s) of the parameters. It can be a single value or an array of the size |
mu.ini.range , rho.ini.range , sigma.mu.ini.range , sigma.rho.ini.range , sigma.eps.ini.range
|
Min and max for the initial values. |
seed |
Seed of the random number generator. If |
parallel |
Logical determining if the simulation should run multiple chains in parallel. If it is |
nr.nodes |
Relevant only if |
compression.type |
One of ‘None’, ‘gz’, ‘xz’, ‘bz’, determining type of a compression of the MCMC files. Important: Do not use this option for a long MCMC simulation as this tends to cause very long run times due to slow reading! |
auto.conf |
List containing a configuration for an ‘automatic’ run (see description of argument |
verbose |
Logical switching log messages on and off. |
verbose.iter |
Integer determining how often (in number of iterations) messages are outputted during the estimation. |
... |
Additional parameters to be passed to the function |
chain.ids |
Array of chain identifiers that should be resumed. If it is |
The MCMCs are stored in sim.dir
in a subdirectory called “phaseIII”. It has exactly the same structure as phase II MCMCs described in run.tfr.mcmc
.
An object of class bayesTFR.mcmc.set
which is a list with two components:
meta |
An object of class |
mcmc.list |
A list of objects of class |
Hana Sevcikova
Raftery, A.E., Alkema, L. and Gerland, P. (2014). Bayesian Population Projections for the United Nations. Statistical Science, Vol. 29, 58-68. doi:10.1214/13-STS419.
## Not run: sim.dir <- tempfile() # Runs Phase II MCMCs (must be run before Phase III) m <- run.tfr.mcmc(nr.chains=1, iter=5, output.dir=sim.dir, verbose=TRUE) # Runs Phase III MCMCs m3 <- run.tfr3.mcmc(sim.dir=sim.dir, nr.chains=2, iter=50, thin=1, verbose=TRUE) m3 <- continue.tfr3.mcmc(sim.dir=sim.dir, iter=10, verbose=TRUE) summary(m3, burnin=10) unlink(sim.dir, recursive=TRUE) ## End(Not run)
## Not run: sim.dir <- tempfile() # Runs Phase II MCMCs (must be run before Phase III) m <- run.tfr.mcmc(nr.chains=1, iter=5, output.dir=sim.dir, verbose=TRUE) # Runs Phase III MCMCs m3 <- run.tfr3.mcmc(sim.dir=sim.dir, nr.chains=2, iter=50, thin=1, verbose=TRUE) m3 <- continue.tfr3.mcmc(sim.dir=sim.dir, iter=10, verbose=TRUE) summary(m3, burnin=10) unlink(sim.dir, recursive=TRUE) ## End(Not run)
Summary of an object of class bayesTFR.convergence
created using the tfr.diagnose
or tfr3.diagnose
functions. It gives an overview about parameters that did not converge.
## S3 method for class 'bayesTFR.convergence' summary(object, expand = FALSE, ...)
## S3 method for class 'bayesTFR.convergence' summary(object, expand = FALSE, ...)
object |
Object of class |
expand |
By default, the function does not show parameters for each country for which there was no convergence, if the status is ‘red’. This argument can switch that option on. |
... |
Not used. |
Hana Sevcikova
Summary of an object bayesTFR.mcmc.set
or bayesTFR.mcmc
, computed via run.tfr.mcmc
or run.tfr3.mcmc
. It can be obtained either for all countries or for a specific country, and either for all parameters or for specific parameters. The function uses the summary.mcmc
function of the coda package.
## S3 method for class 'bayesTFR.mcmc.set' summary(object, country = NULL, chain.id = NULL, par.names = NULL, par.names.cs = NULL, meta.only = FALSE, thin = 1, burnin = 0, ...) ## S3 method for class 'bayesTFR.mcmc' summary(object, country = NULL, par.names = NULL, par.names.cs = NULL, thin = 1, burnin = 0, ...)
## S3 method for class 'bayesTFR.mcmc.set' summary(object, country = NULL, chain.id = NULL, par.names = NULL, par.names.cs = NULL, meta.only = FALSE, thin = 1, burnin = 0, ...) ## S3 method for class 'bayesTFR.mcmc' summary(object, country = NULL, par.names = NULL, par.names.cs = NULL, thin = 1, burnin = 0, ...)
object |
Object of class |
country |
Country name or code if a country-specific summary is desired. The code can be either numeric or ISO-2 or ISO-3 characters. By default, summary for all countries is generated. |
chain.id |
Identifiers of MCMC chains. By default, all chains are considered. |
par.names |
Country independent parameters to be included in the summary. If the underlying object is an MCMC of phase II, the default names are given by |
par.names.cs |
Country-specific parameters to be included in the summary. If the underlying object is an MCMC of phase II, the default names are given by |
meta.only |
If it is |
thin |
Thinning interval. Only used if larger than the |
burnin |
Number of iterations to be discarded from the beginning of each chain before computing the summary. |
... |
Additional arguments passed to the |
Hana Sevcikova
bayesTFR.mcmc.set
, summary.mcmc
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) summary(m, country="CZE", burnin=15)
sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") m <- get.tfr.mcmc(sim.dir) summary(m, country="CZE", burnin=15)
Country-specific summary of an object of class bayesTFR.prediction
, created using the function tfr.predict
. The summary contains the mean, standard deviation and several commonly used quantiles of the simulated trajectories.
## S3 method for class 'bayesTFR.prediction' summary(object, country = NULL, compact = TRUE, ...)
## S3 method for class 'bayesTFR.prediction' summary(object, country = NULL, compact = TRUE, ...)
object |
Object of class |
country |
Country name or code. The code can be either numeric or ISO-2 or ISO-3 characters. If it is |
compact |
Logical switching between a smaller and larger number of displayed quantiles. |
... |
A list of further arguments. |
Hana Sevcikova
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- tfr.predict(sim.dir=sim.dir, output.dir=file.path(getwd(), "exampleTFRpred"), use.tfr3=FALSE, burnin=15, verbose=TRUE) # If the above function was run previously, do # pred <- get.tfr.prediction(sim.dir=file.path(getwd(), "exampleTFRpred")) summary(pred, country = "Ireland") ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- tfr.predict(sim.dir=sim.dir, output.dir=file.path(getwd(), "exampleTFRpred"), use.tfr3=FALSE, burnin=15, verbose=TRUE) # If the above function was run previously, do # pred <- get.tfr.prediction(sim.dir=file.path(getwd(), "exampleTFRpred")) summary(pred, country = "Ireland") ## End(Not run)
Data set containing the raw TFR estimates for all countries and the data quality covariates.
data("rawTFR")
data("rawTFR")
A data frame with 12709 observations on the following 5 variables.
country_code
Three-digit UN ISO-3166 code for the country of that observation is for.
year
a numeric vector for the year of the observation data.
tfr
TFR value.
method
Estimation method to obtain this value. One of the categorical data quality indicator.
source
Source of the data. One of the categorical data quality indicator.
It is used as the default raw TFR data in a run.tfr.mcmc
simulation. It can be used as a template for a user-defined data which can be provided via the my.tfr.raw.file
argument of run.tfr.mcmc
. The “method” and “source” columns are used as the default data quality covariates.
Data provided by the United Nations Population Division.
data(rawTFR) head(rawTFR)
data(rawTFR) head(rawTFR)
Functions tfr.diagnose
and tfr3.diagnose
run convergence diagnostics of existing TFR MCMCs for phase II and phase III, respectively, using the raftery.diag
function from the coda package. has.mcmc.converged
checks if the existing diagnostics converged.
tfr.diagnose(sim.dir, thin = 80, burnin = 2000, express = FALSE, country.sampling.prop = NULL, keep.thin.mcmc=FALSE, verbose = TRUE) tfr3.diagnose(sim.dir, thin = 60, burnin = 10000, express = TRUE, country.sampling.prop = NULL, verbose = TRUE, ...) has.mcmc.converged(diag)
tfr.diagnose(sim.dir, thin = 80, burnin = 2000, express = FALSE, country.sampling.prop = NULL, keep.thin.mcmc=FALSE, verbose = TRUE) tfr3.diagnose(sim.dir, thin = 60, burnin = 10000, express = TRUE, country.sampling.prop = NULL, verbose = TRUE, ...) has.mcmc.converged(diag)
sim.dir |
Directory with the MCMC simulation results. |
thin |
Thinning interval. |
burnin |
Number of iterations to be discarded from the beginning of the parameter traces. |
express |
Logical. If |
country.sampling.prop |
Proportion of countries that are included in the diagnostics. If it is |
keep.thin.mcmc |
Logical. If |
verbose |
Logical switching log messages on and off. |
diag |
Object of class |
... |
Not used. |
The diagnose functions invoke the tfr.raftery.diag
(or tfr3.raftery.diag
) function separately for country-independent parameters and for country-specific parameters. It results in two possible states: red, i.e. it did not converge, and green, i.e. it converged.
The resulting object from tfr.diagnose
is stored in
‘{sim.dir}/diagnostics/bayesTFR.convergence_{thin}_{burnin}.rda’ and can be accessed using the function get.tfr.convergence
. Function tfr3.diagnose
stores its result into
‘{sim.dir}/phaseIII/diagnostics/bayesTFR.convergence_{thin}_{burnin}.rda’ which can be accessed via get.tfr3.convergence
.
has.mcmc.converged
returns a logical value determining if there is convergence or not.
tfr.diagnose
and tfr3.diagnose
return an object of class bayesTFR.convergence
with components:
result |
Table containing all not-converged parameters. Its columns include ‘Total iterations needed’ and ‘Remaining iterations’. |
lresult.country.independent |
Number of rows in |
country.independent |
Result of |
country.specific |
Result of |
iter.needed |
Number of additional iterations suggested in order to achieve convergence. |
iter.total |
Total number of iterations of the original unthinned set of chains. |
use.nr.traj |
Suggestion for number of trajectories in generating predictions. |
burnin |
Burnin used. |
thin |
Thinning interval used. |
status |
Vector of character strings containing the result status. Possible values: ‘green’, ‘red’. |
mcmc.set |
Object of class |
thin.mcmc |
If |
express |
Value of the input argument |
nr.countries |
Vector with elements |
Hana Sevcikova, Leontine Alkema, Adrian Raftery
tfr.raftery.diag
, raftery.diag
, summary.bayesTFR.convergence
, get.tfr.convergence
, create.thinned.tfr.mcmc
The function computes coverage, i.e. the ratio of observed data fitted within the given probability intervals of the predictive posterior distribution of the double logistic function, as well as the root mean square error and mean absolute error of the simulation.
tfr.dl.coverage(sim.dir, pi = c(80, 90, 95), burnin = 2000, verbose = TRUE)
tfr.dl.coverage(sim.dir, pi = c(80, 90, 95), burnin = 2000, verbose = TRUE)
sim.dir |
Directory with the MCMC simulation results. If a prediction and its corresponding thinned MCMCs are available in the simulation directory, those are taken for assessing the goodness of fit. |
pi |
Probability interval. It can be a single number or an array. |
burnin |
Burnin. Only relevant if |
verbose |
Logical switching log messages on and off. |
List with the following components:
total.coverage |
Vector of the coverage, one element per probability interval. For each |
time.coverage |
Matrix corresponding to the coverage computed per time period. (Rows correspond to probability intervals, columns correspond to time.) It is derived like |
country.coverage |
Matrix corresponding to the coverage computed per country. (Rows correspond to probability intervals, columns correspond to countries.) It is derived like |
total.rmse |
Root mean square error as |
time.rmse |
Like |
country.rmse |
Like |
total.mae |
Mean absolute error as |
time.mae |
Like |
country.mae |
Like |
pred.cdf |
|
n |
0-1 |
To see the fit visually per country, use DLcurve.plot(..., predictive.distr=TRUE,...)
.
Hana Sevcikova
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") tfr <- get.tfr.mcmc(sim.dir) # Note that this simulation is a toy example and thus has not converged. gof <- tfr.dl.coverage(sim.dir) gof$time.coverage DLcurve.plot(tfr, country=608, predictive.distr=TRUE, pi=c(80, 90, 95)) ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") tfr <- get.tfr.mcmc(sim.dir) # Note that this simulation is a toy example and thus has not converged. gof <- tfr.dl.coverage(sim.dir) gof$time.coverage DLcurve.plot(tfr, country=608, predictive.distr=TRUE, pi=c(80, 90, 95)) ## End(Not run)
Plot past TFR estimation results from a simulation that accounted for past TFR uncertainty.
tfr.estimation.plot(mcmc.list = NULL, country = NULL, sim.dir = NULL, burnin = 0, thin = 1, pis = c(80, 95), plot.raw = TRUE, grouping = "source", save.image = TRUE, plot.dir = "Estimation.plot", adjust = TRUE, country.code = deprecated(), ISO.code = deprecated())
tfr.estimation.plot(mcmc.list = NULL, country = NULL, sim.dir = NULL, burnin = 0, thin = 1, pis = c(80, 95), plot.raw = TRUE, grouping = "source", save.image = TRUE, plot.dir = "Estimation.plot", adjust = TRUE, country.code = deprecated(), ISO.code = deprecated())
mcmc.list |
Object of class |
country |
Name or numerical code of a country. It can also be given as ISO-2 or ISO-3 characters. |
sim.dir |
Directory with the MCMC simulation results. |
burnin |
Burn-in for getting trajectories and quantiles. A positive burn-in |
thin |
Thin for getting trajectories and quantiles. Thinning level |
pis |
Probability interval. It can be a single number or an array of two numbers. |
plot.raw |
Whether raw data used for the estimation should be plotted. |
grouping |
If raw data is plotted, then grouping should be one of the categorical feature in the data, so that the color and shape of the raw data will differ for different groups. |
save.image |
Logical. Whether the resulting plot will be saved. |
plot.dir |
If |
adjust |
Logical. By default, if the estimation median is adjusted using e.g. |
country.code , ISO.code
|
Deprecated arguments. Use argument |
tfr.estimation.plot
plots posterior distribution of past TFR estimations for a given country. It only works if uncertainty is considered in the MCMC process.
Peiran Liu, Hana Sevcikova
## Not run: sim.dir <- tempfile() mcmc.set <- run.tfr.mcmc(nr.chains = 1, iter = 10, output.dir = sim.dir, replace.output = TRUE, uncertainty = TRUE) tfr.estimation.plot(mcmc.set, "Nigeria", save.image = FALSE) unlink(sim.dir, recursive = TRUE) ## End(Not run)
## Not run: sim.dir <- tempfile() mcmc.set <- run.tfr.mcmc(nr.chains = 1, iter = 10, output.dir = sim.dir, replace.output = TRUE, uncertainty = TRUE) tfr.estimation.plot(mcmc.set, "Nigeria", save.image = FALSE) unlink(sim.dir, recursive = TRUE) ## End(Not run)
Generate world maps of the total fertility rate for given projection period and quantile, using different techniques: tfr.map
and tfr.map.all
use rworldmap, tfr.ggmap
uses ggplot2, and tfr.map.gvis
creates an interactive map via GoogleVis. In addition to TFR, all these functions allow to project country specific Phase II MCMC parameters into the world maps.
tfr.map(pred, quantile = 0.5, year = NULL, par.name = NULL, adjusted = FALSE, projection.index = 1, device = "dev.new", main = NULL, resolution=c("coarse","low","less islands","li","high"), device.args = NULL, data.args = NULL, ...) tfr.ggmap(pred, quantile = 0.5, year = NULL, par.name = NULL, adjusted = FALSE, projection.index = 1, main = NULL, data.args = NULL, viridis.option = "B", nr.cats = 10, same.scale = FALSE, plot = TRUE, file.name = NULL, plot.size = 4, ...) tfr.map.gvis(pred, year = NULL, quantile = 0.5, pi = 80, par.name = NULL, adjusted = FALSE, ...) tfr.map.all(pred, output.dir, output.type = "png", tfr.range = NULL, nr.cats = 50, same.scale = TRUE, quantile = 0.5, file.prefix='TFRwrldmap_', ...) get.tfr.map.parameters(pred, tfr.range = NULL, nr.cats = 50, same.scale = TRUE, quantile = 0.5, ...)
tfr.map(pred, quantile = 0.5, year = NULL, par.name = NULL, adjusted = FALSE, projection.index = 1, device = "dev.new", main = NULL, resolution=c("coarse","low","less islands","li","high"), device.args = NULL, data.args = NULL, ...) tfr.ggmap(pred, quantile = 0.5, year = NULL, par.name = NULL, adjusted = FALSE, projection.index = 1, main = NULL, data.args = NULL, viridis.option = "B", nr.cats = 10, same.scale = FALSE, plot = TRUE, file.name = NULL, plot.size = 4, ...) tfr.map.gvis(pred, year = NULL, quantile = 0.5, pi = 80, par.name = NULL, adjusted = FALSE, ...) tfr.map.all(pred, output.dir, output.type = "png", tfr.range = NULL, nr.cats = 50, same.scale = TRUE, quantile = 0.5, file.prefix='TFRwrldmap_', ...) get.tfr.map.parameters(pred, tfr.range = NULL, nr.cats = 50, same.scale = TRUE, quantile = 0.5, ...)
pred |
Object of class |
quantile |
Quantile for which the map should be generated. It must be equal to one of the values in |
year |
Year to be plotted. It can be a year within a projection period or a year within an estimation period. In the latter case, the observed data are plotted. If not given, |
par.name |
Name of a country-specific parameter to be plotted. If |
adjusted |
Logical indicating if the measure to be plotted is based on adjusted TFRs. |
projection.index |
Index of the projection to be displayed. It is only relevant if |
device |
Device for displaying the map. It is passed to the |
main |
Title for the map. If it is |
resolution |
Map resolution as implemented in |
device.args |
List of
additional arguments to be passed to the |
data.args |
List of additional arguments to be passed to the underlying data retrieving function. |
viridis.option |
Argument |
nr.cats |
Number of color categories. |
same.scale |
Logical controlling if maps for all projection years of this prediction object should be on the same color scale. |
plot |
Logical indicating if a plot should be shown. If |
file.name |
Name of a file to save the plot. If |
plot.size |
Height of the plotting device in inches. The width is automatically set using the aspect ratio of 2.36. Only used if |
output.dir |
Directory into which resulting maps are stored. |
output.type |
Type of the resulting files. It can be “png”, “pdf”, “jpeg”, “bmp”, “tiff”, or “postscript”. |
tfr.range |
Range of the total fertility rate to be displayed. It is of the form |
file.prefix |
Prefix for file names. |
... |
Arguments passed to the |
pi |
Probability interval to be shown when a country is selected in an interactive map. The corresponding quantiles must be available (see argument |
tfr.map
creates a single map for a given projection period and quantile using the rworldmap package. tfr.map.all
generates a sequence of such maps, namely one for each projection period. If the package fields is installed, a color bar legend at the botom of the map is created.
Function get.tfr.map.parameters
can be used in combination with tfr.map
. (Note that get.tfr.map.parameters
is called from inside of tfr.map.all
.) It sets breakpoints for the color scheme using quantiles of a fitted gamma distribution.
Function tfr.ggmap
is similar to tfr.map
, but used the ggplot2 package in combination with the geom_sf
function.
Function tfr.map.gvis
creates an interactive map using the googleVis package and opens it in an internet browser. It also generates a table of TFRs that can be sorted by columns interactively in the browser.
By default, tfr.map
, tfr.ggmap
and tfr.map.gvis
produce maps of TFRs. Alternatively, the functions can be used to plot country-specific Phase II MCMC parameters into a world map. They are given by the argument par.name
. In addition to the MCMC parameters, if par.name='lambda'
, the period of the end of TFR decline (i.e. start of Phase III) is computed for each country and projected into the map. In such a case, for tfr.map
we recommend to adjust the color scale in tfr.map
e.g. using the arguments catMethod='pretty'
and numCats=20
(see mapCountryData
).
get.tfr.map.parameters
returns a list with elements:
pred |
The object of class |
quantile |
Value of the argument |
catMethod |
If the argument |
numCats |
Number of categories. |
coulourPalette |
Subset of the rainbow palette, starting from dark blue and ending at red. |
... |
Additional arguments passed to the function. |
Hana Sevcikova, Patrick Gerland, Adrian Raftery
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- get.tfr.prediction(sim.dir=sim.dir) # Using ggplot2 tfr.ggmap(pred) tfr.ggmap(pred, year = 2100) # Using rworldmap # Uses heat colors and seven categories by default tfr.map(pred) # Uses more colors with more suitable categorization params <- get.tfr.map.parameters(pred) do.call("tfr.map", params) # Another projection year on the same scale do.call("tfr.map", c(list(year=2043), params)) # Using Google Vizualization tool tfr.map.gvis(pred) ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- get.tfr.prediction(sim.dir=sim.dir) # Using ggplot2 tfr.ggmap(pred) tfr.ggmap(pred, year = 2100) # Using rworldmap # Uses heat colors and seven categories by default tfr.map(pred) # Uses more colors with more suitable categorization params <- get.tfr.map.parameters(pred) do.call("tfr.map", params) # Another projection year on the same scale do.call("tfr.map", c(list(year=2043), params)) # Using Google Vizualization tool tfr.map.gvis(pred) ## End(Not run)
These functions are to be used by expert analysts. They allow to change the projection medians either to specific values, including the WPP values, or shift the medians by a given constant, or by a specific adjusting procedure.
tfr.median.set(sim.dir, country, values, years = NULL, ...) tfr.median.shift(sim.dir, country, reset = FALSE, shift = 0, from = NULL, to = NULL, ...) tfr.median.adjust(sim.dir, countries, factor1 = 2/3, factor2 = 1/3, forceAR1 = FALSE, subdir = "predictions") tfr.median.reset(sim.dir, countries = NULL, ...) tfr.shift.prediction.to.wpp(sim.dir, subdir = "predictions", ...)
tfr.median.set(sim.dir, country, values, years = NULL, ...) tfr.median.shift(sim.dir, country, reset = FALSE, shift = 0, from = NULL, to = NULL, ...) tfr.median.adjust(sim.dir, countries, factor1 = 2/3, factor2 = 1/3, forceAR1 = FALSE, subdir = "predictions") tfr.median.reset(sim.dir, countries = NULL, ...) tfr.shift.prediction.to.wpp(sim.dir, subdir = "predictions", ...)
sim.dir |
Directory containing the prediction object. |
country |
Name or numerical code of a country. |
countries |
Vector of country names or codes. If |
values |
Array of the new median values. |
years |
Numeric vector giving years which |
reset |
Logical. If |
shift |
Constant by which the medians should be offset. It is not used if |
from |
Year from which the offset/reset should start. By default, it starts at the first prediction period. |
to |
Year until which the offset/reset should be done. By default, it is set to the last prediction period. |
factor1 , factor2
|
Adjusting factors for the first and second projection period, respectively (see below). |
forceAR1 |
Logical. If |
subdir |
Subdirectory of |
... |
Additional arguments passed to the underlying adjustment function. For |
The function tfr.median.set
can be used to set the medians of the given country to specific values. Function tfr.median.shift
can be used to offset the medians by a specific constant, or to reset the medians to their original BHM values.
Function tfr.median.adjust
runs the prediction procedure for the given countries with an additional decrement in the model in the first two projection periods. In the first projection period it is computed as factor1*S
where S
is a difference between observed decrement and the expected decrement (by the double logistic function) in the last observed period. In the second projection period, in the formula factor1
is replaced by factor2
. If forceAR1
is set to TRUE
, we recommend to set factor1
and factor2
to 0. The function then calls tfr.median.set
in order to store the new median for each country.
Functiontfr.shift.prediction.to.wpp
shifts the projected medians or means (if stat
is “mean”), so that they correspond to the values found in the tfrprojMed
datasets of the wpp package that either corresponds to the package used for the simulation itself or is given by the wpp.year
argument. If using wpp2022 or higher, the dataset name is automatically adjusted depending if it is an annual or a 5-year simulation. Note that regardless if it is an adjustment of the median or mean, the corresponding offset is always converted to a shift of the median.
Function tfr.median.reset
resets medians of the given countries to the original values. By default it deletes adjustments for all countries.
In all five functions, if a median is modified, the corresponding offset is stored in the prediction object (element median.shift
) and the updated prediction object is written back to disk. All functions in the package that use trajectories and trajectory statistics use the median.shift
values to offset the results correspondingly, i.e. trajectories are shifted the same way as the medians.
All functions return an updated object of class bayesTFR.prediction
.
Hana Sevcikova, Leontine Alkema
tfr.median.set.all
for shifting estimation medians.
These functions are to be used by expert analysts. They allow to change the estimation and projection medians to specific values or to the WPP values.
tfr.median.set.all(sim.dir, country, values, years = NULL, burnin = 0, thin = 1, subdir = "predictions") tfr.median.reset.estimation(sim.dir, countries = NULL) tfr.shift.estimation.to.wpp(sim.dir, ..., verbose = TRUE)
tfr.median.set.all(sim.dir, country, values, years = NULL, burnin = 0, thin = 1, subdir = "predictions") tfr.median.reset.estimation(sim.dir, countries = NULL) tfr.shift.estimation.to.wpp(sim.dir, ..., verbose = TRUE)
sim.dir |
Directory containing the prediction object. |
country |
Name or numerical code of a country. |
countries |
Vector of country names or codes. If |
values |
Array of the new median values. |
years |
Numeric vector giving years which |
burnin |
Burnin to use when computing the estimation median. |
thin |
Thinning interval to use when computing the estimation median. |
subdir |
Subdirectory of |
... |
Can be used to pass |
verbose |
Logical. If |
Expert analysts can use these functions to adjust both prediction and estimation medians. Estimation medians can only be adjusted if the simulation was performed with uncertainty = TRUE
. In such a case years
can include past time periods. By default a union of estimation and projection time periods is considered when matched to values
.
Function tfr.shift.estimation.to.wpp
shifts the median estimation of all countries so that they match the values in the tfr
dataset of the corresponding WPP package. Argument burnin
and thin
should be passed to compute the estimation medians.
Function tfr.median.reset.estimation
resets previous adjustments obtained using tfr.median.set.all
. By default it resets adjustments for all countries.
Output is a list. If there are time periods matched to estimation, an object of class bayesTFR.mcmc.meta
is included in the element meta
. If there are time periods matched to years in prediction, then an object of class bayesTFR.prediction
is included in the element pred
.
Function tfr.shift.estimation.to.wpp
returns the updated mcmc.set
object.
Peiran Liu
tfr.shift.prediction.to.wpp
for shifting prediction medians to WPP values.
Functions for accessing names of the MCMC parameters, either country-independent or country-specific.
tfr.parameter.names(trans = NULL, meta = NULL) tfr.parameter.names.cs(country.code = NULL, trans = NULL, back.trans = TRUE) tfr.parameter.names.extended() tfr.parameter.names.cs.extended(country.code = NULL) tfr3.parameter.names() tfr3.parameter.names.cs(country.code = NULL)
tfr.parameter.names(trans = NULL, meta = NULL) tfr.parameter.names.cs(country.code = NULL, trans = NULL, back.trans = TRUE) tfr.parameter.names.extended() tfr.parameter.names.cs.extended(country.code = NULL) tfr3.parameter.names() tfr3.parameter.names.cs(country.code = NULL)
trans |
It can be |
meta |
It can be |
country.code |
Country code. If it is given, the country-specific parameter names contain the postfix ‘_c |
back.trans |
Logical indicating if back-transformable parameter names (i.e. ‘Triangle_c1’, ..., ‘Triangle_c3’) should be returned. |
tfr.parameter.names
returns names of the country-independent Phase II parameters.tfr.parameter.names.cs
returns names of the country-specific Phase II parameters.tfr.parameter.names.extended
returns names of all country-independent Phase II parameters, including the transformed parameters. Parameters ‘alpha’, ‘delta’, ‘alphat’, and‘deltat’ are in their extended format with the postfix ‘_1’, ‘_2’ and ‘_3’. tfr.parameter.names.cs.extended
returns names of all country-specific Phase II parameters, including the transformed parameters. Parameters ‘gamma’ and‘gammat’ are in their extended format with the postfix ‘_1’, ‘_2’ and ‘_3’.tfr3.parameter.names
returns names of the country-independent Phase III parameters.tfr3.parameter.names.cs
returns names of the country-specific Phase III parameters.
Hana Sevcikova
tfr.parameter.names() tfr.parameter.names.extended() tfr.parameter.names.cs() tfr.parameter.names.cs.extended() tfr3.parameter.names() tfr3.parameter.names.cs()
tfr.parameter.names() tfr.parameter.names.extended() tfr.parameter.names.cs() tfr.parameter.names.cs.extended() tfr3.parameter.names() tfr3.parameter.names.cs()
Functions for plotting density of the posterior distribution of the MCMC parameters.
tfr.pardensity.plot(mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr.parameter.names(trans = TRUE), burnin = NULL, dev.ncol=5, low.memory = TRUE, ...) tfr.pardensity.cs.plot(country, mcmc.list=NULL, sim.dir=file.path(getwd(), "bayesTFR.output"), chain.ids=NULL, par.names=tfr.parameter.names.cs(trans=TRUE), burnin=NULL, dev.ncol=3, low.memory=TRUE, ...) tfr3.pardensity.plot(mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr3.parameter.names(), burnin = NULL, dev.ncol=3, low.memory = TRUE, ...) tfr3.pardensity.cs.plot(country, mcmc.list=NULL, sim.dir=file.path(getwd(), "bayesTFR.output"), chain.ids=NULL, par.names=tfr3.parameter.names.cs(), burnin=NULL, dev.ncol=2, low.memory=TRUE, ...)
tfr.pardensity.plot(mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr.parameter.names(trans = TRUE), burnin = NULL, dev.ncol=5, low.memory = TRUE, ...) tfr.pardensity.cs.plot(country, mcmc.list=NULL, sim.dir=file.path(getwd(), "bayesTFR.output"), chain.ids=NULL, par.names=tfr.parameter.names.cs(trans=TRUE), burnin=NULL, dev.ncol=3, low.memory=TRUE, ...) tfr3.pardensity.plot(mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr3.parameter.names(), burnin = NULL, dev.ncol=3, low.memory = TRUE, ...) tfr3.pardensity.cs.plot(country, mcmc.list=NULL, sim.dir=file.path(getwd(), "bayesTFR.output"), chain.ids=NULL, par.names=tfr3.parameter.names.cs(), burnin=NULL, dev.ncol=2, low.memory=TRUE, ...)
country |
Name or code of a country. The code can be either numeric or ISO-2 or ISO-3 characters. |
mcmc.list |
List of |
sim.dir |
Directory with the MCMC simulation results. It is only used if |
chain.ids |
List of MCMC identifiers to be plotted. If it is |
par.names |
Names of parameters for which density should be plotted. By default all (possibly transformed) country-independent parameters are plotted if used within |
burnin |
Number of iterations to be discarded from the beginning of each chain. |
dev.ncol |
Number of column for the graphics device. If the number of parameters is smaller than |
low.memory |
Logical indicating if the processing should run in a low-memory mode. If it is |
... |
Further arguments passed to the |
The functions plot the density of the posterior distribution either for country-independent parameters (tfr.pardensity.plot
for phase II MCMCs and tfr3.pardensity.plot
for phase III MCMCs) or for country-specific parameters (tfr.pardensity.cs.plot
for phase II and tfr3.pardensity.cs.plot
for phase III), one graph per parameter. One can restrict it to specific chains by setting the chain.ids
argument and to specific parameters by setting the par.names
argument.
If mcmc.list
is an object of class bayesTFR.prediction
(which is allowed in tfr.pardensity.plot
and tfr.pardensity.cs.plot
only) and if this object contains thinned traces, they are used instead of the full chains. In such a case, burnin
and chain.ids
cannot be modified - their value is set to the one used when the thinned traces were created, namely when running tfr.predict
. In a situation with long MCMC chains, this approach can significantly speed-up creation of the density plots.
Hana Sevcikova
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") tfr.pardensity.plot(sim.dir=sim.dir) tfr.pardensity.cs.plot(country="Ireland", sim.dir=sim.dir, bw=0.2) ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") tfr.pardensity.plot(sim.dir=sim.dir) tfr.pardensity.cs.plot(country="Ireland", sim.dir=sim.dir, bw=0.2) ## End(Not run)
Functions for plotting the MCMC parameter traces.
tfr.partraces.plot(mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr.parameter.names(trans = TRUE), nr.points = NULL, dev.ncol=5, low.memory = TRUE, ...) tfr.partraces.cs.plot(country, mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr.parameter.names.cs(trans = TRUE), nr.points = NULL, dev.ncol=3, low.memory = TRUE, ...) tfr3.partraces.plot(mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr3.parameter.names(), nr.points = NULL, dev.ncol=3, low.memory = TRUE, ...) tfr3.partraces.cs.plot(country, mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr3.parameter.names.cs(), nr.points = NULL, dev.ncol=2, low.memory = TRUE, ...)
tfr.partraces.plot(mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr.parameter.names(trans = TRUE), nr.points = NULL, dev.ncol=5, low.memory = TRUE, ...) tfr.partraces.cs.plot(country, mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr.parameter.names.cs(trans = TRUE), nr.points = NULL, dev.ncol=3, low.memory = TRUE, ...) tfr3.partraces.plot(mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr3.parameter.names(), nr.points = NULL, dev.ncol=3, low.memory = TRUE, ...) tfr3.partraces.cs.plot(country, mcmc.list = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), chain.ids = NULL, par.names = tfr3.parameter.names.cs(), nr.points = NULL, dev.ncol=2, low.memory = TRUE, ...)
country |
Name or numerical code of a country. The code can be either numeric or ISO-2 or ISO-3 characters. |
mcmc.list |
List of |
sim.dir |
Directory with the MCMC simulation results. It is only used if |
chain.ids |
List of MCMC identifiers to be plotted. If it is |
par.names |
Names of parameters for which traces should be plotted. By default all (possibly transformed) country-independent parameters are plotted if used within |
nr.points |
Number of points to be plotted. If |
dev.ncol |
Number of column for the graphics device. If the number of parameters is smaller than |
low.memory |
Logical indicating if the processing should run in a low-memory mode. If it is |
... |
Additional graphical arguments. |
The functions plot MCMC traces either for country-independent parameters (tfr.partraces.plot
for phase II MCMCs and tfr3.partraces.plot
for phase III MCMCs) or for country-specific parameters (tfr.partraces.cs.plot
for phase II MCMCs and tfr3.partraces.cs.plot
for phase III MCMCs), one graph per parameter. One can restrict it to specific chains by setting the chain.ids
argument, and to specific parameters by setting the par.names
argument.
Hana Sevcikova
coda.list.mcmc
for retrieving raw values of the traces.
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") tfr.partraces.plot(sim.dir=sim.dir) tfr.partraces.cs.plot(country="Netherlands", sim.dir=sim.dir) ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") tfr.partraces.plot(sim.dir=sim.dir) tfr.partraces.cs.plot(country="Netherlands", sim.dir=sim.dir) ## End(Not run)
Using the posterior parameter samples simulated by run.tfr.mcmc
(and possibly run.tfr3.mcmc
) the function generates posterior trajectories for the total fertility rate for all countries of the world.
tfr.predict(mcmc.set = NULL, end.year = 2100, sim.dir = file.path(getwd(), "bayesTFR.output"), replace.output = FALSE, start.year = NULL, nr.traj = NULL, thin = NULL, burnin = 2000, use.diagnostics = FALSE, use.tfr3 = TRUE, burnin3 = 2000, mu = 2.1, rho = 0.8859, sigmaAR1 = 0.1016, min.tfr = 0.5, use.correlation = FALSE, save.as.ascii = 0, output.dir = NULL, subdir = "predictions", low.memory = TRUE, seed = NULL, verbose = TRUE, uncertainty = FALSE, ...)
tfr.predict(mcmc.set = NULL, end.year = 2100, sim.dir = file.path(getwd(), "bayesTFR.output"), replace.output = FALSE, start.year = NULL, nr.traj = NULL, thin = NULL, burnin = 2000, use.diagnostics = FALSE, use.tfr3 = TRUE, burnin3 = 2000, mu = 2.1, rho = 0.8859, sigmaAR1 = 0.1016, min.tfr = 0.5, use.correlation = FALSE, save.as.ascii = 0, output.dir = NULL, subdir = "predictions", low.memory = TRUE, seed = NULL, verbose = TRUE, uncertainty = FALSE, ...)
mcmc.set |
Object of class |
end.year |
End year of the prediction. |
sim.dir |
Directory with the MCMC simulation results. It should equal to the |
replace.output |
Logical. If |
start.year |
Start year of the prediction. By default the prediction is started at the next time period after |
nr.traj |
Number of trajectories to be generated. If |
thin |
Thinning interval used for determining the number of trajectories. Only relevant, if |
burnin |
Number of iterations to be discarded from the beginning of the parameter traces. |
use.diagnostics |
Logical determining if an existing convergence diagnostics for phase II MCMCs should be used for choosing the values of |
use.tfr3 |
Logical determining if phase III should be predicted via MCMCs (simulated via |
burnin3 |
Burnin used for phase III MCMCs. Only relevant if |
save.as.ascii |
Either a number determining how many trajectories should be converted into an ASCII file, or “all” in which case all trajectories are converted. It should be set to 0, if no conversion is desired. |
output.dir |
Directory into which the resulting prediction object and the trajectories are stored. If it is |
subdir |
Subdirectory of |
low.memory |
Logical indicating if the prediction should run in a low-memory mode. If it is |
mu |
Long-term mean |
rho |
Autoregressive parameter |
sigmaAR1 |
Standard deviation |
min.tfr |
Smallest TFR value allowed. |
use.correlation |
Logical. If |
seed |
Seed of the random number generator. If |
verbose |
Logical switching log messages on and off. |
uncertainty |
Logical. If the MCMC steps has considered uncertainty of past TFR and |
... |
Further arguments passed to the underlying functions. |
The trajectories are generated using a distribution of country-specific decline curves (Alkema et al 2011) and either a classic AR(1) process or a country-specific AR(1) process (Raftery et al 2013). Phase II parameter samples simulated using run.tfr.mcmc
are used from all chains, from which the given burnin was discarded. They are evenly thinned to match nr.traj
or using the thin
argument. Such thinned parameter traces, collapsed into one chain, if they do not already exist, are stored on disk into the sub-directory ‘{thinned_mcmc_t_b’ where t is the value of thin
and b the value of burnin
(see create.thinned.tfr.mcmc
).
If Phase III is projected using a BHM (i.e. if use.tfr3
is TRUE
), parameter samples simulated via run.tfr3.mcmc
are used from which burnin (given by burnin3
) is discarded and the chains are evenly thinned in a way that the total size corresponds to the final size of the Phase II parameter samples. Countries for which there are no simulated country-specific Phase III parameters (e.g. because their TFR is still in Phase II or it is an aggregated region) use samples of the “world” AR(1) parameters.
The projection is run for all missing values before the present year, if any. Medians over the trajectories are used as imputed values and the trajectories are discarded. The process then continues by projecting the future values where all generated trajectories are kept.
The resulting prediction object is saved into ‘{output.dir}/{subdir}’. Trajectories for all countries are saved into the same directory in a binary format, one file per country. At the end of the projection, if save.as.ascii
is larger than 0, the function converts the given number of trajectories into a CSV file of a UN-specific format. They are selected by equal spacing (see function convert.tfr.trajectories
for more details on the conversion). In addition, two summary files are created: one in a user-friendly format, the other using a UN-specific coding of the variants and time (see write.projection.summary
for more details).
Object of class bayesTFR.prediction
which is a list containing components:
quantiles |
A |
traj.mean.sd |
A |
nr.traj |
Number of trajectories. |
trf_matrix_reconstructed |
Matrix containing imputed TFR values on spots where the original TFR matrix has missing values, i.e. between the last observed data point and the present year. |
output.directory |
Directory where trajectories corresponding to this prediction are stored. |
nr.projections |
Number of projections. |
burnin , thin , burnin3 , thin3
|
Burnin and thin used for this prediction for Phase II and Phase III, respectively. |
end.year |
The end year of this prediction. |
mu , rho , sigma_t , sigmaAR1
|
Parameters of the AR(1) process. |
min.tfr |
Input value of minimum threshold for TFR. |
na.index |
Index of trajectories for which at least one country got |
mcmc.set |
Object of class |
Hana Sevcikova, Leontine Alkema, Peiran Liu, Bailey Fosdick
Peiran Liu, Hana Sevcikova, Adrian E. Raftery (2023): Probabilistic Estimation and Projection of the Annual Total Fertility Rate Accounting for Past Uncertainty: A Major Update of the bayesTFR R Package. Journal of Statistical Software, 106(8), 1-36. doi:10.18637/jss.v106.i08.
L. Alkema, A. E. Raftery, P. Gerland, S. J. Clark, F. Pelletier, Buettner, T., Heilig, G.K. (2011). Probabilistic Projections of the Total Fertility Rate for All Countries. Demography, Vol. 48, 815-839. doi:10.1007/s13524-011-0040-5.
Raftery, A.E., Alkema, L. and Gerland, P. (2014). Bayesian Population Projections for the United Nations. Statistical Science, Vol. 29, 58-68. doi:10.1214/13-STS419.
Fosdick, B., Raftery, A.E. (2014). Regional Probabilistic Fertility Forecasting by Modeling Between-Country Correlations. Demographic Research, Vol. 30, 1011-1034. doi:10.4054/demres.2014.30.35
run.tfr.mcmc
, run.tfr3.mcmc
, create.thinned.tfr.mcmc
, convert.tfr.trajectories
, write.projection.summary
,
get.tfr.prediction
, summary.bayesTFR.prediction
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains=1, iter=10, output.dir=sim.dir, verbose=TRUE) m3 <- run.tfr3.mcmc(sim.dir=sim.dir, nr.chains=2, iter=40, thin=1, verbose=TRUE) pred <- tfr.predict(m, burnin=0, burnin3=10, verbose=TRUE) summary(pred, country="Iceland") unlink(sim.dir, recursive=TRUE) ## End(Not run)
## Not run: sim.dir <- tempfile() m <- run.tfr.mcmc(nr.chains=1, iter=10, output.dir=sim.dir, verbose=TRUE) m3 <- run.tfr3.mcmc(sim.dir=sim.dir, nr.chains=2, iter=40, thin=1, verbose=TRUE) pred <- tfr.predict(m, burnin=0, burnin3=10, verbose=TRUE) summary(pred, country="Iceland") unlink(sim.dir, recursive=TRUE) ## End(Not run)
Using the posterior parameter samples the function generates posterior trajectories of the total fertility rate for given countries or regions. It is intended to be used after running run.tfr.mcmc.extra
, but it can be also used for purposes of testing specific settings on one or a few countries.
tfr.predict.extra(sim.dir = file.path(getwd(), 'bayesTFR.output'), prediction.dir = sim.dir, subdir = "predictions", countries = NULL, save.as.ascii = 0, verbose = TRUE, uncertainty=FALSE, all.countries.required = TRUE, use.correlation = NULL)
tfr.predict.extra(sim.dir = file.path(getwd(), 'bayesTFR.output'), prediction.dir = sim.dir, subdir = "predictions", countries = NULL, save.as.ascii = 0, verbose = TRUE, uncertainty=FALSE, all.countries.required = TRUE, use.correlation = NULL)
sim.dir |
Directory with the MCMC simulation results. |
prediction.dir |
Directory where the prediction object and the trajectories are stored. |
subdir |
Subdirectory of |
countries |
Vector of country codes for which the prediction should be made. If it is |
save.as.ascii |
Either a number determining how many trajectories should be converted into an ascii file, or “all” in which case all trajectories are converted. It should be set to 0, if no conversion is desired. Note that the conversion is done on all countries. |
verbose |
Logical switching log messages on and off. |
uncertainty |
Logical. If the MCMC steps considered uncertainty of past TFR and |
all.countries.required |
If |
use.correlation |
If missing and if the number of countries is larger than one, it takes the same value as was used in the main simulation. For one country the default is |
In order to use this function, a prediction object must exist, i.e. the function tfr.predict
must have been processed prior to using this function.
Trajectories for given countries or regions are generated and stored in binary format along with other countries (in prediction_dir
). The existing prediction object is updated and stored in the same directory. If save.as.ascii
is larger than zero, trajectories of ALL countries are converted to an ascii format.
Updated object of class bayesTFR.prediction
.
Hana Sevcikova
Generates posterior trajectories of the total fertility rate for subregions of given countries, using the Scale-AR(1) method.
tfr.predict.subnat(countries, my.tfr.file, sim.dir = file.path(getwd(), "bayesTFR.output"), end.year = 2100, start.year = NULL, subdir = "predictions", output.dir = NULL, annual = NULL, nr.traj = NULL, seed = NULL, min.tfr = 0.5, ar.pars = NULL, save.as.ascii = 0, verbose = TRUE)
tfr.predict.subnat(countries, my.tfr.file, sim.dir = file.path(getwd(), "bayesTFR.output"), end.year = 2100, start.year = NULL, subdir = "predictions", output.dir = NULL, annual = NULL, nr.traj = NULL, seed = NULL, min.tfr = 0.5, ar.pars = NULL, save.as.ascii = 0, verbose = TRUE)
countries |
Vector of numerical country codes or country names. |
my.tfr.file |
Tab-separated ASCII file containing the subnational TFR data. See Details for more information on its format. |
sim.dir |
Simulation directory with the national projections generated using |
end.year |
End year of the projections. |
start.year |
Start year of the projections. By default, projections start at the same time point as the national projections. |
subdir |
Subdirectory of |
output.dir |
Directory into which the resulting prediction objects and the trajectories are stored. See below for details. |
annual |
Logical indicating if the subnational projection should be on an annual scale or a 5-year scale. By default,
the scale is matched to the national simulation. If the subnational and national scales are not the same, the national trajectories are either interpolated (if |
nr.traj |
Number of trajectories to be generated. If |
seed |
Seed of the random number generator. If NULL no seed is set. It can be used to generate reproducible projections. |
min.tfr |
Lower bound on TFR. |
ar.pars |
Named vector containing the parameter estimates of the AR(1) process. If given, it must have elements called |
save.as.ascii |
Either a number determining how many trajectories should be converted into an ASCII file, or “all” in which case all trajectories are converted. By default no conversion is performed. |
verbose |
Logical switching log messages on and off. |
The function implements the methodology described in Sevcikova et al (2017). Given a set of national bayesTFR projections, it applies the Scale-AR(1) model to each national trajectory and each subregion of given countries which yield subnational TFR projections.
The file on subnational data passed in my.tfr.file
has to have a column “country_code” with numerical values corresponding to countries given in the argument countries
, and column “reg_code” giving the numerical identifier of each subregion. Column “name” should be used for subregion name, and column “country” for country name. An optional column “include_code” can be used to eliminate entries from processing. Entries with values of 1 or 2 will be included, all others will be ignored. Column “last.observed” can be used to define which time period contains the last observed data point (given as integer, e.g. year in the middle of the time period). Remaining columns define the time periods, e.g. “2000-2005”, “2005-2010” for a 5-year simulation, or “2020”, “2021” for an annual simulation. The package contains an example of such dataset, see Example below.
The default AR(1) parameters were designed for a 5-year simulation, see Sevcikova et al (2017) for more detail. These are . We use the following conversion for the autoregressive parameter
and the standard deviation
if an annual AR(1) process is desired:
. The long-term mean
stays the same for both processes. Thus, the annual parameters are
c(mu = 1, rho = 0.98445, sigma = 0.02086)
. Note that if the ar.pars
argument is specified by the user, it is assumed that the parameters have been scaled appropriately and thus, no conversion takes place.
Argument output.dir
gives a location on disk where results of the function should be stored. If it is NULL
(default), results are stored in the same directory as the national projections. In both cases a subdirectory called “subnat” is created in which each country has its own subfolder with the country code in its name. Each such subfolder contains the same type of outputs as in the national case generated using tfr.predict
, most importantly a directory “predictions” with trajectories for each region.
A list of objects of class bayesTFR.prediction
. The name of each element includes its country code. Not all elements of the class bayesTFR.prediction
are available. For example, no mcmc.set
is attached to these objects. Thus, not all functions that work with bayesTFR.prediction
can be applied to these results.
Even though the resulting object contains subnational results, the names of its elements are the same as in the national case. This allows to apply the same functions on both objects (subnational and national). However, it means that sometimes the meaning of the elements or function arguments does not match the subnational context. For example, various functions expect the argument country
. When a subnational object is passed to such function, country
means a subregion.
Hana Sevcikova
Hana Sevcikova, Adrian E. Raftery, Patrick Gerland (2018). Probabilistic Projection of Subnational Total Fertility Rates. Demographic Research, Vol. 38(60), 1843-1884. doi:10.4054/DemRes.2018.38.60.
get.regtfr.prediction
, tfr.predict
# View the example data my.subtfr.file <- file.path(find.package("bayesTFR"), 'extdata', 'subnational_tfr_template.txt') subtfr <- read.delim(my.subtfr.file, check.names=FALSE) head(subtfr) # Directory with national projections (contains 30 trajectories for each country) nat.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") # Subnational projections for Australia and Canada subnat.dir <- tempfile() preds <- tfr.predict.subnat(c(36, 124), my.tfr.file=my.subtfr.file, sim.dir=nat.dir, output.dir=subnat.dir, start.year=2013) names(preds) get.countries.table(preds[["36"]]) summary(preds[["36"]], "Queensland") tfr.trajectories.plot(preds[["36"]], "Queensland") # plot subnational and national TFR in one plot nat.pred <- get.tfr.prediction(nat.dir) tfr.trajectories.plot(preds[["36"]], 186, pi=80, half.child.variant=FALSE) tfr.trajectories.plot(nat.pred, "Australia", half.child.variant=FALSE, add=TRUE, col=rep("darkgreen", 5), nr.traj=0, show.legend=FALSE) legend("topright", c("regional TFR", "national TFR"), col=c("red", "darkgreen"), lty=1, bty='n') # Retrieve trajectories trajs.Alberta <- get.tfr.trajectories(preds[["124"]], "Alberta") summary(t(trajs.Alberta)) # cleanup unlink(subnat.dir) # See more examples in ?get.regtfr.prediction
# View the example data my.subtfr.file <- file.path(find.package("bayesTFR"), 'extdata', 'subnational_tfr_template.txt') subtfr <- read.delim(my.subtfr.file, check.names=FALSE) head(subtfr) # Directory with national projections (contains 30 trajectories for each country) nat.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") # Subnational projections for Australia and Canada subnat.dir <- tempfile() preds <- tfr.predict.subnat(c(36, 124), my.tfr.file=my.subtfr.file, sim.dir=nat.dir, output.dir=subnat.dir, start.year=2013) names(preds) get.countries.table(preds[["36"]]) summary(preds[["36"]], "Queensland") tfr.trajectories.plot(preds[["36"]], "Queensland") # plot subnational and national TFR in one plot nat.pred <- get.tfr.prediction(nat.dir) tfr.trajectories.plot(preds[["36"]], 186, pi=80, half.child.variant=FALSE) tfr.trajectories.plot(nat.pred, "Australia", half.child.variant=FALSE, add=TRUE, col=rep("darkgreen", 5), nr.traj=0, show.legend=FALSE) legend("topright", c("regional TFR", "national TFR"), col=c("red", "darkgreen"), lty=1, bty='n') # Retrieve trajectories trajs.Alberta <- get.tfr.trajectories(preds[["124"]], "Alberta") summary(t(trajs.Alberta)) # cleanup unlink(subnat.dir) # See more examples in ?get.regtfr.prediction
The functions compute the Raftery diagnostics for each parameter of MCMCs of phase II (tfr.raftery.diag
) and phase III (tfr3.raftery.diag
), taking median over all chains.
tfr.raftery.diag(mcmc = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), burnin = 0, country = NULL, par.names = NA, par.names.cs = NA, country.sampling.prop = 1, verbose=TRUE, ...) tfr3.raftery.diag(mcmc = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), burnin = 0, country = NULL, par.names = NA, par.names.cs = NA, country.sampling.prop = 1, verbose=TRUE, ...)
tfr.raftery.diag(mcmc = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), burnin = 0, country = NULL, par.names = NA, par.names.cs = NA, country.sampling.prop = 1, verbose=TRUE, ...) tfr3.raftery.diag(mcmc = NULL, sim.dir = file.path(getwd(), "bayesTFR.output"), burnin = 0, country = NULL, par.names = NA, par.names.cs = NA, country.sampling.prop = 1, verbose=TRUE, ...)
mcmc |
A |
sim.dir |
Directory with the MCMC simulation results. Only used if |
burnin |
Burnin. |
country |
Name or code of a country. If it is given, country-specific parameters are reduced to parameters of that country. |
par.names |
Names of country-independent parameters for which the Raftery diagnostics should be computed. By default all parameters are used.
If it is |
par.names.cs |
Names of country-specific parameters for which the Raftery diagnostics should be computed. By default all parameters are used.
If it is |
country.sampling.prop |
Proportion of countries that are included in the diagnostics. It should be between 0 and 1. If it is smaller than 1, the countries are randomly sampled. It is only relevant if |
verbose |
Logical switching log messages on and off. |
... |
Additional arguments passed to the |
The Raftery diagnostics is computed for each parameter, using coda's raftery.diag
with r=0.0125, q=0.025 and q=0.975. Values of and burnin are taken as the median over chains. For each country-specific parameter, the maximum over all included countries of such medians is taken.
List with the components:
Nmedian |
2-d array of |
burnin |
2-d array of the same structure as |
not.converged.parameters |
List with two elements, each of which is a data frame containing columns “parameter.name”, “chain.id”, and “N”. These are parameters for which the computed value of Raftery diagnostics |
not.converged.inchain.parameters |
List of the same structure as |
N.country.indep |
Data frame containing columns “parameter.name”, “chain.id”, “N0.025”, and “N0.975”. Each row gives |
N.country.spec |
The same as |
Nmedian.country.spec |
2-d array of |
thin.ind |
List with elements '0.025', '0.975' and |
nr.countries |
Vector with elements |
Hana Sevcikova, Adrian Raftery
The functions plot/tabulate the posterior distribution of TFR trajectories for a given country, or for all countries, including their median and given probability intervals.
tfr.trajectories.plot(tfr.pred, country, pi = c(80, 95), half.child.variant = TRUE, nr.traj = NULL, adjusted.only = TRUE, typical.trajectory = FALSE, mark.estimation.points = FALSE, traj.index = NULL, show.mean = FALSE, show.median = TRUE, xlim = NULL, ylim = NULL, type = 'b', xlab = 'Year', ylab = 'TFR', main = NULL, lwd = c(2, 2, 2, 2, 2, 1), col=c('black', 'green', 'red', 'red', 'blue', '#00000020'), show.legend = TRUE, add = FALSE, uncertainty = FALSE, col_unc = "purple", ...) tfr.trajectories.plot.all(tfr.pred, output.dir = file.path(getwd(), 'TFRtrajectories'), output.type = "png", main = NULL, verbose = FALSE, ...) tfr.trajectories.table(tfr.pred, country, pi = c(80, 95), half.child.variant = TRUE, adjusted = TRUE)
tfr.trajectories.plot(tfr.pred, country, pi = c(80, 95), half.child.variant = TRUE, nr.traj = NULL, adjusted.only = TRUE, typical.trajectory = FALSE, mark.estimation.points = FALSE, traj.index = NULL, show.mean = FALSE, show.median = TRUE, xlim = NULL, ylim = NULL, type = 'b', xlab = 'Year', ylab = 'TFR', main = NULL, lwd = c(2, 2, 2, 2, 2, 1), col=c('black', 'green', 'red', 'red', 'blue', '#00000020'), show.legend = TRUE, add = FALSE, uncertainty = FALSE, col_unc = "purple", ...) tfr.trajectories.plot.all(tfr.pred, output.dir = file.path(getwd(), 'TFRtrajectories'), output.type = "png", main = NULL, verbose = FALSE, ...) tfr.trajectories.table(tfr.pred, country, pi = c(80, 95), half.child.variant = TRUE, adjusted = TRUE)
tfr.pred |
Object of class |
country |
Name or numerical code of a country. It can also be given as ISO-2 or ISO-3 characters. |
pi |
Probability interval. It can be a single number or an array. |
half.child.variant |
If |
nr.traj |
Number of trajectories to be plotted. If |
adjusted.only |
Logical. By default, if the projection or estimation median is adjusted using e.g. |
typical.trajectory |
Logical. If |
mark.estimation.points |
Logical. If |
traj.index |
Vector of trajectory indices to show. If not given, the trajectories are selected using equidistant spacing. |
show.mean , show.median
|
Logical indicating if the mean or/and the median of the distribution should be shown. |
xlim , ylim , type , xlab , ylab
|
Graphical parameters passed to the |
lwd , col
|
Vector of six elements giving the line width and color for: 1. observed data, 2. imputed missing data, 3. median, 4. quantiles, 5. half-child variant, 6. trajectories. |
show.legend |
Logical controlling whether the legend should be drawn. |
add |
Logical controlling whether the trajectories should be plotted into a new graphic device ( |
... |
Additional graphical parameters. In addition, for |
output.dir |
Directory into which resulting graphs are stored. |
output.type |
Type of the resulting files. It can be “png”, “pdf”, “jpeg”, “bmp”, “tiff”, or “postscript”. |
main |
Main title for the plot(s). In |
verbose |
Logical switching log messages on and off. |
uncertainty |
Logical: |
col_unc |
Color of past TFR estimation uncertainty plot. |
adjusted |
Logical. If |
tfr.trajectories.plot
plots posterior distribution of TFR trajectories for a given country. tfr.trajectories.table
gives the same output as a table. tfr.trajectories.plot.all
creates a set of graphs (one per country) that are stored in output.dir
.
The median and given probability intervals are computed using all available trajectories. Thus, nr.traj
does not influence those values - it is used only to control the number of trajectories plotted.
Hana Sevcikova, Leontine Alkema, Peiran Liu
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- get.tfr.prediction(sim.dir) tfr.trajectories.plot(pred, country="Burkina Faso", pi=c(80, 95)) tfr.trajectories.table(pred, country="Burkina Faso", pi=c(80, 95)) ## End(Not run)
## Not run: sim.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output") pred <- get.tfr.prediction(sim.dir) tfr.trajectories.plot(pred, country="Burkina Faso", pi=c(80, 95)) tfr.trajectories.table(pred, country="Burkina Faso", pi=c(80, 95)) ## End(Not run)
Dataset used by the UN for coding time. It is an TAB-separated ASCII file called “UN_time.txt”.
data(UN_time)
data(UN_time)
A data frame with 1034 observations on the following 4 variables.
TimeID
Time identifier.
TLabel
Label of the time, with minimum values of 1950
and 1950-1955
, and maximum values of 2399
, 2400
and 2400-2405
.
TDate
Equal to TLabel
if it is a single year, or the starting year of TLabel
, if it is an interval.
Tinterval
Length of the time interval, or zero, if it is a single year.
For 5-year period data, fertility rates are defined from 1 July of year (t) to 1 July of year (t+5), with 1 January of year (t+3) as exact mid-date. This means for example that data for 2000-2005, refer to the period between 2000.5 and 2005.5, with 2003.0 as exact mid-point.
Data provided by the United Nations Population Division
data(UN_time) str(UN_time)
data(UN_time) str(UN_time)
Dataset used by the UN for coding variants. It also includes variants for the lower and upper bounds of the 80 and 95% probability intervals, respectively, resulting from the Bayesian hierarchical model. The dataset is stored in a TAB-separated ASCII file called “UN_variants.txt”.
data(UN_variants)
data(UN_variants)
A data frame with 23 observations on the following 5 variables.
RevID
Revision identifier.
VarID
Identifier of the variant.
Vshort
Short name of the variant.
VName
Full name of the variant.
VariantDomain
Domain of the variant.
Data provided by the United Nations Population Division
data(UN_variants) str(UN_variants)
data(UN_variants) str(UN_variants)
The function creates two files containing projection summaries, such as the median, mean, the lower and upper bound of the 80 and 90% probability intervals, respectively, the +/- 0.5 child variant and the constant variant. One file is in a user-friendly format, whereas the other is in a UN-specific format with internal coding of the time and the variants. In addition, a file containing some of the model parameters is created.
write.projection.summary(dir = file.path(getwd(), "bayesTFR.output"), subdir = "predictions", output.dir = NULL, revision = NULL, adjusted = FALSE, est.uncertainty = FALSE, ...)
write.projection.summary(dir = file.path(getwd(), "bayesTFR.output"), subdir = "predictions", output.dir = NULL, revision = NULL, adjusted = FALSE, est.uncertainty = FALSE, ...)
dir |
Directory containing the prediction object. It should correspond to the |
subdir |
Subdirectory of |
output.dir |
Directory in which the resulting file will be stored. If |
revision |
UN WPP revision number. If |
adjusted |
Logical. By default the function writes summary using the original BHM projections. If the projection medians are adjusted (using e.g. |
est.uncertainty |
Logical. If |
... |
Additional arguments passed to the underlying functions. Here, argument |
The first file that the function creates is called ‘projection_summary_user_friendly.csv’ (or ‘projection_summary_user_friendly_adjusted.csv’ if adjusted=TRUE
), it is a comma-separated table with the following columns:
country name
country code
name of the variant, such as “median”, “lower 80”, “upper 80”, “lower 95”, “upper 95”, “mean”, “-0.5child”, “+0.5child”, “constant”
e.g. “2005-2010”: TFR for the first time period. If est.uncertainty
is TRUE
, the first time period is the first observed time period. Otherwise it is the last observed one.
e.g. “2010-2015”: TFR for the second time period
further columns with TFR projections
The second file, called ‘projection_summary.csv’ (or ‘projection_summary_adjusted.csv’ if adjusted=TRUE
), also comma-separated table, contains the same information as above in a UN-specific format:
revision number, passed to the function as an argument
variant identifier, extracted from the UN_variants
dataset
country code
time identifier, extracted from the UN_time
dataset
the total fertility rate for this variant, location and time period
The third comma-separated file, called ‘projection_summary_parameters.csv’ contains the following columns:
country name
country code
start period of TFR decline
TFR at the onset of the fertitility transition (median of the parameter)
maximum decrement of TFR decline (median of the parameter)
median of the end level of the fertility transition ( parameter)
2.5 percentile of the distribution
97.5 percentile of the distribution
period of the end decline, measured using the prediction median
Note that this file is not created if adjusted=TRUE
.
This function is automatically called from the tfr.predict
function, therefore in standard cases it will not be needed to call it directly.
Hana Sevcikova
convert.tfr.trajectories
, tfr.predict