Title: | Applied Techniques to Demographic and Time Series Analysis |
---|---|
Description: | The use of overparameterization is proposed with combinatorial analysis to test a broader spectrum of possible ARIMA models. In the selection of ARIMA models, the most traditional methods such as correlograms or others, do not usually cover many alternatives to define the number of coefficients to be estimated in the model, which represents an estimation method that is not the best. The popstudy package contains several tools for statistical analysis in demography and time series based in Shryock research (Shryock et. al. (1980) <https://books.google.co.cr/books?id=8Oo6AQAAMAAJ>). |
Authors: | Cesar Gamboa-Sanabria [aut, mdc, cph, cre] |
Maintainer: | Cesar Gamboa-Sanabria <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2024-11-11 07:33:37 UTC |
Source: | CRAN |
Anonymizing a data frame by avoiding vulnerability to a rainbow table attack.
anonymous(data, ID, string_length = 15, SEED = NULL)
anonymous(data, ID, string_length = 15, SEED = NULL)
data |
data.frame. A dataset with the a variable to change its values. |
ID |
character. A string with the variable name to change its values. |
string_length |
numeric. It defines the string length of the new identification variable. |
SEED |
to be passed to |
anonymous
function returns a list with two data frames:
data |
original data with the new variable |
dictionary |
data frame with the original variable and the new one |
Cesar Gamboa-Sanabria
Oechslin P (2003). “Making a Faster Cryptanalytic Time-Memory Trade-Off.” In Boneh D (ed.), Advances in Cryptology - CRYPTO 2003, 617–630. ISBN 978-3-540-45146-4.
library(dplyr) df <- select(mutate(mtcars, id=rownames(mtcars)), id, !contains("id")) anonymous(df, ID="id", string_length = 5, SEED=160589)
library(dplyr) df <- select(mutate(mtcars, id=rownames(mtcars)), id, !contains("id")) anonymous(df, ID="id", string_length = 5, SEED=160589)
Method to open five-year grouped ages into specific ages.
Beers(data, ...)
Beers(data, ...)
data |
data.drame. It contains at least two variables: five-year grouped ages and population. |
... |
Arguments to be passed to |
Beers
returns a data.frame with specific ages and populations.
Cesar Gamboa-Sanabria
Shryock HS, Siegel JS, Larmon EA, of the Census USB (1980). The Methods and Materials of Demography, number v. 1 in The methods and materials of demography. Department of Commerce, Bureau of the Census. https://books.google.co.cr/books?id=8Oo6AQAAMAAJ.
Beers(Ecuador1990, age, population)
Beers(Ecuador1990, age, population)
Simulated data for Lexis Diagram examples.
data("births_deaths")
data("births_deaths")
The format is: List of 2 $ births: tibble [32 x 3] (S3: tbl_df/tbl/data.frame) ..$ sex : chr [1:32] "male" "male" "male" "male" ... ..$ date_reg: Date[1:32], format: ... ..$ births : num [1:32] 121558 126446 130839 130911 127524 ... $ deaths: tibble [112 x 4] (S3: tbl_df/tbl/data.frame) ..$ sex : chr [1:112] "male" "male" "male" "male" ... ..$ date_reg: Date[1:112], format: ... ..$ age : num [1:112] 0 0 0 0 0 0 0 0 0 0 ... ..$ deaths : num [1:112] 11411 10494 10814 9872 9457 ...
data(births_deaths) summary(births_deaths)
data(births_deaths) summary(births_deaths)
Children Ever Born Data from Bolivia's 2001 Census data.
data("CEB")
data("CEB")
A data frame with 27 observations on 8 variables for each five-year grouped age.
data(CEB) summary(CEB)
data(CEB) summary(CEB)
Compute correlations in a data frames.
correlate_df(data, keep_class = NULL)
correlate_df(data, keep_class = NULL)
data |
data.frame. A dataset with the variables to correlate. |
keep_class |
list. A list that contains desire classes for specyfic variables. |
correlate_df
takes data.frame class objects and works only with numeric, factor, and ordered class variables, so a previous data cleaning is needed for optimal results. A variable is considered nominal when it is a factor variable with more than two levels, and it is no ordered. When a numeric variable has only two different values, it is considered a binary variable. Also, when a factor variable has only two levels, it is regarded as a binary variable. The computed correlation will depend on the paired-variables class: Pearson method when both variables are numeric, Kendall correlation with a numeric and an ordinal variable, point-biserial with a numeric and a binary variable, Polychoric correlation with two ordinal variables, Tetrachoric correlation when both are binary, Rank-Biserial when one is ordinal, and the other is binary; and Kruskal's Lambda with one binary and one nominal, or both nominal variables. A Gaussian linear model is fitted to estimate the multiple correlation coefficient in the specific cases of one nominal variable and another numerical or ordered, so the user should take it carefully.
correlate_df
function returns a list with three objects: A data-frame with the correlation matrix and two correlation plots.
Cesar Gamboa-Sanabria
Khamis H (2008). “Measures of Association: How to Choose?” Journal of Diagnostic Medical Sonography, 24(3), 155-162. doi:10.1177/8756479308317006.
df <- data.frame(cont1=rnorm(100), cont2=rnorm(100), ordi1=factor(sample(1:5, 100, replace = TRUE), ordered = TRUE), ordi2=factor(sample(1:7, 100, replace = TRUE), ordered = TRUE), bin1=rbinom(100, 1, .4), bin2=rbinom(100, 1, .6), nomi1=factor(sample(letters[1:8], 100, replace = TRUE)), nomi2=factor(sample(LETTERS[1:8], 100, replace = TRUE))) correlate_df(df)
df <- data.frame(cont1=rnorm(100), cont2=rnorm(100), ordi1=factor(sample(1:5, 100, replace = TRUE), ordered = TRUE), ordi2=factor(sample(1:7, 100, replace = TRUE), ordered = TRUE), bin1=rbinom(100, 1, .4), bin2=rbinom(100, 1, .6), nomi1=factor(sample(letters[1:8], 100, replace = TRUE)), nomi2=factor(sample(LETTERS[1:8], 100, replace = TRUE))) correlate_df(df)
Births registers in Costa Rica.
data("CR_births")
data("CR_births")
A data frame with 8434 observations on the following 2 variables.
date_reg
a Date
births
a numeric vector
data(CR_births) summary(CR_births)
data(CR_births) summary(CR_births)
Deaths registers in Costa Rica.
data("CR_deaths")
data("CR_deaths")
A data frame with 229462 observations on the following 3 variables.
date_reg
a Date
age
a numeric vector
deaths
a numeric vector
data(CR_deaths) summary(CR_deaths)
data(CR_deaths) summary(CR_deaths)
Fertility rates for Costa Rica 1950-2011.
data("CR_fertility_rates_1950_2011")
data("CR_fertility_rates_1950_2011")
A data frame with 2170 observations on the following 3 variables.
Year
a numeric vector
Age
a numeric vector
Female
a numeric vector with fertility rates
data(CR_fertility_rates_1950_2011) summary(CR_fertility_rates_1950_2011)
data(CR_fertility_rates_1950_2011) summary(CR_fertility_rates_1950_2011)
Mortality rates for Costa Rica 1950-2011.
data("CR_mortality_rates_1950_2011")
data("CR_mortality_rates_1950_2011")
A data frame with 2170 observations on the following 4 variables.
Year
a numeric vector
Age
a numeric vector
Female
a numeric vector with female mortality rates
Male
a numeric vector with male mortality rates
Total
a numeric vector with total mortality rates
data(CR_mortality_rates_1950_2011) summary(CR_mortality_rates_1950_2011)
data(CR_mortality_rates_1950_2011) summary(CR_mortality_rates_1950_2011)
Mortality rates for Costa Rica in 2010-2015
data("CR_mortality_rates_2010_2015")
data("CR_mortality_rates_2010_2015")
A data frame with 7656 observations on the following 4 variables.
Year
a numeric vector
Age
a numeric vector
Female
a numeric vector with female mortality rates
Male
a numeric vector with male mortality rates
data(CR_mortality_rates_2010_2015) summary(CR_mortality_rates_2010_2015)
data(CR_mortality_rates_2010_2015) summary(CR_mortality_rates_2010_2015)
Estimated y projected populations for Costa Rica 1950-2011.
data("CR_populations_1950_2011")
data("CR_populations_1950_2011")
A data frame with 7656 observations on the following 4 variables.
Year
a numeric vector
Age
a numeric vector
Female
a numeric vector with female population
Male
a numeric vector with male population
Total
a numeric vector with total population
data(CR_populations_1950_2011) summary(CR_populations_1950_2011)
data(CR_populations_1950_2011) summary(CR_populations_1950_2011)
Estimated y projected populations for Costa Rica 1950-2015.
data("CR_populations_1950_2015")
data("CR_populations_1950_2015")
A data frame with 7656 observations on the following 4 variables.
Year
a numeric vector
Age
a numeric vector
Female
a numeric vector with female population
Male
a numeric vector with male population
data(CR_populations_1950_2015) summary(CR_populations_1950_2015)
data(CR_populations_1950_2015) summary(CR_populations_1950_2015)
Estimated y projected populations for Costa Rica 1950-2011.
data("CR_women_childbearing_age_1950_2011")
data("CR_women_childbearing_age_1950_2011")
A data frame with 7656 observations on the following 4 variables.
Year
a numeric vector
Age
a numeric vector
Female
a numeric vector with women of reproductive age population
data(CR_women_childbearing_age_1950_2011) summary(CR_women_childbearing_age_1950_2011)
data(CR_women_childbearing_age_1950_2011) summary(CR_women_childbearing_age_1950_2011)
Plot density with descriptive statistics for numerical values.
descriptive_plot(data, ..., labels = NULL, ylab = "Density")
descriptive_plot(data, ..., labels = NULL, ylab = "Density")
data |
data.frame. |
... |
additional arguments to be passed to |
labels |
A vector with x-axis labels. |
ylab |
y-axis label. |
descriptive_plot
function returns a plot with density and descriptive statistics.
Cesar Gamboa-Sanabria
df <- data.frame(var1=rpois(50, 6), var2=rgamma(50, shape=5,rate=.4), var3=rnorm(50, 10)) descriptive_plot(df, var1, var3)
df <- data.frame(var1=rpois(50, 6), var2=rgamma(50, shape=5,rate=.4), var3=rnorm(50, 10)) descriptive_plot(df, var1, var3)
Ecuador census data in 1990 by grouped ages.
data("Ecuador1990")
data("Ecuador1990")
A data frame with 21 observations on the following 4 variables.
age
a factor with levels 0-4
5-9
10-14
15-19
20-24
25-29
30-34
35-39
40-44
45-49
50-54
55-59
60-64
65-69
70-74
75-79
80-84
85-89
90-94
95-99
100+
male
a numeric vector with males population
female
a numeric vector with female population
population
a numeric vector Ecuador population
https://microdata.worldbank.org/index.php/catalog/499
data(Ecuador1990) summary(Ecuador1990)
data(Ecuador1990) summary(Ecuador1990)
The method corrects the zero parity omission error.
El_Badry(data, age, CEB, childs, req_ages = NULL)
El_Badry(data, age, CEB, childs, req_ages = NULL)
data |
data.drame. It contains at least three variables: five-year grouped ages, number of childs and Children Ever Born (CEB). |
age |
variable name in |
CEB |
variable name in |
childs |
variable name in |
req_ages |
optional character string that specifies the five-year grouped age to estimates the intercept. |
Moultrie
returns a list with two elements: a data.frame with corrected children for each number of Children Ever Born and five-year grouped ages and a data.frame with combinations of five-year grouped age to estimate intercept, slope, and R-squared. By default, the method uses the best value of R-squared to apply the El Badry correction.
Cesar Gamboa-Sanabria
Moultrie TA, Dorrington RE, Hill AG, Hill K, Timæus IM, Zaba B (2013). Tools for demographic estimation. International Union for the Scientific Study of Population.
CEB_data <- tidyr::gather(CEB, ages, childs, -Children_Ever_Born) results <- Moultrie(CEB_data, ages, childs, Children_Ever_Born) CEB_data <- tidyr::pivot_wider(results, names_from=age, values_from=childs) CEB_data <- tidyr::gather(CEB_data, ages, children, -CEB) El_Badry(CEB_data,ages, CEB, children)
CEB_data <- tidyr::gather(CEB, ages, childs, -Children_Ever_Born) results <- Moultrie(CEB_data, ages, childs, Children_Ever_Born) CEB_data <- tidyr::pivot_wider(results, names_from=age, values_from=childs) CEB_data <- tidyr::gather(CEB_data, ages, children, -CEB) El_Badry(CEB_data,ages, CEB, children)
Costa Rica population by 5-year-group ages in 2011.
data("grouped_age_CR_pop")
data("grouped_age_CR_pop")
A data frame with 16 observations on the following 2 variables.
age
an ordered factor with levels 0 - 4
< 5 - 9
< 10 - 14
< 15 - 19
< 20 - 24
< 25 - 29
< 30 - 34
< 35 - 39
< 40 - 44
< 45 - 49
< 50 - 54
< 55 - 59
< 60 - 64
< 65 - 69
< 70 - 74
< 75 and more
pop
a numeric vector with the populaion
data(grouped_age_CR_pop) str(grouped_age_CR_pop)
data(grouped_age_CR_pop) str(grouped_age_CR_pop)
Assuming an exponential behavior estimates the population size at time t, the growth rate, or population at time 0.
growth_exp(Nt = NULL, N0 = NULL, r = NULL, t0, t, time_interval, date = FALSE)
growth_exp(Nt = NULL, N0 = NULL, r = NULL, t0, t, time_interval, date = FALSE)
Nt |
numeric. The population at time t. If null and date = FALSE, then estimate the population at time t. |
N0 |
numeric. The population at time 0. If null and date = FALSE, then estimate the population at time 0. |
r |
numeric. The growth rate. If null and date = FALSE, then estimate the growth rate for the time period [t0,t]. |
t0 |
numeric. An object of class character with the date for the first population. |
t |
numeric. An object of class character with the date for the second population. |
time_interval |
character. A string with the time interval to calculate Delta_t. |
date |
logical. If TRUE, then estimates the moment t when Nt reaches a specific value. |
growth_exp
returns a data frame with N0, Ntr, t0, t, delta, and time_interval for desire parameters.
Cesar Gamboa-Sanabria
Shryock HS, Siegel JS (2013). The Methods and Materials of Demography, Studies in Population. Elsevier Science. ISBN 9781483289106, https://books.google.co.cr/books?id=HVW0BQAAQBAJ.
growth_linear
, growth_logistic
# According to the Panama census in 2000-05-14, # the population was 2,839,177. In 2010-05-16, the census # calculates 3,405,813 population. # To get r: growth_exp(N0=2839177, Nt=3405813, t0="2000-05-14", t="2010-05-16", time_interval = "years") # To get Nt at 2000-06-30: growth_exp(N0=2839177, r=0.0182, t0="2000-05-14", t="2000-06-30", time_interval = "years") # The time when the population will be 5,000,000. growth_exp(N0=2839177, Nt=5000000, r=0.0182, t0="2000-05-14", date=TRUE)
# According to the Panama census in 2000-05-14, # the population was 2,839,177. In 2010-05-16, the census # calculates 3,405,813 population. # To get r: growth_exp(N0=2839177, Nt=3405813, t0="2000-05-14", t="2010-05-16", time_interval = "years") # To get Nt at 2000-06-30: growth_exp(N0=2839177, r=0.0182, t0="2000-05-14", t="2000-06-30", time_interval = "years") # The time when the population will be 5,000,000. growth_exp(N0=2839177, Nt=5000000, r=0.0182, t0="2000-05-14", date=TRUE)
Assuming an linear behavior, estimates the population size at time t, the growth rate, or population at time 0.
growth_linear( Nt = NULL, N0 = NULL, r = NULL, t0, t, time_interval, date = FALSE )
growth_linear( Nt = NULL, N0 = NULL, r = NULL, t0, t, time_interval, date = FALSE )
Nt |
numeric. The population at time t. If null and date = FALSE, then estimate the population at time t. |
N0 |
numeric. The population at time 0. If null and date = FALSE, then estimate the population at time 0. |
r |
numeric. The growth rate. If null and date = FALSE, then estimate the growth rate for the time period [t0,t]. |
t0 |
numeric. An object of class character with the date for the first population. |
t |
numeric. An object of class character with the date for the second population. |
time_interval |
character. A string with the time interval to calculate Delta_t. |
date |
logical. If TRUE, then estimates the moment t when Nt reaches a specific value. |
growth_linear
returns a data frame with N0, Ntr, t0, t, delta, and time_interval for desire parameters.
Cesar Gamboa-Sanabria
Shryock HS, Siegel JS (2013). The Methods and Materials of Demography, Studies in Population. Elsevier Science. ISBN 9781483289106, https://books.google.co.cr/books?id=HVW0BQAAQBAJ.
# According to the Panama census at 2000-05-14, # the population was 2,839,177. In 2010-05-16, the census # calculates 3,405,813 population. # To get r: growth_linear(N0=2839177, Nt=3405813, t0="2000-05-14", t="2010-05-16", time_interval = "years") # To get Nt at 2000-06-30: growth_linear(N0=2839177, r=0.0182, t0="2000-05-14", t="2000-06-30", time_interval = "years") # The time when the population will be 5,000,000. growth_linear(N0=2839177, Nt=5000000, r=0.0182, t0="2000-05-14", date=TRUE)
# According to the Panama census at 2000-05-14, # the population was 2,839,177. In 2010-05-16, the census # calculates 3,405,813 population. # To get r: growth_linear(N0=2839177, Nt=3405813, t0="2000-05-14", t="2010-05-16", time_interval = "years") # To get Nt at 2000-06-30: growth_linear(N0=2839177, r=0.0182, t0="2000-05-14", t="2000-06-30", time_interval = "years") # The time when the population will be 5,000,000. growth_linear(N0=2839177, Nt=5000000, r=0.0182, t0="2000-05-14", date=TRUE)
Given two pivots and limits, estimates the growth assuming a logistic behavior.
growth_logistic(pivot_values, pivot_years, upper, lower, t)
growth_logistic(pivot_values, pivot_years, upper, lower, t)
pivot_values |
numeric. Reference values to estimate, like TFR for two specific years. |
pivot_years |
numeric. Reference years to estimate for both values in |
upper |
numeric. Upper asymptotic value. |
lower |
numeric. Lower asymptotic value. |
t |
numeric. Year to get logistic value. |
growth_logistic
returns the logistic estimation for specified year.
Cesar Gamboa-Sanabria
Shryock HS, Siegel JS (2013). The Methods and Materials of Demography, Studies in Population. Elsevier Science. ISBN 9781483289106, https://books.google.co.cr/books?id=HVW0BQAAQBAJ.
# Given TFR values 3.32 and 2.85 for the years 1986 and 1991, respectively, # estimate the TFR in 1987 assuming 1.5 as lower limit and 8 as upper limit. growth_logistic(pivot_values = c(3.32, 2.85), pivot_years = c(1986, 1991), upper = 8, lower=1.5, t=1987)
# Given TFR values 3.32 and 2.85 for the years 1986 and 1991, respectively, # estimate the TFR in 1987 assuming 1.5 as lower limit and 8 as upper limit. growth_logistic(pivot_values = c(3.32, 2.85), pivot_years = c(1986, 1991), upper = 8, lower=1.5, t=1987)
Separate grouped-age data to simple ages data using Karup-King separation factors.
karup_king(data)
karup_king(data)
data |
data.frame. A dataset with two variables: |
karup_king
function returns a a data frame with separated simple ages.
Cesar Gamboa-Sanabria
Shryock HS, Siegel JS, Larmon EA, of the Census USB (1980). The Methods and Materials of Demography, number v. 2 in The Methods and Materials of Demography. U.S. Department of Commerce, Bureau of the Census. https://books.google.co.cr/books?id=SuXrAAAAMAAJ.
karup_king(grouped_age_CR_pop)
karup_king(grouped_age_CR_pop)
Karup-King separation factors.
data("karup_king_factors")
data("karup_king_factors")
A data frame with 76 observations on the following 7 variables.
age
a character vector with simple ages
f1
a numeric vector, Karup-King factor
f2
a numeric vector, Karup-King factor
f3
a numeric vector, Karup-King factor
d1
a numeric vector, used in karup_king
function, do not edit by hand
d2
a numeric vector, used in karup_king
function, do not edit by hand
d3
a numeric vector, used in karup_king
function, do not edit by hand
Shryock HS, Siegel JS, Larmon EA, of the Census USB (1980). The Methods and Materials of Demography, number v. 2 in The Methods and Materials of Demography. U.S. Department of Commerce, Bureau of the Census. https://books.google.co.cr/books?id=SuXrAAAAMAAJ.
data(karup_king_factors) str(karup_king_factors)
data(karup_king_factors) str(karup_king_factors)
Plot a Lexis Diagram from births and deaths data for a given year, month, and day with specific simple ages.
Lexis( deaths_data, births_data, first.date = NULL, choose_year, choose_month, choose_day, ages, factors = NULL )
Lexis( deaths_data, births_data, first.date = NULL, choose_year, choose_month, choose_day, ages, factors = NULL )
deaths_data |
data.frame. A dataset with three variables: date_reg, the registered death date, age, the age of decease; and deaths, the deaths number for that date. See |
births_data |
data data.frame. A dataset with two variables: date_reg, the registered birth date; and births, the births number for that date. See |
first.date |
character. Optional argument that specifies the first date of interest. |
choose_year |
numeric. The year from which the countdown begins until the desired minimum age is reached. |
choose_month |
numeric. The month from which the countdown begins until the desired minimum age is reached. |
choose_day |
numeric. The day from which the countdown begins until the desired minimum age is reached. |
ages |
numeric. An ages vector to plot the diagram. |
factors |
numeric. Optional argument to set specific factors to set alpha and delta sections in Lexis Diagram. |
Lexis function returns a list with two objects: diagram, the Lexis diagram; and deaths, the estimated deaths number.
Cesar Gamboa-Sanabria
Rau R, Bohk-Ewald C, Muszynska MM, Vaupel JW (2017). Visualizing Mortality Dynamics in the Lexis Diagram, The Springer Series on Demographic Methods and Population Analysis. Springer International Publishing. ISBN 9783319648200, https://books.google.co.cr/books?id=ttpCDwAAQBAJ.
Lexis(CR_deaths, CR_births, choose_year=2011, choose_month=1, choose_day=1, ages=0:9)$diagram ##Lexis diagram with specific factors data("births_deaths") Births <- dplyr::filter(births_deaths$births, sex=="male") Deaths <- dplyr::filter(births_deaths$deaths, sex=="male") Lexis(deaths_data=Deaths, births_data=Births, first.date = "1999-01-01", choose_year=2007, choose_month=1, choose_day=1, ages=0:4, factors = c(.2,.41,.47,.48,.48))$diagram
Lexis(CR_deaths, CR_births, choose_year=2011, choose_month=1, choose_day=1, ages=0:9)$diagram ##Lexis diagram with specific factors data("births_deaths") Births <- dplyr::filter(births_deaths$births, sex=="male") Deaths <- dplyr::filter(births_deaths$deaths, sex=="male") Lexis(deaths_data=Deaths, births_data=Births, first.date = "1999-01-01", choose_year=2007, choose_month=1, choose_day=1, ages=0:4, factors = c(.2,.41,.47,.48,.48))$diagram
Estimates a lifetable from mortality rates and population data.
Lifetable( rates, pops, sex, max_age = NULL, first_year, threshold, jump, element = c("mx", "qx", "lx", "dx", "Lx", "Tx", "ex", "rx"), ... )
Lifetable( rates, pops, sex, max_age = NULL, first_year, threshold, jump, element = c("mx", "qx", "lx", "dx", "Lx", "Tx", "ex", "rx"), ... )
rates |
character. A character string that specifies mortality data path. The dataset is a .txt file like |
pops |
character. A character string that specifies population data path. The dataset is a .txt file like |
sex |
character. "female" or "male". |
max_age |
numeric. Desire omega age. If |
first_year |
numeric. First year to start estimation. |
threshold |
numeric. Maximum forecast year. |
jump |
character. Same purpose to |
element |
character. Wanted estimation element, one of "mx", "qx", "lx", "dx", "Lx", "Tx", "ex" or "rx". |
... |
additional arguments to be passed to |
Lifetable
function returns a list with both data frames, wide and long format, for specified element in argument element
for desire years.
Cesar Gamboa-Sanabria
Wunsch G, Mouchart M, Duchêne J (2002). The Life Table: Modelling Survival and Death, European Studies of Population. Springer Netherlands. ISBN 9781402006388, https://books.google.co.cr/books?id=ySex55d4nlsC.
## Not run: write.table(CR_mortality_rates_2010_2015, file = "CR_mortality_rates_2010_2015.txt", sep = "\t", row.names = FALSE, quote = FALSE) write.table(CR_populations_1950_2015, file = "CR_populations_1950_2015.txt", sep = "\t", row.names = FALSE, quote = FALSE) Lifetable("CR_mortality_rates_2010_2015.txt", "CR_populations_1950_2015.txt", sex="female", first_year=2011, threshold=2150, jump="actual", max_age = 100, element="ex", label="CR") ## End(Not run)
## Not run: write.table(CR_mortality_rates_2010_2015, file = "CR_mortality_rates_2010_2015.txt", sep = "\t", row.names = FALSE, quote = FALSE) write.table(CR_populations_1950_2015, file = "CR_populations_1950_2015.txt", sep = "\t", row.names = FALSE, quote = FALSE) Lifetable("CR_mortality_rates_2010_2015.txt", "CR_populations_1950_2015.txt", sex="female", first_year=2011, threshold=2150, jump="actual", max_age = 100, element="ex", label="CR") ## End(Not run)
Forecasting mortality rates.
mortality_projection( mortality_rates_path, total_population_path, omega_age, horizon, first_year_projection, ... )
mortality_projection( mortality_rates_path, total_population_path, omega_age, horizon, first_year_projection, ... )
mortality_rates_path |
character. Path to Mortality rates in a .txt file. |
total_population_path |
character. Path to Populations in a .txt file. |
omega_age |
numeric. Maximum age. |
horizon |
numeric. The forecast horizon. |
first_year_projection |
numeric. Year for the base population. |
... |
additional arguments to be passed to |
mortality_projection
returns an object of class fmforecast
with with both female and male mortality projections and the components of demography::forecast.lca()
.
Cesar Gamboa-Sanabria
## Not run: library(dplyr) data(CR_mortality_rates_1950_2011) #CR_mortality_rates_1950_2011 %>% #write.table(., #file = "CR_mortality_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_populations_1950_2011) #CR_populations_1950_2011 %>% #write.table(., #file = "CR_populations_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) #result <- mortality_projection(mortality_rates_path = "CR_mortality_rates_1950_2011.txt", #total_population_path = "CR_populations_1950_2011.txt", #omega_age = 115, first_year_projection = 2011, horizon = 2150) ## End(Not run)
## Not run: library(dplyr) data(CR_mortality_rates_1950_2011) #CR_mortality_rates_1950_2011 %>% #write.table(., #file = "CR_mortality_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_populations_1950_2011) #CR_populations_1950_2011 %>% #write.table(., #file = "CR_populations_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) #result <- mortality_projection(mortality_rates_path = "CR_mortality_rates_1950_2011.txt", #total_population_path = "CR_populations_1950_2011.txt", #omega_age = 115, first_year_projection = 2011, horizon = 2150) ## End(Not run)
Moultrie's proposal for correction of Children Ever Born in five-year grouped ages.
Moultrie(data, ...)
Moultrie(data, ...)
data |
data.drame. It contains at least three variables: five-year grouped ages, number of childs and Children Ever Born (CEB). |
... |
Arguments to be passed to |
Moultrie
returns a data.frame with corrected childs for each number of Children Ever Born and five-year grouped ages.
Cesar Gamboa-Sanabria
Moultrie TA, Dorrington RE, Hill AG, Hill K, Timæus IM, Zaba B (2013). Tools for demographic estimation. International Union for the Scientific Study of Population.
CEB_data <- tidyr::gather(CEB, ages, childs, -Children_Ever_Born) results <- Moultrie(CEB_data, ages, childs, Children_Ever_Born) tidyr::pivot_wider(results, names_from=age, values_from=childs)
CEB_data <- tidyr::gather(CEB, ages, childs, -Children_Ever_Born) results <- Moultrie(CEB_data, ages, childs, Children_Ever_Born) tidyr::pivot_wider(results, names_from=age, values_from=childs)
An upgrade over the Whipple index allows analyzing digit's attraction (or repulsion) from 0 to 9.
Myers(data, ...)
Myers(data, ...)
data |
data.drame. It contains at least two variables: specific ages and population. |
... |
Arguments to be passed to |
Myers
returns a list with two objects:
Mmat |
a data.frame with specific digits index |
MI |
the Myer's Blend Index. |
Cesar Gamboa-Sanabria
Shryock HS, Siegel JS, Larmon EA, of the Census USB (1980). The Methods and Materials of Demography, number v. 1 in The methods and materials of demography. Department of Commerce, Bureau of the Census. https://books.google.co.cr/books?id=8Oo6AQAAMAAJ.
results <- Myers(Panama1990, age, pop) results$Mmat results$MI
results <- Myers(Panama1990, age, pop) results$Mmat results$MI
Forecasting net migration.
netmigration_projection( mortality_rates_path, TFR_path, total_population_path, WRA_path, omega_age, horizon, first_year_projection )
netmigration_projection( mortality_rates_path, TFR_path, total_population_path, WRA_path, omega_age, horizon, first_year_projection )
mortality_rates_path |
character. Path to Mortality rates in a .txt file. |
TFR_path |
character. Path to Fertility rates in a .txt file. |
total_population_path |
character. Path to Populations in a .txt file. |
WRA_path |
character. Path to Women of Reproductive Age in a .txt file. |
omega_age |
numeric. Maximum age. |
horizon |
numeric. The forecast horizon. |
first_year_projection |
numeric. Year for the base population. |
netmigration_projection
returns an object of class fmforecast
with the forecast netmigration models and the components of demography::forecast.fdmpr()
.
Cesar Gamboa-Sanabria
## Not run: library(dplyr) data(CR_mortality_rates_1950_2011) #CR_mortality_rates_1950_2011 %>% #write.table(., #file = "CR_mortality_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_populations_1950_2011) #CR_populations_1950_2011 %>% #write.table(., #file = "CR_populations_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_fertility_rates_1950_2011) #CR_fertility_rates_1950_2011 %>% #write.table(., #file = "CR_fertility_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_women_childbearing_age_1950_2011) #CR_women_childbearing_age_1950_2011 %>% #write.table(., #file = "CR_women_childbearing_age_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) #result <- netmigration_projection(mortality_rates_path = "CR_mortality_rates_1950_2011.txt", #total_population_path = "CR_populations_1950_2011.txt", #TFR_path = "CR_fertility_rates_1950_2011.txt", #WRA_path = "CR_women_childbearing_age_1950_2011.txt", #omega_age = 115, first_year_projection = 2011, horizon = 2150) ## End(Not run)
## Not run: library(dplyr) data(CR_mortality_rates_1950_2011) #CR_mortality_rates_1950_2011 %>% #write.table(., #file = "CR_mortality_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_populations_1950_2011) #CR_populations_1950_2011 %>% #write.table(., #file = "CR_populations_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_fertility_rates_1950_2011) #CR_fertility_rates_1950_2011 %>% #write.table(., #file = "CR_fertility_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_women_childbearing_age_1950_2011) #CR_women_childbearing_age_1950_2011 %>% #write.table(., #file = "CR_women_childbearing_age_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) #result <- netmigration_projection(mortality_rates_path = "CR_mortality_rates_1950_2011.txt", #total_population_path = "CR_populations_1950_2011.txt", #TFR_path = "CR_fertility_rates_1950_2011.txt", #WRA_path = "CR_women_childbearing_age_1950_2011.txt", #omega_age = 115, first_year_projection = 2011, horizon = 2150) ## End(Not run)
Estimates the best predictive ARIMA model using overparameterization.
op.arima( arima_process = c(p = 1, d = 1, q = 1, P = 1, D = 1, Q = 1), seasonal_periodicity, time_serie, reg = NULL, horiz = 12, prop = 0.8, training_weight = 0.2, testing_weight = 0.8, parallelize = FALSE, clusters = detectCores(logical = FALSE), LAMBDA = NULL, ISP = 100, ... )
op.arima( arima_process = c(p = 1, d = 1, q = 1, P = 1, D = 1, Q = 1), seasonal_periodicity, time_serie, reg = NULL, horiz = 12, prop = 0.8, training_weight = 0.2, testing_weight = 0.8, parallelize = FALSE, clusters = detectCores(logical = FALSE), LAMBDA = NULL, ISP = 100, ... )
arima_process |
numeric. The ARIMA(p,d,q)(P,D,Q) process. |
seasonal_periodicity |
numeric. The seasonal periodicity, 12 for monthly data. |
time_serie |
ts. The univariate time series object to estimate the models. |
reg |
Optionally, a vector or matrix of external regressors, which must have the same number of rows as time_serie. |
horiz |
numeric. The forecast horizon. |
prop |
numeric. Data proportion for training dataset. |
training_weight |
numeric. Importance weight for the goodness of fit and precision measures in the training dataset. |
testing_weight |
numeric. Importance weight for the goodness of fit and precision measures in the testing dataset. |
parallelize |
logical. If TRUE, then use parallel processing. |
clusters |
numeric. The number of clusters for the parallel process. |
LAMBDA |
Optionally. See |
ISP |
numeric. Overparameterization indicator to filter the estimated models in the (0,100] interval. |
... |
additional arguments to be passed to |
op.arima
returns an object of class list
with the following components:
arima_models |
all models defined by the |
final_measures |
goodness of fit and precision measures for each model. |
bests |
a sorted list with the best ARIMA models. |
best_model |
a list of "Arima", see |
Cesar Gamboa-Sanabria
Gamboa-Sanabria C (2022). La Sobreparametrizacion en el ARIMA: una aplicacion a datos costarricenses. Master's thesis, Universidad de Costa Rica.
op.arima(arima_process = c(2,1,2,2,1,2), time_serie = AirPassengers, seasonal_periodicity = 12, parallelize=FALSE)
op.arima(arima_process = c(2,1,2,2,1,2), time_serie = AirPassengers, seasonal_periodicity = 12, parallelize=FALSE)
Panama census data in 1990 by specific ages.
data("Panama1990")
data("Panama1990")
A data frame with 100 observations on the following 2 variables.
age
a character vector with specific ages
pop
a numeric vector with population for each age
data(Panama1990) summary(Panama1990)
data(Panama1990) summary(Panama1990)
Applied techniques to demographic and time series analysis.
Cesar Gamboa-Sanabria [email protected]
Forecasting population using the components method.
population_projection(...)
population_projection(...)
... |
required arguments for |
population_projection
returns an object of class list
with the following components:
mort |
mortality projections from |
fert |
fertility projections from |
mig |
netmigration projections from |
pop |
the national projections by sex and year. |
Cesar Gamboa-Sanabria
mortality_projection
TFR_projection
netmigration_projection
## Not run: library(dplyr) data(CR_mortality_rates_1950_2011) #CR_mortality_rates_1950_2011 %>% #write.table(., #file = "CR_mortality_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_populations_1950_2011) #CR_populations_1950_2011 %>% #write.table(., #file = "CR_populations_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_fertility_rates_1950_2011) #CR_fertility_rates_1950_2011 %>% #write.table(., #file = "CR_fertility_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_women_childbearing_age_1950_2011) #CR_women_childbearing_age_1950_2011 %>% #write.table(., #file = "CR_women_childbearing_age_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) #result <- population_projection(mortality_rates_path = "CR_mortality_rates_1950_2011.txt", #total_population_path = "CR_populations_1950_2011.txt", #TFR_path = "CR_fertility_rates_1950_2011.txt", #WRA_path = "CR_women_childbearing_age_1950_2011.txt", #omega_age = 115, first_year_projection = 2011, horizon = 2020) ## End(Not run)
## Not run: library(dplyr) data(CR_mortality_rates_1950_2011) #CR_mortality_rates_1950_2011 %>% #write.table(., #file = "CR_mortality_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_populations_1950_2011) #CR_populations_1950_2011 %>% #write.table(., #file = "CR_populations_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_fertility_rates_1950_2011) #CR_fertility_rates_1950_2011 %>% #write.table(., #file = "CR_fertility_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_women_childbearing_age_1950_2011) #CR_women_childbearing_age_1950_2011 %>% #write.table(., #file = "CR_women_childbearing_age_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) #result <- population_projection(mortality_rates_path = "CR_mortality_rates_1950_2011.txt", #total_population_path = "CR_populations_1950_2011.txt", #TFR_path = "CR_fertility_rates_1950_2011.txt", #WRA_path = "CR_women_childbearing_age_1950_2011.txt", #omega_age = 115, first_year_projection = 2011, horizon = 2020) ## End(Not run)
Create a basic structure for a project repo.
project_structure()
project_structure()
project_structure
does not return a value, it only creates basic diretories and files in the current working direcotory/repository.
Cesar Gamboa-Sanabria
## Not run: project_structure() ## End(Not run)
## Not run: project_structure() ## End(Not run)
Get full path from a file.
read_from_dir(file, path = NULL)
read_from_dir(file, path = NULL)
file |
The file name. |
path |
The file location. |
read_from_dir
returns an object of class character with the normalizaed path for a file.
Cesar Gamboa-Sanabria
## Not run: file.create("test_file.txt") read_from_dir("test_file.txt") ## End(Not run)
## Not run: file.create("test_file.txt") read_from_dir("test_file.txt") ## End(Not run)
Install/load the required packages from CRAN.
required_packages(...)
required_packages(...)
... |
packages names. |
required_packages
does not return a value, it only install and load the desired packages.
Cesar Gamboa-Sanabria
## Not run: #If you need to install and load the tidyr, dplyr and ggplot2 packages, run the following line: #required_packages(tidyr, dplyr, ggplot2) ## End(Not run)
## Not run: #If you need to install and load the tidyr, dplyr and ggplot2 packages, run the following line: #required_packages(tidyr, dplyr, ggplot2) ## End(Not run)
Method to open five-year grouped ages into specific ages.
Sprague(data, ...)
Sprague(data, ...)
data |
data.drame. It contains at least two variables: five-year grouped ages and population. |
... |
Arguments to be passed to |
Sprague
returns an object of class data.frame with population for specific ages.
Cesar Gamboa-Sanabria
Shryock HS, Siegel JS, Larmon EA, of the Census USB (1980). The Methods and Materials of Demography, number v. 1 in The methods and materials of demography. Department of Commerce, Bureau of the Census. https://books.google.co.cr/books?id=8Oo6AQAAMAAJ.
Sprague(Ecuador1990, age, population)
Sprague(Ecuador1990, age, population)
Forecasting total fertility rates.
TFR_projection(TFR_path, WRA_path, horizon, first_year_projection, ...)
TFR_projection(TFR_path, WRA_path, horizon, first_year_projection, ...)
TFR_path |
character. Path to Fertility rates in a .txt file. |
WRA_path |
character. Path to Women of Reproductive Age in a .txt file. |
horizon |
numeric. The forecast horizon. |
first_year_projection |
numeric. Year for the base population. |
... |
additional arguments to be passed to |
TFR_projection
returns an object of class fmforecast
with the forecast fertility rates and the components of demography::forecast.fdm()
.
Cesar Gamboa-Sanabria
library(dplyr) data(CR_fertility_rates_1950_2011) #CR_fertility_rates_1950_2011 %>% #write.table(., #file = "CR_fertility_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_women_childbearing_age_1950_2011) #CR_women_childbearing_age_1950_2011 %>% #write.table(., #file = "CR_women_childbearing_age_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) #result <- TFR_projection(TFR_path = "CR_fertility_rates_1950_2011.txt", #WRA_path = "CR_women_childbearing_age_1950_2011.txt", #omega_age = 115, first_year_projection = 2011, horizon = 2150)
library(dplyr) data(CR_fertility_rates_1950_2011) #CR_fertility_rates_1950_2011 %>% #write.table(., #file = "CR_fertility_rates_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) data(CR_women_childbearing_age_1950_2011) #CR_women_childbearing_age_1950_2011 %>% #write.table(., #file = "CR_women_childbearing_age_1950_2011.txt", #sep = "\t", #row.names = FALSE, #col.names = TRUE, #quote = FALSE) #result <- TFR_projection(TFR_path = "CR_fertility_rates_1950_2011.txt", #WRA_path = "CR_women_childbearing_age_1950_2011.txt", #omega_age = 115, first_year_projection = 2011, horizon = 2150)