Title: | Read Human Mortality Database and Human Fertility Database Data from the Web |
---|---|
Description: | Utilities for reading data from the Human Mortality Database (<https://www.mortality.org>), Human Fertility Database (<https://www.humanfertility.org>), and similar databases from the web or locally into an R session as data.frame objects. These are the two most widely used sources of demographic data to study basic demographic change, trends, and develop new demographic methods. Other supported databases at this time include the Human Fertility Collection (<https://www.fertilitydata.org>), The Japanese Mortality Database (<https://www.ipss.go.jp/p-toukei/JMD/index-en.html>), and the Canadian Human Mortality Database (<http://www.bdlc.umontreal.ca/chmd/>). Arguments and data are standardized. |
Authors: | Tim Riffe [aut, cre], Carl Boe [aut], Jason Hilton [aut], Josh Goldstein [ctb], Stephen Holzman [ctb] |
Maintainer: | Tim Riffe <[email protected]> |
License: | GPL-2 |
Version: | 2.0.3 |
Built: | 2024-11-06 06:17:52 UTC |
Source: | CRAN |
age2int()
convert the Age column from standard HMD or HFD tables to integerLong the bane of many an HMD/HFD user is that the age column must be read into R as a factor or character vector, yet we'd like to use it as integer or numeric. This function strips symbols that are used to indicate the open age groups ("12-","55+","110+"), and coerces to integer format. This function is called by HFDparse()
and HMDparse()
, and so forth.
age2int(Age)
age2int(Age)
Age |
a vector of the Age column from and HMD or HFD data object that has been read directly into R. This may be a factor or character vector. |
This function is written for the sake of various parse functions.
the same age vector as a clean integer.
original function submitted by Josh Goldstein, modified by Tim Riffe.
AgeTest <- c("12-","13","14","55+") (AgeNew <- age2int(AgeTest)) AgeNew + .5 # sort of mid-interval # also handles abrdiged ages properly: AgeAbridged <- c("0","1-4","5-9","10-14") age2int(AgeAbridged)
AgeTest <- c("12-","13","14","55+") (AgeNew <- age2int(AgeTest)) AgeNew + .5 # sort of mid-interval # also handles abrdiged ages properly: AgeAbridged <- c("0","1-4","5-9","10-14") age2int(AgeAbridged)
This is a helper function to get a vector of 3-character province codes.
getCHMDprovinces()
getCHMDprovinces()
a character vector of 3 character province codes.
## Not run: (provs <- getCHMDprovinces())
## Not run: (provs <- getCHMDprovinces())
The function returns a list of population codes used in the Human Fertility Collection (HFC). Optionally, it also can return a data.frame with both the full population name and short code.
getHFCcountries(names = FALSE)
getHFCcountries(names = FALSE)
names |
logical. Default |
either a character vector of short codes (default) or a data.frame
of country names and codes.
## Not run: getHFCcountries() getHFCcountries(names = TRUE) ## End(Not run)
## Not run: getHFCcountries() getHFCcountries(names = TRUE) ## End(Not run)
This function is called by readHFDweb()
and is separated here for modularity. We include both main and provisional countries in the grab.
getHFDcountries()
getHFDcountries()
a 'tibble' with three columns 'Country', 'link' and 'CNTRY' (the country short code)
called by readHFDweb()
. This assumes that CNTRY
is actually available in the HFD.
getHFDdate(CNTRY)
getHFDdate(CNTRY)
CNTRY |
HFD country short code. |
character string of eight integers representing the date as "yyyymmdd"
.
called by readHFDweb()
. This assumes that CNTRY
is actually available in the HFD.
getHFDitemavail(CNTRY)
getHFDitemavail(CNTRY)
CNTRY |
HFD country short code. |
a tibble of all available data files for the selected country. There are several useful identifiers that can help determine the appropriate file, including the 'measure' and 'subtype' as detected from the html table properties, and 'lexis' and 'parity' as detected either from the file names or the table properties.
This function is called by readHMDweb()
and is separated here for modularity. Assumes you have an internet connection.
getHMDcountries()
getHMDcountries()
a vector of HMD country short codes.
called by readHMDweb()
to find file urls. This assumes that CNTRY
is actually available in the HFD.
getHMDitemavail(CNTRY)
getHMDitemavail(CNTRY)
CNTRY |
character. HMD country short code. |
a tibble of all available data items for the selected country. There are several useful identifiers that can help determine the appropriate file, including the 'measure', 'lexis', 'sex' and interval information, as detected from the item names.
This is a helper function for those familiar with prefecture names but not with prefecture codes (and vice versa). It is also useful for looped downloading of data.
getJMDprefectures()
getJMDprefectures()
a character vector of 2-digit prefecture codes. Names correspond to the proper names given in the English version of the HMD webpage.
## Not run: (prefectures <- getJMDprefectures())
## Not run: (prefectures <- getJMDprefectures())
called by readHFC()
and readHFCweb()
. We assume there are no factors in the given data.frame and that it has been read in from the raw text files using something like: read.csv(file = filepath, stringsAsFactors = FALSE, na.strings = ".", strip.white = TRUE)
. This function is visible to users, but is not likely needed directly.
HFCparse(DF)
HFCparse(DF)
DF |
a data.frame of HFC data, freshly read in. |
This parse routine is based on the subjective opinions of the author...
DF same data.frame, modified so that columns are of a useful class. If there were open age categories, such as "-"
or "+"
, this information is stored in a new dummy column called OpenInterval
. Values of 99 or -99 in the AgeInterval
column are replaced with "+"
and "-"
, respectively. Year
taken from Year1
, and YearInterval
is given, rather than Year2
. Users wishing for a central time point should bear this is mind. The column Country
is renamed CNTRY
. Otherwise, columns in this database are kept in the data.frame
, in case they may be useful.
called by readHFD()
and readHFDweb()
. We assume there are no factors in the given data.frame and that it has been read in from the raw text files using something like: read.table(file = filepath, header = TRUE, skip = 2, na.strings = ".", as.is = TRUE)
. This function is visible to users, but is not likely needed directly.
HFDparse(DF)
HFDparse(DF)
DF |
a data.frame of HFD data, freshly read in. |
This parse routine is based on the subjective opinions of the author...
DF same data.frame, modified so that columns are of a useful class. If there were open age categories, such as "-"
or "+"
, this information is stored in a new dummy column called OpenInterval
.
called by readHMD()
and readHMDweb()
. We assume there are no factors in the given data.frame and that it has been read in from the raw text files using something like: read.table(file = filepath, header = TRUE, skip = 2, na.strings = ".", as.is = TRUE)
. This function is visible to users, but is not likely needed directly.
HMDparse(DF, filepath)
HMDparse(DF, filepath)
DF |
a data.frame of HMD data, freshly read in. |
filepath |
just to check if these are population counts from the name. |
This parse routine is based on the subjective opinions of the author...
DF same data.frame, modified so that columns are of a useful class. If there were open age categories, such as "-"
or "+"
, this information is stored in a new dummy column called OpenInterval
.
CHMD data are formatted exactly as HMD data. This function simply parses the necessary url together given a province code and data item (same nomenclature as HMD). Data is parsed using HMDparse()
, which converts columns into useful and intuitive classes, for ready-use. See ?HMDparse
for more information on type conversions. No authentication is required for this database. Only a single item/prefecture is downloaded. Loop for more complex calls (See examples). The provID is not appended as a column, so be mindful of this if appending several items together into a single data.frame
. Note that at the time of this writing, the finest Lexis resolution for prefectural lifetables is 5x5 (5-year, 5-year age groups). Raw data are, however, provided in 1x1 format, and deaths are also available in triangles. Note that cohort data are not produced for Canada at this time (but you could produce such data by starting with the Deaths\_Lexis
file...).
readCHMDweb(provID = "can", item = "Deaths_1x1", fixup = TRUE, ...)
readCHMDweb(provID = "can", item = "Deaths_1x1", fixup = TRUE, ...)
provID |
a single provID 3 character string, as returned by |
item |
the statistical product you want, e.g., |
fixup |
logical. Should columns be made more user-friendly, e.g., forcing Age to be integer? |
... |
extra arguments ultimately passed to |
This database is curated independently from the HMD/HFD family, and so file types and locations may be subject to change. If this happens, please notify the package maintainer.
data.frame
of the data item is invisibly returned
## Not run: library(HMDHFDplus) # grab province codes (including All Canada) provs <- getCHMDprovinces() # grab all mltper_5x5 # and stick into long data.frame: mltper <- do.call(rbind, lapply(provs, function(provID){ Dat <- readCHMDweb(provID = provID, item = "mltper_5x5", fixup = TRUE) Dat$provID <- provID Dat })) ## End(Not run)
## Not run: library(HMDHFDplus) # grab province codes (including All Canada) provs <- getCHMDprovinces() # grab all mltper_5x5 # and stick into long data.frame: mltper <- do.call(rbind, lapply(provs, function(provID){ Dat <- readCHMDweb(provID = provID, item = "mltper_5x5", fixup = TRUE) Dat$provID <- provID Dat })) ## End(Not run)
readHFD()
reads a standard HFD .txt table as a data.frame
This calls read.table()
with all the necessary defaults to avoid annoying surprises. The Age column is also stripped of "-"
and "+"
and converted to integer, and a logical indicator column called OpenInterval
is added to show where these were located. Output is invisibly returned, so you must assign it to take a look. This is to avoid lengthy console printouts.
readHFD(filepath, fixup = TRUE, ...)
readHFD(filepath, fixup = TRUE, ...)
filepath |
path or connection to the HFD text file, including .txt suffix. |
fixup |
logical. Should columns be made more user-friendly, e.g., forcing Age to be integer? |
... |
other arguments passed to |
No details of note.
data.frame of standard HFD output, except the Age column has been cleaned, and a new open age indicator column has been added.
original function submitted by Josh Goldstein, modified by Tim Riffe.
Read HFD data directly from the web. This function is useful for short reproducible examples, or to make code guaranteed to always use the most up to date version of a particular HFD data file. For working with the entire HFD for a comparative study, it may be more efficient to download the full HFD zip files and read in the elements using readHFD()
. This function returns data formatted in the same way as readHFD()
, that is, with Age columns (and others) converted to integer, and with open age group identifiers stored in a new logical column called OpenInterval
. It is faster to specify CNTRY
and item
as arguments than to make the function figure out what's available. For repeated calls to this function, you can pass your username and password in as variables without having to include these in you R script by using userInput()
– see example. The user also has the option of querying particular updates from the HFD revision history. If you wish to specify a particular update, you must know the date that a particular country was updated, in the format "YYYYMMDD"
. These dates differ between countries, so keep a good record if you wish your work to be reproducible to that extent (as well as lightweight)!
readHFDweb( CNTRY = NULL, item = NULL, username = NULL, password = NULL, fixup = TRUE, Update = NULL )
readHFDweb( CNTRY = NULL, item = NULL, username = NULL, password = NULL, fixup = TRUE, Update = NULL )
CNTRY |
character string of the HFD short code. Only one! |
item |
character string of the data product code, which is the base file name, but excluding the country code and file extension |
username |
your HFD usernames, which is usually the email address you registered with |
password |
your HFD password. Don't make this a sensitive password, as things aren't encrypted. |
fixup |
logical. Should columns be made more user-friendly, e.g., forcing Age to be integer? |
Update |
character string of 8-digit date code of the format |
You need to register for HFD to use this function: https://www.humanfertility.org. It is advised to pass in your credentials as named vectors rather than directly as character strings, so that they are not saved directly in your code. See examples. One option is to just save them in your Rprofile file.
data.frame of the given HFD data file, modified in some friendly ways.
### # this will ask you to enter your login details in the R console ### DAT <- readHFDweb("JPN","tfrRR") ### ### # ---------------------------------------- ### # this is a good way to reuse your login credentials without ### # having to reveal them in your R script. ### # if you want to do this in batch then I'm ### # afraid you'll have to find a clever way to ### # pass in your credentials without an interactive ### # session, such as reading them in from a system file of your own. ### myusername <- userInput() ### mypassword <- userInput() ### DAT <- readHMDweb("USA","mltper_1x1",mypassword,myusername) ### ### #----------------------------------------- ### # this also works, but you'll need to make two selections, ### # plus enter data in the console twice: ### DAT <- readHFDweb()
### # this will ask you to enter your login details in the R console ### DAT <- readHFDweb("JPN","tfrRR") ### ### # ---------------------------------------- ### # this is a good way to reuse your login credentials without ### # having to reveal them in your R script. ### # if you want to do this in batch then I'm ### # afraid you'll have to find a clever way to ### # pass in your credentials without an interactive ### # session, such as reading them in from a system file of your own. ### myusername <- userInput() ### mypassword <- userInput() ### DAT <- readHMDweb("USA","mltper_1x1",mypassword,myusername) ### ### #----------------------------------------- ### # this also works, but you'll need to make two selections, ### # plus enter data in the console twice: ### DAT <- readHFDweb()
readHMD()
reads a standard HMD .txt table as a data.frame
This calls read.table()
with all the necessary defaults to avoid annoying surprises. The Age column is also stripped of "+"
and converted to integer, and a logical indicator column called OpenInterval
is added to show where these were located. If the file contains population counts, values are split into two columns for Jan 1 and Dec 31 of the year. Output is invisibly returned, so you must assign it to take a look. This is to avoid lengthy console printouts.
readHMD(filepath, fixup = TRUE, ...)
readHMD(filepath, fixup = TRUE, ...)
filepath |
path or connection to the HMD text file, including .txt suffix. |
fixup |
logical. Should columns be made more user-friendly, e.g., forcing Age to be integer? |
... |
other arguments passed to |
Population counts in the HMD typically refer to Jan 1st. One exception are years in which a territorial adjustment has been accounted for in estimates. For such years, 'YYYY-' refers to Dec 31 of the year before the adjustment, and 'YYYY+' refers to Jan 1 directly after the adjustment (adjustments are always made Jan 1st). In the data, it will just look like two different estimates for the same year, but in fact it is a definition change or similar. In order to remove headaches from potential territorial adjustments in the data, we simply create two columns, one for January 1st (e.g.,"Female1"
) and another for Dec 31st (e.g.,"Female2"
) . One can recover the adjustment coefficient for each year by taking the ratio $$Vx = P1(t+1) / P2(t)$$. In most years this will be 1, but in adjustment years there is a difference. This must always be accounted for when calculating rates and exposures. Argument fixup
is outsourced to HMDparse()
.
data.frame of standard HMD output, except the Age column has been cleaned, and a new open age indicator column has been added. If the file is Population.txt or Population5.txt, there will be two columns each for males and females.
function written by Tim Riffe.
This is a basic HMD data grabber, based on Carl Boe's original HMD2R()
. It will only grab a single HMD statistical product from a single country. Some typical R pitfalls are removed: The Age column is coerced to integer, while an AgeInterval column is created. Also Population counts are placed into two columns, for Jan. 1st and Dec. 31 of the same year, so as to remove headaches from population universe adjustments, such as territorial changes. Fewer options means less to break. To do more sophisticated data extraction, iterate over country codes or statistical items. Reformatting can be done outside this function using, e.g., long2mat()
. Argument fixup
is outsourced to HMDparse()
.
readHMDweb(CNTRY, item, username, password, fixup = TRUE)
readHMDweb(CNTRY, item, username, password, fixup = TRUE)
CNTRY |
character. HMD population letter code. If not spelled right, or not specified, the function provides a selection list. Only 1. |
item |
character. The statistical product you want, e.g., |
username |
character. Your HMD user id, usually the email address you registered with the HMD under. If left blank, you'll be prompted. Do that if you don't mind the typing and prefer not to save your username in your code. |
password |
character. Your HMD password. If left blank, you'll be prompted. Do that if you don't mind the typing and prefer not to save your password in your code. |
fixup |
logical. Should columns be made more user-friendly, e.g., forcing Age to be integer? |
This function points to the new HMD website (from June 2022) rather than the mirror of the old site that it temporarily pointed to; If your credentials fail then a likely reason is that you need to re-register at the new HMD website https://www.mortality.org/Account/UserAgreement. As soon as you register, your new credentials should work.
data.frame of the HMD product, read as as readHMD()
would read it.
JMD data are formatted exactly as HMD data. This function simply parses the necessary url together given a prefecture code and data item (same nomenclature as HMD). Data is parsed using HMDparse()
, which converts columns into useful and intuitive classes, for ready-use. See ?HMDparse
for more information on type conversions. No authentication is required for this database. Only a single item/prefecture is downloaded. Loop for more complex calls (See examples). The prefID is not appended as a column, so be mindful of this if appending several items together into a single data.frame
. Note that at the time of this writing, the finest Lexis resolution for prefectural lifetables is 5x5 (5-year, 5-year age groups). Raw data are, however, provided in 1x1 format, and deaths are also available in triangles.
readJMDweb(prefID = "01", item = "Deaths_5x5", fixup = TRUE, ...)
readJMDweb(prefID = "01", item = "Deaths_5x5", fixup = TRUE, ...)
prefID |
a single prefID 2-digit character string, ranging from |
item |
the statistical product you want, e.g., |
fixup |
logical. Should columns be made more user-friendly, e.g., forcing Age to be integer? |
... |
extra arguments ultimately passed to |
No details of note. This database in independently maintained, so file types/locations are subject to change. If this happens, please notify the package maintainer.
data.frame
of the data item is invisibly returned
## Not run: library(HMDHFDplus) # grab prefecture codes (including All Japan) prefectures <- getJMDprefectures() # grab all mltper_5x5 # and stick into long data.frame: mltper <- do.call(rbind, lapply(prefectures, function(prefID){ Dat <- readJMDweb(prefID = prefID, item = "mltper_5x5", fixup = TRUE) Dat$PrefID <- prefID Dat })) ## End(Not run)
## Not run: library(HMDHFDplus) # grab prefecture codes (including All Japan) prefectures <- getJMDprefectures() # grab all mltper_5x5 # and stick into long data.frame: mltper <- do.call(rbind, lapply(prefectures, function(prefID){ Dat <- readJMDweb(prefID = prefID, item = "mltper_5x5", fixup = TRUE) Dat$PrefID <- prefID Dat })) ## End(Not run)
this is useful for asking the user for a username or password, so that it goes directly to a variable and doesn't get inadvertently saved into an R script. There are no arguments. This will only return a character string. This is low key, don't bother using it for data entry. Just type characters, no need to put it in quotes, pressing enter will cause the function to return. Output will not be printed to the console, but it can be assigned directly. This is useful to have as an auxiliary function in case multiple calls to functions such as readHMDweb()
are desired.
userInput(silent = FALSE)
userInput(silent = FALSE)
silent |
logical should a little prompt be given, telling the user to enter text in the console? |
a character string, as given by the user.
### mypassword <- userInput() ### myusername <- userInput() ### DAT <- readHMDweb("USA","mltper_1x1",mypassword,myusername)
### mypassword <- userInput() ### myusername <- userInput() ### DAT <- readHMDweb("USA","mltper_1x1",mypassword,myusername)