Title: | Death Registration Coverage Estimation |
---|---|
Description: | A set of three two-census methods to the estimate the degree of death registration coverage for a population. Implemented methods include the Generalized Growth Balance method (GGB), the Synthetic Extinct Generation method (SEG), and a hybrid of the two, GGB-SEG. Each method offers automatic estimation, but users may also specify exact parameters or use a graphical interface to guess parameters in the traditional way if desired. |
Authors: | Tim Riffe, Everton Lima, Bernardo Queiroz |
Maintainer: | Tim Riffe <[email protected]> |
License: | GPL-2 |
Version: | 1.0-0 |
Built: | 2024-10-31 21:17:52 UTC |
Source: | CRAN |
$cod
column if missingOnly handles the case of missing $cod
splitting variable for data of a single year/region. This is not super robust. If you have many regions or whatever then do it yourself. This function was just written to make ggb()
robust to the case of a user specifying data that don't have any territorial or other subgroups, aside from sex.
addcod(X)
addcod(X)
X |
a |
X with a new column, $cod
appended.
ggbChooseAges()
a utility function called by ggbChooseAges()
. After clicking a point, this function readjusts the age range
adjustages(a, age, agesfit)
adjustages(a, age, agesfit)
a |
an age specified by the user, as returned by |
age |
ages present in dataset |
agesfit |
the former age range used for calculating the coverage coefficient |
the adjusted set of ages used for calculating the coverage coefficient
NoteCode
toOne property of the LexisDB scripts that might be useful for downstream checks is the ability to trace which functions have modified a given data object. These can go into NoteCode slots. This function writes code
to the first unoccupied NoteCode
column. If all three NoteCode
columns are occupied, it concatenates the end of the third column. This way we preserve a full history. Unfortunately it gets split between columns. Oh well. Good for eyeballing. This function written for the sake of modularity. Function copied from Human Mortality Database collection directly as-is.
assignNoteCode(X, code)
assignNoteCode(X, code)
X |
the HMD data object that presumably has three |
code |
character string to assign to the column, typically the name of the function operating on |
ideally deaths
is the average annual deaths in the intercensal period, but it is also common to give it as the sum. If this was the case, set deaths.summed
to TRUE
and we take care of it.
avgDeaths(codi, deaths.summed = FALSE)
avgDeaths(codi, deaths.summed = FALSE)
codi |
the standard object as described in e.g. |
deaths.summed |
logical. If |
codi a new column, deathsAvg
will be appended.
A dataset containing 486 rows and 7 variables: Population counts for 1991 and 2000 in abridged ages 0, 1, 5, ... 75, with an open age of 80. Deaths are given as the average death count per age group over the intercensal period. In total there are 53 states in this dataset.
BrasilFemales
BrasilFemales
A data frame with 53940 rows and 10 variables:
integer an id number for each state
integer the census population count in 1991
integer the census population count in 2000
numeric average deaths between censuses
integer 1991
integer 2000
integer lower age bound for each age group
character, 'f'
data downloaded from DATASUS http://www.datasus.gov.br
A dataset containing 486 rows and 7 variables: Population counts for 1980 and 1991 in abridged ages 0, 1, 5, ... 75, with an open age of 80. Deaths are given as the average death count per age group over the intercensal period. In total there are 53 states in this dataset.
BrasilMales
BrasilMales
A data frame with 53940 rows and 10 variables:
integer an id number for each state
integer the census population count in 1991
integer the census population count in 2000
numeric average deaths between censuses
integer 1991
integer 2000
integer lower age bound for each age group
character, 'm'
data downloaded from DATASUS http://www.datasus.gov.br
function from now-deprecated demogR
package. Originally written by Ken Wachter, modified by James Jones, and again by the current maintainer, Tim Riffe. Only minor edits to margin naming in the current version.
cdmltw(sex = "F")
cdmltw(sex = "F")
sex |
|
Tons of lifetable output in matrices. Age in columns, lifetable levels in rows.
Estimate the generalized growth balance method, and the two Bennett-Horiuchi methods of estimating death registration coverage. This requires two censuses and an estimate of the deaths in each 5-year age group between censuses. This might be the arithmetic average of deaths in each age class, or simply the average of deaths around the time of the two censuses. All methods use some stable population assumptions.
ddm(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
ddm(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
X |
|
minA |
the lowest age to be included in search |
maxA |
the highest age to be included in search (the lower bound thereof) |
minAges |
the minimum number of adjacent ages to be used in estimating |
exact.ages |
optional. A user-specified vector of exact ages to use for coverage estimation |
eOpen |
optional. A user-specified value for remaining life-expectancy in the open age group. |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
All methods require some specification about which age range to base results on. If not given, an optimal age range will be estimated automatically, and this information is returned to the user. To identify an age-range in the visually, see plot.ggb()
, when working with a single year/sex/region of data. The automatic age-range determination feature of this function tries to implement an intuitive way of picking ages that follows the advice typically given for doing so visually. We minimize the square of the average squared residual between the fitted line and right term.If you want coverage estimates for a variety of partitions (intercensal periods/regions/by sex), then stack them, and use a variable called $cod
with unique values for each data partition. If data is partitioned using the variable $cod
, then the age range automatically determined might not be the same for each partition. If user-specified, (using a vector of exact.ages
) the age ranges will be the same for all partitions. If you want to specify particular age ranges for each data partition, then you'll need to loop it somehow.
All three methods require time points of the two censuses. Census dates can be given in a variety of ways: 1) (preferred) using Date
classes, and column names $date1
and $date2
(or an unambiguous character string of the date, like, "1981-05-13"
) or 2) by giving column names "day1","month1","year1","day2","month2","year2"
containing respective integers. If only year1
and year2
are given, then we assume January 1 dates. If year and month are given, then we assume dates on the first of the month. Different values of $cod
could indicate sexes, regions, intercensal periods, etc. The $deaths
column should refer to the average annual deaths for each age class in the intercensal period. Sometimes one uses the arithmetic average of recorded deaths in each age, or simply the average of the deaths around the time of census 1 and census 2.
The synthetic extinct generation methods require an estimate of remaining life expectancy in the open age group of the data provided. This is produced using a standard reference to the Coale-Demeny West model life tables. That is a place where things can be improved.
data.frame with columns $cod
, $ggb
, $bh1
, $bh2
, $lower
, and $upper
.
Bennett Neil G, Shiro Horiuchi. Estimating the completeness of death registration in a closed population. Population Index. 1981; 1:207-221.
Hill K. Estimating census and death registration completeness. Asian and Pacific Population Forum. 1987; 1:1-13.
Hill K, You D, Choi Y. Death distribution methods for estimating adult mortality: sensitivity analysis with simulated data errors. Demographic Research. 2009; 21:235-254.
Brass, William, 1975. Methods for Estimating Fertility and Mortality from Limited and Defective Data, Carolina Population Center, Laboratory for Population Studies, University of North Carolina, Chapel Hill.
Preston, S. H., Coale, A. J., Trussel, J. & Maxine, W. Estimating the completeness of reporting of adult deaths in populations that are approximately stable. Population Studies, 1980; v.4: 179-202
# The Mozambique data res <- ddm(Moz) head(res) # The Brasil data BM <- ddm(BrasilMales) BF <- ddm(BrasilFemales) head(BM) head(BF)
# The Mozambique data res <- ddm(Moz) head(res) # The Brasil data BM <- ddm(BrasilMales) BF <- ddm(BrasilFemales) head(BM) head(BF)
produce a dot plot, where each x position is a unique value of $cod
, and points indicate the GGB, SEG, GGB-SEG, and harmonic mean of these. Feed this function the output of ddm()
.
ddmplot(X, ...)
ddmplot(X, ...)
X |
output of |
... |
other arguments passed to |
called for its graphical device side-effects.
# just a rough sketch of the results! res <- ddm(Moz) ddmplot(res)
# just a rough sketch of the results! res <- ddm(Moz) ddmplot(res)
Since death distribution methods are primarily used in adult ages, it's OK to chop off the irregular infant and child age intervals (0,1], (1,5]. Further, if high ages are in different intervals this might also be a non-issue. In principal, the user should set MinAge
and MaxAge
to the same values used in the death distribution methods. Here we have some defaults that should almost always return the result 5
for standard abridged data, or 1
for single age data. Really there are not any other common age-specifications, but it is best to identify these and be explicit about them. We return a warning and NA
if more than one age interval is used. It is assumed that ages refer to the lower bounds of age intervals, as is the standard in demography.
detectAgeInterval(Dat, MinAge = 5, MaxAge = 70, ageColumn = "Age")
detectAgeInterval(Dat, MinAge = 5, MaxAge = 70, ageColumn = "Age")
Dat |
a |
MinAge |
integer ignore ages below this age. |
MaxAge |
integer ignore ages above this age. |
ageColumn |
character string giving the name of the Age column |
integer the age interval. NA
if this is not unique.
The column name can be "sex"
or "Sex"
and nothing else. If coded with integer, the number 1 is recognized as male and numbers, 0, 2, or 6 are assumed to be female. Any other integer will throw an error. If character, if the first letter is "f"
, then we assume female, and if the first letter is "m"
we assume male. Case does not matter. Anything else will throw an error. This function allows for just a little flexibility.
detectSex(Dat, sexColumn = "Sex")
detectSex(Dat, sexColumn = "Sex")
Dat |
a |
sexColumn |
character string giving the name of the Sex column |
either "f"
or "m"
This calculation is based on an indirect method to reference the Coale-Demeny West model life table. First one makes a pseudo life table deaths column using some stable pop assumptions (different in SEG vs GGB-SEG). Then take the ratio of the sum of ages 10-39 to 40-59. These ratios have been worked out for each model life table level, so we can pick the level based on the ratio we produce from the data. From there, we just pick out the remaining life expectancy that corresponds to the top age in our data, which for now hopefully is not higher than 95. The model life tables do not go higher than 95 for now, but that's well beyond the range for this method family. If your data go beyond 85 or so, then just group down to 85, say, and estimate using that instead of keeping a high open age. Called by segMakeColumns()
and ggbsegMakeColumns()
, and not intended for direct user interface, because you need to produce the $deathsLT
column. You can skip calling this function by specifying eOpen
in the top call to seg()
or ggbseg()
.
eOpenCD(codiaugmented)
eOpenCD(codiaugmented)
codiaugmented |
the standard codi object being passed through the chain, but having been preprocessed in the course of |
numeric an estimate of remaining life expectancy in the open age group
We still require two year columns, year1
and year2
, at a minimum. If this function is called, and if month and day columns are missing we add these columns, with values of 1. If date columns are given, then these must be either in an unambiguous character format ("YYYY-MM-DD"
, e.g. "2016-05-30"
is unambiguous). Date columns will override the presence of other year, month, day columns.
fakeDates(X)
fakeDates(X)
X |
a |
X the same data.frame
, possibly with columns for year, month, or day added.
Given two censuses and an average annual number of deaths in each age class between censuses, we can use stable population assumptions to estimate the degree of underregistration of deaths. The method is based on finding a best-fitting linear relationship between two modeled parameters (right term and left term), but the fit, and resulting coverage estimate, depend on exactly which age range is taken. This function either finds a nice age range for you automatically, or you can specify an exact vector of ages.
ggb(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, deaths.summed = FALSE)
ggb(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, deaths.summed = FALSE)
X |
|
minA |
the lowest age to be included in search |
maxA |
the highest age to be included in search (the lower bound thereof) |
minAges |
the minimum number of adjacent ages to be used in estimating |
exact.ages |
optional. A user-specified vector of exact ages to use for coverage estimation |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
Census dates can be given in a variety of ways: 1) using Date classes, and column names $date1
and $date2
(or an unambiguous character string of the date, like, "1981-05-13"
) or 2) by giving column names "day1","month1","year1","day2","month2","year2"
containing integers. If only year1
and year2
are given, then we assume January 1 dates. If year and month are given, then we assume dates on the first of the month. If you want coverage estimates for a variety of intercensal periods/regions/by sex, then stack them, and use a variable called $cod
with unique values for each data chunk. Different values of $cod
could indicate sexes, regions, intercensal periods, etc. The $deaths
column should refer to the average annual deaths for each age class in the intercensal period. Sometimes one uses the arithmetic average of recorded deaths in each age, or simply the average of the deaths around the time of census 1 and census 2. To identify an age-range in the traditional visual way, see ggbChooseAges()
, when working with a single year/sex/region of data. The automatic age-range determination feature of this function tries to implement an intuitive way of picking ages that follows the advice typically given for doing so visually. We minimize the square of the average squared residual between the fitted line and right term.
a data.frame
with columns for the coverage coefficient $coverage
, the minimum $lower
and maximum $upper
of the age range on which it is based. $a
and $b
give the intercept and slope of the line on which the coverage estimate is based. $delta
, $k1
, and $k2
are further derived quantities that may be interesting for advanced users. Rows indicate data partitions, as indicated by the optional $cod
variable.
Hill K. Estimating census and death registration completeness. Asian and Pacific Population Forum. 1987; 1:1-13.
Brass, William, 1975. Methods for Estimating Fertility and Mortality from Limited and Defective Data, Carolina Population Center, Laboratory for Population Studies, University of North Carolina, Chapel Hill.
# The Mozambique data res <- ggb(Moz) res # The Brasil data BM <- ggb(BrasilMales) BF <- ggb(BrasilFemales) head(BM) head(BF)
# The Mozambique data res <- ggb(Moz) res # The Brasil data BM <- ggb(BrasilMales) BF <- ggb(BrasilFemales) head(BM) head(BF)
In a spreadsheet one would typically set up the GGB method to produce a plot that updates as the user changes the age range. This function implements that kind of work flow. This will be intuitive for spreadsheet users, but it does not scale well. Imagine you have 200 territorial units, then you would not want to repeat this task. ggb()
does the same thing automatically. You can compare the age range you select manually with the one given back by ggb()
as a diagnostic, for instance. To set up the plot device, just give a single year/region/sex of data. By default it will give the RMSE-optimized age range to start with, but you can specify a vector of exact ages to use as well. All points are plotted, with a fitted line that has been set to a subset of the points, which is plotted in a different color. You can click any point to change the age range, and the plot updates accordingly, up to a maximum of 15 clicks so you don't waste your time. You can stop the plot by either clicking on the graphics device outside the plot area or clicking out the 15 tries (or more if you increase maxit
).
ggbChooseAges(codi, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, maxit = 15, deaths.summed = FALSE)
ggbChooseAges(codi, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, maxit = 15, deaths.summed = FALSE)
codi |
|
minA |
the lowest age to be included in search |
maxA |
the highest age to be included in search (the lower bound thereof) |
minAges |
the minimum number of adjacent ages to be used in estimating |
exact.ages |
optional. A user-specified vector of exact ages to use for coverage estimation. |
maxit |
the maximum number of clicks you can take. Default 15. |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
If you want to send the results of this into ggb()
, you can do so by setting Exact.ages
to seq(lower,upper,by=5)
, where $lower
, and $upper
are the results returned from ggbChooseAges()
after you're done manually determining the age range.
data.frame
containing elements $coverage
, $lower
, $upper
, and ages
.
## Not run: # for interactive sessions only # *click points to adjus age range used (yellow) # *click in margin to stop and return coverage results ggbChooseAges(Moz) ## End(Not run)
## Not run: # for interactive sessions only # *click points to adjus age range used (yellow) # *click in margin to stop and return coverage results ggbChooseAges(Moz) ## End(Not run)
For a single year/sex/region of data (formatted as required by ggb()
), what is the registration coverage implied by a given age range? Called by ggbcoverageFromYear()
and ggbChooseAges()
.
ggbcoverageFromAges(codi, agesfit, deaths.summed = FALSE)
ggbcoverageFromAges(codi, agesfit, deaths.summed = FALSE)
codi |
a chunk of data (single sex, year, region, etc) with all columns required by |
agesfit |
an integer vector of ages, either returned from |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
numeric. the estimated level of coverage.
Given two censuses and an average annual number of deaths in each age class between censuses, we can use stable population assumptions to estimate the degree of underregistration of deaths. The method is based on finding a best-fitting linear relationship between two modeled parameters (right term and left term), but the fit, and resulting coverage estimate, depend on exactly which age range is taken. This function either finds a nice age range for you automatically, or you can specify an exact vector of ages. Called by ggb()
. Users probably don't need to call this directly. Just use ggb()
instead.
Census dates can be given in a variety of ways: 1) using Date classes, and column names $date1
and $date2
(or an unambiguous character string of the date, like, "1981-05-13"
) or 2) by giving column names "day1","month1","year1","day2","month2","year2"
containing integers. If only year1
and year2
are given, then we assume January 1 dates. If year and month are given, then we assume dates on the first of the month.
ggbcoverageFromYear(codi, exact.ages = NULL, minA = 15, maxA = 75, minAges = 8, deaths.summed = FALSE)
ggbcoverageFromYear(codi, exact.ages = NULL, minA = 15, maxA = 75, minAges = 8, deaths.summed = FALSE)
codi |
|
exact.ages |
optional. use an exact set of ages to estimate coverage. |
minA |
the minimum of the age range searched. Default 15 |
maxA |
the maximum of the age range searched. Default 75 |
minAges |
the minimum number of adjacent ages needed as points for fitting. Default 8 |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
a data.frame
with columns for the coverage coefficient, and the min and max of the age range on which it is based.
Called by ggbChooseAges()
and ggbcoverageFromYear()
. This simply modulates some code that would otherwise be repeated. Users probably don't need to call this function directly. If columns produced by ggbMakeColumns()
are not present, then we call it here just to keep things from breaking.
ggbFittedFromAges(codi, agesfit, deaths.summed = FALSE)
ggbFittedFromAges(codi, agesfit, deaths.summed = FALSE)
codi |
a chunk of data (single sex, year, region, etc) with all columns required by |
agesfit |
an a priori set of ages for which to calculate the fit |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
codi, with many columns added, most importantly $rightterm
, $leftterm
, and $exclude
.
Called by ggbcoverageFromYear()
whenever exact.ages
are not given. This automates what one typically does visually.
ggbgetAgesFit(codi, minA = 15, maxA = 75, minAges = 8, deaths.summed = FALSE)
ggbgetAgesFit(codi, minA = 15, maxA = 75, minAges = 8, deaths.summed = FALSE)
codi |
a chunk of data (single sex, year, region, etc) with all columns required by |
minA |
the lowest age to be included in search |
maxA |
the highest age to be included in search (the lower bound thereof) |
minAges |
the minimum number of adjacent ages to be used in estimating |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
a vector of ages that minimizes the RMSE
codeggbChooseAges
Called by ggbgetAgesFit()
whenever the user does not want to manually determine the age range used to determine registration coverage. Probably no need to be called by top-level users. If a user would rather determine the optimal age range some other way, then look to ggbcoverageFromYear()
where ggbgetRMS
is called and add another condition or make it call something else.
ggbgetRMS(agesi, codi)
ggbgetRMS(agesi, codi)
agesi |
the set of ages used for this iteration |
codi |
|
the RMSE
Called by ggbChooseAges()
and ggbcoverageFromYear()
. This simply modulates some code that would otherwise be repeated. Users probably don't need to call this function directly.
ggbMakeColumns(codi, minA = 15, maxA = 75, deaths.summed = FALSE)
ggbMakeColumns(codi, minA = 15, maxA = 75, deaths.summed = FALSE)
codi |
a chunk of data (single sex, year, region, etc) with all columns required by |
minA |
the minimum of the age range searched. Default 15 |
maxA |
the maximum of the age range searched. Default 75 |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
codi, with many columns added, most importantly $rightterm
, $leftterm
, and $exclude
.
Given two censuses and an average annual number of deaths in each age class between censuses, we can use stable population assumptions to estimate the degree of underregistration of deaths. The method estimates age-specific degrees of coverage. The age pattern of these is assumed to be noisy, so we take the arithmetic mean over some range of ages. One may either specify a particular age-range, or let the age range be determined automatically. If the age-range is found automatically, this is done using the method developed for the generalized growth-balance method. Part of this method relies on a prior value for remaining life expectancy in the open age group. By default, this is estimated using a standard reference to the Coale-Demeny West model life table, although the user may also supply a value. The difference between this method and seg()
is that here we adjust census 1 part way through processing, based on some calculations similar to GGB.
ggbseg(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
ggbseg(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
X |
|
minA |
the lowest age to be included in search |
maxA |
the highest age to be included in search (the lower bound thereof) |
minAges |
the minimum number of adjacent ages to be used in estimating |
exact.ages |
optional. A user-specified vector of exact ages to use for coverage estimation |
eOpen |
optional. A user-specified value for remaining life-expectancy in the open age group. |
deaths.summed |
logical. Is the deaths column given as the total per age in the intercensal period ( |
Census dates can be given in a variety of ways: 1) using Date classes, and column names $date1
and $date2
(or an unambiguous character string of the date, like, "1981-05-13"
) or 2) by giving column names "day1","month1","year1","day2","month2","year2"
containing integers. If only year1
and year2
columns are given, then we assume January 1 dates. If year and month are given, then we assume dates on the first of the month. If you want coverage estimates for a variety of intercensal periods/regions/by sex, then stack them, and use a variable called $cod
with a unique values for each data chunk. Different values of $cod
could indicate sexes, regions, intercensal periods, etc. The $deaths
column should refer to the average annual deaths in each age class in the intercensal period. Sometimes one uses the arithmetic average of recorded deaths in each age, or simply the average of the deaths around the time of census 1 and census 2. To identify an age-range in the traditional visual way, see plot.ggb()
, when working with a single year/sex/region of data. The automatic age-range determination feature of this function tries to implement an intuitive way of picking ages that follows the advice typically given for doing so visually. We minimize the square of the average squared residual between the fitted line and right term. Finally, only specify eOpen
when working with a single region/sex/period of data, otherwise the same value will be passed in irrespective of mortality and sex.
a data.frame
with columns for the coverage coefficient $coverage
, and the minimum $lower
and maximum $upper
of the age range on which it is based. Rows indicate data partitions, as indicated by the optional $cod
variable.
Hill K. Methods for measuring adult mortality in developing countries: a comparative review. The global burden of disease 2000 in aging populations. Research paper; No. 01.13; 2001.
Hill K, You D, Choi Y. Death distribution methods for estimating adult mortality: sensitivity analysis with simulated data errors. Demographic Research. 2009; 21:235-254.
Preston, S. H., Coale, A. J., Trussel, J. & Maxine, W. Estimating the completeness of reporting of adult deaths in populations that are approximately stable. Population Studies, 1980; v.4: 179-202
# The Mozambique data res <- ggbseg(Moz) res # The Brasil data BM <- ggbseg(BrasilMales) BF <- ggbseg(BrasilFemales) head(BM) head(BF)
# The Mozambique data res <- ggbseg(Moz) res # The Brasil data BM <- ggbseg(BrasilMales) BF <- ggbseg(BrasilFemales) head(BM) head(BF)
Given two censuses and an average annual number of deaths in each age class between censuses, we can use stable population assumptions to estimate the degree of underregistration of deaths. The method estimates age-specific degrees of coverage. The age pattern of these is assumed to be noisy, so we take the arithmetic mean over some range of ages. One may either specify a particular age-range, or let the age range be determined automatically. If the age-range is found automatically, this is done using the method developed for the generalized growth-balance method. Part of this method relies on a prior value for remaining life expectancy in the open age group. By default, this is estimated using a standard reference to the Coale-Demeny West model life table, although the user may also supply a value. The difference between this method and seg()
is that here we adjust census 1 part way through processing, based on some calculations similar to GGB. Called by ggbseg()
. Users probably do not need to use this function directly.
ggbsegCoverageFromYear(codi, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
ggbsegCoverageFromYear(codi, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
codi |
|
minA |
the minimum of the age range searched. Default 15 |
maxA |
the maximum of the age range searched. Default 75 |
minAges |
the minimum number of adjacent ages needed as points for fitting. Default 8 |
exact.ages |
optional. use an exact set of ages to estimate coverage. |
eOpen |
optional. A user-specified value for remaining life-expectancy in the open age group. |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
Census dates can be given in a variety of ways: 1) using Date classes, and column names $date1
and $date2
(or an unambiguous character string of the date, like, "1981-05-13"
) or 2) by giving column names "day1","month1","year1","day2","month2","year2"
containing integers. If only year1
and year2
are given, then we assume January 1 dates. If year and month are given, then we assume dates on the first of the month.
a data.frame
with columns for the coverage coefficient, and the min and max of the age range on which it is based.
Called by ggbsegCoverageFromYear()
. This simply modulates some code that would otherwise be repeated. Users probably don't need to call this function directly.
ggbsegMakeColumns(codi, minA = 15, maxA = 75, agesFit, eOpen = NULL, deaths.summed = FALSE)
ggbsegMakeColumns(codi, minA = 15, maxA = 75, agesFit, eOpen = NULL, deaths.summed = FALSE)
codi |
a chunk of data (single sex, year, region, etc) with all columns required by |
minA |
the minimum of the age range searched. Default 15 |
maxA |
the maximum of the age range searched. Default 75 |
agesFit |
vector of ages as passed in by |
eOpen |
optional. A value for remaining life expectancy in the open age group. |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
agesFit
is a vector passed in from ggbsegMakeColumns()
, and it was either estimated using the GGB automatic method there, or simply came from the argument exact.ages
specified in ggbseg()
. By default we just automatically estimate these. eOpen
can be either user-specified, or it will be estimated automatically using eOpenCD()
.
codi, with many columns added, most importantly $Cx
.
We want 5-year age groups starting from 0. Standard abridged data has 0i,1,5. So we need to group together 0 and 1. Just for the sake of getting comparable results.
group01(X)
group01(X)
X |
standard input as required by |
X, with child ages grouped as necessary (or not)
a utility function called by ggbChooseAges()
.
guessage(xvec, yvec, click, age)
guessage(xvec, yvec, click, age)
xvec |
|
yvec |
|
click |
a point given by |
age |
ages present in dataset |
the age corresponding to the x,y pair of $rightterm
, $lefttterm
closest to the point clicked.
This function will pick up "death"
, "deaths"
, "Death"
, or "Deaths"
(and maybe some others?) and rename it "deaths"
for easier internal usage.
guessDeathsColumn(X)
guessDeathsColumn(X)
X |
|
The same data.frame
, returned, with the deaths column renamed as "deaths"
This is an internal utility function, to save on redundant lines of code. Not so useful for hand-processing.
headerPrep(X)
headerPrep(X)
X |
this is any codi-style |
a list of codi chunks (by intercensal period, region, etc), with standardized names, dates, etc.
These logical functions are like the usual ones, but NA
values are treated as FALSE
by default. This is not an exhaustive list, but these are the ones that speed our coding, and reduce code clutter. Functions copied from HMD collection directly as-is.
x %==% y x %!=% y x %>% y x %<% y x %>=% y x %<=% y
x %==% y x %!=% y x %>% y x %<% y x %>=% y x %<=% y
x , y
|
any two vectors that can be logically compared. |
Note that one of these, %>%
makes this package incompatible with the magrittr
package.
## Not run: c(1,2,NA,4,5) == c(1,NA,3,4,NA) # compare c(1,2,NA,4,5) %==% c(1,NA,3,4,NA) ## End(Not run)
## Not run: c(1,2,NA,4,5) == c(1,NA,3,4,NA) # compare c(1,2,NA,4,5) %==% c(1,NA,3,4,NA) ## End(Not run)
Check to see if a point clicked falls in the plot or outside it. This function is used by ggbChooseAges()
.
inUSR(USR, click)
inUSR(USR, click)
USR |
as given by |
click |
a pairlist with elements |
logical. TRUE
if in the plot region.
In order to remove lubridate
dependency, we self-detect leap years and adjust February accordingly.
isLeapYear(Year)
isLeapYear(Year)
Year |
integer of year to query |
logical is the Year a leap year or not
Carl Boe
A dataset containing 17 rows and 8 variables: Population counts for 1997 and 2007 in quinquennial age groups 0, 5, ... 75, with an open age of 80. Deaths are given as the average of the age-specific deaths in 1997 and 2007.
Moz
Moz
A data frame with 17 rows and 8 variables:
integer a column of 1s
integer the census population count in 1997
integer the census population count in 2007
integer average of 1997 and 2007 deaths
integer lower age bound for each age group
character âfâ for female
integer 1997
integer 2007
Data courtesy of Bernardo Queiroz.
These methods are not intended to be applied to ages greater than, say 90 or 95. Usually, we'd top out in the range 75 to 85. In any case, the Coale-Demeny life table implementation that we have only goes up to age 95, so there is a practical limitation to deriving a remaining life expectancy for the open age group. If a user tries to apply the Bennett-Horiuchi methods to data with higher open ages, stuff breaks for the time being. So this function chops the data off at min(maxA,95)
, after having (optionally) grouped data down. This function needs to work with a single partition of data (intercensal period, sex, region, etc).
reduceOpen(X, maxA = 75, group = TRUE)
reduceOpen(X, maxA = 75, group = TRUE)
X |
data formatted per the requirements of |
maxA |
integer ignore ages above this age. |
group |
logical. If |
X, with the open age having been reduced either with or without aggregation.
Given two censuses and an average annual number of deaths in each age class between censuses, we can use stable population assumptions to estimate the degree of underregistration of deaths. The method estimates age-specific degrees of coverage. The age pattern of these is assumed to be noisy, so we take the arithmetic mean over some range of ages. One may either specify a particular age-range, or let the age range be determined automatically. If the age-range is found automatically, this is done using the method developed for the generalized growth-balance method. Part of this method relies on a prior value for remaining life expectancy in the open age group. By default, this is estimated using a standard reference to the Coale-Demeny West model life table, although the user may also supply a value.
seg(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
seg(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
X |
|
minA |
the lowest age to be included in search |
maxA |
the highest age to be included in search (the lower bound thereof) |
minAges |
the minimum number of adjacent ages to be used in estimating |
exact.ages |
optional. A user-specified vector of exact ages to use for coverage estimation |
eOpen |
optional. A user-specified value for remaining life-expectancy in the open age group. |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
Census dates can be given in a variety of ways: 1) using Date classes, and column names $date1
and $date2
(or an unambiguous character string of the date, like, "1981-05-13"
) or 2) by giving column names "day1","month1","year1","day2","month2","year2"
containing integers. If only year1
and year2
columns are given, then we assume January 1 dates. If year and month are given, then we assume dates on the first of the month. If you want coverage estimates for a variety of intercensal periods/regions/by sex, then stack them, and use a variable called $cod
with a unique values for each data chunk. Different values of $cod
could indicate sexes, regions, intercensal periods, etc. The $deaths
column should refer to the average annual deaths in each age class in the intercensal period. Sometimes one uses the arithmetic average of recorded deaths in each age, or simply the average of the deaths around the time of census 1 and census 2. To identify an age-range in the traditional visual way, see plot.ggb()
, when working with a single year/sex/region of data. The automatic age-range determination feature of this function tries to implement an intuitive way of picking ages that follows the advice typically given for doing so visually. We minimize the square of the average squared residual between the fitted line and right term. Finally, only specify eOpen
when working with a single region/sex/period of data, otherwise the same value will be passed in irrespective of mortality and sex.
a data.frame
with columns for the coverage coefficient $coverage
, and the minimum $lower
and maximum $upper
of the age range on which it is based. Rows indicate data partitions, as indicated by the optional $cod
variable. $l25
($u25
) give the mean of the lower (upper) quartile of the distribution of age-specific coverage estimates.
Bennett Neil G, Shiro Horiuchi. Estimating the completeness of death registration in a closed population. Population Index. 1981; 1:207-221.
Preston, S. H., Coale, A. J., Trussel, J. & Maxine, W. Estimating the completeness of reporting of adult deaths in populations that are approximately stable. Population Studies, 1980; v.4: 179-202
# The Mozambique data res <- seg(Moz) res # The Brasil data BM <- seg(BrasilMales) BF <- seg(BrasilFemales) head(BM) head(BF)
# The Mozambique data res <- seg(Moz) res # The Brasil data BM <- seg(BrasilMales) BF <- seg(BrasilFemales) head(BM) head(BF)
For a single year/sex/region of data (formatted as required by seg()
, ggbseg()
), what is the registration coverage implied by a given age range? Called by segCoverageFromYear()
and ggbsegCoverageFromYear()
. Here, the function simply takes the arithmetic mean of a given age range of $Cx
, as returned by segMakeColumns()
or ggbsegMakeColumns()
. Not intended for top-level use.
segCoverageFromAges(codi, agesFit)
segCoverageFromAges(codi, agesFit)
codi |
a chunk of data (single sex, year, region, etc) with all columns required by |
agesFit |
an integer vector of ages, either returned from |
numeric. the estimated level of coverage.
Given two censuses and an average annual number of deaths in each age class between censuses, we can use stable population assumptions to estimate the degree of underregistration of deaths. The method estimates age-specific degrees of coverage. The age pattern of these is assumed to be noisy, so we take the arithmetic mean over some range of ages. One may either specify a particular age-range, or let the age range be determined automatically. If the age-range is found automatically, this is done using the method developed for the generalized growth-balance method. Part of this method relies on a prior value for remaining life expectancy in the open age group. By default, this is estimated using a standard reference to the Coale-Demeny West model life table, although the user may also supply a value. Called by seg()
. Users probably do not need to use this function directly.
segCoverageFromYear(codi, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
segCoverageFromYear(codi, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE)
codi |
|
minA |
the minimum of the age range searched. Default 15 |
maxA |
the maximum of the age range searched. Default 75 |
minAges |
the minimum number of adjacent ages needed as points for fitting. Default 8 |
exact.ages |
optional. use an exact set of ages to estimate coverage. |
eOpen |
optional. A user-specified value for remaining life-expectancy in the open age group. |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
Census dates can be given in a variety of ways: 1) using Date classes, and column names $date1
and $date2
(or an unambiguous character string of the date, like, "1981-05-13"
) or 2) by giving column names "day1","month1","year1","day2","month2","year2"
containing integers. If only year1
and year2
are given, then we assume January 1 dates. If year and month are given, then we assume dates on the first of the month.
a data.frame
with columns for the coverage coefficient, and the min and max of the age range on which it is based.
Called by segCoverageFromYear()
. This simply modulates some code that would otherwise be repeated. Users probably don't need to call this function directly.
segMakeColumns(codi, minA = 15, maxA = 75, eOpen = NULL, deaths.summed = FALSE)
segMakeColumns(codi, minA = 15, maxA = 75, eOpen = NULL, deaths.summed = FALSE)
codi |
a chunk of data (single sex, year, region, etc) with all columns required by |
minA |
the minimum of the age range searched. Default 15 |
maxA |
the maximum of the age range searched. Default 75 |
eOpen |
optional. A value for remaining life expectancy in the open age group. |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
codi, with many columns added, most importantly $Cx
.
the SEG method works by averaging the coverage estimates over a range of ages. Users may wish to see the age pattern for diagnostic purposes.
segplot(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE, log = FALSE)
segplot(X, minA = 15, maxA = 75, minAges = 8, exact.ages = NULL, eOpen = NULL, deaths.summed = FALSE, log = FALSE)
X |
|
minA |
the lowest age to be included in search |
maxA |
the highest age to be included in search (the lower bound thereof) |
minAges |
the minimum number of adjacent ages to be used in estimating |
exact.ages |
optional. A user-specified vector of exact ages to use for coverage estimation |
eOpen |
optional. A user-specified value for remaining life-expectancy in the open age group. |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
log |
logical. should we log the y axis? |
All arguments are essentially the same as those given to seg()
Function called for its graphical side effects
## Not run: segplot(Moz) ## End(Not run)
## Not run: segplot(Moz) ## End(Not run)
convert ages of the form 0,1,2,3,4,5,... into 0,1,1,1,1,5,...
single2abr(x)
single2abr(x)
x |
vector of single ages (lower bound) a.k.a. completed age. |
vector of the same length indicating which abridged age group each single age belongs to (lower bound)
Called by ggbFittedFromAges()
and ggbChooseAges()
slopeint(codi, agesfit, deaths.summed = FALSE)
slopeint(codi, agesfit, deaths.summed = FALSE)
codi |
|
agesfit |
a set of ages to estimate coverage from |
deaths.summed |
logical. is the deaths column given as the total per age in the intercensal period ( |
a pairlist with elements $a
for the intercept and $b
for the slope
Either assume 365 days in the year, or get the precise duration.
yint(Day1, Month1, Year1, Day2, Month2, Year2, reproduce.matlab = FALSE, detect.mid.year = TRUE, detect.start.end = TRUE)
yint(Day1, Month1, Year1, Day2, Month2, Year2, reproduce.matlab = FALSE, detect.mid.year = TRUE, detect.start.end = TRUE)
Day1 |
Day of first date |
Month1 |
Month of first date |
Year1 |
Year of first date |
Day2 |
Day of second date |
Month2 |
Month of second date |
Year2 |
Year of second date |
reproduce.matlab |
logical. default |
detect.mid.year |
logical. default |
detect.start.end |
logical. default |
decimal value of year fraction (can be greater than 1)
We accept dates, and fake them otherwise. Dates must be unique. Iterate over data if necessary for multiple intervals.
yint2(X)
yint2(X)
X |
|
an decimal year value of the time between two dates.
The fraction returned by this is used e.g. for intercensal estimates. Function uses 'lubridate' package to handle dates elegantly.
ypart(Year, Month, Day, reproduce.matlab = TRUE, detect.mid.year = FALSE, detect.start.end = TRUE)
ypart(Year, Month, Day, reproduce.matlab = TRUE, detect.mid.year = FALSE, detect.start.end = TRUE)
Year |
4-digit year (string or integer) |
Month |
month digits (string or integer, 1 or 2 characters) |
Day |
Day of month digits (string or integer, 1 or 2 characters) |
reproduce.matlab |
logical. Default TRUE. Assume 365 days in a year. |
detect.mid.year |
logical. if |
detect.start.end |
logical. default |