Title: | Inbreeding-Purging Estimation in Pedigreed Populations |
---|---|
Description: | Inbreeding-purging analysis of pedigreed populations, including the computation of the inbreeding coefficient, partial, ancestral and purged inbreeding coefficients, and measures of the opportunity of purging related to the individual reduction of inbreeding load. In addition, functions to calculate the effective population size and other parameters relevant to population genetics are included. See López-Cortegano E. (2021) <doi:10.1093/bioinformatics/btab599>. |
Authors: | Eugenio López-Cortegano [aut, cre] |
Maintainer: | Eugenio López-Cortegano <[email protected]> |
License: | GPL-2 |
Version: | 1.8.2 |
Built: | 2024-10-24 06:59:12 UTC |
Source: | CRAN |
Returns a boolean vector indicating what individuals are suitable for purging analyses, given a measure of fitness. Individuals with NA values of fitness, and that do not have descendants with non-NA fitness values, are excluded.
ancestors(ped, reference, rp_idx, nboot = 10000L, seed = NULL, skip_Ng = FALSE)
ancestors(ped, reference, rp_idx, nboot = 10000L, seed = NULL, skip_Ng = FALSE)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
reference |
A string naming a column indicating whether individuals belong to the reference population or not. Column must be boolean or coercible to boolean type. |
rp_idx |
Vector containing the indexes of individuals of the RP |
nboot |
Number of bootstrap iterations (for computing Ng). |
seed |
Sets a seed for the random number generator. |
skip_Ng |
Skip Ng computation or not (FALSE by default). |
Boolean vector indicating what individuals will be evaluated.
This data set contains the pedigree of the arrui (Ammotragus lervia), also known as barbary sheep. A total of 380 individuals is included, as well as measurements of biological fitness and other factors (see reference below for details).
arrui
arrui
A data frame with with records from 380 individuals (in rows), and 10 variables:
id - Individual identity.
dam - Maternal identity.
sire - Paternal identity.
survival15 - 15-days survival.
prod - Female productivity.
sex - Individual sex.
yob - Year of birth.
pom - Period of management.
target - Individual in the target population.
eeza_id - Individual identity (as recorded in the original studbook)
The original studbook containing the complete and updated pedigree can be found at: http://www.eeza.csic.es/en/programadecria.aspx.
López-Cortegano E et al. 2021. Genetic purging in captive endangered ungulates with extremely low effective population sizes.*Heredity*, https://www.nature.com/articles/s41437-021-00473-2.
This data set contains the pedigree of Cuvier's gazelle (Gazella atlas). A total of 948 individuals is included, as well as measurements of biological fitness and other factors (see reference below for details).
atlas
atlas
A data frame with with records from 948 individuals (in rows), and 10 variables:
id - Individual identity.
dam - Maternal identity.
sire - Paternal identity.
survival15 - 15-days survival.
prod - Female productivity.
sex - Individual sex.
yob - Year of birth.
pom - Period of management.
target - Individual in the target population.
eeza_id - Individual identity (as recorded in the original studbook)
The original studbook containing the complete and updated pedigree can be found at: http://www.eeza.csic.es/en/programadecria.aspx.
López-Cortegano E et al. 2021. Genetic purging in captive endangered ungulates with extremely low effective population sizes. *Heredity*, https://www.nature.com/articles/s41437-021-00473-2.
Takes a column name, and checks its use as target. It should name a boolean vector (or coercible to it), with at least one TRUE value.
check_ancestors(id, ancestors)
check_ancestors(id, ancestors)
id |
Vector of individual ids. |
ancestors |
Vector of ancestor ids. |
No return value. Will print an error message if checking fail.
This function will group some other checking functions, that should be run everytime when using functions in this package, to avoid unexpected errors.
check_basic( ped, id_name = "id", dam_name = "dam", sire_name = "sire", when_rename = FALSE, when_sort = FALSE )
check_basic( ped, id_name = "id", dam_name = "dam", sire_name = "sire", when_rename = FALSE, when_sort = FALSE )
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
id_name |
Column name for individual id. |
dam_name |
Column name for dam. |
sire_name |
Column name for sire. |
when_rename |
True when called from ped_rename function. It softs checks on individual ID column name and types |
when_sort |
True when called from ped_sort function. It softs checks on pedigree sorting |
No return value. Will print an error message if checking fail.
Can be used to test arguments that need to be of logical (boolean) type
check_bool(variable)
check_bool(variable)
variable |
Variable to test |
No return value. Will print an error message if checking fail.
Some functions require additional columns. Check that they are named in the pedigree.
check_col(names, name)
check_col(names, name)
names |
Column names (all) |
name |
Column name to check. |
No return value. Will print an error message if checking fail.
The purging coefficient must be a number between 0 and 0.5
check_d(d)
check_d(d)
d |
Purging coefficient (taking values between 0.0 and 0.5). |
No return value. Will print an error message if checking fail.
The pedigree must be of object class 'data.frame'.
check_df(obj)
check_df(obj)
obj |
Object to test |
No return value. Will print an error message if checking fail.
Takes a column name, and checks its use as inbreeding coefficient. It should name a numeric vector, with values in the range [0,1]
check_Fcol(ped, Fcol, compute = TRUE)
check_Fcol(ped, Fcol, compute = TRUE)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
Fcol |
Name of column with inbreeding coefficient values. If none is used, inbreeding will be computed. |
compute |
Compute inbreeding if Fcol is NULL |
Vector of inbreeding values (if checks are successful)
Renamed individuals must be named by their index (from 1 to N)
check_index(id)
check_index(id)
id |
Column of individual ids. |
No return value. Will print an error message if checking fail.
Can be used to test arguments that need to be integers
check_int(variable)
check_int(variable)
variable |
Variable to test |
No return value. Will print an error message if checking fail.
Used to test arguments that need to be of length 1
check_length(variable, message = "Expected value of length 1")
check_length(variable, message = "Expected value of length 1")
variable |
Variable to test |
message |
Error message to display |
No return value. Will print an error message if checking fail.
Return warning when NA values are present
check_na(variable)
check_na(variable)
variable |
Variable to test |
No return value. Will print an error message if checking fail.
Columns for id, dam and sire are mandatory. This function checks that they are named in the pedigree. The function works with arbitrary column names (not 'id', 'dam' and 'sire') to work with ped_rename()
check_names(ped, id_name = "id", dam_name = "dam", sire_name = "sire")
check_names(ped, id_name = "id", dam_name = "dam", sire_name = "sire")
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
id_name |
Column name for individual id. |
dam_name |
Column name for dam. |
sire_name |
Column name for sire. |
No return value. Will print an error message if checking fail.
The effective population size (Ne) must be a number higher than 0
check_Ne(Ne)
check_Ne(Ne)
Ne |
Effective population size |
No return value. Will print an error message if checking fail.
Some functions require additional columns. Check if they are already named in the pedigree.
check_not_col(names, name)
check_not_col(names, name)
names |
Column names (all) |
name |
Column name to check. |
No return value. Will print an error message if checking fail.
Expected and observed number of rows must be equal.
check_nrows(df, exp, message = "Expected value of length 1")
check_nrows(df, exp, message = "Expected value of length 1")
df |
Dataframe to test |
exp |
Expected number of rows |
message |
Error message to display |
No return value. Will print an error message if checking fail.
Individuals must be sorted from older to younger
check_order(id, dam, sire, soft_sorting = FALSE)
check_order(id, dam, sire, soft_sorting = FALSE)
id |
Vector of individual ids. |
dam |
Vector of dam ids. |
sire |
Vector of sire ids. |
soft_sorting |
If TRUE checking is relaxed, allowing descendants to be declared before ancestors |
No return value. Will print an error message if checking fail.
Takes a column name, and checks its use as reference. It should name a boolean vector (or coercible to it), with at least one TRUE value.
Takes a column name, and checks its use as target. It should name a boolean vector (or coercible to it), with at least one TRUE value.
check_reference(ped, reference) check_target(ped, reference, target, variable)
check_reference(ped, reference) check_target(ped, reference, target, variable)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
reference |
A string naming a column indicating whether individuals belong to the reference population or not. Column must be boolean or coercible to boolean type. |
target |
Target column |
variable |
To be used in printed messages |
Vector of reference numbers (if checks are successful)
Vector of target numbers (if checks are successful)
Individual id are unique and cannot be repeated
check_repeat_id(id)
check_repeat_id(id)
id |
Vector of individual ids. |
No return value. Will print an error message if checking fail.
Takes a column name, and checks its use as generation numbers. It should name a numeric vector, with values >= 0.
check_tcol(ped, tcol, compute = TRUE, force_int = FALSE)
check_tcol(ped, tcol, compute = TRUE, force_int = FALSE)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
tcol |
Name of column with individual generation times. If none is used, the number of equivalent complete generations is computed. |
compute |
Compute generation numbers if tcol is NULL |
force_int |
Generation numbers must be integers (disabled by default) |
Vector of generation numbers (if checks are successful)
Columns for id, dam and sire are mandatory, and required to be of type integer
check_types(id, dam, sire)
check_types(id, dam, sire)
id |
Vector of individual ids. |
dam |
Vector of dam ids. |
sire |
Vector of sire ids. |
No return value. Will print an error message if checking fail.
Individual id cannot equal zero (0). This is reserved to dams and sires.
check_zero_id(id)
check_zero_id(id)
id |
Vector of individual ids. |
No return value. Will print an error message if checking fail.
This data set contains the pedigree of the dama gazelle (Nanger dama). A total of 1316 individuals is included, as well as measurements of biological fitness and other factors (see reference below for details).
dama
dama
A data frame with with records from 1316 individuals (in rows), and 10 variables:
id - Individual identity.
dam - Maternal identity.
sire - Paternal identity.
survival15 - 15-days survival.
prod - Female productivity.
sex - Individual sex.
yob - Year of birth.
pom - Period of management.
target - Individual in the target population.
eeza_id - Individual identity (as recorded in the original studbook)
The original studbook containing the complete and updated pedigree can be found at: http://www.eeza.csic.es/en/programadecria.aspx.
López-Cortegano E et al. 2021. Genetic purging in captive endangered ungulates with extremely low effective population sizes. *Heredity*, https://www.nature.com/articles/s41437-021-00473-2.
This data set contains the pedigree of the Darwin/Wedgwood dynasty. It is composed by a total of 63 individuals, including Charles R. Darwin and Francis Galton.
darwin
darwin
A data frame with with records from 63 individuals (in rows), and 3 variables:
Individual - Individual identity.
Mother - Mother's identity.
Father - Father's identity.
The pedigree is adapted from Berra et al. (2010)
Berra TM et al. 2010. Was the Darwin/Wedgwood dynasty adversely affected by consanguinity?. BioScience 60(5): 376-383.
Computes the increase in inbreeding coefficient for a given individual
delta_Fi(Fi, t)
delta_Fi(Fi, t)
Fi |
Individual inbreeding coefficient. |
t |
Individual generation number. |
Individual variation in inbreeding.
This data set contains the pedigree of dorcas gazelle (Gazella dorcas). A total of 1279 individuals is included, as well as measurements of biological fitness and other factors (see reference below for details).
dorcas
dorcas
A data frame with with records from 1279 individuals (in rows), and 10 variables:
id - Individual identity.
dam - Maternal identity.
sire - Paternal identity.
survival15 - 15-days survival.
prod - Female productivity.
sex - Individual sex.
yob - Year of birth.
pom - Period of management.
target - Individual in the target population.
eeza_id - Individual identity (as recorded in the original studbook)
The original studbook containing the complete and updated pedigree can be found at: http://www.eeza.csic.es/en/programadecria.aspx.
López-Cortegano E et al. 2021. Genetic purging in captive endangered ungulates with extremely low effective population sizes. *Heredity*, https://www.nature.com/articles/s41437-021-00473-2.
Estimates the expected inbreeding coefficient (F) as a function of the effective population size and generation number
exp_F(Ne, t)
exp_F(Ne, t)
Ne |
Effective population size |
t |
Generation number |
Computation of the inbreeding coefficient uses the classical formula:
F(t) = 1 - (1 - 1/2N) ^ t
The inbreeding coefficient
Falconer DS, Mackay TFC. 1996. Introduction to Quantitative Genetics. 4th edition. Longman, Essex, U.K.
exp_F(Ne = 50, t = 0) exp_F(Ne = 50, t = 50) exp_F(Ne = 10, t = 50)
exp_F(Ne = 50, t = 0) exp_F(Ne = 50, t = 50) exp_F(Ne = 10, t = 50)
Estimates the expected ancestral inbreeding coefficient (Fa) as a function of the effective population size and generation number
exp_Fa(Ne, t)
exp_Fa(Ne, t)
Ne |
Effective population size |
t |
Generation number |
Computation of the ancestral inbreeding coefficient uses the adaptation from Ballou's (1997) formula, as in López-Cortegano et al. (2018):
Fa(t) = 1 - (1 - 1/2N) ^ (1/2 (t-1)t)
The ancestral inbreeding coefficient
Ballou JD. 1997. Ancestral inbreeding only minimally affects inbreeding depression in mammalian populations. J Hered. 88:169–178.
López-Cortegano E et al. 2018. Detection of genetic purging and predictive value of purging parameters estimated in pedigreed populations. Heredity 121(1): 38-51.
exp_Fa(Ne = 50, t = 0) exp_Fa(Ne = 50, t = 50) exp_Fa(Ne = 10, t = 50)
exp_Fa(Ne = 50, t = 0) exp_Fa(Ne = 50, t = 50) exp_Fa(Ne = 10, t = 50)
Estimates the expected purged inbreeding coefficient (g) as a function of the effective population size, generation number, and purging coefficient
exp_g(Ne, t, d)
exp_g(Ne, t, d)
Ne |
Effective population size |
t |
Generation number |
d |
Purging coefficient (taking values between 0.0 and 0.5). |
Computation of the purged inbreeding coefficient is calculated as in García-Dorado (2012):
g(t) = [ (1 - 1/2N) g(t-1) + 1/2N] * [1 - 2d F(t-1)]
When convergence is reached, the asymptotic value g(a) is returned:
g(a) = (1 - 2d) / (1 + 2d (2N-1))
The purged inbreeding coefficient
García-Dorado. 2012. Understanding and predicting the fitness decline of shrunk populations: Inbreeding, purging, mutation, and standard selection. Genetics 190: 1-16.
exp_g(Ne = 50, t = 0, d = 0.15) exp_g(Ne = 50, t = 50, d = 0.15) exp_g(Ne = 10, t = 50, d = 0.15)
exp_g(Ne = 50, t = 0, d = 0.15) exp_g(Ne = 50, t = 50, d = 0.15) exp_g(Ne = 10, t = 50, d = 0.15)
Computes the standard inbreeding coefficient (F). This is the probability that two alleles on a locus are identical by descent (Falconer and Mackay 1996, Wright 1922), calculated from the genealogical coancestry matrix (Malécot 1948).
F(ped, name_to)
F(ped, name_to)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
name_to |
A string naming the new output column. |
The input dataframe, plus an additional column named "F" with individual inbreeding coefficient values.
Falconer DS, Mackay TFC. 1996. Introduction to Quantitative Genetics. 4th edition. Longman, Essex, U.K.
Malécot G, 1948. Les Mathématiques de l’hérédité. Masson & Cie., Paris.
Wright S. 1922. Coefficients of inbreeding and relationship. The American Naturalist 56: 330-338.
Computes the ancestral inbreeding coefficient (Fa). This is the probability that an allele has been in homozygosity in at least one ancestor (Ballou 1997). A genedrop approach is included to compute unbiased estimates of Fa (Baumung et al. 2015).
Fa(ped, Fi, name_to, genedrop = 0L, seed = NULL)
Fa(ped, Fi, name_to, genedrop = 0L, seed = NULL)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
Fi |
Vector of inbreeding coefficient values |
name_to |
A string naming the new output column. |
genedrop |
Number of genedrop iterations to run. If set to zero (as default), Ballou's Fa is computed. |
seed |
Sets a seed for the random number generator. |
The input dataframe, plus an additional column named "Fa" with individual ancestral inbreeding coefficient values.
Ballou JD. 1997. Ancestral inbreeding only minimally affects inbreeding depression in mammalian populations. J Hered. 88:169–178.
Baumung et al. 2015. GRAIN: A computer program to calculate ancestral and partial inbreeding coefficients using a gene dropping approach. Journal of Animal Breeding and Genetics 132: 100-108.
Computes partial inbreeding coefficients, Fi(j). A coefficient Fi(j) can be read as the probability of individual i being homozygous for alleles derived from ancestor j
Fij_core(ped, ancestors, ancestors_idx, Fi, mapa, ncores = 1, genedrop, seed)
Fij_core(ped, ancestors, ancestors_idx, Fi, mapa, ncores = 1, genedrop, seed)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
ancestors |
Vector of the identities to be assumed as founder ancestors. |
ancestors_idx |
Index of ancestors. |
Fi |
Vector of inbreeding coefficients. |
mapa |
Map of ancestors |
ncores |
Number of cores to use for parallel computing (default = 1) |
genedrop |
Enable genedrop simulation |
seed |
Sets a seed for the random number generator. |
A matrix of partial inbreeding coefficients. Fi(j) values can thus be read from row i and column j.
Computes partial inbreeding coefficients, Fi(j). A coefficient Fi(j) can be read as the probability of individual i being homozygous for alleles derived from ancestor j
Fij_core_i_cpp(dam, sire, anc_idx, mapa, Fi, genedrop = 0L, seed = NULL)
Fij_core_i_cpp(dam, sire, anc_idx, mapa, Fi, genedrop = 0L, seed = NULL)
dam |
Vector of dam ids. |
sire |
Vector of sire ids. |
anc_idx |
Index of ancestors. |
mapa |
Map of ancestors |
Fi |
Vector of inbreeding coefficients. |
genedrop |
Enable genedrop simulation |
seed |
Sets a seed for the random number generator. |
A matrix of partial inbreeding coefficients. Fi(j) values can thus be read from row i and column j.
Computes the purged inbreeding coefficient (g). This is the probability that two alleles on a locus are identical by descent, but relative to deleterious recessive alleles (García-Dorado 2012). The reduction in g relative to standard inbreeding (F) is given by an effective purging coefficient (d), that measures the strength of the deleterious recessive component in the genome. The coefficient g is computed following the methods for pedigrees in García-Dorado (2012) and García-Dorado et al. (2016).
g(ped, d, Fi, name_to)
g(ped, d, Fi, name_to)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
d |
Purging coefficient (taking values between 0.0 and 0.5). |
Fi |
Vector of inbreeding coefficient values |
name_to |
A string naming the new output column. |
The input dataframe, plus an additional column named "g" followed by the purging coefficient, containing purged inbreeding coefficient values.
García-Dorado. 2012. Understanding and predicting the fitness decline of shrunk populations: Inbreeding, purging, mutation, and standard selection. Genetics 190: 1-16.
García-Dorado et al. 2016. Predictive model and software for inbreeding-purging analysis of pedigreed populations. G3 6: 3593-3601.
Computes the deviation from Hardy-Weinberg equilibrium following Caballero and Toro (2000).
hwd(ped, reference = NULL)
hwd(ped, reference = NULL)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
reference |
A string naming a column indicating whether individuals belong to the reference population or not. Column must be boolean or coercible to boolean type. |
A numeric value indicating the deviation from Hardy-Weinberg equilibrium.
Caballero A, Toro M. 2000. Interrelations between effective population size and other pedigree tools for the management of conserved populations. Genet. Res. 75: 331-343.
Creates a vector of length N (the number of individuals) Only coordinates for valid ancestors will be given
idx_ancestors(ids, N)
idx_ancestors(ids, N)
ids |
Ancestor identities |
N |
Total number of individuals |
A logical matrix.
Computes the standard inbreeding coefficient (F). This is the probability that two alleles on a locus are identical by descent (Falconer and Mackay 1996, Wright 1922), calculated from the genealogical coancestry matrix (Malécot 1948).
ip_F(ped, name_to = "Fi")
ip_F(ped, name_to = "Fi")
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
name_to |
A string naming the new output column. |
The input dataframe, plus an additional column with individual inbreeding coefficient values (named "Fi" by default).
Falconer DS, Mackay TFC. 1996. Introduction to Quantitative Genetics. 4th edition. Longman, Essex, U.K.
Malécot G, 1948. Les Mathématiques de l’hérédité. Masson & Cie., Paris.
Wright S. 1922. Coefficients of inbreeding and relationship. The American Naturalist 56: 330-338.
data(dama) dama <- ip_F(dama) tail(dama)
data(dama) dama <- ip_F(dama) tail(dama)
Computes the ancestral inbreeding coefficient (Fa). This is the probability that an allele has been in homozygosity in at least one ancestor (Ballou 1997). A genedrop approach is included to compute unbiased estimates of Fa (Baumung et al. 2015).
ip_Fa(ped, name_to = "Fa", genedrop = 0, seed = NULL, Fcol = NULL)
ip_Fa(ped, name_to = "Fa", genedrop = 0, seed = NULL, Fcol = NULL)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
name_to |
A string naming the new output column. |
genedrop |
Number of genedrop iterations to run. If set to zero (as default), Ballou's Fa is computed. |
seed |
Sets a seed for the random number generator. |
Fcol |
Name of column with inbreeding coefficient values. If none is used, inbreeding will be computed. |
The input dataframe, plus an additional column with individual ancestral inbreeding coefficient values (named "Fa" by default).
Ballou JD. 1997. Ancestral inbreeding only minimally affects inbreeding depression in mammalian populations. J Hered. 88:169–178.
Baumung et al. 2015. GRAIN: A computer program to calculate ancestral and partial inbreeding coefficients using a gene dropping approach. Journal of Animal Breeding and Genetics 132: 100-108.
data(dama) # dama <- ip_Fa(dama) # Compute F on the go (won't be kept in the pedigree). dama <- ip_F(dama) dama <- ip_Fa(dama, Fcol = 'Fi') # If F is computed in advance. tail(dama)
data(dama) # dama <- ip_Fa(dama) # Compute F on the go (won't be kept in the pedigree). dama <- ip_F(dama) dama <- ip_Fa(dama, Fcol = 'Fi') # If F is computed in advance. tail(dama)
Computes partial inbreeding coefficients, Fi(j). A coefficient Fi(j) can be read as the probability of individual i being homozygous for alleles derived from ancestor j. It is calculated following the tabular method described by Gulisija & Crow (2007). Optionally, it can be estimated via genedrop simulation.
ip_Fij( ped, mode = "founders", ancestors = NULL, Fcol = NULL, genedrop = 0, seed = NULL, ncores = 1L )
ip_Fij( ped, mode = "founders", ancestors = NULL, Fcol = NULL, genedrop = 0, seed = NULL, ncores = 1L )
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
mode |
Defines the set of ancestors considered when computing partial inbreeding. It can be set as: "founder" for inbreeding conditional to founders only (default), "all" for all individuals in the pedigree (it may take long to compute in large pedigrees), and "custom" for individuals identities given in a integer vector (see 'ancestors' argument). |
ancestors |
Under the "custom" run mode, it defines a vector of ancestors that will be considered when computing partial inbreeding values. |
Fcol |
Name of column with inbreeding coefficient values. If none is used, inbreeding will be computed. |
genedrop |
Number of genedrop iterations to run. If set to zero (as default), exact coefficients are computed. |
seed |
Sets a seed for the random number generator (only if genedrop is enabled). |
ncores |
Number of cores to use for parallel computing (default = 1) |
A matrix of partial inbreeding coefficients. Fi(j) values can thus be read from row i and column j. In the resultant matrix, there are as many rows as individuals in the pedigree, and as many columns as ancestors used. Columns will be named and sorted by ancestor identity.
Gulisija D, Crow JF. 2007. Inferring purging from pedigree data. Evolution 61(5): 1043-1051.
# Original pedigree file in Gulisija & Crow (2007) pedigree <- tibble::tibble( id = c("M", "K", "J", "a", "c", "b", "e", "d", "I"), dam = c("0", "0", "0", "K", "M", "a", "c", "c", "e"), sire = c("0", "0", "0", "J", "a", "J", "b", "b", "d") ) pedigree <- purgeR::ped_rename(pedigree, keep_names = TRUE) # Partial inbreeding relative to founder ancestors m <- ip_Fij(pedigree) # Note that in the example above, the sum of the values in # rows will equal the vector of inbreeding coefficients # i.e. base::rowSums(m) equals purgeR::ip_F(pedigree)$Fi # Compute partial inbreeding relative to an arbitrary ancestor # with id = 3 (i.e. individual named "J") anc <- as.integer(c(3)) m <- ip_Fij(pedigree, mode = "custom", ancestors = anc)
# Original pedigree file in Gulisija & Crow (2007) pedigree <- tibble::tibble( id = c("M", "K", "J", "a", "c", "b", "e", "d", "I"), dam = c("0", "0", "0", "K", "M", "a", "c", "c", "e"), sire = c("0", "0", "0", "J", "a", "J", "b", "b", "d") ) pedigree <- purgeR::ped_rename(pedigree, keep_names = TRUE) # Partial inbreeding relative to founder ancestors m <- ip_Fij(pedigree) # Note that in the example above, the sum of the values in # rows will equal the vector of inbreeding coefficients # i.e. base::rowSums(m) equals purgeR::ip_F(pedigree)$Fi # Compute partial inbreeding relative to an arbitrary ancestor # with id = 3 (i.e. individual named "J") anc <- as.integer(c(3)) m <- ip_Fij(pedigree, mode = "custom", ancestors = anc)
Computes the purged inbreeding coefficient (g). This is the probability that two alleles on a locus are identical by descent, but relative to deleterious recessive alleles (García-Dorado 2012). The reduction in g relative to standard inbreeding (F) is given by an effective purging coefficient (d), that measures the strength of the deleterious recessive component in the genome. The coefficient g is computed following the methods for pedigrees in García-Dorado (2012) and García-Dorado et al. (2016).
ip_g(ped, d, name_to = "g<d>", Fcol = NULL)
ip_g(ped, d, name_to = "g<d>", Fcol = NULL)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
d |
Purging coefficient (taking values between 0.0 and 0.5). |
name_to |
A string naming the new output column. |
Fcol |
Name of column with inbreeding coefficient values. If none is used, inbreeding will be computed. |
The input dataframe, plus an additional column containing purged inbreeding coefficient values (named "g" and followed by the purging coefficient value by default).
García-Dorado. 2012. Understanding and predicting the fitness decline of shrunk populations: Inbreeding, purging, mutation, and standard selection. Genetics 190: 1-16.
García-Dorado et al. 2016. Predictive model and software for inbreeding-purging analysis of pedigreed populations. G3 6: 3593-3601.
data(dama) dama <- ip_g(dama, d = 0.23) tail(dama)
data(dama) dama <- ip_g(dama, d = 0.23) tail(dama)
The potential reduction in individual inbreeding load can be estimated by means of the opportunity of purging (O) and expressed opportunity of purging (Oe) parameters described by Gulisija and Crow (2007). Whereas O relates to the total potential reduction of the inbreeding load in an individual, as a consequence of it having inbred ancestors, Oe relates to the expressed potential reduction of the inbreeding load. Only Oe is computed by default. Estimates of O and Oe need to be corrected in complex pedigrees (see Details below). Both corrected (named "O" and "Oe" by default), and non-corrected (suffixed with "_raw") are returned.
ip_op( ped, name_Oe = "Oe", compute_O = FALSE, name_O = "O", Fcol = NULL, ncores = 1L, genedrop = 0, seed = NULL, complex = NULL )
ip_op( ped, name_Oe = "Oe", compute_O = FALSE, name_O = "O", Fcol = NULL, ncores = 1L, genedrop = 0, seed = NULL, complex = NULL )
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
name_Oe |
A string naming the new output column for the expressed opportunity of purging (defaults to "Oe") |
compute_O |
Enable computation of total opportunity of purging (disabled by default) |
name_O |
A string naming the new output column for total opportunity of purging (defaults to "O") |
Fcol |
Name of column with inbreeding coefficient values. If none is used, inbreeding will be computed. |
ncores |
Number of cores to use for parallel computing (default = 1) |
genedrop |
Number of genedrop iterations run to compute partial inbreedng. If set to zero (as default), exact coefficients are computed. |
seed |
Sets a seed for the random number generator (only if genedrop is enabled). |
complex |
Enable correction for complex pedigrees (deprecated in v1.3, both raw and corrected measures of "Oe" are returned now). |
Model used here assume fully recessive, high effect size alleles (Gulisija and Crow, 2007).
In simple pedigrees, the opportunity of purging (O) and the expressed opportunity of purging (Oe) are estimated as in Gulisija and Crow (2007). For complex pedigrees involving more than one autozygous individual per path from an individual to an ancestor, O and Oe in the closer ancestors need to be discounted for what was already accounted for in their predecessors. To solve this problem, Gulisija and Crow (2007) provide expression to correct O and Oe (see equations 21 and 22 in the manuscript).
Here, an heuristic approach is used to prevent the inflation of O and Oe,
and avoid the use of additional looped expressions that may result in an
excessive computational cost. To do so, only the contribution of the most recent
ancestors in a path will be considered. Specifically, the method skips contributions
from "far" ancestors k, such that Fj(k) > 0, where j is an intermediate ancestor,
both referred to an individual i of interest. Fj(k) refers to the partial
inbreeding of j for alleles derived from k (see ip_Fij
).
This may not provide exact values of O and Oe, but we expect little bias, since
more distant ancestors also contribute lesser to O and Oe.
Both types of estimates (corrected and non-corrected) are returned (non-corrected estimates, prefixed with "_raw").
The input dataframe, plus an additional column containing Oe and Oe_raw estimates (additional columns for O can appended by enabling compute_O = TRUE
).
Gulisija D, Crow JF. 2007. Inferring purging from pedigree data. Evolution 61(5): 1043-1051.
# Original pedigree file in Gulisija & Crow (2007) pedigree <- tibble::tibble( id = c("M", "K", "J", "a", "c", "b", "e", "d", "I"), dam = c("0", "0", "0", "K", "M", "a", "c", "c", "e"), sire = c("0", "0", "0", "J", "a", "J", "b", "b", "d") ) pedigree <- purgeR::ped_rename(pedigree, keep_names = TRUE) ip_op(pedigree, compute_O = TRUE)
# Original pedigree file in Gulisija & Crow (2007) pedigree <- tibble::tibble( id = c("M", "K", "J", "a", "c", "b", "e", "d", "I"), dam = c("0", "0", "0", "K", "M", "a", "c", "c", "e"), sire = c("0", "0", "0", "J", "a", "J", "b", "b", "d") ) pedigree <- purgeR::ped_rename(pedigree, keep_names = TRUE) ip_op(pedigree, compute_O = TRUE)
Creates a logical matrix that indicates whether an individual i (in columns) is ancestor of other j (in rows) For example, matrix[, 1] will indicate descendants of id = 1 And matrix[1, ] indicates ancestors of id = 1
map_ancestors(ped, idx)
map_ancestors(ped, idx)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
idx |
Index of ancestors to map |
A logical matrix.
Computes the mean realized effective population size. Note this function expected a mean delta_F value for all individuals in the reference population
Computes the standard error of the realized effective population size. Note this function expects the mean and standard deviation of delta F, as well as the total number of individuals in the reference population
Ne_delta(delta) se_Ne_delta(delta)
Ne_delta(delta) se_Ne_delta(delta)
delta |
Vector of individual variations in inbreeding. |
Mean effective population size.
Standard error of the effective population size.
The potential reduction in individual inbreeding load can be estimated by means of the opportunity of purging (O) and expressed opportunity of purging (Oe) parameters described by Gulisija and Crow (2007). Whereas O relates to the total potential reduction of the inbreeding load in an individual, as a consequence of it having inbred ancestors, Oe relates to the expressed potential reduction of the inbreeding load. In both cases, these measures are referred to fully recessive, high effect size alleles (e.g. lethals). For complex pedigrees, involving more than one autozygous individual per path from a reference individual to an ancestor, these estimates are estimated following an heuristic approach (see details below).
op(ped, pi, Fi, name_O, name_Oe, sufix, compute_O = FALSE)
op(ped, pi, Fi, name_O, name_Oe, sufix, compute_O = FALSE)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
pi |
Partial inbreeding matrix |
Fi |
Vector of inbreeding coefficient values |
name_O |
A string naming the new output column for total opportunity of purging (defaults to "O") |
name_Oe |
A string naming the new output column for the expressed opportunity of purging (defaults to "Oe") |
sufix |
A string naming the sufix for non-corrected O and Oe measures |
compute_O |
Enable computation of total opportunity of purging (false by default) |
In simple pedigrees, the opportunity of purging (O) and the expressed opportunity of purging (Oe) are estimated as in Gulisija and Crow (2007). For complex pedigrees involving more than one autozygous individual per path from an individual to an ancestor, O and Oe in the closer ancestors need to be discounted for what was already accounted for in their predecessors. To solve this problem, Gulisija and Crow (2007) provide expression to correct O and Oe (see equations 21 and 22 in the manuscript).
Here, an heuristic approach is used to prevent the inflation of O and Oe,
and avoid the use of additional looped expressions that may result in an
excessive computational cost. To do so, when using ip_op(complex = TRUE)
only the contribution of the most recent ancestors in a path will be considered.
This may not provide exact values of O and Oe, but we expect little bias, since
more distant ancestors also contribute lesser to O and Oe.
The input dataframe, plus two additional column named "O" and "Oe", containing total and expressed opportunity of purging measures.
Gulisija D, Crow JF. 2007. Inferring purging from pedigree data. Evolution 61(5): 1043-1051.
Remove individuals that are not necessary for purging analyses involving fitness.
This will reduce the size of the pedigree, and speed up the computation of inbreeding
parameters.
Individuals removed include those with unknown (NA)
values of a given parameter, as long as they do not have any descendant in the
pedigree with known values of that parameter.
Cleaned pedigrees will automatically have individual identities
renamed (see ped_rename
).
ped_clean(ped, value_from)
ped_clean(ped, value_from)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
value_from |
Name of the column of interest. |
A dataframe with the pedigree cleaned for the specificed parameter (column) provided.
data(arrui) nrow(arrui) arrui <- ped_clean(arrui, "survival15") nrow(arrui)
data(arrui) nrow(arrui) arrui <- ped_clean(arrui, "survival15") nrow(arrui)
Processes a pedigree into a list with two objects, one dataframe of edges, and a dataframe of vertices, which can be used as input for functions of the igraph package.
ped_graph(ped)
ped_graph(ped)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
A list with one dataframe 'edges' and another 'vertices', each following igraph format.
The 'edges' dataframe will contain two columns in addition to the defaults "from" and "to": 1) 'from_parent' indicates whether the vertex from which the edge originates represents a mother ("dam") or a father ("sire"). 2) 'to_parent' indicates whether the vertex to which the edge is directed represents a mother ("dam"), father ("sire") or none ("NA").
ped_rename
, graph_from_data_frame
data(atlas) atlas_graph <- ped_graph(atlas) G <- igraph::graph_from_data_frame(d = atlas_graph$edges, vertices = atlas_graph$vertices, directed = TRUE)
data(atlas) atlas_graph <- ped_graph(atlas) G <- igraph::graph_from_data_frame(d = atlas_graph$edges, vertices = atlas_graph$vertices, directed = TRUE)
For every individual in the pedigree, it will assign them their maternal (or paternal) value for an observed variable of interest.
ped_maternal(ped, value_from, name_to, use_dam = TRUE, set_na = NULL)
ped_maternal(ped, value_from, name_to, use_dam = TRUE, set_na = NULL)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
value_from |
Name of the column of interest. |
name_to |
A string naming the new output column. |
use_dam |
Extract maternal values. If false, parental values are returned. |
set_na |
When maternal values are unknown, NA values are generated by default. This option allows to set a different value. |
The input dataframe, plus an additional column with maternal (or paternal) values of a variable of interest.
# To assign maternal inbreeding as a new variable, we can do as follows: data(dama) dama <- ip_F(dama) dama <- ped_maternal(dama, value_from = "Fi", name_to = "Fdam") tail(dama)
# To assign maternal inbreeding as a new variable, we can do as follows: data(dama) dama <- ip_F(dama) dama <- ped_maternal(dama, value_from = "Fi", name_to = "Fdam") tail(dama)
Functions in purgeR require individuals to be named with integers from 1 to N. This takes a dataframe containing a pedigree, and rename individuals having names in any format to that required by other functions in purgeR. The process will also check that the pedigree format is suitable for other functions in the package.
ped_rename(ped, id = "id", dam = "dam", sire = "sire", keep_names = FALSE)
ped_rename(ped, id = "id", dam = "dam", sire = "sire", keep_names = FALSE)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
id |
A string naming the column with individual identities. It will be renamed to its default value 'id'. |
dam |
A string naming the column with maternal identities. It will be renamed to its default value 'dam'. |
sire |
A string naming the column with paternal identities. It will be renamed to its default value 'sire'. |
keep_names |
A boolean value indicating whether the original identity values should be kept on a separate column (named 'names'), or not. |
A dataframe with the pedigree's identities renamed.
data(darwin) darwin <- ped_rename(darwin, id = "Individual", dam = "Mother", sire = "Father", keep_names = TRUE) head(darwin)
data(darwin) darwin <- ped_rename(darwin, id = "Individual", dam = "Mother", sire = "Father", keep_names = TRUE) head(darwin)
Individuals can be sorted according to the pedigree structure, without need of birth dates.
In the sorted pedigree, descendants will always be placed in rows with higher index number
than that of their ancestors. This way, individuals born first will tend to be in the top
of the pedigree. Younger individuals, and individuals with no descendants will tend to be
placed at the bottom.
This function uses the sorting algorithm developed by Zhang et al (2009).
After sorting, individuals will be renamed from 1 to N using ped_rename
.
ped_sort(ped, id = "id", dam = "dam", sire = "sire", keep_names = FALSE)
ped_sort(ped, id = "id", dam = "dam", sire = "sire", keep_names = FALSE)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
id |
A string naming the column with individual identities. It will be renamed to its default value 'id'. |
dam |
A string naming the column with maternal identities. It will be renamed to its default value 'dam'. |
sire |
A string naming the column with paternal identities. It will be renamed to its default value 'sire'. |
keep_names |
A boolean value indicating whether the original identity values should be kept on a separate column (named 'names'), or not. |
A sorted pedigree dataframe (with ancestors on top of descendants).
Zhang Z, Li C, Todhunter RJ, Lust G, Goonewardene L, Wang Z. 2009. An algorithm to sort complex pedigrees chronologically without birthdates. J Anim Vet Adv. 8 (1): 177-182.
data(darwin) # Here we reshuffle rows in the pedigree. It won't be usable for other functions in the package darwin <- darwin[sample(1:nrow(darwin)), ] # Below, we sort the pedigree again. The order might not be the same as before. # But ancestors will always be placed on top of descendants, # making the pedigree usable for other functions in the package. darwin <- ped_sort(darwin, id = "Individual", dam = "Mother", sire = "Father", keep_names = TRUE)
data(darwin) # Here we reshuffle rows in the pedigree. It won't be usable for other functions in the package darwin <- darwin[sample(1:nrow(darwin)), ] # Below, we sort the pedigree again. The order might not be the same as before. # But ancestors will always be placed on top of descendants, # making the pedigree usable for other functions in the package. darwin <- ped_sort(darwin, id = "Individual", dam = "Mother", sire = "Father", keep_names = TRUE)
Recursive function that computes steps for sorting algorithm described by Zhang et al (2009).
sort_step(p, id, dam, sire, t, S, G, t_G)
sort_step(p, id, dam, sire, t, S, G, t_G)
p |
Pedigree to sort (used as template) |
id |
A string naming the column with individual identities. It will be renamed to its default value 'id'. |
dam |
A string naming the column with maternal identities. It will be renamed to its default value 'dam'. |
sire |
A string naming the column with paternal identities. It will be renamed to its default value 'sire'. |
t |
Template for the new sorted pedigree |
S |
Vector of assumed parent individuals |
G |
Vector of generation numbers (0 identifies the youngest) |
t_G |
Vector G for the new sorted pedigree |
No return value. Will print an error message if checking fail.
Filled template for the sorted pedigree. Once recursion ends, it returns the sorted pedigree
Zhang Z, Li C, Todhunter RJ, Lust G, Goonewardene L, Wang Z. 2009. An algorithm to sort complex pedigrees chronologically without birthdates. J Anim Vet Adv. 8 (1): 177-182.
Computes the deviation from Hardy-Weinberg equilibrium following Caballero and Toro (2000).
pop_hwd(ped, reference = NULL)
pop_hwd(ped, reference = NULL)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
reference |
A string naming a column indicating whether individuals belong to the reference population or not. Column must be boolean or coercible to boolean type. |
A numeric value indicating the deviation from Hardy-Weinberg equilibrium.
Caballero A, Toro M. 2000. Interrelations between effective population size and other pedigree tools for the management of conserved populations. Genet. Res. 75: 331-343.
data(atlas) pop_hwd(dama)
data(atlas) pop_hwd(dama)
Estimate the total and effective number of founders and ancestors in a pedigree, as well as the number of founder genome equivalents (see details on these parameters below). Note that a reference population (RP) must be defined, so that founders and ancestors are referred to the set of individuals belonging to that RP. This is set by means of a boolean vector passed as argument.
pop_Nancestors(ped, reference, nboot = 10000L, seed = NULL, skip_Ng = FALSE)
pop_Nancestors(ped, reference, nboot = 10000L, seed = NULL, skip_Ng = FALSE)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
reference |
A string naming a column indicating whether individuals belong to the reference population or not. Column must be boolean or coercible to boolean type. |
nboot |
Number of bootstrap iterations (for computing Ng). |
seed |
Sets a seed for the random number generator. |
skip_Ng |
Skip Ng computation or not (FALSE by default). |
The total number of founders (Nf) and ancestors (Na) are calculated simply as the count of founders and ancestors of individuals belonging to the reference population (RP). Founders here are defined as individuals with both parentals unknown.
The effective number of founders (Nfe) is the number of equally contributing founders, that would account for observed genetic diversity in the RP, while the effective number of ancestors (Nae) is defined as the minimum number of ancestors, founders or not, required to account for the genetic diversity observed in the RP. These parameters are computed from the probability of gene origin, following methods in Tahmoorespur and Sheikhloo (2011).
While the previous parameters account for diversity loss due to bottlenecks at the level of founders or ancestors, other sources of random loss of alleles (such as drift) can be accounted by means of the number of founder genome equivalents (Ng, Caballero and Toro 2000). This parameter is estimated via Monte Carlo simulation of allele segregation, following Boichard et al. (1997).
A dataframe containing population size estimates for founders and ancestors:
Nr - Total number of individuals in the RP
Nf - Total number of founders
Nfe - Effective number of founders
Na - Total number of ancestors
Nae - Effective number of ancestors
Ng - Number of founder genome equivalents
se_Ng - Standard error of Ng
If some of the auxiliary functions is used (e.g. pop_Nr), only the corresponding numerical estimate will be returned. In the case of pop_Ng, a list object is returned, with the number of founder genome equivalents (Ng) and its standard error (se_Ng).
Boichard D, Maignel L, Verrier E. 1997. The value of using probabilities of gene origin to measure genetic variability in a population. Genet. Sel. Evol. 29: 5-23.
Caballero A, Toro M. 2000. Interrelations between effective population size and other pedigree tools for the management of conserved populations. Genet. Res. 75: 331-343.
Tahmoorespur M, Sheikhloo M. 2011. Pedigree analysis of the closed nucleus of Iranian Baluchi sheep. Small Rumin. Res. 99: 1-6.
data(arrui) pop_Nancestors(arrui, reference = "target", skip_Ng = TRUE)
data(arrui) pop_Nancestors(arrui, reference = "target", skip_Ng = TRUE)
Estimate the effective population size (Ne). This is computed from the increase in individual inbreeding, following the method described by Gutiérrez et al (2008, 2009).
pop_Ne(ped, Fcol, tcol)
pop_Ne(ped, Fcol, tcol)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
Fcol |
Name of column with inbreeding coefficient values. |
tcol |
Name of column with generation numbers. |
A list with the effective population size (Ne) and its standard error (se_Ne).
Gutiérrez JP, Cervantes I, Molina A, Valera M, Goyache F. 2008. Individual increase in inbreeding allows estimating effective sizes from pedigrees. Genet. Sel. Evol. 40: 359-378.
Gutiérrez JP, Cervantes I, Goyache F. 2009. Improving the estimation of realized effective population sizes in farm animals. J. Anim. Breed. Genet. 126: 327-332.
data(atlas) atlas <- ip_F(atlas) # compute inbreeding, appending column "F" atlas <- pop_t(atlas) # compute generations, appending column "t" pop_Ne(atlas, Fcol = "Fi", tcol = "t")
data(atlas) atlas <- ip_F(atlas) # compute inbreeding, appending column "F" atlas <- pop_t(atlas) # compute generations, appending column "t" pop_Ne(atlas, Fcol = "Fi", tcol = "t")
Computes the number of equivalent complete generations (t), as defined by Boichard et al (1997).
pop_t(ped, name_to = "t")
pop_t(ped, name_to = "t")
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
name_to |
A string naming the new output column. |
The input dataframe, plus an additional column corresponding to the number of equivalent complete generations of every individual (named "t" by default).
Boichard D, Maignel L, Verrier E. 1997. The value of using probabilities of gene origin to measure genetic variability in a population. Genet. Sel. Evol., 29: 5-23.
data(dama) dama <- pop_t(dama) tail(dama)
data(dama) dama <- pop_t(dama) tail(dama)
The purgeR package includes functions for the computation of parameters related to inbreeding and genetic purging in pedigreed populations, including standard, ancestral and purged inbreeding coefficients, among other measures of inbreeding and purging. In addition, functions to compute the effective population size and other parameters relevant to population genetics and structure are included.
A complete user's guide with examples is provided as vignettes, introducing functions in this package and providing examples of use. Navigate these vignettes from R with:
browseVignettes("purgeR")
There are currently two vignettes available:
purgeR-tutorial: A complete overview of all functions in the package, including easy to follow examples.
ip: A more advanced guide showing examples of inbreeding purging analyses.
Preprocessing
ped_rename
: Rename individuals in a pedigree from 1 to N
ped_sort
: Sort individuals (with ancestors on top of descendants)
ped_clean
: Remove individuals not used for purging analyses
ped_maternal
: Maternal effects
ped_graph
: Input for igraph
Inbreeding and purging
ip_F
: Inbreeding coefficient
ip_Fa
: Ancestral inbreeding coefficient
ip_Fij
: Partial inbreeding coefficient
ip_g
: Purged inbreeding coefficient
ip_op
: Opportunity of purging
exp_F
: Expected inbreeding coefficient
exp_Fa
: Expected ancestral inbreeding coefficient
exp_g
: Expected purged inbreeding coefficient
Population parameters
pop_hwd
: Deviation from Hardy-Weinberg equilibrium
pop_t
: Number of equivalent complete generations
pop_Ne
: Effective population size
pop_Nancestors
: Population founders and ancestors
pop_Na
: Total number of ancestors
pop_Nae
: Effective number of ancestors
pop_Nf
: Total number of founders
pop_Nfe
: Effective number of founders
pop_Ng
: Number of founder genome equivalents
Fitness
w_grandoffspring
: Grandoffspring
w_offspring
: Offspring
w_reproductive_value
: Reproductive value
Eugenio López-Cortegano <[email protected]> (ORCID)
López-Cortegano E. 2022. purgeR: Inbreeding and purging in pedigreed populations. Bioinformatics, https://doi.org/10.1093/bioinformatics/btab599.
Source code is available via the GitLab repository at https://gitlab.com/elcortegano/purgeR/. Users are encouraged to report bugs, request features, and contribute code to this project.
Some users might find useful the C++ software PURGd, which computes inbreeding-purging parameters and follow-up statistical analyses: https://gitlab.com/elcortegano/PURGd/.
Computes the reproductive value
reproductive_value( ped, reference, name_to, target = NULL, enable_correction = TRUE )
reproductive_value( ped, reference, name_to, target = NULL, enable_correction = TRUE )
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
reference |
A string naming a column indicating whether individuals belong to the reference population or not. Column must be boolean or coercible to boolean type. |
name_to |
A string naming the new output column. |
target |
A string naming a column indicating whether individuals belong to the target population or not. Column must be boolean or coercible to boolean type. By default, all descendants of the reference population are used. |
enable_correction |
Correct reproductive values. |
The input dataframe, plus an additional column with reproductive values for the reference and target populations assumed.
Hunter DC et al. 2019. Pedigree-based estimation of reproductive value. Journal of Heredity 110 (4): 433-444
Given two alleles (one from dam, the other from sire), it samples one at random.
dam_al |
Dam allele. |
sire_al |
Sire allele. |
The sampled allele.
Recursive function that gathers all founders and ancestors for a given individual
dam |
Vector of dams. |
sire |
Vector of sires. |
i |
Reference individual (its index, not id). |
fnd |
Vector of founders (to be returned as reference). |
anc |
Vector of ancestors (to be returned as reference). |
The sampled allele.
Counts the number of grandoffspring for individuals in the pedigree.
w_grandoffspring(ped, name_to)
w_grandoffspring(ped, name_to)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
name_to |
A string naming the new output column. |
The input dataframe, plus an additional column indicating the total number of grandoffspring.
data(arrui) dama <- w_grandoffspring(arrui, name_to = "G") head(arrui)
data(arrui) dama <- w_grandoffspring(arrui, name_to = "G") head(arrui)
Counts the number of offspring for individuals in the pedigree.
w_offspring(ped, name_to, dam_offspring = TRUE, sire_offspring = TRUE)
w_offspring(ped, name_to, dam_offspring = TRUE, sire_offspring = TRUE)
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
name_to |
A string naming the new output column. |
dam_offspring |
Compute dam offspring (TRUE by default). |
sire_offspring |
Compute sire offspring (TRUE by default). |
The input dataframe, plus an additional column indicating the total number of offspring.
data(arrui) dama <- w_offspring(arrui, name_to = "P") head(arrui)
data(arrui) dama <- w_offspring(arrui, name_to = "P") head(arrui)
Computes the reproductive value following the method by Hunter et al. (2019). This is a measure of how well a gene originated in a set of 'reference' individuals is represented in a different set of 'target' individuals. By default, fitness is computed for individuals in the reference population, using all of their descendants as target. A generation-wise mode can also be enabled, to compute fitness contributions consecutively from one generation to the next.
w_reproductive_value( ped, reference, name_to, target = NULL, enable_correction = TRUE, generation_wise = FALSE )
w_reproductive_value( ped, reference, name_to, target = NULL, enable_correction = TRUE, generation_wise = FALSE )
ped |
A dataframe containing the pedigree. Individual (id), maternal (dam), and paternal (sire) identities are mandatory columns. |
reference |
A string naming a column indicating whether individuals belong to the reference population or not. Column must be boolean or coercible to boolean type. |
name_to |
A string naming the new output column. |
target |
A string naming a column indicating whether individuals belong to the target population or not. Column must be boolean or coercible to boolean type. By default, all descendants of the reference population are used. |
enable_correction |
Correct reproductive values (enabled by default). |
generation_wise |
Assume that the reference population is a vector of integers indicating generation numbers. Reproductive values will be computed generation by generation independently (except for the last one). |
A reference population must be defined, which represents a set of individuals whose reproductive value is to be calculated. By default, genetic contributions to the rest of individuals in the pedigree is assumed, but a target population can also be defined, restricting the set of individuals accounted when computing the reproductive value. This could represent for example a cohort of alive individuals.
The input dataframe, plus an additional column with reproductive values for the reference and target populations assumed.
Hunter DC et al. 2019. Pedigree-based estimation of reproductive value. Journal of Heredity 10(4): 433-444.
library(dplyr) library(magrittr) # Pedigree used in Hunter et al. (2019) id <- c("A1", "A2", "A3", "A4", "A5", "A6", "B1", "B2", "B3", "B4", "C1", "C2", "C3", "C4") dam <- c("0", "0", "0", "0", "0", "0", "A2", "A2", "A2", "A4", "B2", "B2", "A4", "A6") sire <- c("0", "0", "0", "0", "0", "0", "A1", "A1", "A1", "A5", "B1", "B3", "B3", "A5") t <- c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2) ped <- tibble::tibble(id, dam, sire, t) ped <- purgeR::ped_rename(ped, keep_names = TRUE) %>% dplyr::mutate(reference = ifelse(t == 1, TRUE, FALSE)) purgeR::w_reproductive_value(ped, reference = "reference", name_to = "R")
library(dplyr) library(magrittr) # Pedigree used in Hunter et al. (2019) id <- c("A1", "A2", "A3", "A4", "A5", "A6", "B1", "B2", "B3", "B4", "C1", "C2", "C3", "C4") dam <- c("0", "0", "0", "0", "0", "0", "A2", "A2", "A2", "A4", "B2", "B2", "A4", "A6") sire <- c("0", "0", "0", "0", "0", "0", "A1", "A1", "A1", "A5", "B1", "B3", "B3", "A5") t <- c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2) ped <- tibble::tibble(id, dam, sire, t) ped <- purgeR::ped_rename(ped, keep_names = TRUE) %>% dplyr::mutate(reference = ifelse(t == 1, TRUE, FALSE)) purgeR::w_reproductive_value(ped, reference = "reference", name_to = "R")