purgeR is a package for the estimation of inbreeding-purging genetic parameters in pedigreed populations. These parameters include the inbreeding coefficient (F), partial (Fi(j)), ancestral (Fa) and purged (g) inbreeding coefficients, as well as the total and expressed opportunity of purging (O and Oe, respectively). Only genealogical records are required to estimate them, and all individual estimates will be stored in a dataframe. Thus, purgeR provides the raw material for subsequent analysis on inbreeding depression and genetic purging (see the ‘ip’ vignette for more detailed examples on this).
In addition, functions are also included for the pre-processing of pedigrees, and for the analysis of population diversity (e.g. effective population size), and the inference of time (e.g. number of equivalent to complete generations), bottlenecks (e.g. effective number of founders and ancestors, and founder genome equivalents) and fitness (e.g. breeding success and reproductive value), among others. All these functions are helpful to contextualize the demographic circumstances under which inbreeding and purging occur, as well as their consequences.
The next sections give a practical introduction to all functions
contained in the purgeR package. The tidyverse R dialect is used
throughout the tutorial, including the pipe operator
%>%
. Users unfamiliar with it are encouraged to read the
introductory book R for Data
Science.
Most functions contain a mandatory argument ‘ped’, that will be used to input a dataframe with pedigree information. Pedigree dataframes need to follow some rules:
To facilitate the usage of pedigrees and improve reproducibility, most functions will in addition require that columns for individuals, mother and fathers’ identities are named ‘id’, ‘dam’, and ‘sire’ respectively, all of type integer, with unknown parents named ‘0’. Individuals should in addition be named in order, from 1 to N.
There is no restriction to the addition of more columns, e.g. containing measures of individual genetic or environmental factors.
Example pedigrees in this package are given already sorted, which
means that ancestors are always placed on top of descendants. This is a
requirement for all functions in the package, except for
ped_sort
, which is a function dedicated to sort individuals
following Zhang et al. (2009) algorithm. See ?ped_sort
for
an example of use.
The function ped_rename
is the most important
pre-processing function in the package, and it will make sure that all
input requirements are met, while making the changes needed for the
remaining functions to work properly. Consider the example below using
the pedigree of the Darwin/Wedgwood family:
Individual | Mother | Father |
---|---|---|
William Darwin I | Unknown | unknown |
Mary Healey | unk | UNK |
Gilbert Wedgwood | 0 | NA |
Margaret Burslem | 0 | 0 |
Anne Earle | 0 | 0 |
William Darwin II | Mary Healey | William Darwin I |
After using ped_rename
, the pedigree is checked, and
individuals are renamed in a proper format:
darwin <- purgeR::ped_rename(
ped = darwin,
id = "Individual",
dam = "Mother",
sire = "Father",
keep_names = TRUE
)
pander::pandoc.table(head(darwin))
id | dam | sire | names |
---|---|---|---|
1 | 0 | 0 | William Darwin I |
2 | 0 | 0 | Mary Healey |
3 | 0 | 0 | Gilbert Wedgwood |
4 | 0 | 0 | Margaret Burslem |
5 | 0 | 0 | Anne Earle |
6 | 2 | 1 | William Darwin II |
Note that in the example only the first 6 rows are shown. In the
renamed dataframe, Charles R. Darwin will appear with id = 52. Note as
well the use of the option keep_names = TRUE
. This will
store the original individual identities on a separate column
‘names’.
Downstream analyses may require at least one additional variable
(column) containing some measurement of biological fitness (or any other
value), meaning that individuals with no data available (i.e. with NA
value) can be filtered out, as long as they are not ancestors of any
other individual with available data. This is the job of the function
ped_clean
, that will reduce the size of the pedigree, and
may improve the performance of inbreeding/purging functions in large
pedigrees.
Taking as example the Dama gazelle pedigree (1316 individuals),
ped_clean
will reduce the pedigree size to 1176 individuals
for the analyses of 15-days survival, and to 389 only when analyzing
female productivity.
data(dama)
dama %>% nrow()
#> [1] 1316
dama %>%
purgeR::ped_clean(value_from = "survival15") %>%
nrow()
#> [1] 1025
dama %>%
purgeR::ped_clean(value_from = "prod") %>%
nrow()
#> [1] 375
Note that ped_clean
will require a renamed input
pedigree. After its filtering step, it will automatically rename again
the output pedigree.
Several measures of inbreeding and purging can be computed, based on
the probability of allele identity by descent of individuals of the
pedigree. All functions related to inbreeding and purging are prefixed
with ip_
.
The inbreeding coefficient (F, Wright 1922), here also referred to as standard inbreeding, is defined as the probability that an individual inherits two alleles derived from the same ancestor (i.e. identical by descent, IBD). In pedigreed populations, this can be calculated for an individual i as the kinship coefficient of its parents j and k (Fi = fj, k), which can be calculated as:
$$f_{j,k {} (j=k)}=\frac{1}{2}(1+F_{j})$$
$$f_{j,k { } (j\neq k)}=\frac{1}{2}(f_{j,k_{d}}+f_{j,k_{s}})$$ Where kd and ks refer to k’s dam and sire (see Falconer & Mackay 1996).
The function ip_F
computes the inbreeding coefficient,
given an input pedigree. Note that the value of F will be saved in a new column of
the dataframe, as it is usually convenient to save it this way to
simplify the computation of further inbreeding and purging parameters,
as well as for later analyses.
The example below shows the inbreeding coefficient of William E. Darwin (son of Charles R. Darwin and Emma Wedgwood).
darwin <- darwin %>% purgeR::ip_F()
darwin %>% dplyr::filter(names == "William Erasmus Darwin")
#> id dam sire names Fi
#> 1 60 54 52 William Erasmus Darwin 0.06298828
F can also be estimated based on population estimates of the effective population size Ne and generation numbers, using the classical expression (Falconer and Mackay 1996):
$$F_{t} = 1 -
(1-\frac{1}{2N})^{t}$$ This can be achieved with the function
exp_F
(e.g. exp_F (Ne = 50, t = 50)
).
As mentioned above, IBD happens when alleles are inherited from the same ancestor and appear in homozygosis. Thus, Fi can be partitioned as the additive contribution of its ancestors to Fi. The partial inbreeding coefficient Fi(j) is defined as i’s probability of IBD for alleles coming from ancestor j. It can be computed from partial kinship coefficients (fp1, p2(j), where p1 and p2 refer to i’s parents), so that Fi(j) = fp1, p2(j), using the tabular method as described by Gulisija & Crow (2007). Given an ancestor j:
The function ip_Fij
will return a matrix object with all
possible values of the partial inbreeding coefficient. In that matrix,
the value in row i and column
j indicates the probability of
IBD of individual i for
alleles coming from ancestor j. Values in the upper diagonal of
the matrix always take values of zero. Of course, the summation of Fi(j)
over every column j equals
Fi when
j are founder ancestors.
m <- ip_Fij(arrui, mode = "founders") # ancestors considered are founders (by default)
base::rowSums(m) # this equals ip_F(arrui) %>% .$Fi
By default, ip_Fij
only considers partial inbreeding
conditional to founders, but it can also be extended to any ancestor
using the mode = "all"
argument. A custom number of
individuals can also be used (see ?ip_Fij
). Note however
that for a large number of individuals, the computation of this matrix
may require a substantial amount of time. In every case, columns of the
returned matrix are sorted by ancestor identity use.
Figure below shows the contribution of the two founders in the Barbary sheep pedigree to inbreeding values F > 0.35.
arrui <- arrui %>% purgeR::ip_F()
tibble::tibble(founder1 = m[, 1], founder2 = m[, 2], Fi = plyr::round_any(arrui$Fi, 0.025)) %>%
tidyr::pivot_longer(cols = c(founder1, founder2), names_to = "Founder", values_to = "Fij") %>%
dplyr::group_by(Fi, Founder) %>%
dplyr::summarise(Fij = sum(Fij)) %>%
ggplot() +
geom_bar(aes(x = Fi, y = Fij, fill = Founder), stat = "identity", position = "fill") +
scale_x_continuous("Inbreeding coefficient (F)", limits = c(0.35, 0.625)) +
scale_y_continuous("Partial contribution to F (in %)", labels = scales::percent_format()) +
scale_fill_manual(values = c("darkgrey", "black")) +
theme(
panel.background = element_blank(),
legend.position = "bottom"
)
Alternatively, partial inbreeding can also be estimated via genedrop
simulation (using option genedrop
). This will however
result in less precise estimation of Fi(j),
and might only be convenient to use in terms of performance for very
large and complex pedigrees. In these cases, a value of
genedrop = 100
might give results that are well correlated
with exact estimates (r > 0.9 for the pedigree examples
provided).
The ancestral inbreeding coefficient (Fa, Ballou 1997) measures the probability of IBD of an individual for an allele that has been in homozygosity in at least one ancestor.
This parameter provides information not only about inbreeding, but can also be used to detect purging, since individuals with inbreeding F and ancestral inbreeding Fa are expected to be more fit than individuals with the same level of inbreeding but lower Fa, given that the ancestors of the former have survived and reproduced despite their higher inbreeding (see Boakes & Wang 2005 and López-Cortegano et al. 2018 for analyses using this parameter).
Ancestral inbreeding can be estimated for an individual i with dam d and sire s as:
$$F_{a_{i}} = \frac{1}{2}[F_{a_{d}} + (1-F_{a_{d}})F_{d} + F_{a_{s}} + (1-F_{a_{s}})F_{s}]$$ Alternatively, a gene-dropping simulation approach can be used, following Baumung et al. (2015), providing unbiased estimates of Fa. This is because, above expression assumes that F and Fa are uncorrelated, which is not true.
Both approaches can be used with the function ip_Fa
.
Note that the computation of Fa requires
estimating F in advance. Use
argument Fcol
to declare a column with F values if it has been computed and
saved in advance (this will save time), or leave it blank to compute it
on the go.
# F was pre-computed above
darwin %>%
purgeR::ip_Fa(Fcol = "Fi") %>%
dplyr::filter(names == "William Erasmus Darwin")
#> id dam sire names Fi Fa
#> 1 60 54 52 William Erasmus Darwin 0.06298828 0.001953125
# Compute F on the go (it won't be saved in the output)
# And enable genedropping
atlas %>%
purgeR::ip_Fa(genedrop = 1000, seed = 1234) %>%
dplyr::select(id, dam, sire, Fa) %>%
tail()
#> id dam sire Fa
#> 943 943 882 737 0.6475
#> 944 944 653 822 0.6000
#> 945 945 653 822 0.6000
#> 946 946 740 822 0.6050
#> 947 947 740 822 0.6075
#> 948 948 897 737 0.6340
Fa can also be estimated based on population estimates of Ne and generation numbers, using the expression from López-Cortegano et al. (2018):
$$F_{a(t)} = 1 -
(1-\frac{1}{2N})^{\frac{1}{2}t(t-1)}$$ This can be achieved with
the function exp_Fa
(e.g. exp_Fa (Ne = 50, t = 50)
).
The purged inbreeding coefficient (g) gives the probability of IBD for deleterious recessive alleles. The reduction of g when compared to standard inbreeding depends on the magnitude of a purging coefficient (d) that measures the strength of the effective deleterious recessive component of the genome (García-Dorado 2012), so that d = 0 implies F = g, and higher d (up to 0.5) means lower g in more inbred individuals. It can be calculated in pedigreed populations from the purged kinship coefficient (γ), in a similar way as standard inbreeding, following the methods described in García-Dorado (2012) and García-Dorado et al. (2016):
$$\gamma_{i,i} = \frac{1}{2}(1+g_{i})(1-2dF_{i})$$
$$\gamma_{i,j} = \frac{1}{2}(\gamma_{i,j_{d}}+\gamma_{i,j_{s}})(1-dF_{j})$$ Where jd and js are j’s mother and father respectively, and i is older than j.
The function ip_g
computes the purged inbreeding
coefficient, given a value of d. The choice of a proper value of
d can however be complex. A
separate vignette titled “Inbreeding and Purging Estimates” describes
methods to help computing the inbreeding load as well as the purging
coefficient.
atlas %>%
ip_F() %>%
ip_g(d = 0.48, Fcol = "Fi") %>%
dplyr::select(id, dam, sire, Fi, tidyselect::starts_with("g")) %>%
tail()
#> id dam sire Fi g0.48
#> 943 943 882 737 0.2350380 0.06066578
#> 944 944 653 822 0.2452226 0.08464775
#> 945 945 653 822 0.2452226 0.08464775
#> 946 946 740 822 0.2409467 0.07522799
#> 947 947 740 822 0.2409467 0.07522799
#> 948 948 897 737 0.2345642 0.06343757
g can also be estimated based on population estimates of Ne and generation numbers, given a value of d, using the expression from García-Dorado (2012):
$$g_{t} = [(1-\frac{1}{2N})g_{t-1}+\frac{1}{2N}](1-2dF_{t-1})$$
This can be achieved with the function exp_g
(e.g. exp_g (Ne = 50, t = 50, d = 0.2)
).
This is the last of functions related to inbreeding coefficients. Plotting together expected values of F, Fa and g (assuming Ne = 25 and an intermediate value d = 0.25), the differences between the three coefficients become apparent.
data.frame(t = 0:50) %>%
dplyr::rowwise() %>%
dplyr::mutate(Fi = exp_F(Ne = 50, t),
Fa = exp_Fa(Ne = 50, t),
g = exp_g(Ne = 50, t, d = 0.25)) %>%
tidyr::pivot_longer(cols = c(Fi, Fa, g), names_to = "Type", values_to = "Inbreeding") %>%
ggplot(aes(x = t, y = Inbreeding, color = Type)) +
geom_line(size = 2) +
scale_x_continuous("Generations (t)") +
theme(legend.position = "bottom")
Whereas previous purging methods focus on inbreeding measurements, opportunity of purging parameters calculate the potential reduction in individual inbreeding load, as a consequence of it having inbred ancestors (Gulisija and Crow 2007). The (total) opportunity of purging for an individual i (Oi) can be computed as:
Oi = ∑j∑k(1/2)n − 1Fj Where j is every inbred ancestor of i, k is every path from i to j, and n is the number of individuals in the path (including i and j).
The expressed opportunity of purging depends on the probability of a given allele to be transmitted from an inbred ancestor j to i, and thus on Fi(j). This is measured by the expressed opportunity of purging (Oe) as:
Oei = ∑j2Fi(j)Fj
In complex pedigrees (involving more than one inbred ancestor per path),
these measures need to be corrected to discount the probability of
purging measured in a close ancestor from that already calculated in a
more distant ancestor. The function ip_op(complex=TRUE)
does not perform Gulisija and Crow (2007) corrections, but instead
applies an heuristic approach, only accounting for close ancestors j, and ignoring contributions from
far ancestors k such that
Fj(k) > 0.
The function ip_op
can be used as:
arrui %>%
ip_op(Fcol = "Fi") %>%
dplyr::filter(target == 1) %>%
tidyr::pivot_longer(cols = c(Oe, Oe_raw)) %>%
ggplot() +
geom_point(aes(x = Fi, y = (value), fill = name), pch = 21, size = 3, alpha = 0.5) +
scale_y_continuous(expression(paste("Expressed opportunity of purging (", O[e], ")", sep=""))) +
scale_x_continuous("Inbreeding coefficient (F)") +
scale_fill_discrete("")
#> Computing partial kinship matrix. This may take a while.
The plot shows the increase of the expressed opportunity of purging with the inbreeding coefficient for individuals in the reference population (last two cohorts). For individuals with the lowest inbreeding values, the corrected (Oe) and uncorrected (Oe_raw) have the same value, but as time progresses Oe_raw becomes larger than Oe. Both values are useful when determining potential reduction in the individual inbreeding load (see more exhaustive examples in López-Cortegano 2022, and the ‘Inbreeding and Purging Estimates’ vignette).
The package purgeR is mainly focused on estimating inbreeding and
purging parameters, but accessory functions are included to compute
other population parameters that might be useful when interpreting
inbreeding and purging results. All functions for computing population
parameters are prefixed with pop_
.
The effective population size (Ne) can be computed from the individual increase in inbreeding (ΔF) as defined by Gutiérrez et al. (2008, 2009):
$$N_{e} = \frac{1}{2\Delta F}$$ Where the individual ΔF can be computed as:
$$\Delta F_{i} = 1 - \sqrt[t_{i}-1]{1-F_{i}}$$ Being Fi individual’s i coefficient of inbreeding, and ti the generation number it belongs to.
The previous expression can be averaged to obtain ΔF and used to estimate
Ne. Thus,
all that is needed to compute Ne in a pedigree
is the individual values of inbreeding and generation time. The function
pop_Ne
will read a pedigree file and calculate Ne using
accessory columns containing inbreeding and time information (named ‘F’
and ‘t’ here). Note that the generation number is estimated here with
pop_t
as the number of equivalents to complete generations
(see below).
atlas %>%
purgeR::ip_F() %>%
purgeR::pop_t() %>%
purgeR::pop_Ne(Fcol = "Fi", tcol = "t")
#> $Ne
#> [1] 8.184803
#>
#> $se_Ne
#> [1] 0.2498695
However, we must warn caution when estimating Ne this way.
Note that the previous estimate includes all
individuals in the pedigree, but only the most recent individuals should
be used, as they already account for the inbreeding in their ancestors.
In the following data set, the column target
indicates the
individuals that belonged to the reference population used to estimate
Ne (see
details in López-Cortegano et al. 2021). Thus, Ne should be
estimated in this case as:
atlas %>%
purgeR::ip_F() %>%
purgeR::pop_t() %>%
dplyr::filter(target == 1) %>%
purgeR::pop_Ne(Fcol = "Fi", tcol = "t")
#> $Ne
#> [1] 14.01041
#>
#> $se_Ne
#> [1] 0.1707653
It is worth mentioning that this method to estimate Ne is of course equivalent to that using the classical formula $F=1-(1-\frac{1}{2N_{e}})^t$.
Generation times are easily computed for populations with discrete
generations, but overlapping generations are the rule in most real world
populations, and methods are required to estimate generation times in
such circumstances. The function pop_t
computes the number
of equivalents to complete generations (teq)
following Boichard et al (1997). This is calculated for an individual
i as:
$$t_{eq} = \sum^{J}_{j=1} (\frac{1}{2})^{n}$$ Where the sum is over all known ancestors, and n is the number of discrete generations that separate individual i from its ancestor j.
Of course, in populations with discrete generations, teq = t, and in those with overlapping generations the estimates of teq strongly correlates with time. Consider as an example the plot below showing the increase of teq with the year of birth (yob) of Gazella cuvieri:
atlas %>%
purgeR::pop_t() %>%
dplyr::mutate(t = plyr::round_any(t, 0.5)) %>%
ggplot() +
geom_boxplot(aes(x = yob, y = t, group = yob)) +
scale_y_continuous(expression(t[eq]))
Not only is the correlation strong, but the total number of generations estimated (~10) matches the expectation given the total number of years of management (45) and the mean age for breeding females (4.31 years, Moreno and Espeso 2008).
Functions are also included to compute the total and effective number of founders and ancestors, as well as the number of founder genomes equivalents (Ng). These parameters can provide information on early population bottlenecks due to unbalanced founder or ancestor contributions, as well as drift. Their estimation is based on probability of gene origin computations, following Boichard et al (1997), but Caballero and Toro (2000) and Tahmoorespur and Sheikhloo (2011) are also recommended lectures in this regard. All these parameters are referred to a reference population (RP) of interest that must be defined, e.g. it could be the latest cohort, or even the entire population.
The total number of founders (Nf) is calculated simply as the count of founders of the RP, while the effective number of founders (Nfe) is the number of equally contributing founders that account for the observed genetic diversity in the RP. Founders are defined as individuals with not known parents (i.e. dam = 0 and sire = 0).
The total number of ancestors (Na) is the count of all ancestors that contribute descendants to the RP, founders or not, while the effective number of ancestors (Nae) is calculated as the minimum number of ancestors, founders or not, required to account for the genetic diversity observed in the RP.
The number of founder genome equivalents (Ng) is defined in a similar way as (Nfe), but its estimated via Monte Carlo simulation of allele segregation, effectively accounting not only for reductions in genetic diversity as consequence of bottlenecks in founders or ancestors contributions to the descent, but also to random sources of diversity loss, such as drift (Boichard et al 1997, Caballero and Toro 2000).
Thus, Nae is always smaller than Nf, and their ratio can inform on the diversity loss due to bottlenecks between the base population and the RP (Tahmoorespur and Sheikhloo 2011). On the other hand, Ng is always the smallest parameter among these, since it accounts not only for diversity loss due to unbalanced founder or ancestor contributions, but also to genetic drift.
The function pop_Nancestors
computes all these
parameters, and returns them in a dataframe:
list("A. lervia" = arrui,
"G. cuvieri" = atlas,
"G. dorcas" = dorcas,
"N. dama" = dama) %>%
purrr::map_dfr(~ pop_Nancestors(., reference = "target", seed = 1234), .id = "Species")
#> Species Nr Nf Nfe Na Nae Ng se_Ng
#> 1 A. lervia 80 2 1.769424 63 1.769424 1.034102 0.2468827
#> 2 G. cuvieri 176 4 3.583086 249 3.583086 2.002087 0.4601456
#> 3 G. dorcas 283 20 13.386468 400 12.990604 5.817682 1.0154111
#> 4 N. dama 251 4 2.614750 349 2.614750 1.868039 0.4506204
Convenience functions are also available, named after the parameters
they estimate. For example, the function pop_Ng
will just
estimate the number of founder genome equivalents, and return that value
as a numeric value. Similarly, pop_Nae
will only estimate
the effective number of ancestors, and so on. See more examples in
?pop_Nancestors
.
In some cases it might be of interest to measure the degree of non-random mating in the population. This is given by deviation from Hardy-Weinberg equilibrium (α, Caballero and Toro 2000), that can be calculated as:
$$\alpha = \frac{F-f}{1.0-f}$$ Where F is the mean inbreeding coefficient of the population, and f the mean coancestry coefficient.
The function pop_hwd
allows to estimate the previous
coefficient, for the entire population, or preferably for a RP:
Note that in the example above α is negative, as usually attributed to populations undergoing management. A value of zero would indicate random mating, and a positive one assortative mating among relatives.
Purging analyses may benefit from an interpretation in terms of fitness change. Fitness measurements are expected to be provided by the users, and could be for example ‘early survival’, or any other trait known to be related to fitness in the studied species. A small set of functions is given however to help users to infer fitness measurements from the pedigree structure itself.
We warn, however, to make use of these with caution, as they might not always reflect true fitness. First, because measures of fitness based on contributions to the offspring (usually named as ‘breeding success’ or ‘productivity’) are limited to individuals present in the pedigree, and that information could be incomplete; Second, if the population is under active management, offspring contributions may not represent actual biological fitness; Third, ‘reproductive values’ give expectations based on additive genetic relatedness and do not account for selective effects. Thus, they may be unappropriated for downstream analysis considering purging effects; Finally, younger individuals in the pedigree might have lower fitness than older ones, because they haven’t had time to generate offspring!
Fitness functions are prefixed with w_
. A first measure
of fitness given by the pedigree is individual breeding success,
measured as the number of offspring present in the pedigree. The
function w_offspring
can be used for this:
# Maximum overall breeding success
arrui %>%
purgeR::w_offspring(name_to = "P") %>%
.$P %>%
max()
#> [1] 69
# Maximum female breeding success
arrui %>%
purgeR::w_offspring(name_to = "P", sire_offspring = FALSE) %>%
.$P %>%
max()
#> [1] 21
Similarly, the number of grandoffspring can also be used as a proxy
for fitness, with the function w_grandoffspring
:
# Maximum overall grandoffspring productivity
arrui %>%
purgeR::w_grandoffspring(name_to = "GP") %>%
.$GP %>%
max()
#> [1] 198
Finally, fitness can also be estimated as ‘reproductive values’,
following the method developed by Hunter et al. (2019). Under this
model, fitness is based on how well a gene originated in a set of
reference individuals is represented in their descendants. We do not go
into the details of the algorithm used in this model, but it allows to
correct genetic contributions by changes in population size and
migration (which is used by default). This method can be used with the
function w_reproductive_value
:
Sometimes it can be useful to assign a given individual i its maternal inbreeding
coefficient, or any other value, for example to evaluate maternal
effects. The function ped_maternal
will read one of the
columns present in the pedigree data frame, and assign to every
individual the value observed in their mothers (or fathers if
use_dam = FALSE
option is used). For individuals with
unknown parents, NA values will be returned by default, but this can be
overridden with the option set_na
.
Some users may be interested in analyse or visualize pedigrees using
methods designed for graphs. The igraph R package is a popular and
powerful tool designed to facilitate the analysis and visualization of
complex networks and graphs. The function ped_graph
provides a way to easily convert pedigrees in the format user by purgeR
to the dataframes with edges and vertices that igraph requires to create
“igraph” objects.
library("igraph")
#>
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:purrr':
#>
#> compose, simplify
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
atlas_VE <- purgeR::ped_graph(purgeR::atlas) # we use :: on atlas because igraph has a function named atlas
G_atlas <- igraph::graph_from_data_frame(d = atlas_VE$edges, vertices = atlas_VE$vertices, directed = TRUE)
Both igraph and ggraph R packages provide was to visualize networks (and pedigrees!). Check the example below making use of a hierarchical circlepack visualization to show the substantial differences in pedigree structure between atlas and dorcas gazelles.
library("ggraph")
set.seed(1234)
atlas_VE <- purgeR::atlas %>% purgeR::pop_t() %>% purgeR::ped_graph()
G_atlas <- igraph::graph_from_data_frame(d = atlas_VE$edges, vertices = atlas_VE$vertices, directed = TRUE)
ggraph(G_atlas, layout = 'dendrogram', circular = TRUE) +
geom_edge_diagonal(colour="#222222", alpha = 0.05) +
geom_node_point(alpha = 0.5, size = 0.1, pch = 1) +
theme(panel.background = element_blank())
Of course there are other ways of representing pedigrees. For more
traditional ways of ploting pedigrees, check the kinship2
R
package.