The aim of optimum contribution selection (OCS) is to find the optimum number of offspring for each breeding animal and to determine if a young animal (a selection candidate) should be selected for breeding or not. This is done in an optimal way, i.e. in a way that ensures that genetic gain is achieved, and that genetic diversity and genetic originality of the population are maintained or recovered. It can be based either on pedigree data or on marker data, whereby the latter approach is recommended. It requires that data is available from a sample of the population which includes the selection candidates.
OCS can be done for populations with overlapping and non-overlapping generations. For OCS with overlapping generations, the percentage which each age class represents in the population must be defined, and the data set should contain individuals from all age classes with non-zero contribution.
Even if the frequency of use of breeding animals is not regulated by the breeding organization, running the optimization still provides valuable information for a breeder, as the animals with highest optimum contributions are most valuable for a breeding program.
This vignette is organized as follows:
All evaluations using marker data are demonstrated at the example of cattle data included in the package. This multi-breed data has already been described in the companion vignette for basic marker-based evaluations.
Data frame Cattle
includes simulated phenotypic
information and has columns Indiv
(individual IDs),
Born
(years of birth), Breed
(breed names),
BV
(breeding values), Sex
(sexes), and
herd
(herds).
## Indiv Born Breed BV Sex herd
## Angler1 Angler1 1991 Angler -1.0706066 male <NA>
## Angler2 Angler2 1994 Angler -0.3362574 female 2
## Angler3 Angler3 1986 Angler -2.0735649 female 1
## Angler4 Angler4 1987 Angler 1.5968307 male <NA>
## Angler5 Angler5 1994 Angler 1.0023969 male <NA>
## Angler6 Angler6 1998 Angler -0.2426676 male <NA>
The data frame contains information on the 4 breeds Angler, Fleckvieh, Holstein, Rotbunt. The “Angler” is an endangered German cattle breed, which had been upgraded with Red Holstein (also called “Rotbunt”). The Rotbunt cattle are a subpopulation of the “Holstein” breed. The “Fleckvieh” or Simmental breed is unrelated to the Angler. The Angler cattle are the selection candidates.
This small example data set contains only genotypes from the first
parts of the first two chromosomes. Vector GTfiles
defined
below contains the names of the genotype files. There is one file for
each chromosome. Data frame map
contains the marker map and
has columns Name
(marker name), Chr
(chromosome number), Position
, Mb
(position in
Mega base pairs), and cM
(position in centiMorgan):
data(map)
dir <- system.file("extdata", package="optiSel")
GTfiles <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep=""))
head(map)
## Name Chr Position cM Mb
## ARS-BFGL-NGS-16466 ARS-BFGL-NGS-16466 1 267940 0 0.267940
## ARS-BFGL-NGS-98142 ARS-BFGL-NGS-98142 1 471078 0 0.471078
## ARS-BFGL-NGS-114208 ARS-BFGL-NGS-114208 1 533815 0 0.533815
## ARS-BFGL-NGS-65067 ARS-BFGL-NGS-65067 1 883895 0 0.883895
## ARS-BFGL-BAC-32722 ARS-BFGL-BAC-32722 1 929617 0 0.929617
## ARS-BFGL-BAC-34682 ARS-BFGL-BAC-34682 1 950841 0 0.950841
As an introductory example consider traditional OCS with marker based kinship matrices. All alternative approaches involve the same steps, so it is recommended to read this section even if you want to minimize inbreeding instead of maximizing genetic gain. The following steps are involved:
For populations with overlapping generations you need to define the percentage which each age class represents in the population. One possibility is to assume that the percentage represented by a class is proportional to the percentage of offspring that is not yet born. Moreover, males and females (excluding newborn individuals) should be equally represented. These percentages can be estimated with function agecont from pedigree data. Since pedigree data is not available for this data set, the percentages are entered by hand:
cont <- data.frame(
age = c( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
male = c(0.14, 0.14, 0.09, 0.04, 0.03, 0.03, 0.02, 0.02, 0.01, 0.01),
female= c(0.08, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.02, 0.02))
One age class spans one year and the individuals born in the current year are in age class 1. Note that in this example young male classes have higher percentages than young female classes because males were used for breeding at an earlier age. The generation interval in years is approximately
## [1] 4.910714
Define a data frame containing the phenotypes of a sample from the
population which includes the selection candidates. Make sure that there
is one column for each trait that should be improved. Logical column
isCandidate
is appended indicating the individuals
available as selection candidates.
Compute the kinships that are to be managed. Below, the kinship is
named sKin
, which is a shorthand for segment based
kinship.
## Using skip = 2
## Using cskip = 2
## Reading chromosome 1 ...M=400
## Reading chromosome 2 ...M=400
Create variable cand
with function candes,
containing all information required to describe the selection
candidates, which are the phenptypes and the kinships. The current
values of the parameters in the population and the available objective
functions and constraints for OCS are shown. Generations are defined to
be non-overlapping if argument cont
is omitted.
## The population is evaluated at time 2014
##
## Mean values of the parameters are: Value
## for trait 'BV' in Angler: -0.0530
## for kinship 'sKin' in Angler: 0.0552
##
## Available objective functions and constraints:
## for trait 'BV' in Angler: min.BV, max.BV, lb.BV, eq.BV, ub.BV
## for kinship 'sKin' in Angler: min.sKin, ub.sKin
##
## ub lb uniform
For numeric columns in data frame phen
the possibility
is provided to define an upper bound (prefix ub
), a lower
bound (prefix lb
), an equality constraint (prefix
eq
), or to minimize (prefix min
) or to
maximize (prefix max
) the weighted sum of the values,
whereby the weights are the contributions of the selection candidates.
If the column contains breeding values, then this is the expected mean
breeding value in the offspring.
For each kinship and native kinship included in the call of function
candes, the possibility is provided to define an upper bound for the
expected mean value of the kinship in the offspring (prefix
ub
), or to minimize the value (prefix
min
).
Constraints ub
and lb
allow to define upper
bounds and lower bounds for the contributions of the selection
candidates. Constraint uniform
allows to assume that
individuals belonging to specified groups have equal contributions.
Now choose the parameters you want to restrict and the parameters you
want to optimize. For traditional OCs the objective is to maximize
genetic gain with method max.BV
, and to restrict the mean
kinship in the offspring by defining constraint
ub.sKin
.
The upper bound for the mean kinship in the offspring should depend
on the current mean kinship in the population, which is included in
component mean
of cand
.
## BV sKin
## 1 -0.0529632 0.05515035
In general, if an upper bound for a kinship K should be defined, it is recommended to derive the threshold value from the desired effective size Ne of the population. If the OCS program started at time t0, then the upper bound for the mean kinship at time t should be $$ub.K=1-(1-\overline{K}_{t_0})\left(1-\frac{1}{2 N_e}\right)^{\frac{t-t_0}{L}},$$ where $\overline{K}_{t_0}$ is the mean kinship in the population at time t0, and L is the generation interval. The critical effective size, i. e. the size below which the fitness of the population steadily decreases, depends on the population and is usually between 50 and 100. But there seems to be a consensus that 50-100 is a long-term viable effective size. To be on the safe side, an effective size of Ne = 100 should be envisaged (T. H. E. Meuwissen 2009).
A list containing the constraints is defined below.
The component named ub.sKin
defines the upper bound for
the mean kinship sKin
in the population in the next year.
Component uniform = "female"
states that all females from a
particular age cohort should have equal contributions, so only the
contributions of male selection candidates will be optimized. Upper and
lower bounds for the contributions of selection candidates could be
defined with lb
and ub
(see the help page of
function opticont).
Now the optimum contributions of the selection candidates can be calculated:
##
## Using solver 'cccp2' with parameters:
## Value
## trace 0
## abstol 1e-05
## feastol 1e-05
## stepadj 0.9
## maxiters 100
## reltol 1e-06
## beta 0.5
##
## valid solver status
## TRUE cccp2 optimal
##
## Variable Value Bound OK?
## --------------------------------------------------
## BV Angler 0.26 max :
## --------------------------------------------------
## lower bounds all x >= lb : TRUE
## upper bounds all x <= ub : TRUE
## breed contribution Angler 1 == 1 : TRUE
## sex contrib. diff. Angler 0 == 0 : TRUE
## BV Angler 0.26 :
## sKin Angler 0.0561 <= 0.0561 : TRUE
## --------------------------------------------------
The report above states that the solution is considered optimal by the solver and that all constraints are fulfilled since valid=TRUE. This information can also be obtained as
## valid solver status
## 1 TRUE cccp2 optimal
The value of the objective function can be accessed as
## BV
## 0.2599955
and the expected average values of the kinshps and traits in the offspring are
## BV sKin
## 1 0.2599955 0.05611434
The results are OK. If they are not, then try to use another solver.
The solver can be specified in parameter solver
of function
opticont.
Available solvers are "alabama"
, "cccp"
,
"cccp2"
, and "slsqp"
. By default the solver is
chosen automatically. Alternatively, the same solver may be used but
with different tuning parameters. The available paramters are displayed
if the function opticont is
called (as shown above).
The optimum contributions of the selection candidates are in
component parent
:
Candidate <- Offspring$parent[, c("Indiv", "Sex", "oc", "herd")]
head(Candidate[Candidate$Sex=="male" & Candidate$oc>0.001,])
## Indiv Sex oc herd
## Angler147 Angler147 male 0.28747376 <NA>
## Angler166 Angler166 male 0.13575008 <NA>
## Angler171 Angler171 male 0.07675416 <NA>
In general, since the number of females with offspring is large, we expect that about $\frac{N_e}{4L}\approx 5$ new males are selected each year. In the first year of OCS with overlapping generation often only very few individuals are selected for breeding. This changes in subsequent years and can be avoided by imposing upper bounds for the optimum contributions.
The numbers of offspring can be obtained from the optimum
contributions and the size N
of the offspring population
with function noffspring.
These values are rounded up or down to whole numbers.
Candidate$n <- noffspring(Candidate, N=20)$nOff
head(Candidate[Candidate$Sex=="male" & Candidate$oc>0.001,])
## Indiv Sex oc herd n
## Angler147 Angler147 male 0.28747376 <NA> 12
## Angler166 Angler166 male 0.13575008 <NA> 5
## Angler171 Angler171 male 0.07675416 <NA> 3
Males and females can be allocated for mating with function matings such that all breeding animals have the desired number of matings. In the example below the mean marker-based inbreeding coefficient in the offspring is minimized.
## Optimal branch and bound solution found
## Sire Dam n
## 2 Angler166 Angler66 1
## 4 Angler147 Angler81 1
## 7 Angler147 Angler83 1
## 10 Angler147 Angler104 1
## 13 Angler147 Angler106 1
## 16 Angler147 Angler111 1
The mean inbreeding in the offspring (which is equal to the mean kinship of the parents) is:
## [1] 0.02846455
The objective of a breeding program depends on several factors. These are the intended use of the breed, the presence of historic bottlenecks, and the importance being placed on the maintenance of genetic originality. In most livestock breeds the focus is on increasing the economic merit, so the objective of the breeding program is to maximize genetic gain. In contrast, companion animals often suffer from historic bottlenecks due to an overuse of popular sires. Hence, in these breeds the objective is to minimize inbreeding. In endangered breeds, which get subsidies for conservation, the focus may be on increasing their conservation values by recovering the native genetic background or by increasing the genetic distance to other breeds.
However, these are conflicting objectives: For maximizing genetic gain, the animals with highest breeding values would be used for breeding, which may create a new bottleneck and contribute to inbreeding depression. Maximizing genetic gain would also favor the use of animals with high genetic contributions from commercial breeds because these animals often have the highest breeding values. But this would reduce the genetic originality of the breed. Minimizing inbreeding in the offspring favors the use of animals with high contributions from other breeds because they have low kinship with the population and it may require the use of outcross animals with breeding values below average.
Thus, focussing on only one aspect automatically worsens the other ones. This can be avoided by imposing constraints on the aspects that are not optimized.
In general, best practice is genotying all selection candidates to enable marker based evaluations. A breeding program based on marker information is more efficient than a breeding program based only on pedigree information, provided that the animals are genotyped for a sufficient number of markers. For several species, however, genotyping is still too expensive, so the breeding programs rely only on pedigree information.
Depending on what the objective of the breeding program is, you may continue reading at the appropriate section:
The required genotype file format, the marker map, the parameters
minSNP
, minL
, unitL
,
unitP
, and ubFreq
, which are used for
estimating the segment based kinship, the kinships at
native haplotype segments, and the breed composition, have
been described in the companion vignette for
basic marker-based evaluations.
The breed composition of individuals can be estimated with
function segBreedComp.
Since native contributions NC
of the Angler cattle should
be considered, they are computed and added as an additional column to
data frame Cattle
.
wdir <- file.path(tempdir(), "HaplotypeEval")
wfile <- haplofreq(GTfiles, Cattle, map, thisBreed="Angler", minL=1.0, w.dir=wdir)
Comp <- segBreedComp(wfile$match, map)
Cattle[rownames(Comp), "NC"] <- Comp$native
## Born Breed BV Sex herd NC
## Angler1 1991 Angler -1.0706066 male <NA> 0.5805690
## Angler2 1994 Angler -0.3362574 female 2 0.5687157
## Angler3 1986 Angler -2.0735649 female 1 0.7418098
## Angler4 1987 Angler 1.5968307 male <NA> 0.4045441
## Angler5 1994 Angler 1.0023969 male <NA> 0.2239983
## Angler6 1998 Angler -0.2426676 male <NA> 0.2458519
A matrix containing the segment based kinship between all
pairs of individuals can be computed with function segIBD, whereas
the kinships at native haplotype segments can be calculated
from the results of function segIBDatN.
Both kinships are computed below. These kinships and the phenotypes of
the selection candidates are combined into a single R-object with
function candes. This function computes also the current values of the
parameters in the population and displays the available objective
functions and constraints. Below, the kinship at native haplotype
segments is named sKinatN
:
phen <- Cattle[Cattle$Breed=="Angler",]
phen$isCandidate <- phen$Born<=2013
sKin <- segIBD(GTfiles, map, minL=1.0)
sKinatN <- segIBDatN(GTfiles, Cattle, map, thisBreed="Angler", minL=1.0)
## The population is evaluated at time 2014
##
## Mean values of the parameters are: Value
## for trait 'BV' in Angler: -0.0530
## for trait 'NC' in Angler: 0.4091
## for kinship 'sKin' in Angler: 0.0552
## for nat. kin. 'sKinatN' in Angler: 0.0770
##
## Available objective functions and constraints:
## for trait 'BV' in Angler: min.BV, max.BV, lb.BV, eq.BV, ub.BV
## for trait 'NC' in Angler: min.NC, max.NC, lb.NC, eq.NC, ub.NC
## for kinship 'sKin' in Angler: min.sKin, ub.sKin
## for nat. kin. 'sKinatN' in Angler: min.sKinatN, ub.sKinatN
##
## ub lb uniform
Compared to the introductory example the possibility to restrict or
to maximize native contributions became available because column
NC
was added to data frame Cattle
.
Additionally, the possibility to minimize or to restrict the kinship at
native segments sKinatN
became available since this kinship
was used as an argument to function candes. The current mean values in
the population are
## BV NC sKin sKinatN
## 1 -0.0529632 0.4090566 0.05515035 0.07701049
Function opticont can now be used to perform the optimization.
Depending on what the objective of the breeding program is, you may continue reading at the appropriate section:
First we create a list of constraints:
Again, equal contributions are assumed for the females and only the contributions of males are to be optimized. The upper bound for the mean segment based kinship was derived from the effective population size as explained above. Now the optimum contributions of the selection candidates can be calculated:
## valid solver status
## 1 TRUE cccp2 optimal
The expected values of the parameters in the next generation are
## BV NC sKin sKinatN
## 1 0.2599955 0.4013489 0.05611434 0.08003351
The results are the same as in the introductory example (as expected). This approach may be apppropriate for a population without introgression, but for populations with historic introgression, the kinship at native alleles should be restricted as well in accordance with the desired effective size, and the native contributions should be restricted in order not to decrease. Otherwise the genetic originality of the breed may get lost in the long term.
con <- list(
uniform ="female",
ub.sKin = 1-(1-cand$mean$sKin)*(1-1/(2*Ne))^(1/L),
ub.sKinatN = 1-(1-cand$mean$sKinatN)*(1-1/(2*Ne))^(1/L),
lb.NC = cand$mean$NC
)
Offspring2 <- opticont("max.BV", cand, con)
For comparison, the summaries of both scenarios are combined into a single data frame:
## BV NC sKin sKinatN
## Ref -0.0529632 0.4090566 0.05515035 0.07701049
## maxBV 0.2599955 0.4013489 0.05611434 0.08003351
## maxBV2 0.2383370 0.4090567 0.05546100 0.07795243
Since native contributions and breeding values are negatively correlated, the genetic gain decreases slightly when native contributions are constrained not to decrease.
Minimizing inbreeding means to minimize the average kinship of the population in order to enable breeders to avoid inbreeding. This is the appropriate approach e.g. for companion animals suffering from a historic bottleneck. It can be done with or without accounting for breeding values. In the example below no breeding values are considered since accurate breeding values are not available for most of these breeds.
First we create a list of constraints:
Again, equal contributions are assumed for the females and only the contributions of males are to be optimized. The segment based kinship is not constrained in this example because it should be minimized.
## BV NC sKin sKinatN
## 1 -0.0529632 0.4090566 0.05515035 0.07701049
## 2 -0.1413072 0.4320944 0.04878925 0.06388093
Minimizing kinship without constraining the mean breeding value decreases the mean breeding value in the offspring slightly because the individuals with high breeding values are related. For this breed, it also increases the native contribution because the individuals from other breeds that were used for upgrading were related.
While in livestock breeds the native contributions should be restricted in order to maintain the genetic originality of the breeds, in several companion breeds the opposite is true. Several companion breeds have high inbreeding coefficients and descend from only very few (e.g. 3) founders (Wellmann and Pfeiffer 2009), and purging seems to be not feasible. Hence, a sufficient genetic diversity of the population cannot be achieved in the population even if marker data is used to minimize inbreeding. For these breeds it may be appropriate to use unrelated individuals from a variety of other breeds in order to increase the genetic diversity. However, only a small contribution from other breeds is needed, so the native contributions should be restricted also for these breeds in order to preserve their genetic originality. Hence, the difference between a breed with high diversity and a breed with low diversity suffering from inbreeding depression is, that the optimum value for the native contribution is smaller than 1 for the latter.
For such a breed it is advisable to allow the use of unrelated individuals from other breeds but to restrict the admissible mean contribution from other breeds in the population. The mean kinship at native alleles should be restricted as well to require only a small amount of introgression:
con <- list(
uniform = "female",
lb.NC = cand$mean$NC + 0.04,
ub.sKinatN = 1-(1-cand$mean$sKinatN)*(1-1/(2*Ne))^(1/L)
)
Offspring2 <- opticont("min.sKin", cand, con)
For comparison, the estimates for both scenarios are combined into a single data frame:
## BV NC sKin sKinatN
## Ref -0.0529632 0.4090566 0.05515035 0.07701049
## minKin -0.1413072 0.4320944 0.04878925 0.06388093
## minKin2 -0.1855771 0.4490567 0.05099124 0.07268120
Of course, for a companion breed, the lower bound for the native contribution should be much higher.
For endangered breeds the priority of a breeding program could be to recover the original genetic background by maximizing native contributions. However, since the individuals with highest native contributions are related, this may considerably increase the inbreeding coefficients if the diversity at native alleles is not preserved. Hence, constraints are defined below not only for the segment based kinship but also for the kinship at native segments in accordance with the desired effective size:
con <- list(
uniform = "female",
ub.sKin = 1-(1-cand$mean$sKin)*(1-1/(2*Ne))^(1/L),
ub.sKinatN = 1-(1-cand$mean$sKinatN)*(1-1/(2*Ne))^(1/L)
)
Offspring <- opticont("max.NC", cand, con)
## valid solver status
## 1 TRUE cccp2 optimal
## BV NC sKin sKinatN
## 1 -0.148815 0.4508486 0.0524655 0.07796019
For this breed, maximizing native contributions results in negative genetic gain because native contributions and breeding values are negatively correlated. This can be avoided by adding an additional constraint for the breeding values:
con <- list(
uniform = "female",
ub.sKin = 1-(1-cand$mean$sKin)*(1-1/(2*Ne))^(1/L),
ub.sKinatN = 1-(1-cand$mean$sKinatN)*(1-1/(2*Ne))^(1/L),
lb.BV = cand$mean$BV
)
Offspring2 <- opticont("max.NC", cand, con)
For comparison, the estimated parameters of both scenarios are combined into a single data frame:
## BV NC sKin sKinatN
## Ref -0.05296320 0.4090566 0.05515035 0.07701049
## maxNC -0.14881503 0.4508486 0.05246550 0.07796019
## maxNC2 -0.05295336 0.4489632 0.05236741 0.07795278
While removing introgressed genetic material from the population is one possibility to increase the conservation value of an endangered breed, an alternative approach is to increase the genetic distance between the endangered breed and commercial breeds. In this case we do not care about whether alleles are native or not. We just want to accumulate haplotype segments which are rare in commercial breeds. This can be done with a core set approach.
In the core set approach, a hypothetical population is considered, consisting of individuals from various breeds. This population is called the core set. The contributions of each breed to the core set are such that the genetic diversity of the core set is maximized.
In the following example the parameter to be minimized is the mean
kinship of individuals from the core set in the next generation.
Constraint uniform
defined below states that the
contributions of the male selection candidates from the breed of
interest are to be optimized, whereas individuals from all other breeds
have uniform contributions.
Since the average kinship in a multi-breed population should be
managed, argument phen
of function cand contains
individuals from all genotyped breeds. This was not the case in the
above examples, where argument phen
contained only the
selection candidates.
Cattle$isCandidate <- Cattle$Born<=2013
cand <- candes(phen=Cattle, sKin=sKin, sKinatN.Angler=sKinatN, bc="sKin", cont=cont)
## The population is evaluated at time 2014
##
## Mean values of the parameters are: Value
## for trait 'NC' in Angler : 0.4091
## for trait 'BV' across breeds: -0.0684
## for trait 'BV.Angler' in Angler : -0.0530
## for trait 'BV.Holstein' in Holstein : 0.0130
## for trait 'BV.Fleckvieh' in Fleckvieh : -0.1136
## for trait 'BV.Rotbunt' in Rotbunt : 0.0886
## for kinship 'sKin' across breeds: 0.0350
## for nat. kin. 'sKinatN.Angler' in Angler : 0.0770
## for kinship 'sKin.Angler' in Angler : 0.0552
## for kinship 'sKin.Holstein' in Holstein : 0.1073
## for kinship 'sKin.Fleckvieh' in Fleckvieh : 0.0686
## for kinship 'sKin.Rotbunt' in Rotbunt : 0.1071
##
## Available objective functions and constraints:
## for trait 'NC' in Angler : min.NC, max.NC, lb.NC, eq.NC, ub.NC
## for trait 'BV' across breeds: min.BV, max.BV, lb.BV, eq.BV, ub.BV
## for trait 'BV.Angler' in Angler : min.BV.Angler, max.BV.Angler, lb.BV.Angler, eq.BV.Angler, ub.BV.Angler
## for trait 'BV.Holstein' in Holstein : min.BV.Holstein, max.BV.Holstein, lb.BV.Holstein, eq.BV.Holstein, ub.BV.Holstein
## for trait 'BV.Fleckvieh' in Fleckvieh : min.BV.Fleckvieh, max.BV.Fleckvieh, lb.BV.Fleckvieh, eq.BV.Fleckvieh, ub.BV.Fleckvieh
## for trait 'BV.Rotbunt' in Rotbunt : min.BV.Rotbunt, max.BV.Rotbunt, lb.BV.Rotbunt, eq.BV.Rotbunt, ub.BV.Rotbunt
## for kinship 'sKin' across breeds: min.sKin, ub.sKin
## for nat. kin. 'sKinatN.Angler' in Angler : min.sKinatN.Angler, ub.sKinatN.Angler
## for kinship 'sKin.Angler' in Angler : min.sKin.Angler, ub.sKin.Angler
## for kinship 'sKin.Holstein' in Holstein : min.sKin.Holstein, ub.sKin.Holstein
## for kinship 'sKin.Fleckvieh' in Fleckvieh : min.sKin.Fleckvieh, ub.sKin.Fleckvieh
## for kinship 'sKin.Rotbunt' in Rotbunt : min.sKin.Rotbunt, ub.sKin.Rotbunt
##
## ub lb uniform
mKin <- cand$mean$sKinatN.Angler
con <- list(
uniform = c("Angler.female", "Fleckvieh", "Holstein", "Rotbunt"),
ub.sKinatN.Angler = 1-(1-mKin)*(1-1/(2*Ne))^(1/L)
)
The upper bound for the mean native kinship was derived from the effective population size as explained above. Now the optimum contributions of the selection candidates can be calculated:
##
## Using solver 'cccp' with parameters:
## Value
## trace 0
## abstol 1e-05
## feastol 1e-05
## stepadj 0.9
## maxiters 100
## reltol 1e-06
## beta 0.5
##
## A square matrix was approximated by a positive
## definite matrix with relative distance 6e-04.
## valid solver status
## TRUE cccp optimal
##
## Variable Value Bound OK?
## ---------------------------------------------------------
## sKin across breeds 0.0332 min :
## ---------------------------------------------------------
## lower bounds all x >= lb : TRUE
## upper bounds all x <= ub : TRUE
## breed contribution Angler 0.4791 == 0.4791 : TRUE
## breed contribution Holstein 0.0443 == 0.0443 : TRUE
## breed contribution Fleckvieh 0.4243 == 0.4243 : TRUE
## breed contribution Rotbunt 0.0523 == 0.0523 : TRUE
## sex contrib. diff. Angler 0 == 0 : TRUE
## sex contrib. diff. Holstein 0 == 0 : TRUE
## sex contrib. diff. Fleckvieh 0 == 0 : TRUE
## sex contrib. diff. Rotbunt 0 == 0 : TRUE
## NC Angler 0.4333 :
## BV across breeds -0.1166 :
## BV Angler -0.1453 :
## BV Holstein 0.029 :
## BV Fleckvieh -0.1244 :
## BV Rotbunt 0.0861 :
## sKin across breeds 0.0332 :
## sKin Angler 0.0488 :
## sKin Holstein 0.1076 :
## sKin Fleckvieh 0.0688 :
## sKin Rotbunt 0.1079 :
## sKinatN Angler 0.0639 <= 0.078 : TRUE
## ---------------------------------------------------------
## valid solver status
## 1 TRUE cccp optimal
## NC BV BV.Angler BV.Holstein BV.Fleckvieh BV.Rotbunt sKin
## 1 0.4333205 -0.1165914 -0.1453215 0.02899812 -0.124364 0.08612751 0.0331582
## sKin.Angler sKin.Holstein sKin.Fleckvieh sKin.Rotbunt sKinatN.Angler
## 1 0.04883195 0.1075815 0.06882723 0.107866 0.06394063
Since the contributions of the selection candidates minimize the mean
kinship sKin
in the core set, they maximize the genetic
diversity of the core set. This is achieved by increasing the gentic
diversity within the breed or by increasing the genetic distance between
the breed of interest and the other breeds. The optimum contributions
are standardized so that their sum is equal to one within each
breed:
## Breed lb oc ub
## Angler8 Angler 0 0.02384619 NA
## Angler14 Angler 0 0.13924780 NA
## Angler16 Angler 0 0.07642795 NA
## Angler25 Angler 0 0.05765760 NA
## Angler30 Angler 0 0.02744606 NA
## Angler53 Angler 0 0.07553520 NA
All evaluations using pedigree data are demonstrated at the example of the Hinterwald cattle. A pedigree is contained in the package. The pedigree and the functions dealing with pedigree data have already been described in the companion vignette for basic pedigree-based evaluations.
The pedigree completeness is an important factor to get reliable results. If an animal has many missing ancestors, then it would falsely considered to be unrelated to other animals, so it will falsely obtain high optimum contributions. There are several approaches to overcome this problem:
Calculate the pedigree completeness for all selection candidates and exclude individuals with a small number of equivalent complete generations from the evaluations. The number of equivalent complete generations can be computed with function summary.
Classify the breed of founders born after some fixed date as
unknown
, and restrict the genetic contribution from these
founders in the offspring. The breed names of the founders can be
classified by using appropriate values for parameters
lastNative
and thisBreed
in function prePed.
Classify the breed of founders born after some fixed date as
unknown
, so that these founders are considered non-native,
and restrict or minimize the kinship at native alleles which is less
affected by incomplete pedigrees than the classical pedigree-based
kinship.
Of course, all 3 approaches can be followed simultaneously. First, we
prepare the pedigree and classify the breed of founders born after 1970
to be unknown
:
data("PedigWithErrors")
Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder", lastNative=1970)
## Sire Dam Sex Breed Born I
## 276000813798399 276000813550094 276000813155754 male Hinterwaelder 2008 2.5
## 276000814047526 276000813550094 276000812922177 female Hinterwaelder 2009 3.5
## 276000814094461 276000813550094 276000811287249 male Hinterwaelder 2008 5.5
## 276000813852592 276000813255923 276000812922670 female Hinterwaelder 2007 2.5
## 276000813617234 276000813270780 276000811823362 female Hinterwaelder 2008 4.0
## 276000891978029 S276000891978029 276000891978027 female Hinterwaelder 2008 2.0
## BV Offspring
## 276000813798399 0.01632677 FALSE
## 276000814047526 0.61601695 FALSE
## 276000814094461 0.27304978 FALSE
## 276000813852592 0.99055123 FALSE
## 276000813617234 1.25277591 FALSE
## 276000891978029 1.33973331 FALSE
The breed composition of individuals can be estimated with
function pedBreedComp.
Since the native contribution should be considered in some scenarios,
they are added as additional column NC
to the pedigree.
## native unknown unbek0 Fleckvieh
## 276000813798399 0.54309845 0.1633301 0.042541504 0.10948944
## 276000814047526 0.54775238 0.1770020 0.036956787 0.10919952
## 276000814094461 0.50883484 0.2580566 0.034835815 0.10763550
## 276000813852592 0.55436325 0.1279297 0.040756226 0.09793472
## 276000813617234 0.25216293 0.4960938 0.140098572 0.04468918
## 276000891978029 0.08495712 0.8837891 0.004339218 0.01391411
Below, the Hinterwald cattle born between 1980 and 1990 with at least
4 complete equivalent generations in the pedigree are selected to
describe the population and the individuals being at least 1 year old
are chosen as selection candidates. The aim is computing optimum
contributions of the selection candidates to the birth cohort 1991. Data
frame phen
is defined below, which contains the individual
IDs in Colmumn 1 (Indiv
), sexes in Column 2
(Sex
), breed names (Breed
), years of birth
(Born
), breeding values (BV
), and the native
contributions (NC
) of the individuals. Finally, the logical
column isCandidate
indicating the selection candidates is
appended.
use <- Pedig$Born %in% (1980:1990) & Pedig$Breed=="Hinterwaelder"
use <- use & summary(Pedig)$equiGen>=4
phen <- Pedig[use, c("Indiv", "Sex", "Breed", "Born", "BV", "NC")]
phen$isCandidate <- phen$Born<=1991
Since cattle have overlapping generations, the percentage which each age class represents in the population must be defined. One possibility is to assume that the percentage represented by a class is proportional to the percentage of offspring that is not yet born. Moreover, males and females (excluding newborn individuals) should be equally represented. Percentages fulfilling these assumptions can obtained with function agecont:
## age male female
## 1 1 0.16547917 0.07255799
## 2 2 0.16500502 0.07255799
## 3 3 0.12162008 0.06704857
## 4 4 0.05500167 0.05675740
## 5 5 0.01849194 0.04740178
## 6 6 0.01090550 0.03804617
The generation interval is approximately
## [1] 4.956284
The breeding values were simulated such that breeding values and native contributions are negatively correlated. This mimics historic introgression from a high-yielding commercial breed.
A matrix containing the pedigree based kinship between all pairs of individuals can be computed with function pedIBD. It is half the additive relationship matrix. The pedigree based kinship at native alleles can be calculated from the results of function pedIBDatN.
The data fame containing phenotypes and the kinships are combined
below into a single R-object with function candes. This function also
estimates the current values of the parameters in the population and
displays the available objective functions and constraints. Below, the
pedigree based kinship is named pKin
, and the
kinship at native alleles is named pKinatN
:
pKin <- pedIBD(Pedig, keep.only=phen$Indiv)
pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=phen$Indiv)
## The population is evaluated at time 1990
##
## Mean values of the parameters are: Value
## for trait 'BV' in Hinterwaelder: -0.6212
## for trait 'NC' in Hinterwaelder: 0.6236
## for kinship 'pKin' in Hinterwaelder: 0.0254
## for nat. kin. 'pKinatN' in Hinterwaelder: 0.0452
##
## Available objective functions and constraints:
## for trait 'BV' in Hinterwaelder: min.BV, max.BV, lb.BV, eq.BV, ub.BV
## for trait 'NC' in Hinterwaelder: min.NC, max.NC, lb.NC, eq.NC, ub.NC
## for kinship 'pKin' in Hinterwaelder: min.pKin, ub.pKin
## for nat. kin. 'pKinatN' in Hinterwaelder: min.pKinatN, ub.pKinatN
##
## ub lb uniform
Compared to the introductory example the possibility to restrict or
to maximize native contributions becomes available because column
NC
is now included in data frame phen
.
Additionally, there is the possibility to minimize or to restrict the
kinship at native alleles pKinatN
and the
pedigree based kinship pKin
.
For defining appropriate threshold values for the constraints, the current mean kinships, the mean native contribution, and the mean breeding value in the population need to be known. The values can be obtained as
## BV NC pKin pKinatN
## 1 -0.6212472 0.6235631 0.02540349 0.04524766
Depending on what the objective of the breeding program is, you may continue reading at the appropriate section:
This is the traditional approach proposed by T. H. E. Meuwissen (1997). First we create a list of constraints:
Here, equal numbers of offspring are assumed for the females and only the contributions of males are to be optimized. The upper bound for the mean pedigree based kinship was derived from the effective population size as explained above. Now the optimum contributions of the selection candidates can be calculated:
## BV NC pKin pKinatN
## 1 -0.6212472 0.6235631 0.02540349 0.04524766
## 2 -0.5374799 0.6152733 0.02638868 0.04724255
This approach may be apppropriate for a population without introgression and complete pedigrees, but for populations with historic introgression, the kinship at native alleles should be restricted as well in accordance with the desired effective size, and the native contributions should be restricted in order not to decrease. Otherwise the genetic originality of the breed may get lost in the long term.
con <- list(
uniform = "female",
ub.pKin = 1-(1-cand$mean$pKin)*(1-1/(2*Ne))^(1/L),
ub.pKinatN = 1-(1-cand$mean$pKinatN)*(1-1/(2*Ne))^(1/L),
lb.NC = cand$mean$NC
)
Offspring2 <- opticont("max.BV", cand, con)
For comparison, the parameters of both scenarios are combined into a
single data frame with rbind
:
## BV NC pKin pKinatN
## Ref -0.6212472 0.6235631 0.02540349 0.04524766
## maxBV -0.5374799 0.6152733 0.02638868 0.04724255
## maxBV2 -0.5483020 0.6235632 0.02600296 0.04621304
Thus, genetic gain in Method 2 is only slightly below the genetic gain in Method 1, but the native contributions do not decrease and the kinship at native alleles increases at a lower rate.
Minimizing inbreeding means to minimize the average kinship of the population in order to enable breeders to avoid inbreeding. This is the appropriate approach e.g. for companion animals suffering from a historic bottleneck. It can be done with or without accounting for breeding values. In the example below no breeding values are considered since accurate breeding values are not available for most of these breeds.
First we create a list of constraints:
Again, equal numbers of offspring are assumed for all females and only the contributions of males are to be optimized. The pedigree based kinship is not constrained in this example because it should be minimized.
## BV NC pKin pKinatN
## 1 -0.6212472 0.6235631 0.02540349 0.04524766
## 2 -0.6740305 0.6293595 0.02202135 0.03889580
The approach shown above has the disadvantage that kinships between individuals are less reliable if ancestors are missing in the pedigree. The alternative approach, shown below, is to minimize the kinship at native alleles and to restrict pedigree based kinship.
While in livestock breeds the native contributions should be preserved in order to maintain the genetic originality of the breeds, in several companion breeds the opposite is true. Several companion breeds have high inbreeding coefficients and descend from only very few (e.g. 3) founders. Hence, a sufficient genetic diversity cannot be achieved in the population. For these breeds it may be appropriate to use unrelated individuals from a variety of other breeds in order to increase the genetic diversity. However, only a small contribution from other breeds is needed, so the native contributions should be restricted also for these breeds in order to preserve their genetic originality. The difference between a breed with high diversity and a breed with low diversity suffering from inbreeding depression is, that the optimum value for the native contribution is smaller than 1 for the latter. For such a breed it is advisable to allow the use of individuals from other breeds but to restrict the admissible mean contribution from other breeds.
In summary, the alternative approach is to minimize the kinship at native alleles and to restrict pedigree based kinship and native contributions:
con <- list(
uniform = "female",
lb.NC = 1.02*cand$mean$NC,
ub.pKin = 1-(1-cand$mean$pKin)*(1-1/(2*Ne))^(1/L)
)
Offspring2 <- opticont("min.pKinatN", cand, con)
## Warning in nloptr::slsqp(x0 = X, fn = op$f$f0, gr = op$f$g0, lower = op$lb$val,
## : The old behavior for hin >= 0 has been deprecated. Please restate the
## inequality to be <=0. The ability to use the old behavior will be removed in a
## future release.
## Warning in nloptr::slsqp(x0 = X, fn = op$f$f0, gr = op$f$g0, lower = op$lb$val,
## : The old behavior for hinjac >= 0 has been deprecated. Please restate the
## inequality to be <=0. The ability to use the old behavior will be removed in a
## future release.
For comparison, the parameter estimates are combined into a single data frame:
## BV NC pKin pKinatN
## Ref -0.6212472 0.6235631 0.02540349 0.04524766
## minKin -0.6740305 0.6293595 0.02202135 0.03889580
## minKin2 -0.6813346 0.6360344 0.02220580 0.03864749
The pedigree based kinship is slightly higher in the second approach, but the kinship at native alleles is lower. Since pedigree based kinships are less reliable due to missing ancestors in the pedigree, the second approach is recommended. However, the use of pedigree data has the disadvantage that only the expected kinships can be minimized. The expected kinships deviate from the realized kinships due to mendelian segregation. Hence, for breeds with serious inbreeding problems it is recommended to genotype the selection candidates and to perform marker-based optimum contribution selection.
For endangered breeds the priority of a breeding program could be to recover the original genetic background by maximizing native contributions. However, since the individuals with highest native contributions are related, this may considerably increase the inbreeding coefficients if the diversity at native alleles is not preserved. Hence, constraints are defined below not only for the pedigree based kinship, but also for the kinship at native alleles in accordance with the desired effective size:
con <- list(
uniform = "female",
ub.pKin = 1-(1-cand$mean$pKin)*(1-1/(2*Ne))^(1/L),
ub.pKinatN = 1-(1-cand$mean$pKinatN)*(1-1/(2*Ne))^(1/L)
)
Offspring <- opticont("max.NC", cand, con)
## BV NC pKin pKinatN
## 1 -0.6944268 0.6534981 0.02638974 0.04596307
For some breeds, native contributions and breeding values are negatively correlated, so maximizing native contributions results in negative genetic. This can be avoided by adding an additional constraint for the breeding values:
con <- list(
uniform = "female",
ub.pKin = 1-(1-cand$mean$pKin)*(1-1/(2*Ne))^(1/L),
ub.pKinatN = 1-(1-cand$mean$pKinatN)*(1-1/(2*Ne))^(1/L),
lb.BV = cand$mean$BV
)
Offspring2 <- opticont("max.NC", cand, con)
For comparison, the estimates for both scenarios are combined into a single data frame:
## BV NC pKin pKinatN
## Ref -0.6212472 0.6235631 0.02540349 0.04524766
## maxNC -0.6944268 0.6534981 0.02638974 0.04596307
## maxNC2 -0.6212452 0.6478674 0.02639071 0.04608244