Title: | Optimum Contribution Selection and Population Genetics |
---|---|
Description: | A framework for the optimization of breeding programs via optimum contribution selection and mate allocation. An easy to use set of function for computation of optimum contributions of selection candidates, and of the population genetic parameters to be optimized. These parameters can be estimated using pedigree or genotype information, and include kinships, kinships at native haplotype segments, and breed composition of crossbred individuals. They are suitable for managing genetic diversity, removing introgressed genetic material, and accelerating genetic gain. Additionally, functions are provided for computing genetic contributions from ancestors, inbreeding coefficients, the native effective size, the native genome equivalent, pedigree completeness, and for preparing and plotting pedigrees. The methods are described in:\n Wellmann, R., and Pfeiffer, I. (2009) <doi:10.1017/S0016672309000202>.\n Wellmann, R., and Bennewitz, J. (2011) <doi:10.2527/jas.2010-3709>.\n Wellmann, R., Hartwig, S., Bennewitz, J. (2012) <doi:10.1186/1297-9686-44-34>.\n de Cara, M. A. R., Villanueva, B., Toro, M. A., Fernandez, J. (2013) <doi:10.1111/mec.12560>.\n Wellmann, R., Bennewitz, J., Meuwissen, T.H.E. (2014) <doi:10.1017/S0016672314000196>.\n Wellmann, R. (2019) <doi:10.1186/s12859-018-2450-5>. |
Authors: | Robin Wellmann [aut, cre] |
Maintainer: | Robin Wellmann <[email protected]> |
License: | GPL-2 |
Version: | 2.0.9 |
Built: | 2024-11-06 06:26:30 UTC |
Source: | CRAN |
A framework for the optimization of breeding programs via optimum contribution selection and mate allocation. An easy to use set of function for computation of optimum contributions of selection candidates, and of the population genetic parameters to be optimized. These parameters can be estimated using pedigree or genotype information, and include kinships, kinships at native haplotype segments, and breed composition of crossbred individuals. They are suitable for managing genetic diversity, removing introgressed genetic material, and accelerating genetic gain. Additionally, functions are provided for computing genetic contributions from ancestors, inbreeding coefficients, the native effective size, the native genome equivalent, pedigree completeness, and for preparing and plotting pedigrees. The methods are described in:\n Wellmann, R., and Pfeiffer, I. (2009) <doi:10.1017/S0016672309000202>.\n Wellmann, R., and Bennewitz, J. (2011) <doi:10.2527/jas.2010-3709>.\n Wellmann, R., Hartwig, S., Bennewitz, J. (2012) <doi:10.1186/1297-9686-44-34>.\n de Cara, M. A. R., Villanueva, B., Toro, M. A., Fernandez, J. (2013) <doi:10.1111/mec.12560>.\n Wellmann, R., Bennewitz, J., Meuwissen, T.H.E. (2014) <doi:10.1017/S0016672314000196>.\n Wellmann, R. (2019) <doi:10.1186/s12859-018-2450-5>.
Optimum Contribution Selection
After kinships, breeding values and/or native contributions of the selection candidates have been computed, function candes can be used to create an R-object containing all this information. The current average kinships and trait values are estimated by this function, and the available objective functions and constraints for optimum contribution selection are reported. The following function can then be used to compute optimum contributions:
opticont | Calculates optimum genetic contributions of selection candidates to the next generation, |
and checks if all constraints are fulfilled. | |
Function noffspring can be used to compute the optimum numbers of offspring of selection candidates from their optimum contributions. Function matings can be used for mate allocation.
Kinships
For pairs of individuals the following kinships can be computed:
pedIBD | Calculates pedigree based probability of alleles to be IBD ("pedigree based kinship""), |
segIBD | Calculates segment based probability of alleles to be IBD ("segment based kinship"), |
pedIBDatN | Calculates pedigree based probability of alleles to be IBD at segments with Native origin, |
segIBDatN | Calculates segment based probability of alleles to be IBD at segments with Native origin, |
pedIBDorM | Calculates pedigree based probability of alleles to be IBD or Migrant alleles, |
segIBDandN | Calculates segment based probability of alleles to be IBD and have Native origin, |
segN | Calculates segment based probability of alleles to have Native origin, |
makeA | Calculates the pedigree-based additive relationship matrix. |
Phenotypes and results from these functions can be combined with function candes into a single R object, which can then be used as an argument to function opticont.
The segment based kinship can be used to calculate the optimum contributions of different breeds to a hypothetical multi-breed population with maximum genetic diversity by using function opticomp.
Function sim2dis can be used to convert a similarity matrix (e.g. a kinship matrix) into a dissimilarity matrix which is suitable for multidimensional scaling.
Breed Composition
The breed composition of crossbred individuals can be accessed with
pedBreedComp | Calculates pedigree based the Breed Composition, which is the genetic contribution |
of each individual from other breeds and from native founders. The native contribution | |
is the proportion of the genome not originating from other breeds. | |
segBreedComp | Calculates segment based the Breed Composition. The native contribution is the |
proportion of the genome belonging to segments that have low frequency in | |
other breeds. | |
The native contributions obtained by the above functions can be constrained or maximized with function opticont to remove introgressed genetic material, or alternatively, the segment-based native contribution can be considered a quantitative trait and included in a selection index.
Haplotype frequencies
Frequencies of haplotype segments in particular breeds can be computed and plotted with
haplofreq | Calculates the maximum frequency each segment has in a set of reference breeds, |
and the name of the breed in which the segment has maximum frequency. | |
Identification of native segments. | |
freqlist | Combines results obtained with function haplofreq for different reference breeds |
into a single R object which is suitable for plotting. | |
plot.HaploFreq | Plots frequencies of haplotype segments in particular reference breeds. |
Inbreeding Coefficients and Genetic Contributions
The inbreeding coefficients and genetic contributions from ancestors can be computed with:
pedInbreeding | Calculates pedigree based Inbreeding. |
segInbreeding | Calculates segment based Inbreeding, i.e. inbreeding based on |
runs of homozygosity (ROH). | |
genecont | Calculates genetic contributions each individual has from all it's ancestors in |
the pedigree. | |
Preparing and plotting pedigree data
There are some functions for preparing and plotting pedigree data
prePed | prepares a Pedigree by sorting, adding founders and pruning the pedigree, |
completeness | Calculates pedigree completeness in all ancestral generations, |
summary.Pedig | Calculates number of equivalent complete generations, number of fully |
traced generations, number of maximum generations traced, index of | |
pedigree completeness, inbreeding coefficients, | |
subPed | Creates a subset of a large Pedigree, |
pedplot | Plots a pedigree, |
sampleIndiv | Samples individuals from a pedigree. |
Population Parameters
Finally, there are some functions for estimating population parameters:
conttac | Calculates genetic contributions of breeds to age cohorts, |
summary.candes | Calculates for every age cohort several genetic parameters. These may |
include average kinships, kinships at native loci, | |
the native effective size, and the native genome equivalent. | |
Genotype File Format
All functions reading genotype data assume that the files are in the following format:
Genotypes are phased and missing genotypes have been imputed. Each file has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. There can be some extra columns on the left hand side containing no genotype data. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should be repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces.
Use function read.indiv to extract the IDs of the individuals from a genotype file.
Robin Wellmann [aut, cre]
Maintainer: Robin Wellmann <[email protected]>
de Cara MAR, Villanueva B, Toro MA, Fernandez J (2013). Using genomic tools to maintain diversity and fitness in conservation programmes. Molecular Ecology. 22: 6091-6099
Wellmann, R., and Pfeiffer, I. (2009). Pedigree analysis for conservation of genetic diversity and purging. Genet. Res. 91: 209-219
Wellmann, R., and Bennewitz, J. (2011). Identification and characterization of hierarchical structures in dog breeding schemes, a novel method applied to the Norfolk Terrier. Journal of Animal Science. 89: 3846-3858
Wellmann, R., Hartwig, S., Bennewitz, J. (2012). Optimum contribution selection for conserved populations with historic migration; with application to rare cattle breeds. Genetics Selection Evolution. 44: 34
Wellmann, R., Bennewitz, J., Meuwissen, T.H.E. (2014) A unified approach to characterize and conserve adaptive and neutral genetic diversity in subdivided populations. Genet Res (Camb). 69: e16
Wellmann, R. (2019). Optimum Contribution Selection and Mate Allocation for Breeding: The R Package optiSel. BMC Bioinformatics 20:25
#See ?opticont for optimum contribution selection #These examples demonstrate computation of some population genetic parameters. data(ExamplePed) Pedig <- prePed(ExamplePed, thisBreed="Hinterwaelder", lastNative=1970) head(Pedig) ############################################ # Evaluation of # # - kinships # # - genetic diversities # # - native effective size # # - native genome equivalent # ############################################ phen <- Pedig[Pedig$Breed=="Hinterwaelder",] pKin <- pedIBD(Pedig) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder") pop <- candes(phen=phen, pKin=pKin, pKinatN=pKinatN, quiet=TRUE, reduce.data=FALSE) Param <- summary(pop, tlim=c(1970,2005), histNe=150, base=1800, df=4) plot(Param$t, Param$Ne, type="l", ylim=c(0,150), main="Native Effective Size", ylab="Ne", xlab="") matplot(Param$t, Param[,c("pKin", "pKinatN")], type="l",ylim=c(0,1),main="Kinships", xlab="Year", ylab="mean Kinship") abline(0,0) legend("topleft", legend = c("pKin", "pKinatN"), lty=1:2, col=1:2, cex=0.6) info <- paste("Base Year =", attributes(Param)$base, " historic Ne =", attributes(Param)$histNe) plot(Param$t,Param$NGE,type="l",main="Native Genome Equivalents", ylab="NGE",xlab="",ylim=c(0,7)) mtext(info, cex=0.7) ############################################ # Genetic contributions from other breeds # ############################################ cont <- pedBreedComp(Pedig, thisBreed='Hinterwaelder') contByYear <- conttac(cont, Pedig$Born, use=Pedig$Breed=="Hinterwaelder", mincont=0.04, long=FALSE) round(contByYear,2) barplot(contByYear, ylim=c(0,1), col=1:10, ylab="genetic contribution", legend=TRUE, args.legend=list(x="topleft",cex=0.6)) ###################################################### # Frequencies of haplotype segments in other breeds # ###################################################### data(map) data(Cattle) dir <- system.file("extdata", package="optiSel") files <- file.path(dir, paste("Chr", 1:2, ".phased", sep="")) Freq <- freqlist( haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minSNP=20), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Holstein", minSNP=20), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Fleckvieh", minSNP=20) ) plot(Freq, ID=1, hap=2, refBreed="Rotbunt")
#See ?opticont for optimum contribution selection #These examples demonstrate computation of some population genetic parameters. data(ExamplePed) Pedig <- prePed(ExamplePed, thisBreed="Hinterwaelder", lastNative=1970) head(Pedig) ############################################ # Evaluation of # # - kinships # # - genetic diversities # # - native effective size # # - native genome equivalent # ############################################ phen <- Pedig[Pedig$Breed=="Hinterwaelder",] pKin <- pedIBD(Pedig) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder") pop <- candes(phen=phen, pKin=pKin, pKinatN=pKinatN, quiet=TRUE, reduce.data=FALSE) Param <- summary(pop, tlim=c(1970,2005), histNe=150, base=1800, df=4) plot(Param$t, Param$Ne, type="l", ylim=c(0,150), main="Native Effective Size", ylab="Ne", xlab="") matplot(Param$t, Param[,c("pKin", "pKinatN")], type="l",ylim=c(0,1),main="Kinships", xlab="Year", ylab="mean Kinship") abline(0,0) legend("topleft", legend = c("pKin", "pKinatN"), lty=1:2, col=1:2, cex=0.6) info <- paste("Base Year =", attributes(Param)$base, " historic Ne =", attributes(Param)$histNe) plot(Param$t,Param$NGE,type="l",main="Native Genome Equivalents", ylab="NGE",xlab="",ylim=c(0,7)) mtext(info, cex=0.7) ############################################ # Genetic contributions from other breeds # ############################################ cont <- pedBreedComp(Pedig, thisBreed='Hinterwaelder') contByYear <- conttac(cont, Pedig$Born, use=Pedig$Breed=="Hinterwaelder", mincont=0.04, long=FALSE) round(contByYear,2) barplot(contByYear, ylim=c(0,1), col=1:10, ylab="genetic contribution", legend=TRUE, args.legend=list(x="topleft",cex=0.6)) ###################################################### # Frequencies of haplotype segments in other breeds # ###################################################### data(map) data(Cattle) dir <- system.file("extdata", package="optiSel") files <- file.path(dir, paste("Chr", 1:2, ".phased", sep="")) Freq <- freqlist( haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minSNP=20), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Holstein", minSNP=20), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Fleckvieh", minSNP=20) ) plot(Freq, ID=1, hap=2, refBreed="Rotbunt")
Contributions of age classes to the population are calculated such that the contribution of each age class to the population is proportional to the expected proportion of offspring that is not yet born.
Note that the contribution of a class to the population is not equal to the proportion of individuals belonging to the class.
agecont(Pedig, use=Pedig$Born >= quantile(Pedig$Born, 0.75), maxAge=NA)
agecont(Pedig, use=Pedig$Born >= quantile(Pedig$Born, 0.75), maxAge=NA)
Pedig |
Pedigree with colums |
use |
Logical vector or character vector with IDs indicating the individuals from the current population. |
maxAge |
Parents that are more than |
Contributions of age classes to the population are calculated such that the contribution of each age class to the population is proportional to the expected proportion of offspring that is not yet born.
More precisely:
Individuals born in the current year are in age class k=1
. Typically, each age class spans one year. No individual can have offspring in the same age class. Males and females that are not born in the current year are assumed to have equal contributions to the population. Moreover, as stated above, it is assumed that the contribution of each class to the population is proportional to the proportion of offspring from this class that is not yet born when the individuals leaves the class.
This approach to define contributions has the advantage that it does not need to be known which individuals are still alive and which are removed from the breeding pool. Moreover, it causes old age classes to have a smaller contribution to the population than young age classes.
The contributions are estimated from the ages of the parents when the individuals in vector use
were born. Obviously, the contributions of age classes to the offspring in the next year do not coincide with the contributions of the age classes to the population.
Data frame containing the contributions of all age cohorts to the current population.
data(PedigWithErrors) Pedig <- prePed(PedigWithErrors) use <- Pedig$Breed=="Hinterwaelder" & !is.na(Pedig$Born) use <- use & Pedig$Born>=2000 & Pedig$Born<=2004 # Calculate the contribution of each age class ## cont <- agecont(Pedig, use) # Contribution of each age class to # the current population: head(cont) # Note: In this case, young males have a higher contribution to the # population than young females because they are used for breeding # for a shorter time span, i.e. they are culled earlier. # Males and females (excluding the newborn individuals) # have equal contributions to the current population: sum(cont$male[-1]) #[1] 0.3925894 sum(cont$female[-1]) #[1] 0.3925894 # The total contribution of classes to the curent population is equal to 1 sum(cont$female) + sum(cont$male) #[1] 1 # When used for OCS, the contribution of the offspring to the # population in the next year is equal to the contribution of the individuals # born in this year to the current population: cont$male[1]+cont$female[1] #[1] 0.2148212 # This is approximately 1/L, where L is the generation interval.
data(PedigWithErrors) Pedig <- prePed(PedigWithErrors) use <- Pedig$Breed=="Hinterwaelder" & !is.na(Pedig$Born) use <- use & Pedig$Born>=2000 & Pedig$Born<=2004 # Calculate the contribution of each age class ## cont <- agecont(Pedig, use) # Contribution of each age class to # the current population: head(cont) # Note: In this case, young males have a higher contribution to the # population than young females because they are used for breeding # for a shorter time span, i.e. they are culled earlier. # Males and females (excluding the newborn individuals) # have equal contributions to the current population: sum(cont$male[-1]) #[1] 0.3925894 sum(cont$female[-1]) #[1] 0.3925894 # The total contribution of classes to the curent population is equal to 1 sum(cont$female) + sum(cont$male) #[1] 1 # When used for OCS, the contribution of the offspring to the # population in the next year is equal to the contribution of the individuals # born in this year to the current population: cont$male[1]+cont$female[1] #[1] 0.2148212 # This is approximately 1/L, where L is the generation interval.
An R-Object is created containing all information describing the individuals, which is usually a sample from the current population and includes the selection candidates. Average kinships and trait values, and the available objective functions and constraints for optimum contribution selection (OCS) are reported.
candes(phen, cont=NULL, N=1000, quiet=FALSE, t=NA, bc=NULL, reduce.data=TRUE, ...)
candes(phen, cont=NULL, N=1000, quiet=FALSE, t=NA, bc=NULL, reduce.data=TRUE, ...)
phen |
Data frame with column |
cont |
Data frame frame with column |
N |
The population size. A small value accelerates the increase in kinship due to genetic drift. For overlapping generations it can be calculated as |
quiet |
Should the report be suppressed? |
t |
The time at which the population should be evaluated. The default means that |
bc |
Only needed if multi-breed data is provided. Named vector with breed contributions, with component names being the names of the breeds in |
reduce.data |
Logical. Should data from individuals not contributing to the population at time |
... |
One or more objects of class |
An R-Object is created containing all information describing the individuals, which is usually the current population and includes the selection candidates. Average kinships and trait values are estimated and reported. The weights of Age x Sex classes are in accordance with argument cont
. The available objective functions and constraints for optimum contribution selection are reported.
List of class candes with the following components:
kinship |
Objects of class |
phen |
Supplied data frame * Column * Column * Column * Column * Column |
mean |
Data frame containing estimates of the current mean values (at time |
current |
Data frame containing the same values as component |
bc |
Character vector with optimum breed contributions (see above). |
classes |
Data frame containing the number of individuals in each class (column |
breed |
List describing the breeds included in the data set. |
Robin Wellmann
data(PedigWithErrors) Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder", lastNative=1970, keep=PedigWithErrors$Born%in%1992) use <- Pedig$Born %in% (1980:1990) & Pedig$Breed=="Hinterwaelder" Population <- Pedig$Indiv[use] Pedig$NC <- pedBreedComp(Pedig, thisBreed="Hinterwaelder")$native pKin <- pedIBD(Pedig, keep.only=Population) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=Population) Phen <- Pedig[Population, ] ### Example 1: Overlapping Generations ### Old individuals contribute only little to the means: cont <- agecont(Pedig, Population, maxAge=10) cand <- candes(phen=Phen, pKin=pKin, pKinatN=pKinatN, cont=cont) cand$current[,c("Name", "Type", "Breed", "Val", "Var")] # Name Type Breed Val Var #1 BV trait Hinterwaelder -0.55979308 BV #2 NC trait Hinterwaelder 0.56695077 NC #3 pKin kinship Hinterwaelder 0.02230896 pKin #4 pKinatN nat. kin. Hinterwaelder 0.04678453 pKinatN # BV: simulated breeding values # NC: native genetic contribution computed from pedigree # pKin: pedigree-based kinship # pKinatN: pedigree-based native kinship ### Example 2: Discrete Generations (cont=NULL). ### Old individuals and young individuals contribute equally to the means: Phen$Born <- 1 cand <- candes(phen=Phen, pKin=pKin, pKinatN=pKinatN, cont=NULL) cand$current[,c("Name", "Type", "Breed", "Val", "Var")] # Name Type Breed Val Var #1 BV trait Hinterwaelder -0.71910508 BV #2 NC trait Hinterwaelder 0.58226604 NC #3 pKin kinship Hinterwaelder 0.01979228 pKin #4 pKinatN nat. kin. Hinterwaelder 0.04053012 pKinatN ### Shorthand: cand$mean # BV NC pKin pKinatN #1 -0.7191051 0.582266 0.01979228 0.04053012 cand$mean$pKin #[1] 0.01979228
data(PedigWithErrors) Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder", lastNative=1970, keep=PedigWithErrors$Born%in%1992) use <- Pedig$Born %in% (1980:1990) & Pedig$Breed=="Hinterwaelder" Population <- Pedig$Indiv[use] Pedig$NC <- pedBreedComp(Pedig, thisBreed="Hinterwaelder")$native pKin <- pedIBD(Pedig, keep.only=Population) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=Population) Phen <- Pedig[Population, ] ### Example 1: Overlapping Generations ### Old individuals contribute only little to the means: cont <- agecont(Pedig, Population, maxAge=10) cand <- candes(phen=Phen, pKin=pKin, pKinatN=pKinatN, cont=cont) cand$current[,c("Name", "Type", "Breed", "Val", "Var")] # Name Type Breed Val Var #1 BV trait Hinterwaelder -0.55979308 BV #2 NC trait Hinterwaelder 0.56695077 NC #3 pKin kinship Hinterwaelder 0.02230896 pKin #4 pKinatN nat. kin. Hinterwaelder 0.04678453 pKinatN # BV: simulated breeding values # NC: native genetic contribution computed from pedigree # pKin: pedigree-based kinship # pKinatN: pedigree-based native kinship ### Example 2: Discrete Generations (cont=NULL). ### Old individuals and young individuals contribute equally to the means: Phen$Born <- 1 cand <- candes(phen=Phen, pKin=pKin, pKinatN=pKinatN, cont=NULL) cand$current[,c("Name", "Type", "Breed", "Val", "Var")] # Name Type Breed Val Var #1 BV trait Hinterwaelder -0.71910508 BV #2 NC trait Hinterwaelder 0.58226604 NC #3 pKin kinship Hinterwaelder 0.01979228 pKin #4 pKinatN nat. kin. Hinterwaelder 0.04053012 pKinatN ### Shorthand: cand$mean # BV NC pKin pKinatN #1 -0.7191051 0.582266 0.01979228 0.04053012 cand$mean$pKin #[1] 0.01979228
Simulated phenotypes of cattle whose genotypes are included in files Chr1.phased, and Chr2.phased.
data(Cattle)
data(Cattle)
Data frame containing information on genotyped cattle.
The columns contain the ID of the individual (Indiv
), the year of birth (Born
), the breed name (Breed
), a breeding value (BV
), the sex (Sex
), and the herd (herd
).
Phased genotypes of cattle from chromosome 1 (only the first part of the chromosome). Further information on these animals is included in data frame Cattle.
All functions reading phased genotype data assume that the files are in the following format:
Each file has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. There can be some extra columns on the left hand side containing no genotype data. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should be repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces.
Use function read.indiv to extract the IDs of the individuals from a genotype file.
GTfile <- system.file("extdata/Chr1.phased", package="optiSel") file.show(GTfile) GT <- read.table(GTfile, header=TRUE, skip=2, check.names=FALSE) GT[1:10,1:5]
GTfile <- system.file("extdata/Chr1.phased", package="optiSel") file.show(GTfile) GT <- read.table(GTfile, header=TRUE, skip=2, check.names=FALSE) GT[1:10,1:5]
Phased genotypes from Chromosome 2 (only the first part of the chromosome). Further information on these animals is included in data frame Cattle.
All functions reading phased genotype data assume that the files are in the following format:
Each file has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. There can be some extra columns on the left hand side containing no genotype data. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should be repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces.
Use function read.indiv to extract the IDs of the individuals from a genotype file.
GTfile <- system.file("extdata/Chr2.phased", package="optiSel") file.show(GTfile) GT <- read.table(GTfile, header=TRUE, skip=2, check.names=FALSE) GT[1:10,1:5]
GTfile <- system.file("extdata/Chr2.phased", package="optiSel") file.show(GTfile) GT <- read.table(GTfile, header=TRUE, skip=2, check.names=FALSE) GT[1:10,1:5]
Calculates completeness of the pedigree for individuals and for groups of individuals in each ancestral generation.
completeness(Pedig, keep=NULL, maxd=50, by="Indiv")
completeness(Pedig, keep=NULL, maxd=50, by="Indiv")
Pedig |
Data frame containing the pedigree, where the first columns are |
keep |
Vector with IDs of the individuals for which the completeness will be calculated, or a logical vector indicating the individuals. By default, all individuals are used. |
maxd |
Number of generations for which completeness should be calculated. |
by |
Name of a column in data frame |
The function computes the completeness of the pedigree for the specified individuals and for groups of individuals. It is the proportion of known ancestors in each generation. Generation 0 corresponds to the individual itself, so the completeness is always 1 in generation 0.
Data frame with the following columns
Indiv (or 'by') |
ID of the individual or level of the grouping factor, |
Generation |
Generation number, |
Completeness |
Completeness of the pedigree in the respective generation. |
Robin Wellmann
Cazes P, Cazes MH. (1996) Comment mesurer la profondeur genealogique d'une ascendance? Population (French Ed) 51:117-140.
Another function for characterizing pedigree completeness is summary.Pedig
.
#Computes the pedigree completeness of Hinterwald cattle #born between 2006 and 2007 in each ancestral generation. data(PedigWithErrors) Pedig <- prePed(PedigWithErrors) compl <- completeness(Pedig, keep=Pedig$Born %in% (2006:2007), maxd=50, by="Indiv") head(compl) #Summary statistics can be computed directly from the pedigree: Summary <- summary(Pedig, keep=Pedig$Born %in% (2006:2007)) head(Summary) hist(Summary$PCI, xlim=c(0,1), main="Pedigree Completeness") hist(Summary$Inbreeding, xlim=c(0,1), main="Inbreeding") hist(Summary$equiGen, xlim=c(0,20), main="Number of Equivalent Complete Generations") hist(Summary$fullGen, xlim=c(0,20), main="Number of Fully Traced Generations") hist(Summary$maxGen, xlim=c(0,20), main="Number of Maximum Generations Traced") compl <- completeness(Pedig, keep=Pedig$Born %in% (2006:2007), maxd=50, by="Sex") head(compl) library("ggplot2") ggplot(compl, aes(Generation, Completeness, col=Sex))+geom_line()
#Computes the pedigree completeness of Hinterwald cattle #born between 2006 and 2007 in each ancestral generation. data(PedigWithErrors) Pedig <- prePed(PedigWithErrors) compl <- completeness(Pedig, keep=Pedig$Born %in% (2006:2007), maxd=50, by="Indiv") head(compl) #Summary statistics can be computed directly from the pedigree: Summary <- summary(Pedig, keep=Pedig$Born %in% (2006:2007)) head(Summary) hist(Summary$PCI, xlim=c(0,1), main="Pedigree Completeness") hist(Summary$Inbreeding, xlim=c(0,1), main="Inbreeding") hist(Summary$equiGen, xlim=c(0,20), main="Number of Equivalent Complete Generations") hist(Summary$fullGen, xlim=c(0,20), main="Number of Fully Traced Generations") hist(Summary$maxGen, xlim=c(0,20), main="Number of Maximum Generations Traced") compl <- completeness(Pedig, keep=Pedig$Born %in% (2006:2007), maxd=50, by="Sex") head(compl) library("ggplot2") ggplot(compl, aes(Generation, Completeness, col=Sex))+geom_line()
Calculates genetic contributions of other breeds to age cohorts
conttac(cont, cohort, use=rep(TRUE,length(cohort)), mincont=0.05, long=TRUE)
conttac(cont, cohort, use=rep(TRUE,length(cohort)), mincont=0.05, long=TRUE)
cont |
Data frame containing the genetic contributions of several ancestors or breeds to all individuals. This is typically the output of function pedBreedComp. |
cohort |
Numeric vector indicating for every individual the age cohort to which it belongs (typically year of birth). |
use |
Logical vector indicating for every individual whether it should be included in an age cohort (typically |
mincont |
Contributions of breeeds with average contribution smaller than |
long |
Should the resutling data frame be melted for easy plotting? |
The genetic contributions from other breeds to all age cohorts are computed. The genetic contribution from a breed is the fraction of genes in the gene pool originating from the respective breed.
Data frame containing the genetic contribution from every breed to every age cohort.
Robin Wellmann
data(ExamplePed) Pedig <- prePed(ExamplePed, thisBreed="Hinterwaelder", lastNative=1970) cont <- pedBreedComp(Pedig, thisBreed="Hinterwaelder") contByYear <- conttac(cont, Pedig$Born, use=Pedig$Breed=="Hinterwaelder", mincont=0.04, long=FALSE) round(contByYear,2) barplot(contByYear, ylim=c(0,1), col=1:10, ylab="genetic contribution", legend=TRUE, args.legend=list(x="bottomleft",cex=0.5))
data(ExamplePed) Pedig <- prePed(ExamplePed, thisBreed="Hinterwaelder", lastNative=1970) cont <- pedBreedComp(Pedig, thisBreed="Hinterwaelder") contByYear <- conttac(cont, Pedig$Born, use=Pedig$Breed=="Hinterwaelder", mincont=0.04, long=FALSE) round(contByYear,2) barplot(contByYear, ylim=c(0,1), col=1:10, ylab="genetic contribution", legend=TRUE, args.legend=list(x="bottomleft",cex=0.5))
This data set gives a small subset of the pedigree of Hinterwald cattle suitable for demonstration purposes.
data(ExamplePed)
data(ExamplePed)
Data frame with columns Indiv
(individual ID), Sire
, Dam
, Sex
, Breed
, Born
with year of birth, and simulated breeding value BV
.
The function combines objects computed with function haplofreq into a list with class HaploFreq
and adds some attributes.
freqlist(...)
freqlist(...)
... |
R-objects computed with function haplofreq. |
The function combines objects computed with function haplofreq into a list with class HaploFreq
.
A list with class HaploFreq
Robin Wellmann
data(map) data(Cattle) dir <- system.file("extdata", package="optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") Freq <- freqlist( haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minL=2.0), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Holstein", minL=2.0), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Fleckvieh", minL=2.0) ) #The component names are the reference breeds by default: names(Freq) plot(Freq, ID=1, hap=2, refBreed="Rotbunt") plot(Freq, ID=1, hap=2, refBreed="Holstein", Chr=1)
data(map) data(Cattle) dir <- system.file("extdata", package="optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") Freq <- freqlist( haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minL=2.0), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Holstein", minL=2.0), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Fleckvieh", minL=2.0) ) #The component names are the reference breeds by default: names(Freq) plot(Freq, ID=1, hap=2, refBreed="Rotbunt") plot(Freq, ID=1, hap=2, refBreed="Holstein", Chr=1)
Calculates the genetic contributions each individual has from specified ancestors.
genecont(Pedig, from=NULL, to=NULL)
genecont(Pedig, from=NULL, to=NULL)
Pedig |
Data frame containing the pedigree, where the first columns are |
from |
Vector with ancestors whose contributions to the individuals should be calculated. By default, the contributions from all individuals will be calculated. |
to |
Vector with individuals for which the contributions from ancestors should be calculated. By default, the contributions are calculated for all individuals. |
This function calculates genetic contributions of specified ancestors to each individual.
Lower triangular matrix with genetic contributions for each pair of individuals. Column i contains the genetic contribution of ancestor i to all individuals.
Robin Wellmann
data(ExamplePed) Pedig <- prePed(ExamplePed) cont <- genecont(Pedig) plot(Pedig$Born, cont[,"276000803611144"], pch=18, ylim=c(0,1)) Pedig["276000803611144",] #faster: cont <- genecont(Pedig, from="276000803611144") head(cont) plot(Pedig$Born, cont[,"276000803611144"], pch=18, ylim=c(0,1))
data(ExamplePed) Pedig <- prePed(ExamplePed) cont <- genecont(Pedig) plot(Pedig$Born, cont[,"276000803611144"], pch=18, ylim=c(0,1)) Pedig["276000803611144",] #faster: cont <- genecont(Pedig, from="276000803611144") head(cont) plot(Pedig$Born, cont[,"276000803611144"], pch=18, ylim=c(0,1))
For each haplotype from thisBreed
and every SNP the occurence of the haplotype segment containing the SNP in a set of reference breeds is evaluated. The maximum frequency each segment has in one of these reference breeds is computed, and the breed in which the segment has maximum frequency is identified. Results are either returned in a list or saved to files.
haplofreq(files, phen, map, thisBreed, refBreeds="others", minSNP=20, minL=1.0, unitL="Mb", ubFreq=0.01, keep=NULL, skip=NA, cskip=NA, w.dir=NA, what=c("freq", "match"), cores=1, quiet=FALSE)
haplofreq(files, phen, map, thisBreed, refBreeds="others", minSNP=20, minL=1.0, unitL="Mb", ubFreq=0.01, keep=NULL, skip=NA, cskip=NA, w.dir=NA, what=c("freq", "match"), cores=1, quiet=FALSE)
files |
Either a character vector with file names, or a list containing character vectors with file names. The files contain phased genotypes, one file for each chromosome. File names must contain the chromosome name as specified in the If
|
phen |
Data frame containing the ID (column |
map |
Data frame providing the marker map with columns including marker name |
thisBreed |
Name of a breed from column |
refBreeds |
Vector with names of breeds from column |
minSNP |
Minimum number of marker SNPs included in a segment. |
minL |
Minimum length of a segment in |
unitL |
The unit for measuring the length of a segment. Possible units are the number of marker SNPs included in the segment ( |
ubFreq |
If a haplotype segment has frequency smaller than |
keep |
Subset of the IDs of the individuals from data frame |
skip |
Take line |
cskip |
Take column |
w.dir |
Output file directory. Writing results to files has the advantage that much less working memory is required. By default, no files are created. The function returns only the file names if files are created. |
what |
For |
cores |
Number of cores to be used for parallel processing of chromosomes. By default one core is used. For |
quiet |
Should console output be suppressed? |
For each haplotype from thisBreed
and every SNP the occurence of the haplotype segment containing the SNP in a set of reference breeds is evaluated. The maximum frequency each segment has in one of these reference breeds is computed, and the breed in which the segment has maximum frequency is identified. Results are either returned in a list or saved to files.
Marker file format: Each marker file containing phased genotypes has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. The first cskip
columns are ignored. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces.
If w.dir=NA
then a list is returned. The list may have the following components:
freq |
Mx(2N) - matrix containing for every SNP and for each of the 2N haplotypes from |
match |
Mx(2N) - matrix containing for every SNP and for each of the 2N haplotypes from |
The list has attributes thisBreed
, and map
.
If w.dir
is the name of a directory, then results are written to files, whereby each file corresponds to one chromosome, and a data frame with file names is returned.
Robin Wellmann
data(map) data(Cattle) dir <- system.file("extdata", package="optiSel") files <- file.path(dir, paste("Chr", 1:2, ".phased", sep="")) Freq <- freqlist( haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minL=2.0), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Holstein", minL=2.0), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Fleckvieh", minL=2.0) ) plot(Freq, ID=1, hap=2, refBreed="Rotbunt") plot(Freq, ID=1, hap=2, refBreed="Holstein", Chr=1) ## Test for using multiple cores: Freq1 <- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minL=2.0, cores=NA)$freq range(Freq[[1]]-Freq1) #[1] 0 0 ## Creating output files with allele frequencies and allele origins: rdir <- system.file("extdata", package = "optiSel") wdir <- file.path(tempdir(), "HaplotypeEval") chr <- unique(map$Chr) files <- file.path(rdir, paste("Chr", chr, ".phased", sep="")) wfile <- haplofreq(files, Cattle, map, thisBreed="Angler", minL=2.0, w.dir=wdir) View(read.table(wfile$match[1],skip=1)) #unlink(wdir, recursive = TRUE)
data(map) data(Cattle) dir <- system.file("extdata", package="optiSel") files <- file.path(dir, paste("Chr", 1:2, ".phased", sep="")) Freq <- freqlist( haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minL=2.0), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Holstein", minL=2.0), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Fleckvieh", minL=2.0) ) plot(Freq, ID=1, hap=2, refBreed="Rotbunt") plot(Freq, ID=1, hap=2, refBreed="Holstein", Chr=1) ## Test for using multiple cores: Freq1 <- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minL=2.0, cores=NA)$freq range(Freq[[1]]-Freq1) #[1] 0 0 ## Creating output files with allele frequencies and allele origins: rdir <- system.file("extdata", package = "optiSel") wdir <- file.path(tempdir(), "HaplotypeEval") chr <- unique(map$Chr) files <- file.path(rdir, paste("Chr", chr, ".phased", sep="")) wfile <- haplofreq(files, Cattle, map, thisBreed="Angler", minL=2.0, w.dir=wdir) View(read.table(wfile$match[1],skip=1)) #unlink(wdir, recursive = TRUE)
Calculates the the Pedigree-based Additive Relationship Matrix. This is twice the pedigree based kinship matrix.
makeA(Pedig, keep.only=NULL, keep=keep.only, AFounder=NULL)
makeA(Pedig, keep.only=NULL, keep=keep.only, AFounder=NULL)
Pedig |
Data frame containing the Pedigree. The data frame has columns (1) Individual, (2) Sire, (3) Dam. Missing parents are coded as NA. Both parents must either be missing or present. If this is not the case use prePed. |
keep |
If |
keep.only |
If |
AFounder |
Additive relationship matrix of the founders. The row names are the ids of the founders. By default, founders are assumed to be unrelated. Founders not included in this matrix are also assumed to be unrelated. |
Computation of pedigree based additive relationship matrix A which is twice the kinship matrix. For individuals i and j it is defined as
Aij = 2*(Probability that two alleles chosen from individuals i and j are IBD). |
Additive relationship matrix.
Robin Wellmann
data(PedigWithErrors) data(Phen) Pedig <- prePed(PedigWithErrors) keep <- Pedig$Indiv[summary(Pedig)$equiGen>5 & Pedig$Indiv %in% Phen$Indiv] A <- makeA(Pedig, keep.only=keep) A[1:3,1:3]
data(PedigWithErrors) data(Phen) Pedig <- prePed(PedigWithErrors) keep <- Pedig$Indiv[summary(Pedig)$equiGen>5 & Pedig$Indiv %in% Phen$Indiv] A <- makeA(Pedig, keep.only=keep) A[1:3,1:3]
Marker map for SNPs from cattle chromosomes 1 - 2 (only the first parts of the chromosomes). The corresponding genotypes are included in Chr1.phased and Chr2.phased.
data(map)
data(map)
Data frame containing the marker map including marker name (Name
), chromosome number (Chr
), position in base pairs (Position
), position in centiMorgan (cM
), and position in mega base pairs (Mb
).
Males and females are allocated for mating such that all breeding animals have the desired number of matings. The mean inbreeding coefficient in the offspring is minimized if matrix Kin
contains pairwise kinships of the selection candidates.
matings(phen, Kin, alpha=1, ub.n=NA, max=FALSE, solver="default", ...)
matings(phen, Kin, alpha=1, ub.n=NA, max=FALSE, solver="default", ...)
phen |
Data frame with desired number of matings (column |
Kin |
Kinship matrix (or an other similarity matrix) for selection candidates. |
alpha |
If |
ub.n |
Maximum number of matings of the same individuals. Without this constraint (i.e. |
max |
The default |
solver |
Either |
... |
Further optimization parameters. By default, they are passed to function ecos.control |
.
Males and females are allocated for mating such that all breeding animals have the desired number of matings. If Kin
is a kinship matrix, then the mean inbreeding coefficient in the offspring is minimized. In general, the mean similarity of the parents is minimized.
The maximum number of matings of the same individuals can be constrained. For each herd, the proportion alpha
of matings with the same male can be constrained as well, but this increases computation time.
Data frame with columns Sire
, Dam
, n
, and possibly herd
, whereby column n
contains the desired number of matings, and column herd
contains the herd of the dam.
The data frame has attributes objval
with the value of the objective function (usually the mean inbreeding coefficient), and attribute info
describing the solution as reported by the solver.
Robin Wellmann
data("map") data("Cattle") dir <- system.file("extdata", package = "optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") sKin <- segIBD(files, map, minSNP=20, minL=2.0) Phen <- Cattle[Cattle$Breed=="Angler", ] cont <- data.frame( age = c( 1, 2, 3, 4, 5, 6), male = c(0.11, 0.11, 0.10, 0.08, 0.06, 0.04), female= c(0.11, 0.11, 0.10, 0.08, 0.06, 0.04)) cand <- candes(phen=Phen, sKin = sKin, cont=cont) con <- list(uniform="female", ub.sKin = 0.047) Offspring <- opticont("max.BV", cand, con, trace=FALSE) ##### Minimize inbreeding ##### Candidate <- Offspring$parent Candidate$n <- noffspring(Candidate, N=20)$nOff Mating <- matings(Candidate, sKin) Mating attributes(Mating)$objval library("Rsymphony") Mating <- matings(Candidate, sKin, alpha=0.30, solver=Rsymphony_solve_LP) Mating attributes(Mating)$objval attributes(Mating)$info #[1] "Optimum solution found"
data("map") data("Cattle") dir <- system.file("extdata", package = "optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") sKin <- segIBD(files, map, minSNP=20, minL=2.0) Phen <- Cattle[Cattle$Breed=="Angler", ] cont <- data.frame( age = c( 1, 2, 3, 4, 5, 6), male = c(0.11, 0.11, 0.10, 0.08, 0.06, 0.04), female= c(0.11, 0.11, 0.10, 0.08, 0.06, 0.04)) cand <- candes(phen=Phen, sKin = sKin, cont=cont) con <- list(uniform="female", ub.sKin = 0.047) Offspring <- opticont("max.BV", cand, con, trace=FALSE) ##### Minimize inbreeding ##### Candidate <- Offspring$parent Candidate$n <- noffspring(Candidate, N=20)$nOff Mating <- matings(Candidate, sKin) Mating attributes(Mating)$objval library("Rsymphony") Mating <- matings(Candidate, sKin, alpha=0.30, solver=Rsymphony_solve_LP) Mating attributes(Mating)$objval attributes(Mating)$info #[1] "Optimum solution found"
Calculates the optimum numbers of offspring from optimum contributions of selection candidates.
noffspring(cand, N, random=TRUE)
noffspring(cand, N, random=TRUE)
cand |
Data frame with optimum contributions (column |
N |
Desired number of individuals in the offspring population. |
random |
Logical. If |
The function calculates the optimum numbers of offspring of the selection candidates from the optimum contributions cand$oc
and the size N
of the offspring population.
Data frame with column Indiv
containing the individual IDs and column nOff
containing the optimum numbers of offspring.
Column nOff
is approximately 2*N*cand$oc
with sum(noff[cand$Sex=="male"])=N
and sum(noff[cand$Sex=="female"])=N
.
Robin Wellmann
set.seed(1) data(PedigWithErrors) Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder") use <- Pedig$Born %in% (1998:2008) & Pedig$Breed=="Hinterwaelder" Population <- sampleIndiv(Pedig[use, ], each=50) pKin <- pedIBD(Pedig, keep.only=Population) Phen <- Pedig[Population, ] Phen$isCandidate <- Phen$Born %in% (2003:2008) cont <- agecont(Pedig, Population) cand <- candes(phen=Phen, fA=pedIBD(Pedig, keep.only=Phen$Indiv), cont=cont) con <- list(ub.fA=0.0175, uniform="female") Offspring <- opticont("max.BV", cand, con, trace = FALSE) N <- 250 Candidate <- Offspring$parent Candidate$nOff <- noffspring(Candidate, N)$nOff sum(Candidate$nOff[Candidate$Sex=="male"]) #[1] 250 sum(Candidate$nOff[Candidate$Sex=="female"]) #[1] 250 round(2*N*Candidate$oc-Candidate$nOff, 2)
set.seed(1) data(PedigWithErrors) Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder") use <- Pedig$Born %in% (1998:2008) & Pedig$Breed=="Hinterwaelder" Population <- sampleIndiv(Pedig[use, ], each=50) pKin <- pedIBD(Pedig, keep.only=Population) Phen <- Pedig[Population, ] Phen$isCandidate <- Phen$Born %in% (2003:2008) cont <- agecont(Pedig, Population) cand <- candes(phen=Phen, fA=pedIBD(Pedig, keep.only=Phen$Indiv), cont=cont) con <- list(ub.fA=0.0175, uniform="female") Offspring <- opticont("max.BV", cand, con, trace = FALSE) N <- 250 Candidate <- Offspring$parent Candidate$nOff <- noffspring(Candidate, N)$nOff sum(Candidate$nOff[Candidate$Sex=="male"]) #[1] 250 sum(Candidate$nOff[Candidate$Sex=="female"]) #[1] 250 round(2*N*Candidate$oc-Candidate$nOff, 2)
Calculates optimum contributions of breeds to a hypothetical multi-breed population with maximum diversity. Additionally the average kinship within and between breeds and the genetic distances between breeds are computed.
opticomp(f, phen, obj.fun="NGD", lb=NULL, ub=NULL, ...)
opticomp(f, phen, obj.fun="NGD", lb=NULL, ub=NULL, ...)
f |
Kinship matrix (e.g. a segment based kinship matrix). |
phen |
Data frame with column |
obj.fun |
The objective function to be maximized. For For |
lb |
Named vector providing lower bounds for the contributions of the breeds can be provided. The names of the components are the breed names. The default |
ub |
Named vector providing upper bounds for the contributions of the breeds can be provided. The names of the components are the breed names. The default |
... |
Further parameters passed to the solver solve.QP of R package |
Calculates optimum contributions of breeds to a hypothetical multi-breed population with maximum diversity. Additionally the average kinship within and between breeds and the genetic distances between breeds are computed.
A list with the following components:
bc |
Vector with optimum contributions of breeds to a hypothetical multi-breed population with maximum diversity |
value |
The value of the objective function, i.e. the maximum diversity that can be achieved. |
f |
Matrix containing the mean kinships within and between breeds. |
Dist |
Genetic distances between breeds. |
Robin Wellmann
Wellmann, R., Bennewitz, J., Meuwissen, T.H.E. (2014) A unified approach to characterize and conserve adaptive and neutral genetic diversity in subdivided populations. Genetics Selection Evolution. 69, e16
library(optiSel) data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") ##################################################################### # Find the optimum breed composition using segment based kinship # ##################################################################### IBD <- segIBD(files, minSNP=20, map=map, minL=2.0) mb <- opticomp(IBD, Cattle, obj.fun="NGD") #### Optimum breed composition: ### round(mb$bc,3) # Angler Fleckvieh Holstein Rotbunt # 0.469 0.444 0.041 0.046 #### Average kinships within and between breeds: ### round(mb$f,4) # Angler Fleckvieh Holstein Rotbunt #Angler 0.0523 0.0032 0.0414 0.0417 #Fleckvieh 0.0032 0.0625 0.0036 0.0032 #Holstein 0.0414 0.0036 0.1074 0.0894 #Rotbunt 0.0417 0.0032 0.0894 0.1057 #### Genetic distances between breeds: ### round(mb$Dist,4) # Angler Fleckvieh Holstein Rotbunt #Angler 0.0000 0.2329 0.1960 0.1930 #Fleckvieh 0.2329 0.0000 0.2853 0.2844 #Holstein 0.1960 0.2853 0.0000 0.1309 #Rotbunt 0.1930 0.2844 0.1309 0.0000 ##################################################################### # The optimum breed composition depends on the kinship matrix # # and the objective function: # ##################################################################### bc <- opticomp(IBD, Cattle, obj.fun="NTD")$bc round(bc,3) # Angler Fleckvieh Holstein Rotbunt # 0.264 0.447 0.148 0.141
library(optiSel) data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") ##################################################################### # Find the optimum breed composition using segment based kinship # ##################################################################### IBD <- segIBD(files, minSNP=20, map=map, minL=2.0) mb <- opticomp(IBD, Cattle, obj.fun="NGD") #### Optimum breed composition: ### round(mb$bc,3) # Angler Fleckvieh Holstein Rotbunt # 0.469 0.444 0.041 0.046 #### Average kinships within and between breeds: ### round(mb$f,4) # Angler Fleckvieh Holstein Rotbunt #Angler 0.0523 0.0032 0.0414 0.0417 #Fleckvieh 0.0032 0.0625 0.0036 0.0032 #Holstein 0.0414 0.0036 0.1074 0.0894 #Rotbunt 0.0417 0.0032 0.0894 0.1057 #### Genetic distances between breeds: ### round(mb$Dist,4) # Angler Fleckvieh Holstein Rotbunt #Angler 0.0000 0.2329 0.1960 0.1930 #Fleckvieh 0.2329 0.0000 0.2853 0.2844 #Holstein 0.1960 0.2853 0.0000 0.1309 #Rotbunt 0.1930 0.2844 0.1309 0.0000 ##################################################################### # The optimum breed composition depends on the kinship matrix # # and the objective function: # ##################################################################### bc <- opticomp(IBD, Cattle, obj.fun="NTD")$bc round(bc,3) # Angler Fleckvieh Holstein Rotbunt # 0.264 0.447 0.148 0.141
The optimum contributions of selection candidates to the offspring are calculated. The optimization procedure can take into account conflicting breeding goals, which are to achieve genetic gain, to reduce the rate of inbreeding, and to recover the original genetic background of a breed.
It can be used for overlapping as well as for non-overlapping generations. In the case of overlapping generations, average values of the parameters for the population in the next year will be optimized, whereas for non-overlapping generations, average values of the parameters in the next generation will be optimized. Below, the "next evaluation time" means the next year for populations with overlapping generations, but the next generation for populations with non-overlapping generations.
Optimization can be done for several breeds or breeding lines simultaneously, which is adviseable if the aim is to increase diversity or genetic distance between them.
opticont(method, cand, con, bc=NULL, solver="default", quiet=FALSE, make.definite=FALSE, ...)
opticont(method, cand, con, bc=NULL, solver="default", quiet=FALSE, make.definite=FALSE, ...)
method |
Character string |
cand |
An R-Object containing all information describing the individuals (phenotypes and kinships). These indivdiuals are a sample from the population that includes the selection candidates. It can be created with function candes. This object also defines whether generations are overlapping or non-overlapping. * If the aim is to increase genetic distance between breeds, then samples from several breeds are needed. * If column |
con |
List defining threshold values for constraints. The components are described in the Details section. If one is missing, then the respective constraint is not applied. Permitted constraint names are reported by function candes. |
bc |
Named numeric vector with breed contributions, which is only needed if |
solver |
Name of the solver used for optimization. Available solvers are |
quiet |
If |
make.definite |
Logical variable indicating whether non-positive-semidefinite matrices should be approximated by positive-definite matrices. This is always done for solvers that are known not to convergue otherwise. |
... |
Tuning parameters of the solver. The available parameters depend on the solver and will be printed when function |
The optimum contributions of selection candidates to the offspring are calculated. The proportion of offspring that should have a particular selection candidate as parent is twice its optimum contribution.
Constraints
Argument con
is a list defining the constraints. Permitted names for the components are displayed by function candes. Their meaning is as follows:
uniform: Character vector specifying the breeds or sexes for which the contributions are not to be optimized. Within each of these groups it is assumed that all individuals have equal (uniform) contributions. Character string "BREED.female"
means that all females from breed BREED
have equal contributions and thus equal numbers of offspring. Column 'isCandidate' of cand$phen
is ignored for these individuals.
lb: Named numeric vector containing lower bounds for the contributions of the selection candidates. The component names are their IDs. By default the lower bound is 0 for all individuals.
ub: Named numeric vector containing upper bounds for the contributions of the selection candidates. Their component names are the IDs. By default no upper bound is specified.
ub.VAR: Upper bound for the expected mean value of kinship or trait VAR in the population at the next evaluation time. Upper bounds for an arbitrary number of different kinships and traits may be provided. If data frame cand$phen
contains individuals from several breeds, the bound refers to the mean value of the kinship or trait in the multi-breed population.
ub.VAR.BREED: Upper bound for the expected mean value of kinship or trait VAR in the breed BREED at the next evaluation time. Upper bounds for an arbitrary number of different kinships and traits may be provided.
Note that VAR must be replaced by the name of the variable and BREED by the name of the breed. For traits, lower bounds can be defined as lb.VAR or lb.VAR.BREED. Equality constraints can be defined as eq.VAR or eq.VAR.BREED.
Application to multi-breed data
Optimization can be done for several breeds or breeding lines simultaneously, which is adviseable if the aim is to increase genetic diversity in a multi-breed population, or to increase the genetic distances between breeds or breeding lines. However, for computing the kinship of individuals from different breeds, marker data is needed.
The multi-breed population referred above is a hypothetical subdivided population consisting of purebred animals from the breeds included in column Breed
of cand$phen
. The proportion of individuals from a given breed in this population is its breed contribution specified in argument bc
. It is not the proportion of individuals of this breed in data frame cand$phen
.
The aim is to minimize or to constrain the average genomic kinship in this multi-breed population. This causes the genetic distance between the breeds to increase, and thus may increase the conservation value of the breeds, or the heterosis effects in crossbred animals.
Remark
If the function does not provide a valid result due to numerical problems then try to use another solver, use other optimization parameters, define upper or lower bounds instead of equality constraints, or relax the constraints to ensure that the optimization problem is solvable.
A list with the following components
parent |
Data frame |
info |
Data frame with component |
mean |
Data frame containing the expected mean value of each kinship and trait in the population at the next evaluation time. |
bc |
Data frame with breed contributions in the hypothetical multi-breed population used for computing the average kinship across breeds. |
obj.fun |
Named numeric value with value and name of the objective function. |
summary |
Data frame containing one row for each constraint with the value of the constraint in column |
Robin Wellmann
Wellmann, R. (2018). Optimum Contribution Selection and Mate Allocation for Breeding: The R Package optiSel. submitted
## For other objective functions and constraints see the vignettes ###################################################### # Example 1: Advanced OCS with overlapping # # generations using pedigree data # # - maximize genetic gain (BV) # # - restrict increase of mean kinship (pKin) # # - restrict increase of native kinship (pKinatN)# # - avoid decrease of native contribution (NC) # ###################################################### ### Define object cand containing all required ### information on the individuals data(PedigWithErrors) Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder", lastNative=1970, keep=PedigWithErrors$Born%in%1992) Pedig$NC <- pedBreedComp(Pedig, thisBreed="Hinterwaelder")$native use <- Pedig$Born %in% (1980:1990) & Pedig$Breed=="Hinterwaelder" use <- use & summary(Pedig)$equiGen>=3 cont <- agecont(Pedig, use, maxAge=10) Phen <- Pedig[use, ] pKin <- pedIBD(Pedig, keep.only=Phen$Indiv) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=Phen$Indiv) Phen$isCandidate <- Phen$Born < 1990 cand <- candes(phen=Phen, pKin=pKin, pKinatN=pKinatN, cont=cont) ### Mean values of the parameters in the population: cand$mean # BV NC pKin pKinatN #1 -0.5648208 0.5763161 0.02305245 0.0469267 ### Define constraints for OCS ### Ne: Effective population size ### L: Generation interval Ne <- 100 L <- 1/(4*cont$male[1]) + 1/(4*cont$female[1]) con <- list(uniform = "female", ub.pKin = 1-(1-cand$mean$pKin)*(1-1/(2*Ne))^(1/L), ub.pKinatN = 1-(1-cand$mean$pKinatN)*(1-1/(2*Ne))^(1/L), lb.NC = cand$mean$NC) ### Solve the optimization problem Offspring <- opticont("max.BV", cand, con, trace=FALSE) ### Expected average values of traits and kinships ### in the population now and at the next evaluation time rbind(cand$mean, Offspring$mean) # BV NC pKin pKinatN #1 -0.5648208 0.5763161 0.02305245 0.04692670 #2 -0.4972679 0.5763177 0.02342014 0.04790944 ### Data frame with optimum contributions Candidate <- Offspring$parent Candidate[Candidate$oc>0.01, c("Indiv", "Sex", "BV", "NC", "lb", "oc", "ub")] ###################################################### # Example 2: Advanced OCS with overlapping # # generations using genotype data # # - minimize mean kinship (sKin) # # - restrict increase of native kinship (sKinatN)# # - avoid decrease of breeding values (BV) # # - cause increase of native contribution (NC) # ###################################################### ### Prepare genotype data data(map) data(Cattle) ### Compute genomic kinship and genomic kinship at native segments dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", 1:2, ".phased", sep="")) sKin <- segIBD(files, map, minL=1.0) sKinatN <- segIBDatN(files, Cattle, map, thisBreed="Angler", minL=1.0) ### Compute migrant contributions of selection candidates Haplo <- haplofreq(files, Cattle, map, thisBreed="Angler", minL=1.0, what="match") Comp <- segBreedComp(Haplo$match, map) Cattle[Comp$Indiv, "NC"] <- Comp$native Phen <- Cattle[Cattle$Breed=="Angler",] cand <- candes(phen=Phen, sKin=sKin, sKinatN=sKinatN, cont=cont) ### Define constraints for OCS ### Ne: Effective population size ### L: Generation interval Ne <- 100 L <- 4.7 con <- list(uniform = "female", ub.sKinatN = 1-(1-cand$mean$sKinatN)*(1-1/(2*Ne))^(1/L), lb.NC = 1.03*cand$mean$NC, lb.BV = cand$mean$BV) # Compute optimum contributions; the objective is to minimize mean kinship Offspring <- opticont("min.sKin", cand, con=con) # Check if the optimization problem is solved Offspring$info # Average values of traits and kinships rbind(cand$mean, Offspring$mean) # BV NC sKin sKinatN #1 -0.07658022 0.4117947 0.05506277 0.07783431 #2 -0.07657951 0.4308061 0.04830328 0.06395410 # Value of the objective function Offspring$obj.fun # sKin #0.04830328 ### Data frame with optimum contributions Candidate <- Offspring$parent Candidate[Candidate$oc>0.01, c("Indiv", "Sex", "BV", "NC", "lb", "oc", "ub")] ####################################################### # Example 3: Advanced OCS with overlapping # # generations using genotype data # # for multiple breeds or beeding lines # # - Maximize breeding values in all breeds # # - restrict increase of kinships within each breed # # - reduce average kinship across breeds # # - restrict increase of native kinship in Angler # # - cause increase of native contribution in Angler # # by optimizing contributions of males from all breeds# ####################################################### cand <- candes(phen=Cattle, sKin=sKin, sKinatN.Angler=sKinatN, cont=cont) L <- 5 Ne <- 100 con <- list(uniform = "female", ub.sKin = cand$mean$sKin - 0.01/L, ub.sKin.Angler = 1-(1-cand$mean$sKin.Angler)*(1-1/(2*Ne))^(1/L), ub.sKin.Holstein = 1-(1-cand$mean$sKin.Holstein)*(1-1/(2*Ne))^(1/L), ub.sKin.Rotbunt = 1-(1-cand$mean$sKin.Rotbunt)*(1-1/(2*Ne))^(1/L), ub.sKin.Fleckvieh= 1-(1-cand$mean$sKin.Fleckvieh)*(1-1/(2*Ne))^(1/L), ub.sKinatN.Angler= 1-(1-cand$mean$sKinatN.Angler)*(1-1/(2*Ne))^(1/L), lb.NC = cand$mean$NC + 0.05/L) Offspring <- opticont("max.BV", cand, con, trace=FALSE, solver="slsqp") Offspring$mean Candidate <- Offspring$parent[Offspring$parent$Sex=="male", ] Candidate[Candidate$oc>0.01, c("Indiv", "Sex", "BV", "NC", "lb", "oc", "ub")]
## For other objective functions and constraints see the vignettes ###################################################### # Example 1: Advanced OCS with overlapping # # generations using pedigree data # # - maximize genetic gain (BV) # # - restrict increase of mean kinship (pKin) # # - restrict increase of native kinship (pKinatN)# # - avoid decrease of native contribution (NC) # ###################################################### ### Define object cand containing all required ### information on the individuals data(PedigWithErrors) Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder", lastNative=1970, keep=PedigWithErrors$Born%in%1992) Pedig$NC <- pedBreedComp(Pedig, thisBreed="Hinterwaelder")$native use <- Pedig$Born %in% (1980:1990) & Pedig$Breed=="Hinterwaelder" use <- use & summary(Pedig)$equiGen>=3 cont <- agecont(Pedig, use, maxAge=10) Phen <- Pedig[use, ] pKin <- pedIBD(Pedig, keep.only=Phen$Indiv) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=Phen$Indiv) Phen$isCandidate <- Phen$Born < 1990 cand <- candes(phen=Phen, pKin=pKin, pKinatN=pKinatN, cont=cont) ### Mean values of the parameters in the population: cand$mean # BV NC pKin pKinatN #1 -0.5648208 0.5763161 0.02305245 0.0469267 ### Define constraints for OCS ### Ne: Effective population size ### L: Generation interval Ne <- 100 L <- 1/(4*cont$male[1]) + 1/(4*cont$female[1]) con <- list(uniform = "female", ub.pKin = 1-(1-cand$mean$pKin)*(1-1/(2*Ne))^(1/L), ub.pKinatN = 1-(1-cand$mean$pKinatN)*(1-1/(2*Ne))^(1/L), lb.NC = cand$mean$NC) ### Solve the optimization problem Offspring <- opticont("max.BV", cand, con, trace=FALSE) ### Expected average values of traits and kinships ### in the population now and at the next evaluation time rbind(cand$mean, Offspring$mean) # BV NC pKin pKinatN #1 -0.5648208 0.5763161 0.02305245 0.04692670 #2 -0.4972679 0.5763177 0.02342014 0.04790944 ### Data frame with optimum contributions Candidate <- Offspring$parent Candidate[Candidate$oc>0.01, c("Indiv", "Sex", "BV", "NC", "lb", "oc", "ub")] ###################################################### # Example 2: Advanced OCS with overlapping # # generations using genotype data # # - minimize mean kinship (sKin) # # - restrict increase of native kinship (sKinatN)# # - avoid decrease of breeding values (BV) # # - cause increase of native contribution (NC) # ###################################################### ### Prepare genotype data data(map) data(Cattle) ### Compute genomic kinship and genomic kinship at native segments dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", 1:2, ".phased", sep="")) sKin <- segIBD(files, map, minL=1.0) sKinatN <- segIBDatN(files, Cattle, map, thisBreed="Angler", minL=1.0) ### Compute migrant contributions of selection candidates Haplo <- haplofreq(files, Cattle, map, thisBreed="Angler", minL=1.0, what="match") Comp <- segBreedComp(Haplo$match, map) Cattle[Comp$Indiv, "NC"] <- Comp$native Phen <- Cattle[Cattle$Breed=="Angler",] cand <- candes(phen=Phen, sKin=sKin, sKinatN=sKinatN, cont=cont) ### Define constraints for OCS ### Ne: Effective population size ### L: Generation interval Ne <- 100 L <- 4.7 con <- list(uniform = "female", ub.sKinatN = 1-(1-cand$mean$sKinatN)*(1-1/(2*Ne))^(1/L), lb.NC = 1.03*cand$mean$NC, lb.BV = cand$mean$BV) # Compute optimum contributions; the objective is to minimize mean kinship Offspring <- opticont("min.sKin", cand, con=con) # Check if the optimization problem is solved Offspring$info # Average values of traits and kinships rbind(cand$mean, Offspring$mean) # BV NC sKin sKinatN #1 -0.07658022 0.4117947 0.05506277 0.07783431 #2 -0.07657951 0.4308061 0.04830328 0.06395410 # Value of the objective function Offspring$obj.fun # sKin #0.04830328 ### Data frame with optimum contributions Candidate <- Offspring$parent Candidate[Candidate$oc>0.01, c("Indiv", "Sex", "BV", "NC", "lb", "oc", "ub")] ####################################################### # Example 3: Advanced OCS with overlapping # # generations using genotype data # # for multiple breeds or beeding lines # # - Maximize breeding values in all breeds # # - restrict increase of kinships within each breed # # - reduce average kinship across breeds # # - restrict increase of native kinship in Angler # # - cause increase of native contribution in Angler # # by optimizing contributions of males from all breeds# ####################################################### cand <- candes(phen=Cattle, sKin=sKin, sKinatN.Angler=sKinatN, cont=cont) L <- 5 Ne <- 100 con <- list(uniform = "female", ub.sKin = cand$mean$sKin - 0.01/L, ub.sKin.Angler = 1-(1-cand$mean$sKin.Angler)*(1-1/(2*Ne))^(1/L), ub.sKin.Holstein = 1-(1-cand$mean$sKin.Holstein)*(1-1/(2*Ne))^(1/L), ub.sKin.Rotbunt = 1-(1-cand$mean$sKin.Rotbunt)*(1-1/(2*Ne))^(1/L), ub.sKin.Fleckvieh= 1-(1-cand$mean$sKin.Fleckvieh)*(1-1/(2*Ne))^(1/L), ub.sKinatN.Angler= 1-(1-cand$mean$sKinatN.Angler)*(1-1/(2*Ne))^(1/L), lb.NC = cand$mean$NC + 0.05/L) Offspring <- opticont("max.BV", cand, con, trace=FALSE, solver="slsqp") Offspring$mean Candidate <- Offspring$parent[Offspring$parent$Sex=="male", ] Candidate[Candidate$oc>0.01, c("Indiv", "Sex", "BV", "NC", "lb", "oc", "ub")]
Computes for every individual the genetic contribution from native founders and from other breeds according to the pedigree.
pedBreedComp(Pedig, thisBreed)
pedBreedComp(Pedig, thisBreed)
Pedig |
Data frame containing the pedigree with the first 3 columns being |
thisBreed |
Name of the breed of interest as denoted in column |
For every individual the genetic contribution from native founders and from other breeds is computed. It is the fraction of genes that originate from the respective breed.
Data frame with one row for each individual and the following columns
Indiv |
IDs of the individuals |
native |
Native Contribution: The genetic contribution from native founders. |
... |
Genetic contributions from other breeds, one column for each breed. The columns are ordered, so that the most influential breeds come first. |
Robin Wellmann
data(ExamplePed) Pedig <- prePed(ExamplePed, thisBreed="Hinterwaelder", lastNative=1970) cont <- pedBreedComp(Pedig, thisBreed="Hinterwaelder") cont[1000:1010,2:5] contByYear <- conttac(cont, Pedig$Born, use=Pedig$Breed=="Hinterwaelder", mincont=0.04, long=FALSE) round(contByYear,2) barplot(contByYear,ylim=c(0,1), col=1:10, ylab="genetic contribution", legend=TRUE, args.legend=list(x="bottomleft",cex=0.6))
data(ExamplePed) Pedig <- prePed(ExamplePed, thisBreed="Hinterwaelder", lastNative=1970) cont <- pedBreedComp(Pedig, thisBreed="Hinterwaelder") cont[1000:1010,2:5] contByYear <- conttac(cont, Pedig$Born, use=Pedig$Breed=="Hinterwaelder", mincont=0.04, long=FALSE) round(contByYear,2) barplot(contByYear,ylim=c(0,1), col=1:10, ylab="genetic contribution", legend=TRUE, args.legend=list(x="bottomleft",cex=0.6))
Calculates the pedigree based probability of alleles to be IBD. This pedigree based kinship matrix is also called coancestry matrix and is half the additive relationship matrix.
pedIBD(Pedig, keep.only=NULL, keep=keep.only, kinFounder=NULL)
pedIBD(Pedig, keep.only=NULL, keep=keep.only, kinFounder=NULL)
Pedig |
Data frame containing the pedigree with |
keep |
If |
keep.only |
If |
kinFounder |
Kinship matrix for the founders. The row names are the ids of the founders. By default, founders are assumed to be unrelated. Founders not included in this matrix are also assumed to be unrelated. |
Computation of pedigree based kinship matrix f which is half the additive relationship matrix. For individuals i and j it is defined as
fij = Probability that two alleles chosen from individuals i and j are IBD. |
Kinship matrix.
Robin Wellmann
data(PedigWithErrors) data(Phen) keep <- Phen$Indiv Pedig <- prePed(PedigWithErrors, keep=keep, thisBreed="Hinterwaelder", lastNative=1970) pedA <- pedIBD(Pedig, keep.only=keep)
data(PedigWithErrors) data(Phen) keep <- Phen$Indiv Pedig <- prePed(PedigWithErrors, keep=keep, thisBreed="Hinterwaelder", lastNative=1970) pedA <- pedIBD(Pedig, keep.only=keep)
Calculates the kinship at native alleles, which is the pedigree based probability of native alleles to be IBD.
pedIBDatN(Pedig, thisBreed=NA, keep.only=NULL, keep=keep.only, nGen=NA, quiet=FALSE)
pedIBDatN(Pedig, thisBreed=NA, keep.only=NULL, keep=keep.only, nGen=NA, quiet=FALSE)
Pedig |
Data frame containing the pedigree with |
thisBreed |
Name of the breed for which the kinships are to be computed. |
keep |
If |
keep.only |
If |
nGen |
Number of generations taken into account for estimating the native effective size. The default means that the native effective size is not estimated, which requires less memory. |
quiet |
Should console output be suppressed? |
Calculates a list containing matrices needed to compute pedigree based kinships at native alleles, defined as the conditional probability that two randomly chosen alleles are IBD, given that both originate from native founders.
A native founder is an individual with unkown parents belonging to thisBreed
.
The kinship at native alleles between individuals i and j is Q1[i,j]/Q2[i,j]
.
The mean kinship at native alleles in the offspring is (x'Q1x+d1)/(x'Q2x+d2)
, where x
is the vector with genetic contributions of the selection candidates.
The native effective size is estimated from nGen
generations only if nGen
is not NA
.
A list of class ratioFun
including components:
Q1 |
matrix with |
Q2 |
matrix with |
d1 |
The value by which the probability that two alleles chosen from the offspring are IBD and native increases due to genetic drift. |
d2 |
The value by which the probability that two alleles chosen from the offspring are native increases due to genetic drift. |
id |
IDs of the individuals for which the probabilites have been computed. |
mean |
Mean kinship at native alleles of the individuals specified in argument |
Robin Wellmann
data(PedigWithErrors) data(Phen) keep <- Phen$Indiv Pedig <- prePed(PedigWithErrors, keep=keep, thisBreed="Hinterwaelder", lastNative=1970) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=keep, nGen=6) #Number of Migrant Founders: 237 #Number of Native Founders: 150 #Individuals in Pedigree : 1658 #Native effective size : 49.5 ## Mean kinship at native segments: pKinatN$mean #[1] 0.0776925 ## Note that this can not be computed as mean(pKinatN$of). ## Results for individuals: pKinatN$of <- pKinatN$Q1/pKinatN$Q2 pKinatN$of["276000812497583","276000812496823"] #[1] 0.05941229
data(PedigWithErrors) data(Phen) keep <- Phen$Indiv Pedig <- prePed(PedigWithErrors, keep=keep, thisBreed="Hinterwaelder", lastNative=1970) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=keep, nGen=6) #Number of Migrant Founders: 237 #Number of Native Founders: 150 #Individuals in Pedigree : 1658 #Native effective size : 49.5 ## Mean kinship at native segments: pKinatN$mean #[1] 0.0776925 ## Note that this can not be computed as mean(pKinatN$of). ## Results for individuals: pKinatN$of <- pKinatN$Q1/pKinatN$Q2 pKinatN$of["276000812497583","276000812496823"] #[1] 0.05941229
Calculates the pedigree based probability of alleles to be IBD (identical by descent) or Migrant alleles: For each pair of individuals the probability is computed that two alleles taken at random are IBD or are migrant alleles.
pedIBDorM(Pedig, thisBreed=NA, keep.only=NULL, keep=keep.only)
pedIBDorM(Pedig, thisBreed=NA, keep.only=NULL, keep=keep.only)
Pedig |
Data frame containing the Pedigree. The data frame has columns (1) Individual, (2) Sire, (3) Dam, (4) Sex, and (5) Breed. Missing parents are coded as NA. Both parents must either be missing or present. If this is not the case use prePed. |
thisBreed |
Name of the breed in column (5) of the pedigree for which the kinships are to be computed. |
keep |
If |
keep.only |
If |
Computation of modified pedigree based kinship matrices taking allele origin into account.
A native founder is an individual with unkown parents belonging to thisBreed
. A migrant is an individual with unkown parents not belonging to thisBreed
.
A list with the following components:
pedIBDorM |
Matrix containing for individuals i and j the probability that two alleles chosen from the individuals are IBD or at least one of them is a migrant allele (only computed if 1 is in |
pedIBDorMM |
Matrix containing for individuals i and j the probability that two alleles chosen from the individuals are IBD or both are migrant alleles (only computed if 2 is in |
Robin Wellmann
data(PedigWithErrors) data(Phen) keep <- Phen$Indiv Pedig <- prePed(PedigWithErrors, keep=keep, thisBreed="Hinterwaelder", lastNative=1970) Kin <- pedIBDorM(Pedig, thisBreed="Hinterwaelder", keep.only=keep) mean(Kin$pedIBDorM) #[1] 0.8201792 mean(Kin$pedIBDorMM) #[1] 0.335358
data(PedigWithErrors) data(Phen) keep <- Phen$Indiv Pedig <- prePed(PedigWithErrors, keep=keep, thisBreed="Hinterwaelder", lastNative=1970) Kin <- pedIBDorM(Pedig, thisBreed="Hinterwaelder", keep.only=keep) mean(Kin$pedIBDorM) #[1] 0.8201792 mean(Kin$pedIBDorMM) #[1] 0.335358
This data set gives the pedigree of Hinterwald cattle with some artificially introduced errors and simulated breeding values.
data(PedigWithErrors)
data(PedigWithErrors)
A data frame with columns Indiv
(individual ID), Sire
, Dam
, Sex
, Breed
, Born
with year of birth, and column BV
with simulated breeding values.
Calculates Pedigree Based Inbreeding
pedInbreeding(Pedig)
pedInbreeding(Pedig)
Pedig |
Data frame containing the Pedigree with the first 3 columns being |
Computation of pedigree based inbreeding.
This function is a wrapper function for pedigree
from package pedigree
.
A data frame with column Indiv
containing the individual IDs and column Inbr
containing the inbreeding coefficients.
Robin Wellmann
data(PedigWithErrors) data(Phen) keep <- Phen$Indiv Pedig <- prePed(PedigWithErrors, keep=keep) Res <- pedInbreeding(Pedig) mean(Res$Inbr[Res$Indiv %in% keep]) #[1] 0.01943394
data(PedigWithErrors) data(Phen) keep <- Phen$Indiv Pedig <- prePed(PedigWithErrors, keep=keep) Res <- pedInbreeding(Pedig) mean(Res$Inbr[Res$Indiv %in% keep]) #[1] 0.01943394
Plots a pedigree
pedplot(Pedig, affected=NULL, status=NULL, label="Indiv", ...)
pedplot(Pedig, affected=NULL, status=NULL, label="Indiv", ...)
Pedig |
Data frame containing the pedigree with columns |
affected |
Logical vector indicating for each individual if its symbol should be plotted in colour. The default |
status |
Logical vector indicating for each individual if its symbol in the plot should be crossed out. The default |
label |
Character vector containing the columns of data frame |
... |
Options passed to the underlying function plot.pedigree from package |
This function plots a pedigree. If data frame Pedig
has logical column keep
then the default values mean that the symbols of these animals are plotted in color and for animals from other breeds the symbol is crossed out.
An invisible list returned by the underlying function plot.pedigree from package kinship2
.
Robin Wellmann
data(PedigWithErrors) sPed <- subPed(PedigWithErrors, keep="276000810087543", prevGen=3, succGen=2) pedplot(sPed, mar=c(2,4,2,4), label=c("Indiv", "Born", "Breed"), cex=0.4)
data(PedigWithErrors) sPed <- subPed(PedigWithErrors, keep="276000810087543", prevGen=3, succGen=2) pedplot(sPed, mar=c(2,4,2,4), label=c("Indiv", "Born", "Breed"), cex=0.4)
A data frame simulated breeding values of some Hinterwald cattle with offspring born in 2006 or 2007.
data(Phen)
data(Phen)
A data frame with individual IDs (Indiv
), sexes (Sex
), breeding values (BV
), and native contribution (NC
).
For a particular haplotype from thisBreed
and each marker m
the frequency of the segment containing marker m
in a specified reference breed is plotted.
## S3 method for class 'HaploFreq' plot(x, ID=1, hap=1, refBreed=NULL, Chr=NULL, show.maxFreq=FALSE, ...)
## S3 method for class 'HaploFreq' plot(x, ID=1, hap=1, refBreed=NULL, Chr=NULL, show.maxFreq=FALSE, ...)
x |
This is either an R-Object obtained with function haplofreq or a list obtained with function freqlist. |
ID |
Either the ID of the animal from this breed to be plotted, or the position of the animal in R-Object |
hap |
Number of the haplotype to be plotted ( |
refBreed |
Breed name. The frequencies the haplotype segments have in this reference breed will be plotted. Parameter |
Chr |
Vector with chromosomes to be plotted. The default means that all chromosomes will be plotted. |
show.maxFreq |
If |
... |
Arguments to be passed to methods, such as graphical parameters. |
For a particular haplotype from thisBreed
and each marker m
from chromosomes Chr
the frequency of the segment containing marker m
in reference breed refBreed
is plotted (red line), as well as the maximum frequency the segment has in one of the evaluated breeds (black line), and the maximum frequency a segment from thisBreed
has in one of the evaluated breeds (grey area, if show.maxFreq=TRUE
).
No return value, called for plotting.
Robin Wellmann
data(map) data(Cattle) dir <- system.file("extdata", package="optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") Freq <- freqlist( haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minSNP=20), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Holstein", minSNP=20), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Fleckvieh", minSNP=20) ) names(Freq) plot(Freq, ID=1, hap=2, refBreed="Rotbunt") Freq <- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="others", minSNP=20) plot(Freq, ID=1, hap=2) plot(Freq, ID=1, hap=2, show.maxFreq=TRUE) Freq <- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Angler", minSNP=20) plot(Freq, ID=1, hap=2)
data(map) data(Cattle) dir <- system.file("extdata", package="optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") Freq <- freqlist( haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minSNP=20), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Holstein", minSNP=20), haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Fleckvieh", minSNP=20) ) names(Freq) plot(Freq, ID=1, hap=2, refBreed="Rotbunt") Freq <- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="others", minSNP=20) plot(Freq, ID=1, hap=2) plot(Freq, ID=1, hap=2, show.maxFreq=TRUE) Freq <- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Angler", minSNP=20) plot(Freq, ID=1, hap=2)
Prepares a pedigree by sorting and adding founders and pruning the pedigree.
prePed(Pedig, keep=NULL, thisBreed=NA, lastNative=NA, addNum=FALSE)
prePed(Pedig, keep=NULL, thisBreed=NA, lastNative=NA, addNum=FALSE)
Pedig |
Data frame containing the pedigree where the first 3 columns correspond to: Individual ID, Sire, and Dam. More columns can be passed in the |
keep |
Vector with IDs of individuals, or |
thisBreed |
Name of the breed. |
lastNative |
Last year of birth for which individuals with unknown pedigree are considered native. |
addNum |
If |
This function takes a pedigree, adds missing founders, and sorts the pedigree. If parameter keep
contains IDs of individuals then only these individuals and their ancestors will be kept in the pedigree.
If the pedigree contains loops, then the loops will be broken by setting the parents of one animal in each loop to NA
.
If the pedigree contains columnn Sex
then the sexes will be recoded as 'male'
and 'female'
. Missing sexes will be determined from pedigree structure if possible.
If the pedigree contains column Breed
then for ancestors with missing breed the breed name is estimated. If parameter lastNative
is not NA
then for each animal with one missing parent an imaginary founder is added to the pedigree in order to enable classifying the breed names of all founders as follows:
In general animals with missing breed are assumed to have the same breed as most of their offspring. But there is one exception: For founders belonging to thisBreed
who are born after lastNative
the breed name will be set to "unknown"
. Moreover for founders from thisBreed
with unknown year of birth the breed name will be set to "unknown"
if all their descendants are born after lastNative+I
.
Data frame containing the pedigree with columns:
Indiv |
Character column with IDs of the individuals |
Sire |
Character column with IDs of the sires |
Dam |
Character column with IDs of the dams |
Sex |
Character column with sexes of the individuals denoted as |
Breed |
Character column with adjusted breed names of the individuals (only if |
Born |
Numeric column with Year-of-Birth of the individuals (only if |
I |
Numeric column with the average age of the parents when the respective individual was born (only if |
numIndiv |
Numeric IDs of the individuals, which are equal to the row numbers (only if |
numSire |
Numeric IDs of the sires (only if |
numDam |
Numeric IDs of the dams (only if |
Offspring |
Logical column indicating the individuals with offspring. |
Robin Wellmann
data(PedigWithErrors) Pedig <- prePed(PedigWithErrors) tail(Pedig) hist(Pedig$I, freq=FALSE, ylim=c(0,0.2))
data(PedigWithErrors) Pedig <- prePed(PedigWithErrors) tail(Pedig) hist(Pedig$I, freq=FALSE, ylim=c(0,0.2))
Reads individual IDs from a genotype file.
read.indiv(file, skip=NA, cskip=NA)
read.indiv(file, skip=NA, cskip=NA)
file |
Name of the genotype file. |
skip |
Take line |
cskip |
Take column |
Reading individual IDs from phased marker files.
Marker file format: Each marker file containing phased genotypes has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. The first cskip
columns are ignored. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces. The name of each file must contain the chromosome name as specified in the map
in the form "ChrNAME."
, e.g. "Breed2.Chr1.phased"
.
Vector with the IDs of the individuals.
Robin Wellmann
data(Cattle) dir <- system.file("extdata", package = "optiSel") file <- file.path(dir, "Chr1.phased") ID <- read.indiv(file) identical(Cattle$Indiv, ID) #[1] TRUE
data(Cattle) dir <- system.file("extdata", package = "optiSel") file <- file.path(dir, "Chr1.phased") ID <- read.indiv(file) identical(Cattle$Indiv, ID) #[1] TRUE
Sampling Individuals from a Pedigree.
sampleIndiv(Pedig, from="Born", each=100)
sampleIndiv(Pedig, from="Born", each=100)
Pedig |
Pedigree with column |
from |
Column name. From each cohort specified in this column (e.g. year of birth), the number of individuals specified in parameter |
each |
Number of individuals to be sampled from each cohort. |
From each cohort, a specified number of individuals will be sampled. If a cohort contains less individuals, then all individuals are sampled. This may be needed for estimating population specific parameters from a subset of a large pedigree to reduce computation time.
Character vector containing the IDs of the individuals.
Robin Wellmann
data("PedigWithErrors") set.seed(1) Pedig <- prePed(PedigWithErrors) use <- Pedig$Breed=="Hinterwaelder" keep <- sampleIndiv(Pedig[use, ], from="Born", each=5) keep
data("PedigWithErrors") set.seed(1) Pedig <- prePed(PedigWithErrors) use <- Pedig$Breed=="Hinterwaelder" keep <- sampleIndiv(Pedig[use, ], from="Born", each=5) keep
Calculates the segment based Breed Composition: For every individual the breed composition is estimated, including the genetic contribution from native ancestors.
segBreedComp(Native, map, unitP="Mb")
segBreedComp(Native, map, unitP="Mb")
Native |
This parameter is either (1) Mx(2N) logical matrix, with or (2) Mx(2N) character matrix, with components being the first characters of the names of the breeds in which the respective segment has maximum frequency. Segments considered native are coded as or (3) Vector with file names. The files contain for every SNP and for each haplotype 1 if the segment containing the SNP is considered native. Otherwise it is the first letter of the name of the breed in which the segment has maximum frequency. These files are typically created by function haplofreq.
There is one file per chromosome and file names must contain the chromosome name as specified in the |
map |
Data frame providing the marker map with columns including marker name |
unitP |
The unit for measuring the proportion of the genome included in native segments.
Possible units are the number of marker SNPs included in shared segments ( |
For every individual the breed composition is computed, including the genetic contribution from native ancestors (native contribution). The native contribution is the proportion of the genome belonging to segments whose frequency is smaller than a predefined value in all other breeds.
Additionally, for each introgressed breed, the proportion of the genome of each individual is computed that is non-native and has maximum frequency in the respective breed (not if option (1) is used).
Data frame with the number of rows being the number of individuals. The columns are
Indiv |
IDs of the individuals, |
native |
Genetic contributions from native ancestors, |
... |
Contributions from other breeds. |
Robin Wellmann
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") GTfiles <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) Haplo <- haplofreq(GTfiles, Cattle, map, thisBreed="Angler", minSNP=20, minL=1.0) Comp <- segBreedComp(Haplo$freq<0.01, map) mean(Comp$native) #[1] 0.3853432 Comp <- segBreedComp(Haplo$match, map) apply(Comp[, -1], 2, mean) ## Reading native segments from files: wdir <- file.path(tempdir(), "HaplotypeEval") file <- haplofreq(GTfiles, Cattle, map, thisBreed="Angler", minSNP=20, minL=1.0, ubFreq=0.01, what="match", w.dir=wdir) Comp <- segBreedComp(file$match, map) head(Comp) apply(Comp[, -1], 2, mean) # native F H R #0.38534317 0.05503451 0.25986508 0.29975724 #unlink(wdir, recursive = TRUE)
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") GTfiles <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) Haplo <- haplofreq(GTfiles, Cattle, map, thisBreed="Angler", minSNP=20, minL=1.0) Comp <- segBreedComp(Haplo$freq<0.01, map) mean(Comp$native) #[1] 0.3853432 Comp <- segBreedComp(Haplo$match, map) apply(Comp[, -1], 2, mean) ## Reading native segments from files: wdir <- file.path(tempdir(), "HaplotypeEval") file <- haplofreq(GTfiles, Cattle, map, thisBreed="Angler", minSNP=20, minL=1.0, ubFreq=0.01, what="match", w.dir=wdir) Comp <- segBreedComp(file$match, map) head(Comp) apply(Comp[, -1], 2, mean) # native F H R #0.38534317 0.05503451 0.25986508 0.29975724 #unlink(wdir, recursive = TRUE)
Segment based probability of alleles to be IBD (identical by descent): For each pair of individuals the probability is computed that two alleles taken at random position from randomly chosen haplotypes belong to a shared segment.
segIBD(files, map, minSNP=20, minL=1.0, unitP="Mb", unitL="Mb", a=0.0, keep=NULL, skip=NA, cskip=NA, cores=1, quiet=FALSE)
segIBD(files, map, minSNP=20, minL=1.0, unitP="Mb", unitL="Mb", a=0.0, keep=NULL, skip=NA, cskip=NA, cores=1, quiet=FALSE)
files |
This parameter is either (1) A vector with names of phased marker files, one file for each chromosome, or (2) A list with two components. Each component is a vector with names of phased marker files, one file for each chromosome. Each components corresponds to a different set of individuals. This enables to compute the kinship between individuals stored in two different files. File names must contain the chromosome name as specified in the |
map |
Data frame providing the marker map with columns including marker name |
minSNP |
Minimum number of marker SNPs included in a segment. |
minL |
Minimum length of a segment in |
unitP |
The unit for measuring the proportion of the genome included in shared segments.
Possible units are the number of marker SNPs included in shared segments ( |
unitL |
The unit for measuring the length of a segment. Possible units are the number of marker SNPs included in the segment ( |
a |
The Function providing the weighting factor for each segment is w(x)=x*x/(a+x*x). The parameter of the function is the length of the segment in |
keep |
If |
skip |
Take line |
cskip |
Take column |
cores |
Number of cores to be used for parallel processing of chromosomes. By default one core is used. For |
quiet |
Should console output be suppressed? |
For each pair of individuals the probability is computed that two SNPs taken at random position from randomly chosen haplotypes belong to a shared segment.
Genotype file format: Each file containing phased genotypes has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. The first cskip
columns are ignored. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces.
NxN
segment-based kinship matrix with N
being the number of individuals.
Robin Wellmann
de Cara MAR, Villanueva B, Toro MA, Fernandez J (2013). Using genomic tools to maintain diversity and fitness in conservation programmes. Molecular Ecology. 22: 6091-6099
data(map) dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) f <- segIBD(files, map, minSNP=15, minL=1.0) mean(f) #[1] 0.05677993 f <- segIBD(files, map, minSNP=15, minL=1.0, cores=NA) mean(f) #[1] 0.05677993 ## Multidimensional scaling of animals: ## (note that only few markers are used) data(Cattle) library("smacof") D <- sim2dis(f, 4) color <- c(Angler="red", Rotbunt="green", Fleckvieh="blue", Holstein="black") col <- color[as.character(Cattle$Breed)] Res <- smacofSym(D, itmax = 5000, eps = 1e-08) plot(Res$conf, pch=18, col=col, main="Multidimensional Scaling", cex=0.5) mtext(paste("segIBD Stress1 = ", round(Res$stress,3)))
data(map) dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) f <- segIBD(files, map, minSNP=15, minL=1.0) mean(f) #[1] 0.05677993 f <- segIBD(files, map, minSNP=15, minL=1.0, cores=NA) mean(f) #[1] 0.05677993 ## Multidimensional scaling of animals: ## (note that only few markers are used) data(Cattle) library("smacof") D <- sim2dis(f, 4) color <- c(Angler="red", Rotbunt="green", Fleckvieh="blue", Holstein="black") col <- color[as.character(Cattle$Breed)] Res <- smacofSym(D, itmax = 5000, eps = 1e-08) plot(Res$conf, pch=18, col=col, main="Multidimensional Scaling", cex=0.5) mtext(paste("segIBD Stress1 = ", round(Res$stress,3)))
Calculates the segment based probability of alleles to be IBD (identical by descent) and Native: For each pair of individuals the probability is computed that two SNPs taken at random position from randomly chosen haplotypes belong to a shared segment and are native.
segIBDandN(files, Native, map, minSNP=20, unitP="Mb", minL=1.0, unitL="Mb", a=0.0, keep=NULL, skip=NA, cskip=NA, cores=1, quiet=FALSE)
segIBDandN(files, Native, map, minSNP=20, unitP="Mb", minL=1.0, unitL="Mb", a=0.0, keep=NULL, skip=NA, cskip=NA, cores=1, quiet=FALSE)
files |
Vector with names of the phased marker files, one file for each chromosome. The required format is described under |
Native |
This parameter is either (1) Mx(2N) indicator matrix, with 1, if the segment containing the SNP is considered native, and 0 otherwise. The row names are the marker names, and the non-unique column names are the IDs of the individuals. The matrix is typically computed from the output of function haplofreq. or (2) Vector with file names. The files contain for every SNP and for each haplotype from this breed 1 if the segment containing the SNP is considered native. These files are typically created by function haplofreq.
There is one file per chromosome and file names must contain the chromosome name as specified in the |
map |
Data frame providing the marker map with columns including marker name |
minSNP |
Minimum number of marker SNPs included in a segment. |
unitP |
The unit for measuring the proportion of the genome included in shared segments.
Possible units are the number of marker SNPs included in shared segments ( |
minL |
Minimum length of a segment in |
unitL |
The unit for measuring the length of a segment. Possible units are the number of marker SNPs included in the segment ( |
a |
The Function providing the weighting factor for each segment is w(x)=x*x/(a+x*x). The parameter of the function is the length of the segment in |
keep |
Vector with IDs of individuals (from this breed) for which the probabilities are to be computed. By default, they will be computed for all individuals included in |
skip |
Take line |
cskip |
Take column |
cores |
Number of cores to be used for parallel processing of chromosomes. By default one core is used. For |
quiet |
Should console output be suppressed? |
For each pair of individuals the probability is computed that two SNPs taken at random position from randomly chosen haplotypes belong to a shared segment and are native. That is, they are not introgressed from other breeds.
Genotype file format: Each file containing phased genotypes has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. The first cskip
columns are ignored. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces. The name of each file must contain the chromosome name as specified in the map
in the form "ChrNAME."
, e.g. "Breed2.Chr1.phased"
.
NxN
matrix with N
being the number of individuals from this breed included in all files (and in parameter keep
).
Robin Wellmann
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") GTfile <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) Freq <- haplofreq(GTfile, Cattle, map, thisBreed="Angler", refBreeds="others", minSNP=20)$freq fIBDN <- segIBDandN(GTfile, Freq<0.01, map=map, minSNP=20) mean(fIBDN) #[1] 0.01032261 fIBDN <- segIBDandN(GTfile, Freq<0.01, map=map, minSNP=20, cores=NA) mean(fIBDN) #[1] 0.01032261 ## using files: wdir <- file.path(tempdir(),"HaplotypeEval") chr <- unique(map$Chr) GTfile <- file.path( dir, paste("Chr", chr, ".phased", sep="")) file <- haplofreq(GTfile, Cattle, map, thisBreed="Angler", minSNP=20, ubFreq=0.01, w.dir=wdir) fIBDN <- segIBDandN(GTfile, file$match, map=map, minSNP=20) mean(fIBDN) #[1] 0.01032261 fIBDN <- segIBDandN(GTfile, file$match, map=map, minSNP=20, cores=NA) mean(fIBDN) #[1] 0.01032261 #unlink(wdir, recursive = TRUE)
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") GTfile <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) Freq <- haplofreq(GTfile, Cattle, map, thisBreed="Angler", refBreeds="others", minSNP=20)$freq fIBDN <- segIBDandN(GTfile, Freq<0.01, map=map, minSNP=20) mean(fIBDN) #[1] 0.01032261 fIBDN <- segIBDandN(GTfile, Freq<0.01, map=map, minSNP=20, cores=NA) mean(fIBDN) #[1] 0.01032261 ## using files: wdir <- file.path(tempdir(),"HaplotypeEval") chr <- unique(map$Chr) GTfile <- file.path( dir, paste("Chr", chr, ".phased", sep="")) file <- haplofreq(GTfile, Cattle, map, thisBreed="Angler", minSNP=20, ubFreq=0.01, w.dir=wdir) fIBDN <- segIBDandN(GTfile, file$match, map=map, minSNP=20) mean(fIBDN) #[1] 0.01032261 fIBDN <- segIBDandN(GTfile, file$match, map=map, minSNP=20, cores=NA) mean(fIBDN) #[1] 0.01032261 #unlink(wdir, recursive = TRUE)
Calculates the kinship at native alleles, which is the segment based probability of native alleles to be IBD.
segIBDatN(files, phen, map, thisBreed, refBreeds="others", ubFreq=0.01, minSNP=20, unitP="Mb", minL=1.0, unitL="Mb", a=0.0, keep=NULL, lowMem=TRUE, skip=NA, cskip=NA, cores=1, quiet=FALSE)
segIBDatN(files, phen, map, thisBreed, refBreeds="others", ubFreq=0.01, minSNP=20, unitP="Mb", minL=1.0, unitL="Mb", a=0.0, keep=NULL, lowMem=TRUE, skip=NA, cskip=NA, cores=1, quiet=FALSE)
files |
This can be a character vector with names of the phased marker files, one file for each chromosome.
Alternatively a) b) c) File names must contain the chromosome name as specified in the |
phen |
Data frame containing the ID (column |
map |
Data frame providing the marker map with columns including marker name |
thisBreed |
Breed name: Results will be computed for individuals from |
refBreeds |
Vector containing names of genotyped breeds. A segment is considered native if its frequency is smaller than |
ubFreq |
A segment is considered native if its frequency is smaller than |
minSNP |
Minimum number of marker SNPs included in a segment. |
unitP |
The unit for measuring the proportion of the genome included in native segments.
Possible units are the number of marker SNPs included in shared segments ( |
minL |
Minimum length of a segment in |
unitL |
The unit for measuring the length of a segment. Possible units are the number of marker SNPs included in the segment ( |
a |
The function providing the weighting factor for each segment is w(x)=x*x/(a+x*x). The parameter of the function is the length of the segment in |
keep |
Subset of the IDs of the individuals from data frame |
lowMem |
If |
skip |
Take line |
cskip |
Take column |
cores |
Number of cores to be used for parallel processing of chromosomes. By default one core is used. For |
quiet |
Should console output be suppressed? |
Calculates a list containing matrices needed to compute segment based kinships at native alleles, defined as the conditional probability that two randomly chosen alleles are IBD, given that both originate from native ancestors. An allele is considered to originate from a native ancesor if the segment containing the allele has low frequency in all reference breeds.
The kinship at native alleles between individuals i and j is Q1[i,j]/Q2[i,j]
.
The mean kinship at native alleles in the offspring is (x'Q1x+d1)/(x'Q2x+d2)
, where x
is the vector with genetic contributions of the selection candidates.
Genotype file format: Each file containing phased genotypes has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. The first cskip
columns are ignored. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces. The name of each file must contain the chromosome name as specified in the map
in the form "ChrNAME."
, e.g. "Breed2.Chr1.phased"
.
A list of class ratioFun
including components:
Q1 |
matrix with |
Q2 |
matrix with |
d1 |
The value by which the probability that two alleles chosen from the offspring are IBD and native increases due to genetic drift. |
d2 |
The value by which the probability that two alleles chosen from the offspring are native increases due to genetic drift. |
id |
IDs of the individuals for which the probabilites have been computed. |
mean |
Mean kinship at native alleles of the specified individuals. Note that |
Robin Wellmann
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") sKinatN <- segIBDatN(files, Cattle, map, thisBreed="Angler", ubFreq=0.01, minL=1.0, lowMem=FALSE) ## Mean kinship at native segments: sKinatN$mean #[1] 0.06695171 ## Note that this can not be computed as mean(sKinatN$of). ## Results for individuals: sKinatN$of <- sKinatN$Q1/sKinatN$Q2 sKinatN$of["Angler1","Angler5"] #[1] 0.4394066 ## Use temporary files to reduce working memory: sKinatN <- segIBDatN(files, Cattle, map, thisBreed="Angler", ubFreq=0.01, minL=1.0) ## Mean kinship at native segments: sKinatN$mean #[1] 0.06695171
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") files <- paste(dir, "/Chr", 1:2, ".phased", sep="") sKinatN <- segIBDatN(files, Cattle, map, thisBreed="Angler", ubFreq=0.01, minL=1.0, lowMem=FALSE) ## Mean kinship at native segments: sKinatN$mean #[1] 0.06695171 ## Note that this can not be computed as mean(sKinatN$of). ## Results for individuals: sKinatN$of <- sKinatN$Q1/sKinatN$Q2 sKinatN$of["Angler1","Angler5"] #[1] 0.4394066 ## Use temporary files to reduce working memory: sKinatN <- segIBDatN(files, Cattle, map, thisBreed="Angler", ubFreq=0.01, minL=1.0) ## Mean kinship at native segments: sKinatN$mean #[1] 0.06695171
Segment based Inbreeding: For each individual the probability is computed that the paternal allele and the maternal allele, sampled from random position, belong to a shared segment (i.e. a run of homozygosity, ROH). The arguments are the same as for function segIBD.
segInbreeding(files, map, minSNP=20, minL=1.0, unitP="Mb", unitL="Mb", a=0.0, keep=NULL, skip=NA, cskip=NA, quiet=FALSE)
segInbreeding(files, map, minSNP=20, minL=1.0, unitP="Mb", unitL="Mb", a=0.0, keep=NULL, skip=NA, cskip=NA, quiet=FALSE)
files |
This parameter is either (1) A vector with names of phased marker files, one file for each chromosome, or (2) A list with two components. Each component is a vector with names of phased marker files, one file for each chromosome. Each components corresponds to a different set of individuals. File names must contain the chromosome name as specified in the |
map |
Data frame providing the marker map with columns including marker name |
minSNP |
Minimum number of marker SNPs included in a segment. |
minL |
Minimum length of a segment in |
unitP |
The unit for measuring the proportion of the genome included in shared segments.
Possible units are the number of marker SNPs included in shared segments ( |
unitL |
The unit for measuring the length of a segment. Possible units are the number of marker SNPs included in the segment ( |
a |
The Function providing the weighting factor for each segment is w(x)=x*x/(a+x*x). The parameter of the function is the length of the segment in |
keep |
If |
skip |
Take line |
cskip |
Take column |
quiet |
Should console output be suppressed? |
For each pair of individuals the probability is computed that two SNPs taken at random position from randomly chosen haplotypes belong to a shared segment.
Genotype file format: Each file containing phased genotypes has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. The first cskip
columns are ignored. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces.
A data frame with column Indiv
containing the individual IDs and column Inbr
containing the inbreeding coefficients.
Robin Wellmann
de Cara MAR, Villanueva B, Toro MA, Fernandez J (2013). Using genomic tools to maintain diversity and fitness in conservation programmes. Molecular Ecology. 22: 6091-6099
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", 1:2, ".phased", sep="")) f <- segInbreeding(files, map, minSNP=20, minL=2.0) Cattle2 <- merge(Cattle, f, by="Indiv") tapply(Cattle2$Inbr, Cattle2$Breed, mean) # Angler Fleckvieh Holstein Rotbunt #0.03842552 0.05169508 0.12431393 0.08386849 boxplot(Inbr~Breed, data=Cattle2, ylim=c(0,1), main="Segment Based Inbreeding") fIBD <- segIBD(files, map, minSNP=20, minL=2.0) identical(rownames(fIBD), f$Indiv) #[1] TRUE range(2*diag(fIBD)-1-f$Inbr) #[1] -2.220446e-16 2.220446e-16
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", 1:2, ".phased", sep="")) f <- segInbreeding(files, map, minSNP=20, minL=2.0) Cattle2 <- merge(Cattle, f, by="Indiv") tapply(Cattle2$Inbr, Cattle2$Breed, mean) # Angler Fleckvieh Holstein Rotbunt #0.03842552 0.05169508 0.12431393 0.08386849 boxplot(Inbr~Breed, data=Cattle2, ylim=c(0,1), main="Segment Based Inbreeding") fIBD <- segIBD(files, map, minSNP=20, minL=2.0) identical(rownames(fIBD), f$Indiv) #[1] TRUE range(2*diag(fIBD)-1-f$Inbr) #[1] -2.220446e-16 2.220446e-16
Segment based probability of alleles to be Native: For each pair of individuals the probability is computed that two SNPs taken at random position from randomly chosen haplotypes both belong to native segments.
segN(Native, map, unitP="Mb", keep=NULL, cores=1, quiet=FALSE)
segN(Native, map, unitP="Mb", keep=NULL, cores=1, quiet=FALSE)
Native |
This parameter is either (1) Mx(2N) indicator matrix, with 1, if the segment containing the SNP is considered native, and 0 otherwise. The row names are the marker names, and the non-unique column names are the IDs of the individuals. The matrix is typically computed from the output of function haplofreq. or (2) Vector with file names. The files contain for every SNP and for each haplotype from this breed 1 if the segment containing the SNP is considered native. These files are typically created by function haplofreq.
There is one file per chromosome and file names must contain the chromosome name as specified in the |
map |
Data frame providing the marker map with columns including marker name |
unitP |
The unit for measuring the proportion of the genome included in native segments.
Possible units are the number of marker SNPs included in shared segments ( |
keep |
Vector with IDs of individuals (from this breed) for which the probabilities are to be computed. By default, they will be computed for all individuals included in |
cores |
Number of cores to be used for parallel processing of chromosomes. By default one core is used. For |
quiet |
Should console output be suppressed? |
For each pair of individuals the probability is computed that two SNPs taken at random position from randomly chosen haplotypes both belong to native segments. That is, they are not introgressed from other breeds.
NxN
matrix with N
being the number of genotyped individuals from this breed (which are also included in vector keep
).
Robin Wellmann
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) Freq <- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="others", minSNP=20)$freq fN <- segN(Freq<0.01, map) mean(fN) #[1] 0.15418 fN <- segN(Freq<0.01, map, cores=NA) mean(fN) #[1] 0.15418 ## using files: wdir <- file.path(tempdir(),"HaplotypeEval") chr <- unique(map$Chr) GTfile <- file.path( dir, paste("Chr", chr, ".phased", sep="")) files <- haplofreq(GTfile, Cattle, map, thisBreed="Angler", w.dir=wdir) fN <- segN(files$match, map) mean(fN) #[1] 0.15418 fN <- segN(files$match, map, cores=NA) mean(fN) #[1] 0.15418 #unlink(wdir, recursive = TRUE)
data(map) data(Cattle) dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) Freq <- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="others", minSNP=20)$freq fN <- segN(Freq<0.01, map) mean(fN) #[1] 0.15418 fN <- segN(Freq<0.01, map, cores=NA) mean(fN) #[1] 0.15418 ## using files: wdir <- file.path(tempdir(),"HaplotypeEval") chr <- unique(map$Chr) GTfile <- file.path( dir, paste("Chr", chr, ".phased", sep="")) files <- haplofreq(GTfile, Cattle, map, thisBreed="Angler", w.dir=wdir) fN <- segN(files$match, map) mean(fN) #[1] 0.15418 fN <- segN(files$match, map, cores=NA) mean(fN) #[1] 0.15418 #unlink(wdir, recursive = TRUE)
Converts a similarity matrix (e.g. a kinship matrix) into a dissimilarity matrix.
sim2dis(f, a=4.0, baseF=0.03, method=1)
sim2dis(f, a=4.0, baseF=0.03, method=1)
f |
Similarity matrix. |
a |
Exponent |
baseF |
Old inbreeding not measured by |
method |
Either |
This function converts a similarity matrix f
with values between 0 and 1 (e.g. a kinship matrix) into a dissimilarity matrix.
At first, the similarity is adjusted as
f <- baseF + (1-baseF)*f
.
Then, for Method 1, the dissimilarity between individuals i
and j
is computed as
Dij = (-log(fij))^a
,
whereas for Method 2, the dissimilarity is computed as
Dij = sqrt((fii+fjj)/2-fij)^a
.
Although Method 2 may provide lower stress values in some cases, Method 1 has the advantage that the area reflects the diversity of a population more reasonable.
Dissimilarity matrix D.
Robin Wellmann
data(map) dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) f <- segIBD(files, map, minSNP=15, minL=1.0) D <- sim2dis(f, 4) ## Multidimensional scaling of animals: data(Cattle) library("smacof") color <- c(Angler="red", Rotbunt="green", Fleckvieh="blue", Holstein="black") col <- color[as.character(Cattle$Breed)] Res <- smacofSym(D, itmax = 5000, eps = 1e-08) plot(Res$conf, pch=18, col=col, main="Multidimensional Scaling", cex=0.5) mtext(paste("segIBD Stress1 = ", round(Res$stress,3)))
data(map) dir <- system.file("extdata", package = "optiSel") files <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep="")) f <- segIBD(files, map, minSNP=15, minL=1.0) D <- sim2dis(f, 4) ## Multidimensional scaling of animals: data(Cattle) library("smacof") color <- c(Angler="red", Rotbunt="green", Fleckvieh="blue", Holstein="black") col <- color[as.character(Cattle$Breed)] Res <- smacofSym(D, itmax = 5000, eps = 1e-08) plot(Res$conf, pch=18, col=col, main="Multidimensional Scaling", cex=0.5) mtext(paste("segIBD Stress1 = ", round(Res$stress,3)))
Creates a subset of a large pedigree that includes only individuals related with specified individuals in a predefined way.
subPed(Pedig, keep, prevGen=3, succGen=0)
subPed(Pedig, keep, prevGen=3, succGen=0)
Pedig |
Data frame containing the pedigree where the first 3 columns correspond to: Individual ID, Sire, and Dam. More columns can be passed in the |
keep |
Vector with IDs of individuals. Only these individuals and individuals related with them in a predefined way will be kept in the pedigree. |
prevGen |
Number of previous (ancestral) generations to be included in the pedigree. |
succGen |
Number of succeeding (descendant) generations to be included in the pedigree. |
This function creates a subset of a large pedigree that includes only individuals related with the individuals specified in the vector keep
in a predefined way.
A data frame containing the reduced pedigree. A column keep
is appended indicating which individuals were included in parameter keep
.
Robin Wellmann
data(PedigWithErrors) sPed <- subPed(PedigWithErrors, keep="276000891974272", prevGen=3, succGen=2) sPed label <- c("Indiv", "Born", "Breed") pedplot(sPed, mar=c(2,4,2,4), label=label, cex=0.7)
data(PedigWithErrors) sPed <- subPed(PedigWithErrors, keep="276000891974272", prevGen=3, succGen=2) sPed label <- c("Indiv", "Born", "Breed") pedplot(sPed, mar=c(2,4,2,4), label=label, cex=0.7)
For every time point (age cohort), several population genetic parameters are estimated. These may include the generation interval, the average kinship, the average native kinship, the native effective size (native Ne), and the native genome equivalent (NGE) of the population at that point in time.
## S3 method for class 'candes' summary(object, tlim=range(object$phen$Born, na.rm=TRUE), histNe=NA, base=tlim[1], df=4, ...)
## S3 method for class 'candes' summary(object, tlim=range(object$phen$Born, na.rm=TRUE), histNe=NA, base=tlim[1], df=4, ...)
object |
R-Object created with function candes containing phenotypes and kinship information on individuals from the same breed. Data frame |
tlim |
Numeric vector with 2 components giving the time span for which genetic parameters are to be computed. |
histNe |
The historic effective size of the population assumed for the time between year |
base |
The base year in which individuals are assumed to be unrelated. The base year affects the NGE. The default is |
df |
Smoothing parameter used for computing the native effective size. The default is |
... |
further arguments passed to or from other methods |
For every time point (age cohort), several population genetic parameters are estimated. These may include the generation interval, the average kinship, the average native kinship, the native effective size (native Ne), and the native genome equivalent (NGE) of the population at that point in time. The population at a time t consists of all individuals born between t-I and t, where I is the generation interval. The population genetic parameters are described below.
A data frame providing for each time point (age cohort) several population genetic parameters. These may include
t |
The age cohort, containing e.g. year-of-birth or the generation number. These are the levels of column |
I |
The estimated generation interval at the time when the individuals were born. |
KIN |
The average kinship |
NATKIN |
The average native kinship |
Ne |
The native effective size of the population at the time when the individuals were born. The native effective size, quantifies how fast the smoothed native kinship is increasing. The native kinship may decrease for a short time span, in which case the estimate would be NA. Use a smaller value for parameter |
NGE |
The native genome equivalents of the population at the time when the individuals were born. The NGE estimates the number of unrelated individuals that would be needed to establish a hypothetical new population that has the same genetic diversity at native alleles as the population under study, whereby the individuals born in the base-year are assumed to be unrelated. |
Robin Wellmann
data(ExamplePed) Pedig <- prePed(ExamplePed, thisBreed="Hinterwaelder", lastNative=1970) phen <- Pedig[Pedig$Breed=="Hinterwaelder",] pKin <- pedIBD(Pedig) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder") pop <- candes(phen=phen, pKin=pKin, pKinatN=pKinatN, quiet=TRUE, reduce.data=FALSE) Param <- summary(pop, tlim=c(1970,1995), histNe=150, base=1800, df=4) plot(Param$t, Param$pKinatN,type="l", ylim=c(0,0.1)) lines(Param$t,Param$pKin, col="red") plot(Param$t, Param$Ne, type="l", ylim=c(0,100)) plot(Param$t, Param$NGE, type="l", ylim=c(0,10)) plot(Param$t, Param$I, type="l", ylim=c(0,10))
data(ExamplePed) Pedig <- prePed(ExamplePed, thisBreed="Hinterwaelder", lastNative=1970) phen <- Pedig[Pedig$Breed=="Hinterwaelder",] pKin <- pedIBD(Pedig) pKinatN <- pedIBDatN(Pedig, thisBreed="Hinterwaelder") pop <- candes(phen=phen, pKin=pKin, pKinatN=pKinatN, quiet=TRUE, reduce.data=FALSE) Param <- summary(pop, tlim=c(1970,1995), histNe=150, base=1800, df=4) plot(Param$t, Param$pKinatN,type="l", ylim=c(0,0.1)) lines(Param$t,Param$pKin, col="red") plot(Param$t, Param$Ne, type="l", ylim=c(0,100)) plot(Param$t, Param$NGE, type="l", ylim=c(0,10)) plot(Param$t, Param$I, type="l", ylim=c(0,10))
Calculates summary statistics for pedigrees.
## S3 method for class 'Pedig' summary(object, keep.only=NULL, maxd=50, d=4, ...)
## S3 method for class 'Pedig' summary(object, keep.only=NULL, maxd=50, d=4, ...)
object |
An object from class |
keep.only |
The individuals to be included in the summary. |
maxd |
Maximum pedigree depth. |
d |
Number of generations taken into account for computing the PCI. |
... |
further arguments passed to or from other methods |
Computes summary statistics for pedigrees, including the numbers of equivalent complete generations, numbers of fully traced generations, numbers of maximum generations traced, indexes of pedigree completeness (MacCluer et al, 1983), and the inbreeding coefficients.
A data frame with the following columns:
Indiv |
IDs of the individuals, |
equiGen |
Number of equivalent complete generations, |
fullGen |
Number of fully traced generations, |
maxGen |
Number of maximum generations traced, |
PCI |
Index of pedigree completeness (MacCluer et al, 1983) in generation d . |
Inbreeding |
Inbreeding coefficient. |
Robin Wellmann
MacCluer J W, Boyce A J, Dyke B, Weitkamp L R, Pfenning D W, Parsons C J (1983). Inbreeding and pedigree structure in Standardbred horses. J Hered 74 (6): 394-399.
data(PedigWithErrors) Pedig <- prePed(PedigWithErrors) Summary <- summary(Pedig, keep.only=Pedig$Born %in% (2006:2007)) head(Summary) hist(Summary$PCI, xlim=c(0,1), main="Pedigree Completeness") hist(Summary$Inbreeding, xlim=c(0,1), main="Inbreeding") hist(Summary$equiGen, xlim=c(0,20), main="Number of Equivalent Complete Generations") hist(Summary$fullGen, xlim=c(0,20), main="Number of Fully Traced Generations") hist(Summary$maxGen, xlim=c(0,20), main="Number of Maximum Generations Traced")
data(PedigWithErrors) Pedig <- prePed(PedigWithErrors) Summary <- summary(Pedig, keep.only=Pedig$Born %in% (2006:2007)) head(Summary) hist(Summary$PCI, xlim=c(0,1), main="Pedigree Completeness") hist(Summary$Inbreeding, xlim=c(0,1), main="Inbreeding") hist(Summary$equiGen, xlim=c(0,20), main="Number of Equivalent Complete Generations") hist(Summary$fullGen, xlim=c(0,20), main="Number of Fully Traced Generations") hist(Summary$maxGen, xlim=c(0,20), main="Number of Maximum Generations Traced")