Title: | Parentage Assignment using Bi-Allelic Genetic Markers |
---|---|
Description: | Can be used for paternity and maternity assignment and outperforms conventional methods where closely related individuals occur in the pool of possible parents. The method compares the genotypes of offspring with any combination of potentials parents and scores the number of mismatches of these individuals at bi-allelic genetic markers (e.g. Single Nucleotide Polymorphisms). It elaborates on a prior exclusion method based on the Homozygous Opposite Test (HOT; Huisman 2017 <doi:10.1111/1755-0998.12665>) by introducing the additional exclusion criterion HIPHOP (Homozygous Identical Parents, Heterozygous Offspring are Precluded; Cockburn et al., in revision). Potential parents are excluded if they have more mismatches than can be expected due to genotyping error and mutation, and thereby one can identify the true genetic parents and detect situations where one (or both) of the true parents is not sampled. Package 'hiphop' can deal with (a) the case where there is contextual information about parentage of the mother (i.e. a female has been seen to be involved in reproductive tasks such as nest building), but paternity is unknown (e.g. due to promiscuity), (b) where both parents need to be assigned, because there is no contextual information on which female laid eggs and which male fertilized them (e.g. polygynandrous mating system where multiple females and males deposit young in a common nest, or organisms with external fertilisation that breed in aggregations). For details: Cockburn, A., Penalba, J.V.,Jaccoud, D.,Kilian, A., Brouwer, L., Double, M.C., Margraf, N., Osmond, H.L., van de Pol, M. and Kruuk, L.E.B. (in revision). HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for bi-allelic markers. Molecular Ecology Resources, DOI to be added upon acceptance. |
Authors: | Martijn van de Pol [aut, cre] , Lyanne Brouwer [aut] , Andrew Cockburn [aut] |
Maintainer: | Martijn van de Pol <[email protected]> |
License: | GPL-2 |
Version: | 0.0.1 |
Built: | 2024-11-16 06:46:07 UTC |
Source: | CRAN |
An example list of how the output from the hothiphop function looks like. Listed are all possible triad combinations for 8 offspring from the 2018 cohort This dataframe is also used to generate the vignette. See the help file of the hothiphop function for an explanation of columns.
combinations
combinations
An object of class data.frame
with 24192 rows and 24 columns.
A dataset containing genotypes of 1407 superb fairy wrens (Malurus cyaneus) from five cohorts (breeding seasons 2014-2018) from the Australian National Botanic Garden. Each individual (rows) is scored at 1376 loci (columns) with the scores meaning: 0: homozygotes at common allele; 1: homozygote at rare allele and 2: heterozygotes; NA: locus could not be scored.
genotypes
genotypes
A data frame with 1407 rows (individuals) and 1376 variables (loci)
Andrew Cockburn, [email protected]
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
This function calculates the number genetic mismatches according to the hiphop and hot test for any combination of offspring-potential.dam-potential.sire. The HOT test (Homozygous Opposite Test; Huisman 2017) compares the genotype of an offspring with a potential parent: a mismatch is scored when both the offspring and parent are homozygous, but for different alleles. The HIPHOP test (Homozygous Identical Parents, Heterozygous Offspring are Precluded; Cockburn et al. in revision) compares the genotype of an offspring with both potential parents: a mismatch is scored when the offspring is heterozygous and both parent are homozygous for the same allele The resulting output can next be summarized using the 'topmatch()' function.
hothiphop(ind, gen)
hothiphop(ind, gen)
ind |
The input file with individuals, which should contain at least the columns brood, individual, type, social.parent, year. |
gen |
The input file with genotypes, which should contain the loci-names as column headers and the individual names as row-header and should only contain the values 0, 1, 2 or NA. |
a dataframe with all possible offspring-potential.dam-potential.sire combinations and their mismatch scores according to the HOT and HIPHOP test, the number of loci this was based on, and some additional relevant information about the social parents and potential dam and sires
the year or cohort that is being considered, adults can be potential dam or sire in some years, but no in others
an identifier of the brood to which the offspring and adults belong/are associated with
an identifier of the offspring
an identifier of the potential dam
an identifier of the potential sire
the sum of the hiphop and hot.parents mismatch score
the hiphop mismatch score of the offspring with the potential dam and potential sire, expressed as the number of loci giving mismatches
the hot score of the offspring with both the potential dam and sire, expressed as the number of loci giving mismatches
the hot score of the offspring with the potential dam, expressed as the number of loci giving mismatches
the hot mismatch score of the offspring with the potential sire , expressed as the number of loci giving mismatches
the sum of the hot.dam and hiphop mismatch score
the sum of the hot.sire and hiphop mismatch score
the number of loci at which both the offspring and dam were not NA
the number of loci at which both the offspring and sire were not NA
the number of loci at which the offspring, dam and sire were not NA
proportion of loci at which the offspring was heterozygous
if the social.mother genotypic data is in the genotypes file then equal to 1, else 0
if the social.father genotypic data is in the genotypes file then equal to 1, else 0
if the potential dam is the social mother then equal to 1, else 0
if the potential sire is the social father then equal to 1, else 0
if the potential dam is part of the same group (i.e. associated with the same brood) as the offspring, then equal to 1, else 0
if the potential sire is part of the same group (i.e. associated with the same brood) as the offspring, then equal to 1, else 0
identity of the social mother of the offspring
identity of the social father of the offspring
Martijn van de Pol, [email protected]
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
Huisman, J. (2017). Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond. Molecular ecology resources, 17(5), 1009-1024.
results<-hothiphop(ind=individuals[1:22,], gen=genotypes) head(results) best<-topmatch(x=results, ranking="hothiphop.parents") head(best)
results<-hothiphop(ind=individuals[1:22,], gen=genotypes) head(results) best<-topmatch(x=results, ranking="hothiphop.parents") head(best)
A list of 2527 individuals (superb fairy wrens; Malurus cyaneus)) of which their genetics are to be compared to determine parentage. The dataset consists of 1153 offspring, 469 adult females that are potential dams and 905 adult males that are potential sires. Data is from five cohorts (breeding seasons 2014-2018) from the Australian National Botanic Garden. Note that individuals can occur multiple times in the dataset, as adults can have parentage in multiple years. Also offspring can become adults in future years. Also if an individual is associated with multiple broods as social parent this means there can be multiple records per year. Note that the columns social parent and brood are only used to determine whether a potential dam or sire is the social parent, an extra-group (technically extra-brood) parent or within-group parent that is not the social parent (a subordinate).
individuals
individuals
A data frame with 2527 rows and 6 variables:
an identifier of the brood to which the offspring and adults belong/are associated with
an identifier of individual
denotes whether the individual is an offspring, adult female (potential dam) or adult male (potential sire)
if the individual is the social parent of the brood then equal to 1, else 0
the year or cohort that is being considered, adults can be potential dam or sire in some years, but no in others
Andrew Cockburn, [email protected]
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
This function allows one to inspect the genotypes of all individuals in your input file to see whether genetic data is available, and if so what the proportion of homozygotes, heterozygotes fraction of missing values is.
inspect(ind, gen)
inspect(ind, gen)
ind |
The input file with individuals, which should contain at least the columns brood, individual, type, social.parent, year. |
gen |
The input file with genotypes, which should contain the loci-names as column headers and the individual names as row-header and should only contain the values 0, 1, 2 or NA. |
the individuals file with a summary of the genotypes attached. #'
an identifier of the brood to which the offspring and adults belong/are associated with
an identifier of the offspring
denotes whether the individual is an offspring, adult female (potential dam) or adult male (potential sire)
if the individual is the social parent of the brood then equal to 1, else 0
the year or cohort that is being considered, adults can be potential dam or sire in some years, but no in others
if the individual's genotypic data is in the genotypes file then equal to 1, else 0
the proportion of loci that was scored as 0 (homozygotes at common allele)
the proportion of loci that was scored as 1 (homozygotes at rare allele)
the proportion of loci that was scored as 2 (heterozygote)
the proportion of loci that has missing values (NA)
the number of loci that has genotypic information (i.e. not NA)
Martijn van de Pol, [email protected]
overview<-inspect(ind=individuals[1:22,], gen=genotypes) head(overview)
overview<-inspect(ind=individuals[1:22,], gen=genotypes) head(overview)
This function summarizes per offspring the top combinations of dam and sire that have the least genetic mismatches according to the hot (Huisman 2017) and/or hiphop (Cockburn et al. in revision) test criteria. In addition to the top matched combinations the summary always also list the social parent, if not among the top X. The user can choose whether one wants to look for the most likely dam and sire with and without assuming that the social mother (or father) is the genetic parent. Furthermore, one can choose on which test score to rank individuals. For more information and worked examples, see the vignette and Cockburn et al. (in revision).
topmatch( x, ranking, condition = "none", thres = 99999, top = 3, unique = "pair" )
topmatch( x, ranking, condition = "none", thres = 99999, top = 3, unique = "pair" )
x |
The file with hot and hiphop scores that is generated by the 'hothiphop()' function. |
ranking |
This sets the mismatch criterion in which dams and sires are ranked, possibilities include ranking="hothiphop.parents", "hiphop", "hot.parents", hot.dam", "hot.sire", "hothiphop.dam", "hothiphop.sire". In some situations it can be useful to supply two ranking criteria, for example to avoid ties (e.g. ranking=c("hot.dam", "hiphop)) |
condition |
Whether or not one wants to condition on either the social mother (condition="mother") or social father (condition="father"). The default is condition="none" in which case all possible dams and sires are considered as genetic parents. |
thres |
Sets a threshold value for the number of mismatches, if parents have less mismatches then the variable below.threshold in the output will have a value of 1. If 0>=thres<1, the number of mismatches will be re-expressed as a proportion of all loci sampled (not NA). If -1>thres<0 the mismatch scores are expressed as the number of mismatched divided by the number of loci at which the offspring is heterozygote (for the HIPHOP test) or divided by number of loci at which the offspring is homozygote (for the HOT test). This standardization may be sometimes useful for the recalculating the HIPHOP score to account for the fact that some offspring can be scored at more loci than others for this test. By default thres is set to a large integer (99999). |
top |
Sets the number of top matches that is shown in the output #' @return a dataframe with for each offspring the top X (default top 3, giving 3 rows for each offspring) dam and sire combinations and their mismatch scores according to the HOT and HIPHOP test, the number of loci this was based on, and some additional relevant information about the social parents and potential dam and sires. In addition to the top X offspring-dam-sire with the fewest mismatches, the scores of the social parents are also always listed (if they were not already in the top X ranking). #'
|
unique |
When using the ranking criteria hot.dam or hot.sire the best ranked dam or sire will occupy the entire top 3 and this does not allow comparing among dams. By setting unique to "dam" (or "sire") the output shows the top 3 with the best record for each unique dam (or sire). |
Martijn van de Pol, [email protected]
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
Huisman, J. (2017). Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond. Molecular ecology resources, 17(5), 1009-1024.
results<-hothiphop(ind=individuals[1:22,], gen=genotypes) best<-topmatch(x=results, ranking="hothiphop.parents") head(best)
results<-hothiphop(ind=individuals[1:22,], gen=genotypes) best<-topmatch(x=results, ranking="hothiphop.parents") head(best)