An example list of how the output from the hothiphop function looks like. Listed are all possible triad combinations for 8 offspring from the 2018 cohort This dataframe is also used to generate the vignette. See the help file of the hothiphop function for an explanation of columns.
combinations
combinations
An object of class data.frame
with 24192 rows and 24 columns.
A dataset containing genotypes of 1407 superb fairy wrens (Malurus cyaneus) from five cohorts (breeding seasons 2014-2018) from the Australian National Botanic Garden. Each individual (rows) is scored at 1376 loci (columns) with the scores meaning: 0: homozygotes at common allele; 1: homozygote at rare allele and 2: heterozygotes; NA: locus could not be scored.
genotypes
genotypes
A data frame with 1407 rows (individuals) and 1376 variables (loci)
Andrew Cockburn, andrew.cockburn@anu.edu.au
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
This function calculates the number genetic mismatches according to the hiphop and hot test for any combination of offspring-potential.dam-potential.sire. The HOT test (Homozygous Opposite Test; Huisman 2017) compares the genotype of an offspring with a potential parent: a mismatch is scored when both the offspring and parent are homozygous, but for different alleles. The HIPHOP test (Homozygous Identical Parents, Heterozygous Offspring are Precluded; Cockburn et al. in revision) compares the genotype of an offspring with both potential parents: a mismatch is scored when the offspring is heterozygous and both parent are homozygous for the same allele The resulting output can next be summarized using the 'topmatch()' function.
hothiphop(ind, gen)
hothiphop(ind, gen)
ind |
The input file with individuals, which should contain at least the columns brood, individual, type, social.parent, year. |
gen |
The input file with genotypes, which should contain the loci-names as column headers and the individual names as row-header and should only contain the values 0, 1, 2 or NA. |
a dataframe with all possible offspring-potential.dam-potential.sire combinations and their mismatch scores according to the HOT and HIPHOP test, the number of loci this was based on, and some additional relevant information about the social parents and potential dam and sires
the year or cohort that is being considered, adults can be potential dam or sire in some years, but no in others
an identifier of the brood to which the offspring and adults belong/are associated with
an identifier of the offspring
an identifier of the potential dam
an identifier of the potential sire
the sum of the hiphop and hot.parents mismatch score
the hiphop mismatch score of the offspring with the potential dam and potential sire, expressed as the number of loci giving mismatches
the hot score of the offspring with both the potential dam and sire, expressed as the number of loci giving mismatches
the hot score of the offspring with the potential dam, expressed as the number of loci giving mismatches
the hot mismatch score of the offspring with the potential sire , expressed as the number of loci giving mismatches
the sum of the hot.dam and hiphop mismatch score
the sum of the hot.sire and hiphop mismatch score
the number of loci at which both the offspring and dam were not NA
the number of loci at which both the offspring and sire were not NA
the number of loci at which the offspring, dam and sire were not NA
proportion of loci at which the offspring was heterozygous
if the social.mother genotypic data is in the genotypes file then equal to 1, else 0
if the social.father genotypic data is in the genotypes file then equal to 1, else 0
if the potential dam is the social mother then equal to 1, else 0
if the potential sire is the social father then equal to 1, else 0
if the potential dam is part of the same group (i.e. associated with the same brood) as the offspring, then equal to 1, else 0
if the potential sire is part of the same group (i.e. associated with the same brood) as the offspring, then equal to 1, else 0
identity of the social mother of the offspring
identity of the social father of the offspring
Martijn van de Pol, martijn@myscience.eu
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
Huisman, J. (2017). Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond. Molecular ecology resources, 17(5), 1009-1024.
results<-hothiphop(ind=individuals[1:22,], gen=genotypes) head(results) best<-topmatch(x=results, ranking="hothiphop.parents") head(best)
results<-hothiphop(ind=individuals[1:22,], gen=genotypes) head(results) best<-topmatch(x=results, ranking="hothiphop.parents") head(best)
A list of 2527 individuals (superb fairy wrens; Malurus cyaneus)) of which their genetics are to be compared to determine parentage. The dataset consists of 1153 offspring, 469 adult females that are potential dams and 905 adult males that are potential sires. Data is from five cohorts (breeding seasons 2014-2018) from the Australian National Botanic Garden. Note that individuals can occur multiple times in the dataset, as adults can have parentage in multiple years. Also offspring can become adults in future years. Also if an individual is associated with multiple broods as social parent this means there can be multiple records per year. Note that the columns social parent and brood are only used to determine whether a potential dam or sire is the social parent, an extra-group (technically extra-brood) parent or within-group parent that is not the social parent (a subordinate).
individuals
individuals
A data frame with 2527 rows and 6 variables:
an identifier of the brood to which the offspring and adults belong/are associated with
an identifier of individual
denotes whether the individual is an offspring, adult female (potential dam) or adult male (potential sire)
if the individual is the social parent of the brood then equal to 1, else 0
the year or cohort that is being considered, adults can be potential dam or sire in some years, but no in others
Andrew Cockburn, andrew.cockburn@anu.edu.au
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
This function allows one to inspect the genotypes of all individuals in your input file to see whether genetic data is available, and if so what the proportion of homozygotes, heterozygotes fraction of missing values is.
inspect(ind, gen)
inspect(ind, gen)
ind |
The input file with individuals, which should contain at least the columns brood, individual, type, social.parent, year. |
gen |
The input file with genotypes, which should contain the loci-names as column headers and the individual names as row-header and should only contain the values 0, 1, 2 or NA. |
the individuals file with a summary of the genotypes attached. #'
an identifier of the brood to which the offspring and adults belong/are associated with
an identifier of the offspring
denotes whether the individual is an offspring, adult female (potential dam) or adult male (potential sire)
if the individual is the social parent of the brood then equal to 1, else 0
the year or cohort that is being considered, adults can be potential dam or sire in some years, but no in others
if the individual's genotypic data is in the genotypes file then equal to 1, else 0
the proportion of loci that was scored as 0 (homozygotes at common allele)
the proportion of loci that was scored as 1 (homozygotes at rare allele)
the proportion of loci that was scored as 2 (heterozygote)
the proportion of loci that has missing values (NA)
the number of loci that has genotypic information (i.e. not NA)
Martijn van de Pol, martijn@myscience.eu
overview<-inspect(ind=individuals[1:22,], gen=genotypes) head(overview)
overview<-inspect(ind=individuals[1:22,], gen=genotypes) head(overview)
This function summarizes per offspring the top combinations of dam and sire that have the least genetic mismatches according to the hot (Huisman 2017) and/or hiphop (Cockburn et al. in revision) test criteria. In addition to the top matched combinations the summary always also list the social parent, if not among the top X. The user can choose whether one wants to look for the most likely dam and sire with and without assuming that the social mother (or father) is the genetic parent. Furthermore, one can choose on which test score to rank individuals. For more information and worked examples, see the vignette and Cockburn et al. (in revision).
topmatch( x, ranking, condition = "none", thres = 99999, top = 3, unique = "pair" )
topmatch( x, ranking, condition = "none", thres = 99999, top = 3, unique = "pair" )
x |
The file with hot and hiphop scores that is generated by the 'hothiphop()' function. |
ranking |
This sets the mismatch criterion in which dams and sires are ranked, possibilities include ranking="hothiphop.parents", "hiphop", "hot.parents", hot.dam", "hot.sire", "hothiphop.dam", "hothiphop.sire". In some situations it can be useful to supply two ranking criteria, for example to avoid ties (e.g. ranking=c("hot.dam", "hiphop)) |
condition |
Whether or not one wants to condition on either the social mother (condition="mother") or social father (condition="father"). The default is condition="none" in which case all possible dams and sires are considered as genetic parents. |
thres |
Sets a threshold value for the number of mismatches, if parents have less mismatches then the variable below.threshold in the output will have a value of 1. If 0>=thres<1, the number of mismatches will be re-expressed as a proportion of all loci sampled (not NA). If -1>thres<0 the mismatch scores are expressed as the number of mismatched divided by the number of loci at which the offspring is heterozygote (for the HIPHOP test) or divided by number of loci at which the offspring is homozygote (for the HOT test). This standardization may be sometimes useful for the recalculating the HIPHOP score to account for the fact that some offspring can be scored at more loci than others for this test. By default thres is set to a large integer (99999). |
top |
Sets the number of top matches that is shown in the output #' @return a dataframe with for each offspring the top X (default top 3, giving 3 rows for each offspring) dam and sire combinations and their mismatch scores according to the HOT and HIPHOP test, the number of loci this was based on, and some additional relevant information about the social parents and potential dam and sires. In addition to the top X offspring-dam-sire with the fewest mismatches, the scores of the social parents are also always listed (if they were not already in the top X ranking). #'
|
unique |
When using the ranking criteria hot.dam or hot.sire the best ranked dam or sire will occupy the entire top 3 and this does not allow comparing among dams. By setting unique to "dam" (or "sire") the output shows the top 3 with the best record for each unique dam (or sire). |
Martijn van de Pol, martijn@myscience.eu
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers. Molecular Ecology Resources, in revision.
Huisman, J. (2017). Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond. Molecular ecology resources, 17(5), 1009-1024.
results<-hothiphop(ind=individuals[1:22,], gen=genotypes) best<-topmatch(x=results, ranking="hothiphop.parents") head(best)
results<-hothiphop(ind=individuals[1:22,], gen=genotypes) best<-topmatch(x=results, ranking="hothiphop.parents") head(best)