Title: | Phasing, Pedigree Reconstruction, Sire Imputation and Recombination Events Identification of Half-sib Families Using SNP Data |
---|---|
Description: | Identification of recombination events, haplotype reconstruction, sire imputation and pedigree reconstruction using half-sib family SNP data. |
Authors: | Mohammad Ferdosi <[email protected]>, Cedric Gondro <[email protected]> |
Maintainer: | Mohammad Ferdosi <[email protected]> |
License: | GPL-3 |
Version: | 2.0.3 |
Built: | 2024-12-13 06:34:17 UTC |
Source: | CRAN |
Identification of recombination events, haplotype reconstruction and sire imputation using half-sib family SNP data.
Package: | hsphase |
Type: | Package |
Version: | 2.0.1 |
Date: | 2014-6-17 |
License: | GPL 3 |
Main Functions: bmh
: Block partitioningssp
: Sire inferenceaio
: Phasingimageplot
: Image plot of the block structure rpoh
: Reconstruct pedigree based on opposing homozygote
Auxiliary Functions hss
: Half-sib family splittercs
: Chromosome splitterpara
: Parallel data analysis
Note: These functions can be used to analyse large datasets.
Mohammad H. Ferdosi <[email protected]>, Cedric Gondro <[email protected]> Maintainer: Mohammad H. Ferdosi <[email protected]>
Ferdosi, M. H., Kinghorn, B. P., van der Werf, J. H., & Gondro, C (2013). Effect of genotype and pedigree error on detection of recombination events, sire imputation and haplotype inference using the hsphase algorithm. In Proc. Assoc. Advmt. Anim. Breed. Genet (Vol. 20, pp. 546-549). AAABG; Napier, New Zealand.
Ferdosi, M. H., Kinghorn, B. P., van der Werf, J. H., & Gondro, C. (2014). Detection of recombination events, haplotype reconstruction and imputation of sires using half-sib SNP genotypes. Genetics, selection, evolution: GSE, 46(1), 11.
Ferdosi, M. H., Kinghorn, B. P., van der Werf, J. H., Lee, S. H., & Gondro, C. (2014). hsphase: an R package for pedigree reconstruction, detection of recombination events, phasing and imputation of half-sib family groups. BMC Bioinformatics, 15(1), 172.
Ferdosi, M. H., & Boerner, V. (2014). A fast method for evaluating opposing homozygosity in large SNP data sets. Livestock Science.
genotype <- matrix(c( 0,0,0,0,1,2,2,2,0,0,2,0,0,0, 2,2,2,2,1,0,0,0,2,2,2,2,2,2, 2,2,2,2,1,2,2,2,0,0,2,2,2,2, 2,2,2,2,0,0,0,0,2,2,2,2,2,2, 0,0,0,0,0,2,2,2,2,2,2,0,0,0), ncol = 14, byrow = TRUE) ssp(bmh(genotype), genotype) aio(genotype) imageplot(bmh(genotype), title = "ImagePlot example") rplot(genotype, c(1:14))
genotype <- matrix(c( 0,0,0,0,1,2,2,2,0,0,2,0,0,0, 2,2,2,2,1,0,0,0,2,2,2,2,2,2, 2,2,2,2,1,2,2,2,0,0,2,2,2,2, 2,2,2,2,0,0,0,0,2,2,2,2,2,2, 0,0,0,0,0,2,2,2,2,2,2,0,0,0), ncol = 14, byrow = TRUE) ssp(bmh(genotype), genotype) aio(genotype) imageplot(bmh(genotype), title = "ImagePlot example") rplot(genotype, c(1:14))
Calculates a symmetric matrix of distances between genotypes, based on a given genotype matrix. Each row in the 'GenotypeMatrix' represents a genotype, and each column represents a marker. The genotype is coded as 0 for AA, 1 for AB, and 2 for BB. Use 9 to represent missing data.
.fastdist(GenotypeMatrix)
.fastdist(GenotypeMatrix)
GenotypeMatrix |
A matrix where each row represents a genotype and each column represents a marker. Genotypes should be coded as 0 for AA, 1 for AB, and 2 for BB, with 9 representing missing data. |
Returns a symmetric matrix of distances between the genotypes specified in the 'GenotypeMatrix'. Row and column names of the returned matrix correspond to the row names of the 'GenotypeMatrix'.
# Simulate genotype data for 40 individuals across 1000 SNPs # genotypes <- simulateHalfsib(numInd = 40, numSNP = 1000, recbound = 0:6, type = "genotype") # Calculate the distance matrix # dist_matrix <- fastdist(genotypes) # Display the distance matrix # print(dist_matrix)
# Simulate genotype data for 40 individuals across 1000 SNPs # genotypes <- simulateHalfsib(numInd = 40, numSNP = 1000, recbound = 0:6, type = "genotype") # Calculate the distance matrix # dist_matrix <- fastdist(genotypes) # Display the distance matrix # print(dist_matrix)
This function calculates the minor allele frequency (MAF) for a given single nucleotide polymorphism (SNP) data. The SNP data should be coded numerically: 0 for homozygous for the first allele (AA), 1 for heterozygous (AB), and 2 for homozygous for the second allele (BB). Missing data should be coded as 9.
.maf(snp)
.maf(snp)
snp |
A numeric vector representing the genotype of individuals for a single SNP. The genotype should be coded as 0 for AA, 1 for AB, and 2 for BB. Use 9 to represent missing data. |
A numeric value representing the minor allele frequency (MAF) for the SNP data provided.
snp_data <- c(0, 0, 1, 2, 2, 9) maf_value <- .maf(snp_data) print(maf_value)
snp_data <- c(0, 0, 1, 2, 2, 9) maf_value <- .maf(snp_data) print(maf_value)
This function simulates genotypes for a set of half-siblings based on specified parameters, including the number of individuals, the number of SNPs, recombination boundaries, and the type of data to return. It generates a sire genotype, maternal half-sib genotypes, and combines these to simulate offspring genotypes, optionally returning phased genotypes based on recombination events.
.simulateHalfsib( numInd = 40, numSNP = 10000, recbound = 0:6, type = "genotype" )
.simulateHalfsib( numInd = 40, numSNP = 10000, recbound = 0:6, type = "genotype" )
numInd |
Integer, the number of half-siblings to simulate. |
numSNP |
Integer, the number of SNPs to simulate for each individual. |
recbound |
Numeric vector, specifying the range of possible recombination events to simulate. |
type |
Character string, specifying the type of data to return: "genotype" for genotypic data or any other string for phased genotypic data. |
Depending on the type
parameter, this function returns a matrix of simulated genotypic data
for half-siblings. If type
is "genotype", it returns unphased genotypic data; otherwise, it returns phased genotypic data.
sim_genotypes <- .simulateHalfsib(numInd = 40, numSNP = 10000, recbound = 0:6, type = "genotype") dim(sim_genotypes) # Should return 40 rows (individuals) and 100 columns (SNPs)
sim_genotypes <- .simulateHalfsib(numInd = 40, numSNP = 10000, recbound = 0:6, type = "genotype") dim(sim_genotypes) # Should return 40 rows (individuals) and 100 columns (SNPs)
Phasing of a half-sib family group.
aio(genotypeMatrix, bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3, output = "phase")
aio(genotypeMatrix, bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3, output = "phase")
genotypeMatrix |
|
bmh_forwardVectorSize |
|
bmh_excludeFP |
|
bmh_nsap |
|
output |
|
This function calls the bmh
, ssp
and phf
functions.
Returns a list of matrices. The first element (phasedHalfsibs) is a matrix with two rows (phased haplotypes) per individual (first paternal and second maternal). Data in format 0 (A), 1 (B) and 9 (unphased or missing).
The second (sireHaplotype) and third (blockStructure) elements are the same as the output of ssp
and bmh
.
Only this function needs to be called to phase a half-sib family. The genotype's matrix must contain individuals from only one half-sib family and one ordered chromosome.
genotype <- matrix(c( # Define a Half-sib Genotype Matrix 2,1,0, # Individual 1 2,0,0, # Individual 2 0,0,2 # Individual 3 ), byrow = TRUE, ncol = 3) # There are 3 individulas with three SNPs aio(genotype) # The genotypes must include only one half-sib family and one chromosome
genotype <- matrix(c( # Define a Half-sib Genotype Matrix 2,1,0, # Individual 1 2,0,0, # Individual 2 0,0,2 # Individual 3 ), byrow = TRUE, ncol = 3) # There are 3 individulas with three SNPs aio(genotype) # The genotypes must include only one half-sib family and one chromosome
Identifies the block structure (chromosome segments) in the half-sib family that each individual inherited from its sire.
bmh(GenotypeMatrix, forwardVectorSize = 30, excludeFP = TRUE, nsap = 3)
bmh(GenotypeMatrix, forwardVectorSize = 30, excludeFP = TRUE, nsap = 3)
GenotypeMatrix |
|
forwardVectorSize |
|
excludeFP |
|
nsap |
|
Returns a matrix of the blocking structure that contains 1s, 2s and 0s. 1s and 2s are the two sire strands. The choice of strand is arbitrary for each chromosome and not consistent across chromosomes. 0s indicate regions of unknown origin.
The genotype's matrix must contain individuals from only one half-sib family and one ordered chromosome.
genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) (result <- bmh(genotype))
genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) (result <- bmh(genotype))
Detect all possible crossover events.
co(genotypeMatrix)
co(genotypeMatrix)
genotypeMatrix |
|
Returns a matrix with the number of crossover events for each site.
genotype <- matrix(c( # Define a Half-sib Genotype Matrix 2,1,0, # Individual 1 2,0,2, # Individual 2 0,0,2 # Individual 3 ), byrow = TRUE, ncol = 3) # There are 3 individuals with three SNPs co(genotype)
genotype <- matrix(c( # Define a Half-sib Genotype Matrix 2,1,0, # Individual 1 2,0,2, # Individual 2 0,0,2 # Individual 3 ), byrow = TRUE, ncol = 3) # There are 3 individuals with three SNPs co(genotype)
This function splits the genotypes list generated by hss
into the different chromosomes based on a map file and orders SNP based on chromosomal position.
cs(halfsib, mapPath, separator = " ")
cs(halfsib, mapPath, separator = " ")
halfsib |
|
mapPath |
|
separator |
|
The map file should include only the chromosomes that will be analyzed. For example, the Y and X chromosomes should be excluded (and others optionally). Names of each element in the list can be used for further categorization. The header must be "Name Chr Position".
Returns a list of matrices, the number of elements in this list is the number of half-sib families multiplied by the number of chromosomes.
# Please run demo(hsphase)
# Please run demo(hsphase)
This data set serves as an example of a genotype matrix intended for use with the hsphase
package.
data(genotypes)
data(genotypes)
The data set is a genotype matrix with specific structure, including:
Columns: Represent Single Nucleotide Polymorphisms (SNPs). Each column corresponds to a specific SNP.
Rows: Represent individual animals. Each row corresponds to the genotypic data for a single animal across various SNPs.
Creates a blocking structure matrix of the half-sib family based on phased data of the sire and half-sib family.
hbp(PhasedGenotypeMatrix, PhasedSireGenotype, strand = "auto")
hbp(PhasedGenotypeMatrix, PhasedSireGenotype, strand = "auto")
PhasedGenotypeMatrix |
|
PhasedSireGenotype |
|
strand |
|
Returns a matrix where 3 or 4 stands for the SNP originating in, respectively, strands 1 and 2. 0 indicates that the source strand for the SNP is unknown.
The input matrices must only contain individuals from one half-sib family and one ordered chromosome.
The strand
option should be set to "auto" (default value).
sire <- matrix(c( 0,0,0,0,0,1, # Haplotype one of the sire 0,1,1,1,1,0 # Haplotype two of the sire ), byrow = TRUE, ncol = 6) haplotypeHalfsib <- matrix(c( 1,0,1,1,1,1, # Individual one, haplotype one 0,1,0,0,0,0, # Individual one, haplotype two 0,1,1,0,1,1, # Individual two, haplotype one 1,0,0,1,0,0 # Individual two, haplotype two ), byrow = TRUE, ncol = 6) # 0s and 1s are alelle a and b hbp(haplotypeHalfsib, sire)
sire <- matrix(c( 0,0,0,0,0,1, # Haplotype one of the sire 0,1,1,1,1,0 # Haplotype two of the sire ), byrow = TRUE, ncol = 6) haplotypeHalfsib <- matrix(c( 1,0,1,1,1,1, # Individual one, haplotype one 0,1,0,0,0,0, # Individual one, haplotype two 0,1,1,0,1,1, # Individual two, haplotype one 1,0,0,1,0,0 # Individual two, haplotype two ), byrow = TRUE, ncol = 6) # 0s and 1s are alelle a and b hbp(haplotypeHalfsib, sire)
The hh
function creates a heatmap of the half-sib families using the
matrix of opposing homozygotes.
hh(oh, inferredPedigree, realPedigree, pedOnly = TRUE)
hh(oh, inferredPedigree, realPedigree, pedOnly = TRUE)
oh |
|
inferredPedigree |
|
realPedigree |
|
pedOnly |
|
Returns the heatmap of the matrix of opposing homozygotes with sidebars colour coded by sires from the inferred and original pedigree.
The fuction uses the colour generated by getcol function in the made4 package (Aedin Culhane).
c1h1 <- .simulateHalfsib(numInd = 62, numSNP = 5000) c1h2 <- .simulateHalfsib(numInd = 38, numSNP = 5000) Genotype <- rbind(c1h1, c1h2) oh <- ohg(Genotype) # creating the Opposing Homozygote matrix hh(oh)
c1h1 <- .simulateHalfsib(numInd = 62, numSNP = 5000) c1h2 <- .simulateHalfsib(numInd = 38, numSNP = 5000) Genotype <- rbind(c1h1, c1h2) oh <- ohg(Genotype) # creating the Opposing Homozygote matrix hh(oh)
Splits the dataset into half-sib family groups based on a pedigree.
hss(pedigree, genotype, check = TRUE)
hss(pedigree, genotype, check = TRUE)
pedigree |
|
genotype |
|
check |
|
Only half-sib groups that have more than 3 individuals will be returned.
Returns a list of numeric matrices, each matrix is a half-sib family.
Pedigree must have at least two columns with sample ids (Column 1) and sire ids (Column 2).
# Please run demo(hsphase)
# Please run demo(hsphase)
Create an imageplot of the blocking structure.
imageplot(x, title, rv = FALSE, ...)
imageplot(x, title, rv = FALSE, ...)
x |
|
title |
|
rv |
|
... |
Can be used to set xLabels and yLabels axis. |
White indicates regions of unknown origin, red and blue correspond to the two sire strands.
This is a modified version of a function written by Chris Seidel.
http://www.phaget4.org/R/image_matrix.html
genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) # each row contains the SNP of individuals imageplot(bmh(genotype))
genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) # each row contains the SNP of individuals imageplot(bmh(genotype))
Impute the paternal strand from low density to high density utilising high density sire haplotype.
impute(halfsib_genotype_ld, sire_hd, bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3)
impute(halfsib_genotype_ld, sire_hd, bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3)
halfsib_genotype_ld |
|
sire_hd |
|
bmh_forwardVectorSize |
|
bmh_excludeFP |
|
bmh_nsap |
|
Return an imputed half-sib matrix.
This data set is an example of a map file used within the hsphase
package to demonstrate the mapping of SNPs to their respective locations on chromosomes.
data(map)
data(map)
The data set is formatted as a data frame with the following columns, providing essential information about each SNP:
Name: The unique identifier or name of the SNP.
Chr: The chromosome on which the SNP is located.
Position: The position of the SNP on the chromosome, expressed in base pairs.
Counts the number of opposing homozygotes for each animal that caused a heterozygus site in the sire.
ohd(genotypeMatrix, unique_check = FALSE, SNPs = 6000)
ohd(genotypeMatrix, unique_check = FALSE, SNPs = 6000)
genotypeMatrix |
|
unique_check |
|
SNPs |
|
Returns a vector with the number of heterozygous sites that each sample caused.
This function can be used to identify pedigree errors; i.e., the outliers.
This method is suggested by Bruce Tier <[email protected]> to identify pedigree errors.
genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) ohd(genotype)
genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) ohd(genotype)
Creates a matrix of opposing homozygotes from the genotype matrix.
ohg(genotypeMatrix)
ohg(genotypeMatrix)
genotypeMatrix |
|
Returns a square matrix (sample X sample) with the pairwise counts of opposing homozygotes.
This function can be slow with a large data set. The fast version of this function will be available after publish of the related manuscript.
Ferdosi, M. H., & Boerner, V. (2014). A fast method for evaluating opposing homozygosity in large SNP data sets. Livestock Science.
genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) ohg(genotype)
genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) ohg(genotype)
Plot the sorted vectorized matrix of Opposing Homozygotes.
ohplot(oh, genotype, pedigree, check = FALSE)
ohplot(oh, genotype, pedigree, check = FALSE)
oh |
|
genotype |
|
pedigree |
|
check |
|
The cut off
line shows the edge of most different groups.
set.seed(100) chr <- list() sire <- list() set.seed(1) chr <- list() for(i in 1:5) { chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10) sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]]) sire[[i]] <- sire[[i]][1,] + sire[[i]][2,] sire[[i]][sire[[i]] == 18] <- 9 } Genotype <- do.call(rbind, chr) rownames(Genotype) <- 6:(nrow(Genotype) + 5) sire <- do.call(rbind, sire) rownames(sire) <- 1:5 Genotype <- rbind(sire, Genotype) oh <- ohg(Genotype) # creating the Opposing Homozygote matrix pedigree <- as.matrix(data.frame(c(1:5, 6:(nrow(Genotype))), rep = c(rep(0,5), rep(1:5, rep(20,5))))) ohplot(oh, Genotype, pedigree, check = TRUE)
set.seed(100) chr <- list() sire <- list() set.seed(1) chr <- list() for(i in 1:5) { chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10) sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]]) sire[[i]] <- sire[[i]][1,] + sire[[i]][2,] sire[[i]][sire[[i]] == 18] <- 9 } Genotype <- do.call(rbind, chr) rownames(Genotype) <- 6:(nrow(Genotype) + 5) sire <- do.call(rbind, sire) rownames(sire) <- 1:5 Genotype <- rbind(sire, Genotype) oh <- ohg(Genotype) # creating the Opposing Homozygote matrix pedigree <- as.matrix(data.frame(c(1:5, 6:(nrow(Genotype))), rep = c(rep(0,5), rep(1:5, rep(20,5))))) ohplot(oh, Genotype, pedigree, check = TRUE)
This function uses the list of matrices (the output of cs
) and runs one of the options, on each element of the list, in parallel.
para(halfsibs, cpus = 1, option = "bmh", type = "SOCK", bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3, pmMethod = "constant")
para(halfsibs, cpus = 1, option = "bmh", type = "SOCK", bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3, pmMethod = "constant")
halfsibs |
|
cpus |
|
option |
|
type |
|
bmh_forwardVectorSize |
|
bmh_excludeFP |
|
bmh_nsap |
|
pmMethod |
|
Type of analysis can be bmh
, ssp
, aio
, pm
, or rec (refer to pm
, rplot
and vignette for more information about rec).
Returns a list of matrices with the results (formats specific to the option selected).
# Please run demo(hsphase)
# Please run demo(hsphase)
This dataset provides an example of a pedigree, specifically designed for use with the hsphase
package.
data(pedigree)
data(pedigree)
The dataset is structured as a data frame with detailed familial relationships, including:
First Column: Identifiers for half-sibs.
Second Column: Identifiers for sires.
Tries to link the inferred pedigree from rpoh
with the sire IDs in the original pedigree and fix pedigree errors.
pedigreeNaming(inferredPedigree, realPedigree)
pedigreeNaming(inferredPedigree, realPedigree)
inferredPedigree |
|
realPedigree |
|
This function calls the bmh
and recombinations
functions to count the number of recombinations in each half-sib group.
Returns the inferred pedigree with the best fit to the sire names used in the original pedigree file.
# Please run demo(hsphase)
# Please run demo(hsphase)
Phases the half-sib family by using the blocking structure and imputed sire matrices.
phf(GenotypeMatrix, blockMatrix, sirePhasedMatrix)
phf(GenotypeMatrix, blockMatrix, sirePhasedMatrix)
GenotypeMatrix |
|
blockMatrix |
|
sirePhasedMatrix |
|
Returns a matrix that contains the phased parental haplotypes of the half-sibs. It uses 1, 0 and 9 for A, B and missing.
The genotype matrix must only contain individuals from one half-sib family and one ordered chromosome.
This function is used by the aio
function for complete phasing of a half-sib group.
genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2), byrow = TRUE, ncol = 3) block <- bmh(genotype) phf(genotype, block, ssp(block, genotype))
genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2), byrow = TRUE, ncol = 3) block <- bmh(genotype) phf(genotype, block, ssp(block, genotype))
Creates a recombination matrix based on the blocking structure.
pm(blockMatrix, method = "constant")
pm(blockMatrix, method = "constant")
blockMatrix |
|
method |
|
This function finds the recombination between two consecutive sites, and marks the recombination site with a 1; if there are unknown sites between two blocks it will also mark these sites with a 1 (constant
method) or 1 divided by number of unknown site (relative
method).
genotype <- matrix(c( 0,2,0,1,0, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) # each row contains the SNP of individuals (result <- bmh(genotype)) pm(result)
genotype <- matrix(c( 0,2,0,1,0, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) # each row contains the SNP of individuals (result <- bmh(genotype)) pm(result)
Assign offsprings to the parents.
pogc(oh, genotypeError)
pogc(oh, genotypeError)
oh |
|
genotypeError |
|
Return a data frame with two columns. The first column is the animal ID and the second column is the parent ID.
set.seed(100) chr <- list() sire <- list() set.seed(1) chr <- list() for(i in 1:5) { chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10) sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]]) sire[[i]] <- sire[[i]][1,] + sire[[i]][2,] sire[[i]][sire[[i]] == 18] <- 9 } Genotype <- do.call(rbind, chr) rownames(Genotype) <- 6:(nrow(Genotype) + 5) sire <- do.call(rbind, sire) rownames(sire) <- 1:5 Genotype <- rbind(sire, Genotype) oh <- ohg(Genotype) # creating the Opposing Homozygote matrix pogc(oh, 5)
set.seed(100) chr <- list() sire <- list() set.seed(1) chr <- list() for(i in 1:5) { chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10) sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]]) sire[[i]] <- sire[[i]][1,] + sire[[i]][2,] sire[[i]][sire[[i]] == 18] <- 9 } Genotype <- do.call(rbind, chr) rownames(Genotype) <- 6:(nrow(Genotype) + 5) sire <- do.call(rbind, sire) rownames(sire) <- 1:5 Genotype <- rbind(sire, Genotype) oh <- ohg(Genotype) # creating the Opposing Homozygote matrix pogc(oh, 5)
This function reads and checks genotype files.
readGenotype(genotypePath, separatorGenotype = " ", check = TRUE)
readGenotype(genotypePath, separatorGenotype = " ", check = TRUE)
genotypePath |
|
separatorGenotype |
|
check |
|
Returns the genotype matrix.
Please refer to vignette for more information.
# A comprehensive demo and example dataset is available from # http://www-personal.une.edu.au/~cgondro2/hsphase.html
# A comprehensive demo and example dataset is available from # http://www-personal.une.edu.au/~cgondro2/hsphase.html
Counts the number of recombinations for each individual.
recombinations(blockMatrix)
recombinations(blockMatrix)
blockMatrix |
|
Returns a vector of recombinations. The number of elements in this vector is equal to the number of individuals, i.e. each element holds the number of recombinations identified for each sample.
genotype <- matrix(c( 2,1,0,0, 2,0,2,2, 0,0,2,2, 0,2,0,0 ), byrow = TRUE, ncol = 4) recombinations(bmh(genotype))
genotype <- matrix(c( 2,1,0,0, 2,0,2,2, 0,0,2,2, 0,2,0,0 ), byrow = TRUE, ncol = 4) recombinations(bmh(genotype))
This function creates a plot which shows the sum of all recombination events across a half-sib family.
rplot(x, distance, start = 1, end = ncol(x), maximum = 100, overwrite = FALSE, method = "constant")
rplot(x, distance, start = 1, end = ncol(x), maximum = 100, overwrite = FALSE, method = "constant")
x |
|
distance |
|
start |
|
end |
|
maximum |
|
overwrite |
|
method |
|
genotype <- matrix(c( 0,2,0,1,0, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) # each row contains the SNP of individuals rplot(genotype, c(1,2,3,4,8))
genotype <- matrix(c( 0,2,0,1,0, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) # each row contains the SNP of individuals rplot(genotype, c(1,2,3,4,8))
Reconstructs a half-sib pedigree based on a matrix of opposing homozygotes.
rpoh(genotypeMatrix, oh, forwardVectorSize = 30, excludeFP = TRUE, nsap = 3, maxRec = 15, intercept = 26.3415, coefficient = 77.3171, snpnooh, method, maxsnpnooh)
rpoh(genotypeMatrix, oh, forwardVectorSize = 30, excludeFP = TRUE, nsap = 3, maxRec = 15, intercept = 26.3415, coefficient = 77.3171, snpnooh, method, maxsnpnooh)
genotypeMatrix |
|
oh |
|
forwardVectorSize |
|
excludeFP |
|
nsap |
|
maxRec |
|
intercept |
|
coefficient |
|
snpnooh |
|
method |
|
maxsnpnooh |
|
Four methods simple, recombinations, calus and manual can be
utilized to reconstruct the pedigree.
The following examples show the arguments require for each method.
pedigree1 <- rpoh(oh = oh, snpnooh = 732, method = "simple")
pedigree2 <- rpoh(genotypeMatrix = genotypeChr1, oh = ohg(genotype), maxRec = 10 , method = "recombinations")
pedigree3 <- rpoh(genotypeMatrix = genotype, oh = oh, method = "calus")
pedigree4 <- rpoh(oh = oh, maxsnpnooh = 31662, method = "manual")
Returns a data frame with two columns, the first column is animals' ID and the second column is sire identifiers (randomly generated).
Method can be recombinations, simple, calus or manual. Please refer to vignette for more information.
The sire genotype should be removed before using this function utilizing pogc
function.
bmh
and recombinations
# Please run demo(hsphase)
# Please run demo(hsphase)
Infer (impute) and phase sire's genotype based on the block structure matrix (recombination blocks) and homozygous sites of the half-sib genotype matrix.
ssp(blockMatrix, genotypeMatrix)
ssp(blockMatrix, genotypeMatrix)
blockMatrix |
|
genotypeMatrix |
|
Returns a matrix (Imputed Sire) with two rows one for each haplotype of the sire (columns are SNP in the order of the genotype matrix). Alleles are coded as 0 (A) and 1 (B). Alleles that could not be imputed are coded as 9.
genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) # each row contains the SNP of individuals (result <- ssp(bmh(genotype), genotype))
genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0), ncol = 5, byrow = TRUE) # each row contains the SNP of individuals (result <- ssp(bmh(genotype), genotype))