| Title: | Phasing, Pedigree Reconstruction, Sire Imputation and Recombination Events Identification of Half-sib Families Using SNP Data |
|---|---|
| Description: | Identification of recombination events, haplotype reconstruction, sire imputation and pedigree reconstruction using half-sib family SNP data. |
| Authors: | Mohammad Ferdosi [aut, cre], Cedric Gondro [aut] |
| Maintainer: | Mohammad Ferdosi <[email protected]> |
| License: | GPL-3 |
| Version: | 3.0.0 |
| Built: | 2026-05-18 09:25:43 UTC |
| Source: | https://github.com/cran/hsphase |
Identification of recombination events, haplotype reconstruction and sire imputation using half-sib family SNP data.
| Package: | hsphase |
| Type: | Package |
| Version: | 3.0.0 |
| Date: | 2026-02-15 |
| License: | GPL-3 |
Main functions:bmh: Block partitioningssp: Sire inferenceaio: Phasingimageplot: Image plot of the block structurerpoh: Reconstruct pedigree based on opposing homozygotes
Auxiliary functions:hss: Half-sib family splittercs: Chromosome splitterpara: Parallel data analysis
Mohammad H. Ferdosi [email protected], Cedric Gondro [email protected]
Maintainer: Mohammad H. Ferdosi [email protected]
Ferdosi, M. H., Kinghorn, B. P., van der Werf, J. H., & Gondro, C. (2013).
Effect of genotype and pedigree error on detection of recombination events, sire imputation and haplotype inference using the hsphase algorithm.
In Proc. Assoc. Advmt. Anim. Breed. Genet (Vol. 20, pp. 546–549). AAABG; Napier, New Zealand.
Ferdosi, M. H., Kinghorn, B. P., van der Werf, J. H. J., & Gondro, C. (2014).
Detection of recombination events, haplotype reconstruction and imputation of sires using half-sib SNP genotypes.
Genetics Selection Evolution, 46(1), 11.
Ferdosi, M. H., Kinghorn, B. P., van der Werf, J. H. J., Lee, S. H., & Gondro, C. (2014).
hsphase: an R package for pedigree reconstruction, detection of recombination events, phasing and imputation of half-sib family groups.
BMC Bioinformatics, 15(1), 172.
Ferdosi, M. H., & Boerner, V. (2014).
A fast method for evaluating opposing homozygosity in large SNP data sets.
Livestock Science.
Sahoo S., Ferdosi M.H., van der Werf J.H.J., and de las Heras-Saldana S., et al. (2025) Proc. Assoc. Advmt. Anim. Breed. Genet. 26: 323
genotype <- matrix(c( 0,0,0,0,1,2,2,2,0,0,2,0,0,0, 2,2,2,2,1,0,0,0,2,2,2,2,2,2, 2,2,2,2,1,2,2,2,0,0,2,2,2,2, 2,2,2,2,0,0,0,0,2,2,2,2,2,2, 0,0,0,0,0,2,2,2,2,2,2,0,0,0 ), ncol = 14, byrow = TRUE) ssp(bmh(genotype), genotype) aio(genotype) imageplot(bmh(genotype), title = "ImagePlot example") rplot(genotype, 1:14)genotype <- matrix(c( 0,0,0,0,1,2,2,2,0,0,2,0,0,0, 2,2,2,2,1,0,0,0,2,2,2,2,2,2, 2,2,2,2,1,2,2,2,0,0,2,2,2,2, 2,2,2,2,0,0,0,0,2,2,2,2,2,2, 0,0,0,0,0,2,2,2,2,2,2,0,0,0 ), ncol = 14, byrow = TRUE) ssp(bmh(genotype), genotype) aio(genotype) imageplot(bmh(genotype), title = "ImagePlot example") rplot(genotype, 1:14)
Calculates a symmetric matrix of distances (canberra) between genotypes, based on a given genotype matrix. Each row in the 'GenotypeMatrix' represents a genotype, and each column represents a marker.
.fastdist(GenotypeMatrix).fastdist(GenotypeMatrix)
GenotypeMatrix |
A matrix where each row represents a genotype and each column represents a marker. Genotypes should be coded as 0 for AA, 1 for AB, and 2 for BB, with 9 representing missing data. |
Returns a symmetric matrix of distances (canberra) between the genotypes specified in the 'GenotypeMatrix'. Row and column names of the returned matrix correspond to the row names of the 'GenotypeMatrix'.
# Simulate genotype data for 40 individuals across 1000 SNPs genotypes <- .simulateHalfsib(numInd = 5, numSNP = 1000, recbound = 0:6, type = "genotype") # Calculate the distance matrix dist_matrix <- hsphase::.fastdist(genotypes) print(dist_matrix)# Simulate genotype data for 40 individuals across 1000 SNPs genotypes <- .simulateHalfsib(numInd = 5, numSNP = 1000, recbound = 0:6, type = "genotype") # Calculate the distance matrix dist_matrix <- hsphase::.fastdist(genotypes) print(dist_matrix)
Internal helper to enforce a consistent strand-label orientation across adjacent columns of a block-structure matrix.
.fixRotation(blockStructure).fixRotation(blockStructure)
blockStructure |
A numeric matrix (typically individuals in rows, SNPs in columns) representing a block/strand structure. Values are expected to be small integers (commonly including '0', '1', '2' and possibly other internal codes). |
The input typically encodes sire strand-of-origin labels per individual (rows) and marker/SNP (columns), where '0' indicates unknown and non-zero values indicate an assigned strand/state. The native algorithm compares each column to the previous one and, when a "contrast" (swap of strand labels) increases agreement, it relabels the next column to reduce apparent strand-rotation between columns.
This function is a thin R wrapper around the native routine
fixRotation implemented in C++ and called via .Call().
At each step, the C++ code computes an agreement score between column
i and column i+1 using only positions where both columns are
non-zero. It also computes the score after applying a contrast mapping to
column i+1 (conceptually swapping strand labels '1' and '2', leaving
'0' unchanged). If the contrasted version agrees more with column i,
the function relabels column i+1.
The relabeling performed by the native code is:
'1 -> 3'
'2 -> 1'
'3 -> 2'
leaving other values unchanged. (These codes are part of hsphase's internal block/strand encoding.)
A numeric matrix with the same dimensions as blockStructure,
where some entries in column i+1 may be relabeled to improve
consistency with column i. The transformation is applied iteratively
from left to right across columns.
bmh, ssp, aio for creation
and downstream usage of block structures.
Internal wrapper around the native C routine hblock. It transforms a
BMH result matrix into a block representation, with an optional maximum block
size constraint.
.hblock(bmhResult, MaxBlock = 400).hblock(bmhResult, MaxBlock = 400)
bmhResult |
A numeric/integer matrix containing BMH results (block matching/haplotype-block intermediate output). Must be a matrix. |
MaxBlock |
Integer scalar. Maximum block size (default: 400). |
This function transposes and flattens bmhResult before passing it to
compiled code via .C:
.C("hblock", ...).
A matrix (same general shape as bmhResult) containing inferred
block structure. Row and column names are propagated from bmhResult
where available.
Calculates the minor allele frequency (MAF) for a single SNP coded as: 0 = AA, 1 = AB, 2 = BB, and 9 = missing.
.maf(snp).maf(snp)
snp |
A numeric vector of genotypes for one SNP. Values must be 0, 1, 2, or 9 (missing). |
A single numeric value: the minor allele frequency (MAF).
snp_data <- c(0, 0, 1, 2, 2, 9) .maf(snp_data)snp_data <- c(0, 0, 1, 2, 2, 9) .maf(snp_data)
Converts a haplotype matrix where each individual is represented by one row and alleles are stored in alternating columns (1st allele, 2nd allele, ...) into a two-row-per-individual representation.
.o2tH(haplotype).o2tH(haplotype)
haplotype |
A haplotype object:
|
Internally, any allele code of '2' is converted to '0' before conversion.
An integer matrix in a two-row-per-individual format with
2 * nrow(haplotype) rows and ncol(haplotype) / 2 columns.
Row names are interleaved using the original individual names.
Converts a haplotype matrix where each individual is represented by two rows (allele 1 and allele 2) into a single-row-per-individual representation.
.ptr2por(haplotype).ptr2por(haplotype)
haplotype |
A matrix containing haplotypes with two rows per individual. |
A matrix with one row per individual (allele 1 and allele 2 combined).
This function simulates genotypes for a set of half-siblings based on specified parameters, including the number of individuals, the number of SNPs, recombination boundaries, and the type of data to return. It generates a sire genotype, maternal half-sib genotypes, and combines these to simulate offspring genotypes, optionally returning phased genotypes based on recombination events.
.simulateHalfsib( numInd = 40, numSNP = 10000, recbound = 0:6, type = "genotype" ).simulateHalfsib( numInd = 40, numSNP = 10000, recbound = 0:6, type = "genotype" )
numInd |
Integer, the number of half-siblings to simulate. |
numSNP |
Integer, the number of SNPs to simulate for each individual. |
recbound |
Numeric vector, specifying the range of possible recombination events to simulate. |
type |
Character string, specifying the type of data to return: "genotype" for genotypic data or any other string for phased genotypic data. |
Depending on the type parameter, this function returns a matrix of simulated genotypic data
for half-siblings. If type is "genotype", it returns unphased genotypic data; otherwise, it returns phased genotypic data.
sim_genotypes <- .simulateHalfsib(numInd = 40, numSNP = 10000, recbound = 0:6, type = "genotype") dim(sim_genotypes) # Should return 40 rows (individuals) and 100 columns (SNPs)sim_genotypes <- .simulateHalfsib(numInd = 40, numSNP = 10000, recbound = 0:6, type = "genotype") dim(sim_genotypes) # Should return 40 rows (individuals) and 100 columns (SNPs)
Add switch points to haplotypes by swapping the two haplotype rows for an individual from each switch point to the end of the chromosome.
addSwitch(haplotypeMatrix, switchPoints, minLength)addSwitch(haplotypeMatrix, switchPoints, minLength)
haplotypeMatrix |
|
switchPoints |
|
minLength |
|
Important: Each switch point causes a swap of the two haplotype rows for that individual from the switch position to the end.
The switchPoints list must have one element per individual (not per haplotype row).
If an element is 0, no switch is applied for that individual.
If you rely on minLength to ignore nearby switches, verify your installed version enforces this rule.
A matrix of the same dimension as haplotypeMatrix with switches applied.
groupMatSingle and fixSW
haplotype <- matrix(c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0), byrow = TRUE, nrow = 6) switchPoints <- list(firstInd = c(2), secondInd = c(1, 3), lastInd = 0) addSwitch(haplotype, switchPoints, 0)haplotype <- matrix(c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0), byrow = TRUE, nrow = 6) switchPoints <- list(firstInd = c(2), secondInd = c(1, 3), lastInd = 0) addSwitch(haplotype, switchPoints, 0)
Phasing of a single half-sib family group (single ordered chromosome).
aio(genotypeMatrix, bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3, bmh_fillMissing = FALSE, output = "phase")aio(genotypeMatrix, bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3, bmh_fillMissing = FALSE, output = "phase")
genotypeMatrix |
|
bmh_forwardVectorSize |
|
bmh_excludeFP |
|
bmh_nsap |
|
bmh_fillMissing |
|
output |
|
This function calls bmh, ssp, and phf.
If output = "phase", returns a haplotype matrix with two rows per individual (first paternal, second maternal),
coded as 0 (allele A), 1 (allele B), and 9 (missing/unphased).
Otherwise returns a list with elements:
phasedHalfsibs
sireHaplotype
blockStructure
Only this function needs to be called to phase a half-sib family. The genotype matrix must contain individuals from a single family and a single ordered chromosome.
genotype <- matrix(c( # Define a Half-sib Genotype Matrix 2,1,0, # Individual 1 2,0,0, # Individual 2 0,0,2 # Individual 3 ), byrow = TRUE, ncol = 3) # There are 3 individulas with three SNPs aio(genotype) # The genotypes must include only one half-sib family and one chromosomegenotype <- matrix(c( # Define a Half-sib Genotype Matrix 2,1,0, # Individual 1 2,0,0, # Individual 2 0,0,2 # Individual 3 ), byrow = TRUE, ncol = 3) # There are 3 individulas with three SNPs aio(genotype) # The genotypes must include only one half-sib family and one chromosome
Identifies the block structure (chromosome segments) in a half-sib family that each individual inherited from its sire.
bmh( GenotypeMatrix, forwardVectorSize = 30, excludeFP = TRUE, nsap = 3, fillMissing = FALSE )bmh( GenotypeMatrix, forwardVectorSize = 30, excludeFP = TRUE, nsap = 3, fillMissing = FALSE )
GenotypeMatrix |
|
forwardVectorSize |
|
excludeFP |
|
nsap |
|
fillMissing |
|
A matrix of block structure containing 1, 2, and 0.
Values 1 and 2 represent the two sire strands (arbitrary labeling within a chromosome),
and 0 indicates unknown origin.
The genotype matrix must contain individuals from only one half-sib family and one ordered chromosome.
genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) bmh(genotype)genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) bmh(genotype)
Detects possible crossover segments by comparing pairs of individuals in a half-sib family.
co(genotypeMatrix)co(genotypeMatrix)
genotypeMatrix |
|
A matrix Returns a matrix with the number of crossover events for each site.
genotype <- matrix(c( 2,1,0, 2,0,2, 0,0,2 ), byrow = TRUE, ncol = 3) co(genotype)genotype <- matrix(c( 2,1,0, 2,0,2, 0,0,2 ), byrow = TRUE, ncol = 3) co(genotype)
Splits the genotype list generated by hss into chromosomes based on a map file/data.frame and orders SNPs by chromosomal position.
cs(halfsib, mapPath, separator = " ")cs(halfsib, mapPath, separator = " ")
halfsib |
|
mapPath |
|
separator |
|
The map file should include only the chromosomes that will be analyzed. For example, the Y and X chromosomes should be excluded (and others optionally). Names of each element in the list can be used for further categorization. The header must be "Name Chr Position".
Returns a list of matrices, the number of elements in this list is the number of half-sib families multiplied by the number of chromosomes.
# Please run demo(hsphase)# Please run demo(hsphase)
Fix switch errors in haplotypes for a half-sib family.
fixSW(haplotype, ohMax = 0, windowsSize = 100, minLength = 100, cpus = 2)fixSW(haplotype, ohMax = 0, windowsSize = 100, minLength = 100, cpus = 2)
haplotype |
|
ohMax |
|
windowsSize |
|
minLength |
|
cpus |
|
A haplotype matrix with switch errors corrected.
haplotype <- .simulateHalfsib(7, 2500, type = "haplotype")$phased switches <- list(500,0,0,1200,c(1000,2000),500,1200) haplotype2 <- addSwitch(haplotype, switches, 0) gMat <- groupMatSingle(haplotype2, 100, 2, "haplotype") imageplot(gMat, title = "Before fixing switches") haplotype3 <- fixSW(haplotype2, 0, 100, 100) gMat2 <- groupMatSingle(haplotype3, 100, 2, "haplotype") imageplot(gMat2, title = "After fixing switches")haplotype <- .simulateHalfsib(7, 2500, type = "haplotype")$phased switches <- list(500,0,0,1200,c(1000,2000),500,1200) haplotype2 <- addSwitch(haplotype, switches, 0) gMat <- groupMatSingle(haplotype2, 100, 2, "haplotype") imageplot(gMat, title = "Before fixing switches") haplotype3 <- fixSW(haplotype2, 0, 100, 100) gMat2 <- groupMatSingle(haplotype3, 100, 2, "haplotype") imageplot(gMat2, title = "After fixing switches")
An example genotype matrix for the hsphase package.
data(genotypes)data(genotypes)
A genotype matrix with:
Columns: SNPs
Rows: animals/individuals
Genotype coding follows the package conventions (typically 0, 1, 2 and 9 for missing).
Sahoo S., Ferdosi M.H., van der Werf J.H.J., and de las Heras-Saldana S., et al. (2025) Proc. Assoc. Advmt. Anim. Breed. Genet. 26: 323
Group the genotype or haplotype of a half-sib family into partitions using opposing homozygotes.
groupMatSingle(haplotype, windowsSize, cpus = 2, input = "haplotype", oh = 0)groupMatSingle(haplotype, windowsSize, cpus = 2, input = "haplotype", oh = 0)
haplotype |
|
windowsSize |
|
cpus |
|
input |
|
oh |
|
A grouping matrix.
haplotype <- .simulateHalfsib(10, 5000, type = "haplotype")$phased gMat <- groupMatSingle(haplotype, 100, 2, "haplotype") imageplot(gMat)haplotype <- .simulateHalfsib(10, 5000, type = "haplotype")$phased gMat <- groupMatSingle(haplotype, 100, 2, "haplotype") imageplot(gMat)
Creates a block-structure matrix for a half-sib family based on phased data of the sire and the half-sib family.
hbp(PhasedGenotypeMatrix, PhasedSireGenotype, strand = "auto")hbp(PhasedGenotypeMatrix, PhasedSireGenotype, strand = "auto")
PhasedGenotypeMatrix |
|
PhasedSireGenotype |
|
strand |
|
A matrix in which 3 or 4 indicates the SNP originates from, respectively, sire strand 1 or strand 2.
0 indicates the origin is unknown.
The input matrices must contain individuals from a single half-sib family and a single ordered chromosome. The SNP order must match between inputs.
sire <- matrix(c( 0,0,0,0,0,1, # Haplotype one of the sire 0,1,1,1,1,0 # Haplotype two of the sire ), byrow = TRUE, ncol = 6) haplotypeHalfsib <- matrix(c( 1,0,1,1,1,1, # Individual one, haplotype one 0,1,0,0,0,0, # Individual one, haplotype two 0,1,1,0,1,1, # Individual two, haplotype one 1,0,0,1,0,0 # Individual two, haplotype two ), byrow = TRUE, ncol = 6) # 0s and 1s are alelle a and b hbp(haplotypeHalfsib, sire)sire <- matrix(c( 0,0,0,0,0,1, # Haplotype one of the sire 0,1,1,1,1,0 # Haplotype two of the sire ), byrow = TRUE, ncol = 6) haplotypeHalfsib <- matrix(c( 1,0,1,1,1,1, # Individual one, haplotype one 0,1,0,0,0,0, # Individual one, haplotype two 0,1,1,0,1,1, # Individual two, haplotype one 1,0,0,1,0,0 # Individual two, haplotype two ), byrow = TRUE, ncol = 6) # 0s and 1s are alelle a and b hbp(haplotypeHalfsib, sire)
Creates a heatmap of a half-sib dataset using an opposing-homozygotes (OH) matrix, with optional sidebars showing inferred and/or real pedigree groupings.
hh(oh, inferredPedigree, realPedigree, pedOnly = TRUE)hh(oh, inferredPedigree, realPedigree, pedOnly = TRUE)
oh |
|
inferredPedigree |
|
realPedigree |
|
pedOnly |
|
Returns a heatmap of the OH matrix with sidebars color-coded by sire groups from the inferred and original pedigrees (where provided).
The function uses colors generated by the getcol function in the made4 package (Aedin Culhane).
c1h1 <- .simulateHalfsib(numInd = 62, numSNP = 5000) c1h2 <- .simulateHalfsib(numInd = 38, numSNP = 5000) Genotype <- rbind(c1h1, c1h2) oh <- ohg(Genotype) hh(oh)c1h1 <- .simulateHalfsib(numInd = 62, numSNP = 5000) c1h2 <- .simulateHalfsib(numInd = 38, numSNP = 5000) Genotype <- rbind(c1h1, c1h2) oh <- ohg(Genotype) hh(oh)
Splits the dataset into half-sib family groups based on a pedigree.
hss(pedigree, genotype, minHS = 4, check = TRUE)hss(pedigree, genotype, minHS = 4, check = TRUE)
pedigree |
|
genotype |
|
minHS |
|
check |
|
Only half-sib groups that have more than 3 individuals will be returned.
Returns a list of numeric matrices, each matrix is a half-sib family.
Pedigree must have at least two columns with sample ids (Column 1) and sire ids (Column 2).
# Please run demo(hsphase)# Please run demo(hsphase)
Create an image plot of the blocking structure.
imageplot(x, title = c(), rv = FALSE, ...)imageplot(x, title = c(), rv = FALSE, ...)
x |
|
title |
|
rv |
|
... |
Optional graphical parameters. |
White indicates regions of unknown origin; red and blue correspond to the two sire strands.
This is a modified version of a function written by Chris Seidel, available at http://www.phaget4.org/R/image_matrix.html.
genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) imageplot(bmh(genotype))genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) imageplot(bmh(genotype))
Impute the paternal strand from low density to high density utilising high density sire haplotype.
impute(halfsib_genotype_ld, sire_hd, bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3)impute(halfsib_genotype_ld, sire_hd, bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3)
halfsib_genotype_ld |
|
sire_hd |
|
bmh_forwardVectorSize |
|
bmh_excludeFP |
|
bmh_nsap |
|
Return an imputed half-sib matrix.
An example map dataset for the hsphase package.
data(map)data(map)
A data.frame with the following columns:
SNP identifier
Chromosome
SNP position in base pairs
Sahoo S., Ferdosi M.H., van der Werf J.H.J., and de las Heras-Saldana S., et al. (2025) Proc. Assoc. Advmt. Anim. Breed. Genet. 26: 323
Counts, for each animal, the number of loci where it contributes opposing homozygotes in sites that imply heterozygosity in the sire.
ohd(genotypeMatrix, unique_check = FALSE, SNPs = 6000)ohd(genotypeMatrix, unique_check = FALSE, SNPs = 6000)
genotypeMatrix |
|
unique_check |
|
SNPs |
|
A numeric vector with the number of heterozygous sites that each sample caused.
This function can be used to identify pedigree errors; i.e., outliers with unusually high values.
This method was suggested by Bruce Tier <[email protected]> to identify pedigree errors.
genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) ohd(genotype)genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) ohd(genotype)
Creates a matrix of pairwise opposing-homozygote (OH) counts from a genotype matrix.
ohg(genotypeMatrix)ohg(genotypeMatrix)
genotypeMatrix |
|
Returns a square matrix (sample sample) of pairwise counts of opposing homozygotes.
(Some versions may return this matrix inside a named list element.)
This function can be slow for large datasets.
Ferdosi, M. H., & Boerner, V. (2014). A fast method for evaluating opposing homozygosity in large SNP data sets. Livestock Science.
genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) ohg(genotype)genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) ohg(genotype)
Plot the sorted vectorized matrix of Opposing Homozygotes.
ohplot(oh, genotype, pedigree, check = FALSE)ohplot(oh, genotype, pedigree, check = FALSE)
oh |
|
genotype |
|
pedigree |
|
check |
|
The cut off line shows the edge of most different groups.
set.seed(100) chr <- list() sire <- list() set.seed(1) chr <- list() for(i in 1:5) { chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10) sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]]) sire[[i]] <- sire[[i]][1,] + sire[[i]][2,] sire[[i]][sire[[i]] == 18] <- 9 } Genotype <- do.call(rbind, chr) rownames(Genotype) <- 6:(nrow(Genotype) + 5) sire <- do.call(rbind, sire) rownames(sire) <- 1:5 Genotype <- rbind(sire, Genotype) oh <- ohg(Genotype) # creating the Opposing Homozygote matrix pedigree <- as.matrix(data.frame(c(1:5, 6:(nrow(Genotype))), rep = c(rep(0,5), rep(1:5, rep(20,5))))) ohplot(oh, Genotype, pedigree, check = TRUE)set.seed(100) chr <- list() sire <- list() set.seed(1) chr <- list() for(i in 1:5) { chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10) sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]]) sire[[i]] <- sire[[i]][1,] + sire[[i]][2,] sire[[i]][sire[[i]] == 18] <- 9 } Genotype <- do.call(rbind, chr) rownames(Genotype) <- 6:(nrow(Genotype) + 5) sire <- do.call(rbind, sire) rownames(sire) <- 1:5 Genotype <- rbind(sire, Genotype) oh <- ohg(Genotype) # creating the Opposing Homozygote matrix pedigree <- as.matrix(data.frame(c(1:5, 6:(nrow(Genotype))), rep = c(rep(0,5), rep(1:5, rep(20,5))))) ohplot(oh, Genotype, pedigree, check = TRUE)
This function uses the list of matrices (the output of cs) and runs one of the options, on each element of the list, in parallel.
para(halfsibs, cpus = 1, option = "bmh", type = "SOCK", bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3, bmh_fillMissing = FALSE,pmMethod = "constant")para(halfsibs, cpus = 1, option = "bmh", type = "SOCK", bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE, bmh_nsap = 3, bmh_fillMissing = FALSE,pmMethod = "constant")
halfsibs |
|
cpus |
|
option |
|
type |
|
bmh_forwardVectorSize |
|
bmh_excludeFP |
|
bmh_nsap |
|
bmh_fillMissing |
|
pmMethod |
|
Type of analysis can be bmh, ssp, aio, pm, or rec (refer to pm, rplot and vignette for more information about rec).
Returns a list of matrices with the results (formats specific to the option selected).
# Please run demo(hsphase)# Please run demo(hsphase)
An example pedigree dataset for the hsphase package.
data(pedigree)data(pedigree)
A data.frame with the following columns:
Half-sib (offspring) IDs
Sire IDs
Sahoo S., Ferdosi M.H., van der Werf J.H.J., and de las Heras-Saldana S., et al. (2025) Proc. Assoc. Advmt. Anim. Breed. Genet. 26: 323
Tries to link the inferred pedigree from rpoh with sire IDs in the original pedigree and fix pedigree errors.
pedigreeNaming(inferredPedigree, realPedigree)pedigreeNaming(inferredPedigree, realPedigree)
inferredPedigree |
|
realPedigree |
|
This function calls bmh and recombinations to count the number of recombinations in each half-sib group.
Returns the inferred pedigree with the best match to sire names used in the original pedigree file.
# Please run demo(hsphase)# Please run demo(hsphase)
Phases a half-sib family using the block structure and an imputed sire haplotype matrix.
phf(GenotypeMatrix, blockMatrix, sirePhasedMatrix)phf(GenotypeMatrix, blockMatrix, sirePhasedMatrix)
GenotypeMatrix |
|
blockMatrix |
|
sirePhasedMatrix |
|
Returns a matrix containing the phased parental haplotypes of the half-sibs (two rows per individual).
Alleles are coded as 0 (A), 1 (B), and 9 (missing/unphased).
The genotype matrix must contain individuals from one half-sib family and one ordered chromosome.
This function is used by aio for complete phasing of a half-sib group.
genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) block <- bmh(genotype) phf(genotype, block, ssp(block, genotype))genotype <- matrix(c( 2,1,0, 2,0,0, 0,0,2 ), byrow = TRUE, ncol = 3) block <- bmh(genotype) phf(genotype, block, ssp(block, genotype))
Creates a recombination (probability) matrix based on the blocking structure.
pm(blockMatrix, method = "constant")pm(blockMatrix, method = "constant")
blockMatrix |
|
method |
|
This function identifies recombination between two consecutive sites and marks recombination sites with 1.
If there are unknown sites between two blocks, it marks these sites with:
1 for the "constant" method, or
for the "relative" method, where is the number of unknown sites.
genotype <- matrix(c( 0,2,0,1,0, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) block <- bmh(genotype) pm(block)genotype <- matrix(c( 0,2,0,1,0, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) block <- bmh(genotype) pm(block)
Assign offspring to parents based on an opposing-homozygotes (OH) matrix.
pogc(oh, genotypeError)pogc(oh, genotypeError)
oh |
|
genotypeError |
|
A data.frame with two columns:
animal ID
assigned parent ID
set.seed(1) chr <- list() sire <- list() for(i in 1:5) { chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10) sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]]) sire[[i]] <- sire[[i]][1,] + sire[[i]][2,] sire[[i]][sire[[i]] == 18] <- 9 } Genotype <- do.call(rbind, chr) rownames(Genotype) <- 6:(nrow(Genotype) + 5) sire <- do.call(rbind, sire) rownames(sire) <- 1:5 Genotype <- rbind(sire, Genotype) oh <- ohg(Genotype) pogc(oh, 5)set.seed(1) chr <- list() sire <- list() for(i in 1:5) { chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10) sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]]) sire[[i]] <- sire[[i]][1,] + sire[[i]][2,] sire[[i]][sire[[i]] == 18] <- 9 } Genotype <- do.call(rbind, chr) rownames(Genotype) <- 6:(nrow(Genotype) + 5) sire <- do.call(rbind, sire) rownames(sire) <- 1:5 Genotype <- rbind(sire, Genotype) oh <- ohg(Genotype) pogc(oh, 5)
Reads a genotype file and optionally checks it for common formatting/data issues.
readGenotype(genotypePath, separatorGenotype = " ", check = TRUE)readGenotype(genotypePath, separatorGenotype = " ", check = TRUE)
genotypePath |
|
separatorGenotype |
|
check |
|
A genotype matrix.
Please refer to the vignette for more information.
Counts the number of recombinations for each individual based on the block structure.
recombinations(blockMatrix)recombinations(blockMatrix)
blockMatrix |
|
A numeric vector of recombination counts with length equal to the number of individuals (rows) in blockMatrix.
genotype <- matrix(c( 2,1,0,0, 2,0,2,2, 0,0,2,2, 0,2,0,0 ), byrow = TRUE, ncol = 4) recombinations(bmh(genotype))genotype <- matrix(c( 2,1,0,0, 2,0,2,2, 0,0,2,2, 0,2,0,0 ), byrow = TRUE, ncol = 4) recombinations(bmh(genotype))
Creates a plot showing the sum of recombination events across a half-sib family.
rplot(x, distance, start = 1, end = ncol(x), maximum = 100, overwrite = FALSE, method = "constant")rplot(x, distance, start = 1, end = ncol(x), maximum = 100, overwrite = FALSE, method = "constant")
x |
|
distance |
|
start |
|
end |
|
maximum |
|
overwrite |
|
method |
|
genotype <- matrix(c( 0,2,0,1,0, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) rplot(genotype, c(1,2,3,4,8))genotype <- matrix(c( 0,2,0,1,0, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) rplot(genotype, c(1,2,3,4,8))
Reconstructs a half-sib pedigree based on a matrix of opposing homozygotes.
rpoh(genotypeMatrix, oh, forwardVectorSize = 30, excludeFP = TRUE, nsap = 3, maxRec = 15, intercept = 26.3415, coefficient = 77.3171, snpnooh, method, maxsnpnooh)rpoh(genotypeMatrix, oh, forwardVectorSize = 30, excludeFP = TRUE, nsap = 3, maxRec = 15, intercept = 26.3415, coefficient = 77.3171, snpnooh, method, maxsnpnooh)
genotypeMatrix |
|
oh |
|
forwardVectorSize |
|
excludeFP |
|
nsap |
|
maxRec |
|
intercept |
|
coefficient |
|
snpnooh |
|
method |
|
maxsnpnooh |
|
Four methods simple, recombinations, calus and manual can be
utilized to reconstruct the pedigree.
The following examples show the arguments require for each method.
pedigree1 <- rpoh(oh = oh, snpnooh = 732, method = "simple")
pedigree2 <- rpoh(genotypeMatrix = genotypeChr1, oh = ohg(genotype), maxRec = 10 , method = "recombinations")
pedigree3 <- rpoh(genotypeMatrix = genotype, oh = oh, method = "calus")
pedigree4 <- rpoh(oh = oh, maxsnpnooh = 31662, method = "manual")
Returns a data frame with two columns, the first column is animals' ID and the second column is sire identifiers (randomly generated).
Method can be recombinations, simple, calus or manual. Please refer to vignette for more information.
The sire genotype should be removed before using this function utilizing pogc function.
bmh and recombinations
# Please run demo(hsphase)# Please run demo(hsphase)
Infers (imputes) and phases the sire haplotypes based on the block structure matrix and homozygous sites of the half-sib genotype matrix.
ssp(blockMatrix, genotypeMatrix)ssp(blockMatrix, genotypeMatrix)
blockMatrix |
|
genotypeMatrix |
|
A matrix with two rows (one per sire haplotype) and columns corresponding to SNPs in genotype order.
Alleles are coded as 0 (A) and 1 (B). Alleles that could not be imputed are coded as 9.
genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) ssp(bmh(genotype), genotype)genotype <- matrix(c( 0,2,1,1,1, 2,0,1,2,2, 2,2,1,0,2, 2,2,1,1,1, 0,0,2,1,0 ), ncol = 5, byrow = TRUE) ssp(bmh(genotype), genotype)
Detect switch errors in the haplotypes of a half-sib family.
switchDetector(groupMatrix)switchDetector(groupMatrix)
groupMatrix |
|
A list of integer vectors. The list length equals the number of individuals.
Each vector contains the locations of detected switch errors for that individual.
haplotype <- .simulateHalfsib(8, 3000, type = "haplotype")$phased switches <- list(2500,0,0,1200,c(1000,2000),500,2000,0) haplotype2 <- addSwitch(haplotype, switches, 0) gMat <- groupMatSingle(haplotype2, 100, 2, "haplotype") switchDetector(gMat)haplotype <- .simulateHalfsib(8, 3000, type = "haplotype")$phased switches <- list(2500,0,0,1200,c(1000,2000),500,2000,0) haplotype2 <- addSwitch(haplotype, switches, 0) gMat <- groupMatSingle(haplotype2, 100, 2, "haplotype") switchDetector(gMat)