| Title: | KIR Genotype Imputation and Model Training from SNP Array Data |
|---|---|
| Description: | A scalable and accurate tool for Killer-cell Immunoglobulin-like Receptor (KIR) genotype imputation directly from SNP array data using supervised machine learning models trained across five continental ancestry groups. Uses attribute bagging and an ensemble classifier method with haplotype inference for SNPs and KIR types. Models are built from global populations in the 1000 Genomes Project and validated across diverse biobank cohorts. Methods are based on Zheng et al. (2014) <doi:10.1016/j.ajhg.2013.12.015> and Sadeeq et al. (2026) <https://github.com/NormanLabUCD/PONG2>. |
| Authors: | Suraju A. Sadeeq [aut, cre], Laura A. Leaton [aut], Katherine M. Kichula [aut], Paul J. Norman [aut], Xiuwen Zheng [ctb, cph] (Original HIBAG C++ code adapted in src/PONG.cpp and src/LibKIR.cpp) |
| Maintainer: | Suraju A. Sadeeq <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.1 |
| Built: | 2026-06-24 10:27:30 UTC |
| Source: | https://github.com/cran/PONG2 |
Train KIR genotype prediction models using parallel attribute bagging
across multiple CPU cores. This is the core training function used
by the pong2 train CLI command.
kirParallelAttrBagging( cl, hla, snp, auto.save = "", nclassifier = 100, mtry = c("sqrt", "all", "one"), prune = TRUE, rm.na = TRUE, stop.cluster = FALSE, verbose = TRUE )kirParallelAttrBagging( cl, hla, snp, auto.save = "", nclassifier = 100, mtry = c("sqrt", "all", "one"), prune = TRUE, rm.na = TRUE, stop.cluster = FALSE, verbose = TRUE )
cl |
a cluster object created by |
hla |
a KIR allele table object of class |
snp |
a SNP genotype object of class |
auto.save |
character string; file path prefix for auto-saving
classifiers during training. Use |
nclassifier |
integer; number of individual ensemble classifiers to train (default: 100) |
mtry |
character; number of SNPs randomly selected at each node.
One of |
prune |
logical; if |
rm.na |
logical; if |
stop.cluster |
logical; if |
verbose |
logical; if |
An object of class hlaAttrBagClass representing the trained
PONG2 KIR prediction model. The object contains:
integer; number of training samples
integer; number of SNP predictors used
character; the KIR locus name
character vector; KIR alleles in the model
list; individual ensemble classifiers
numeric; out-of-bag accuracy estimate
Use kirPredict() to apply the model to new samples.
# Load example data data(PONG2_example) # Set up parallel cluster cl <- parallel::makeCluster(2) # Train a small model model <- kirParallelAttrBagging( cl = cl, hla = example_kir, snp = example_snp, nclassifier = 20, verbose = FALSE ) parallel::stopCluster(cl) # View model summary print(model) # Clean up hlaClose(model)# Load example data data(PONG2_example) # Set up parallel cluster cl <- parallel::makeCluster(2) # Train a small model model <- kirParallelAttrBagging( cl = cl, hla = example_kir, snp = example_snp, nclassifier = 20, verbose = FALSE ) parallel::stopCluster(cl) # View model summary print(model) # Clean up hlaClose(model)
Predict KIR genotypes for a set of samples using a trained PONG2
attribute bagging model. This is the core prediction function used
by the pong2 impute CLI command.
kirPredict( object, snp, cl = FALSE, type = c("response+dosage", "response", "prob", "response+prob"), vote = c("prob", "majority"), allele.check = TRUE, match.type = c("Position", "Pos+Allele", "RefSNP+Position", "RefSNP"), same.strand = FALSE, verbose = TRUE, verbose.match = TRUE )kirPredict( object, snp, cl = FALSE, type = c("response+dosage", "response", "prob", "response+prob"), vote = c("prob", "majority"), allele.check = TRUE, match.type = c("Position", "Pos+Allele", "RefSNP+Position", "RefSNP"), same.strand = FALSE, verbose = TRUE, verbose.match = TRUE )
object |
a PONG2 model object of class |
snp |
a SNP genotype object of class |
cl |
a cluster object for parallel computation, or |
type |
character; type of prediction output. One of:
|
vote |
character; voting method for ensemble classifiers.
One of |
allele.check |
logical; if |
match.type |
character; SNP matching method. One of
|
same.strand |
logical; if |
verbose |
logical; if |
verbose.match |
logical; if |
An object of class hlaAlleleClass containing KIR imputation
results. The object includes:
data frame with columns sample.id,
allele1, allele2, and prob (posterior
probability of the best call)
numeric matrix of allele dosage scores
(samples x alleles); NULL if type = "response"
numeric matrix of posterior probabilities
(alleles x samples); NULL unless
type = "response+prob" or "prob"
Samples with posterior probability below the call threshold
(CT) are assigned NA for both alleles.
# Load example data data(PONG2_example) # Load model from object model <- hlaModelFromObj(example_mobj) # Predict KIR genotypes pred <- kirPredict( object = model, snp = example_snp, type = "response+prob", verbose = FALSE ) # View results head(pred$value) # Clean up hlaClose(model)# Load example data data(PONG2_example) # Load model from object model <- hlaModelFromObj(example_mobj) # Predict KIR genotypes pred <- kirPredict( object = model, snp = example_snp, type = "response+prob", verbose = FALSE ) # View results head(pred$value) # Clean up hlaClose(model)
A small example dataset for demonstrating PONG2 functions. Contains 50 samples and 200 SNPs in the KIR region (chr19), along with a pre-trained KIR3DL1 model with 10 classifiers.
data(PONG2_example)data(PONG2_example)
Three objects are loaded:
A hlaSNPGenoClass object with 50 samples
and 200 SNPs in the KIR region (chr19, hg19 assembly)
A hlaAlleleClass object with KIR3DL1
allele calls for 50 samples
A hlaAttrBagObj object — a pre-trained
KIR3DL1 model with 10 ensemble classifiers
data(PONG2_example) # SNP data cat("Samples:", ncol(example_snp$genotype), "\n") cat("SNPs: ", nrow(example_snp$genotype), "\n") # KIR allele table cat("Locus: ", example_kir$locus, "\n") # Model cat("Classifiers:", length(example_mobj$classifiers), "\n")data(PONG2_example) # SNP data cat("Samples:", ncol(example_snp$genotype), "\n") cat("SNPs: ", nrow(example_snp$genotype), "\n") # KIR allele table cat("Locus: ", example_kir$locus, "\n") # Model cat("Classifiers:", length(example_mobj$classifiers), "\n")