This vignette provides a complete, step-by-step guide to performing
KIR allele imputation using the impute command in
PONG2.
The workflow covers:
--fill-missing)| Requirement | Version | Notes |
|---|---|---|
| PLINK2 | ≥ 2.0 | Must be in PATH |
| R | ≥ 4.0 | With PONG2 installed |
| minimac4 | ≥ 4.1.6 | Only for --fill-missing |
| Eagle2 | ≥ 2.4 | Only for pre-phasing |
| bgzip & tabix | HTSlib | Only for --fill-missing |
PONG2 works best when input files are restricted to chromosome 19 (covering the KIR locus). Extract chr19 from your full-genome PLINK files:
This creates chr19_only.bed,
chr19_only.bim, and chr19_only.fam.
# --filter can be 0.005 or 0.01
# 0.005 allows more rare KIR alleles in the output
pong2 impute \
-i chr19_only \
-o results/basic \
-l KIR3DL1 \
-a hg38 \
-t 16 \
--filter 0.005PONG2 will automatically check the SNP overlap between your data and the 1KGP reference panel in the KIR region and report the match rate.
NOTE: KIR Region SNP Overlap between input data and 1KGP
Overlap rate is computed between your input data and the 1000 Genomes Project (1KGP) reference panel in the KIR region:
Assembly KIR Region Coordinates hg19 chr19:55,000,000–55,400,000 hg38 chr19:54,000,000–55,000,000
Overlap Rate Status Action ≥ 50% Pass Proceed with PONG2 directly < 50% Fail Run Eagle2 + pre-imputation first
If your match rate is sufficient (≥ 50%), PONG2 will proceed automatically. If not, use one of the pre-imputation strategies below.
Pre-phasing the KIR region is required before any pre-imputation strategy.
Pass the pre-phased VCF directly to PONG2 using --vcf
and --fill-missing.
Important:
--vcfis the only input required with--fill-missing.
PLINK files cannot hold phased haplotype data — the pipeline derives everything from the VCF internally. Do not supply-itogether with--fill-missing.
pong2 impute \
--vcf chr19.phased.vcf.gz \
-o results/local_impute \
-l KIR3DL1 \
-a hg19 \
-t 20 \
--filter 0.005 \
--fill-missingPre-impute your chr19 data using a public server before running PONG2. This is the approach used in the PONG2 manuscript.
The phased VCF from Eagle2 (chr19.phased.vcf.gz) is
ready for upload. If you need to export from PLINK first:
Eagle v2.4 if uploading unphased data;
skip if already phasedAfter pong2 impute completes, results are saved in
<output>/KIR/:
| File | Description |
|---|---|
KIR/<locus>.csv |
Predicted KIR alleles per sample (main results) |
KIR/<locus>.RData |
Full prediction object including allele probabilities |
sample.id, KIR3DL1.1, KIR3DL1.2, prob.KIR3DL1.1, prob.KIR3DL1.2
HG00096, KIR3DL1*001, KIR3DL1*002, 0.98, 0.95
HG00097, KIR3DL1*005, KIR3DL1*015, 0.87, 0.91
For datasets with >2,000 samples, PONG2 automatically splits prediction into chunks of 2,000 samples to prevent memory issues. Results are combined and saved as a single output file — no action required from the user.
| Scenario | Recommended approach |
|---|---|
| SNP overlap ≥ 50% | Run pong2 impute -i directly |
| SNP overlap < 50%, quick run needed | Eagle2 → pong2 impute --vcf --fill-missing |
| SNP overlap < 50%, highest accuracy | Eagle2 → Michigan Server → pong2 impute -i |
| Low overlap, understand risks | pong2 impute -i --force |
Happy KIR imputation! 🧬