Title: | 'DEploid' Data Analysis and Results Interpretation |
Description: | 'DEploid' (Zhu et.al. 2018 <doi:10.1093/bioinformatics/btx530>) is designed for deconvoluting mixed genomes with unknown proportions. Traditional phasing programs are limited to diploid organisms. Our method modifies Li and Stephen’s algorithm with Markov chain Monte Carlo (MCMC) approaches, and builds a generic framework that allows haloptype searches in a multiple infection setting. This package provides R functions to support data analysis and results interpretation. |
Authors: | Joe Zhu [aut, cre] , Jacob Almagro-Garcia [aut], Gil McVean [aut], University of Oxford [cph], Yinghan Liu [ctb], CodeCogs Zyba Ltd [com, cph], Deepak Bandyopadhyay [com, cph], Lutz Kettner [com, cph] |
Maintainer: | Joe Zhu <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.0.1 |
Built: | 2024-12-19 18:35:56 UTC |
Source: | CRAN |
Compute observed allele frequency within sample from the allele counts.
computeObsWSAF(alt, ref)
computeObsWSAF(alt, ref)
alt |
Numeric array of alternative allele count. |
ref |
Numeric array of reference allele count. |
Numeric array of observed allele frequency within sample.
for histogram.
# Example 1 refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390CoverageTxt <- extractCoverageFromTxt(refFile, altFile) obsWSAF <- computeObsWSAF(PG0390CoverageTxt$altCount, PG0390CoverageTxt$refCount) # Example 2 vcfFile <- system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") PG0390CoverageVcf <- extractCoverageFromVcf(vcfFile, "PG0390-C") obsWSAF <- computeObsWSAF(PG0390CoverageVcf$altCount, PG0390CoverageVcf$refCount)
# Example 1 refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390CoverageTxt <- extractCoverageFromTxt(refFile, altFile) obsWSAF <- computeObsWSAF(PG0390CoverageTxt$altCount, PG0390CoverageTxt$refCount) # Example 2 vcfFile <- system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") PG0390CoverageVcf <- extractCoverageFromVcf(vcfFile, "PG0390-C") obsWSAF <- computeObsWSAF(PG0390CoverageVcf$altCount, PG0390CoverageVcf$refCount)
Extract read counts from tab-delimited text files of a single sample.
extractCoverageFromTxt(refFileName, altFileName)
extractCoverageFromTxt(refFileName, altFileName)
refFileName |
Path of the reference allele count file. |
altFileName |
Path of the alternative allele count file. |
A data.frame contains four columns: chromosomes, positions, reference allele count, alternative allele count.
The allele count files must be tab-delimited. The allele count files contain three columns: chromosomes, positions and allele count.
refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390 <- extractCoverageFromTxt(refFile, altFile)
refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390 <- extractCoverageFromTxt(refFile, altFile)
Extract VCF information
extractCoverageFromVcf(filename, samplename)
extractCoverageFromVcf(filename, samplename)
filename |
VCF file name. |
samplename |
Sample name |
A dataframe list with members of haplotypes, proportions and log likelihood of the MCMC chain.
SNP chromosomes.
SNP positions.
reference allele count.
alternative allele count.
vcfFile = system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") vcf = extractCoverageFromVcf(vcfFile, "PG0390-C")
vcfFile = system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") vcf = extractCoverageFromVcf(vcfFile, "PG0390-C")
Extract population level allele frequency (PLAF) from text file.
plafFileName |
Path of the PLAF text file. |
A numeric array of PLAF
The text file must have header, and population level allele frequency recorded in the "PLAF" field.
plafFile <- system.file("extdata", "labStrains.test.PLAF.txt", package = "DEploid.utils") plaf <- extractPLAF(plafFile)
plafFile <- system.file("extdata", "labStrains.test.PLAF.txt", package = "DEploid.utils") plaf <- extractPLAF(plafFile)
Plot the posterior probabilities of a haplotype given the refernece panel.
haplotypePainter( posteriorProbabilities, title = "", labelScaling, numberOfInbreeding = 0 )
haplotypePainter( posteriorProbabilities, title = "", labelScaling, numberOfInbreeding = 0 )
posteriorProbabilities |
Posterior probabilities matrix with the size of number of loci by the number of reference strain. |
title |
Figure title. |
labelScaling |
Scaling parameter for plotting. |
numberOfInbreeding |
Number of inbreading strains |
No return value called for side effects
Produce histogram of the allele frequency within sample.
histWSAF( obsWSAF, exclusive = TRUE, title = "Histogram 0<WSAF<1", cex.lab = 1, cex.main = 1, cex.axis = 1 )
histWSAF( obsWSAF, exclusive = TRUE, title = "Histogram 0<WSAF<1", cex.lab = 1, cex.main = 1, cex.axis = 1 )
obsWSAF |
Observed allele frequency within sample |
exclusive |
When TRUE 0 < WSAF < 1; otherwise 0 <= WSAF <= 1. |
title |
Histogram title |
cex.lab |
Label size. |
cex.main |
Title size. |
cex.axis |
Axis text size. |
# Example 1 refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390CoverageTxt <- extractCoverageFromTxt(refFile, altFile) obsWSAF <- computeObsWSAF(PG0390CoverageTxt$altCount, PG0390CoverageTxt$refCount) histWSAF(obsWSAF) myhist <- histWSAF(obsWSAF, FALSE) # Example 2 vcfFile <- system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") PG0390CoverageVcf <- extractCoverageFromVcf(vcfFile, "PG0390-C") obsWSAF <- computeObsWSAF(PG0390CoverageVcf$altCount, PG0390CoverageVcf$refCount) histWSAF(obsWSAF) myhist <- histWSAF(obsWSAF, FALSE)
# Example 1 refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390CoverageTxt <- extractCoverageFromTxt(refFile, altFile) obsWSAF <- computeObsWSAF(PG0390CoverageTxt$altCount, PG0390CoverageTxt$refCount) histWSAF(obsWSAF) myhist <- histWSAF(obsWSAF, FALSE) # Example 2 vcfFile <- system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") PG0390CoverageVcf <- extractCoverageFromVcf(vcfFile, "PG0390-C") obsWSAF <- computeObsWSAF(PG0390CoverageVcf$altCount, PG0390CoverageVcf$refCount) histWSAF(obsWSAF) myhist <- histWSAF(obsWSAF, FALSE)
Plot alternative allele count vs reference allele count at each site.
plotAltVsRef( ref, alt, title = "Alt vs Ref", exclude.ref = c(), exclude.alt = c(), potentialOutliers = c(), cex.lab = 1, cex.main = 1, cex.axis = 1 )
plotAltVsRef( ref, alt, title = "Alt vs Ref", exclude.ref = c(), exclude.alt = c(), potentialOutliers = c(), cex.lab = 1, cex.main = 1, cex.axis = 1 )
ref |
Numeric array of reference allele count. |
alt |
Numeric array of alternative allele count. |
title |
Figure title, "Alt vs Ref" by default |
exclude.ref |
Numeric array of reference allele count at sites that are not deconvoluted. |
exclude.alt |
Numeric array of alternative allele count at sites that are not deconvoluted |
potentialOutliers |
Potential outliers |
cex.lab |
Label size. |
cex.main |
Title size. |
cex.axis |
Axis text size. |
No return value called for side effects
# Example 1 refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390CoverageTxt <- extractCoverageFromTxt(refFile, altFile) plotAltVsRef(PG0390CoverageTxt$refCount, PG0390CoverageTxt$altCount) # Example 2 vcfFile <- system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") PG0390CoverageVcf <- extractCoverageFromVcf(vcfFile, "PG0390-C") plotAltVsRef(PG0390CoverageVcf$refCount, PG0390CoverageVcf$altCount)
# Example 1 refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390CoverageTxt <- extractCoverageFromTxt(refFile, altFile) plotAltVsRef(PG0390CoverageTxt$refCount, PG0390CoverageTxt$altCount) # Example 2 vcfFile <- system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") PG0390CoverageVcf <- extractCoverageFromVcf(vcfFile, "PG0390-C") plotAltVsRef(PG0390CoverageVcf$refCount, PG0390CoverageVcf$altCount)
Plot observed alternative allele frequency within sample against expected WSAF.
plotObsExpWSAF( obsWSAF, expWSAF, title = "WSAF(observed vs expected)", cex.lab = 1, cex.main = 1, cex.axis = 1 )
plotObsExpWSAF( obsWSAF, expWSAF, title = "WSAF(observed vs expected)", cex.lab = 1, cex.main = 1, cex.axis = 1 )
obsWSAF |
Numeric array of observed WSAF. |
expWSAF |
Numeric array of expected WSAF. |
title |
Figure title. |
cex.lab |
Label size. |
cex.main |
Title size. |
cex.axis |
Axis text size. |
No return value called for side effects
Plot the MCMC samples of the proportion, indexed by the MCMC chain.
plotProportions( proportions, title = "Components", cex.lab = 1, cex.main = 1, cex.axis = 1 )
plotProportions( proportions, title = "Components", cex.lab = 1, cex.main = 1, cex.axis = 1 )
proportions |
Matrix of the MCMC proportion samples. The matrix size is number of the MCMC samples by the number of strains. |
title |
Figure title. |
cex.lab |
Label size. |
cex.main |
Title size. |
cex.axis |
Axis text size. |
No return value called for side effects
Plot allele frequencies within sample against population level.
plotWSAFvsPLAF( plaf, obsWSAF, expWSAF = c(), potentialOutliers = c(), title = "WSAF vs PLAF", cex.lab = 1, cex.main = 1, cex.axis = 1 )
plotWSAFvsPLAF( plaf, obsWSAF, expWSAF = c(), potentialOutliers = c(), title = "WSAF vs PLAF", cex.lab = 1, cex.main = 1, cex.axis = 1 )
plaf |
Numeric array of population level allele frequency. |
obsWSAF |
Numeric array of observed altenative allele frequencies within sample. |
expWSAF |
Numeric array of expected WSAF from model. |
potentialOutliers |
Potential outliers |
title |
Figure title, "WSAF vs PLAF" by default |
cex.lab |
Label size. |
cex.main |
Title size. |
cex.axis |
Axis text size. |
No return value called for side effects
# Example 1 refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390CoverageTxt <- extractCoverageFromTxt(refFile, altFile) obsWSAF <- computeObsWSAF(PG0390CoverageTxt$altCount, PG0390CoverageTxt$refCount) plafFile <- system.file("extdata", "labStrains.test.PLAF.txt", package = "DEploid.utils") plaf <- extractPLAF(plafFile) plotWSAFvsPLAF(plaf, obsWSAF) # Example 2 vcfFile <- system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") PG0390CoverageVcf <- extractCoverageFromVcf(vcfFile, "PG0390-C") obsWSAF <- computeObsWSAF(PG0390CoverageVcf$altCount, PG0390CoverageVcf$refCount) plafFile <- system.file("extdata", "labStrains.test.PLAF.txt", package = "DEploid.utils") plaf <- extractPLAF(plafFile) plotWSAFvsPLAF(plaf, obsWSAF)
# Example 1 refFile <- system.file("extdata", "PG0390-C.test.ref", package = "DEploid.utils") altFile <- system.file("extdata", "PG0390-C.test.alt", package = "DEploid.utils") PG0390CoverageTxt <- extractCoverageFromTxt(refFile, altFile) obsWSAF <- computeObsWSAF(PG0390CoverageTxt$altCount, PG0390CoverageTxt$refCount) plafFile <- system.file("extdata", "labStrains.test.PLAF.txt", package = "DEploid.utils") plaf <- extractPLAF(plafFile) plotWSAFvsPLAF(plaf, obsWSAF) # Example 2 vcfFile <- system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid.utils") PG0390CoverageVcf <- extractCoverageFromVcf(vcfFile, "PG0390-C") obsWSAF <- computeObsWSAF(PG0390CoverageVcf$altCount, PG0390CoverageVcf$refCount) plafFile <- system.file("extdata", "labStrains.test.PLAF.txt", package = "DEploid.utils") plaf <- extractPLAF(plafFile) plotWSAFvsPLAF(plaf, obsWSAF)