Title: | Finding Allele-Specific Copy Number in Whole-Exome Sequencing Data |
---|---|
Description: | This is a method for Allele-specific DNA Copy Number profiling for whole-Exome sequencing data. Given the allele-specific coverage and site biases at the variant loci, this program segments the genome into regions of homogeneous allele-specific copy number. It requires, as input, the read counts for each variant allele in a pair of case and control samples, as well as the site biases. For detection of somatic mutations, the case and control samples can be the tumor and normal sample from the same individual. The implemented method is based on the paper: Chen, H., Jiang, Y., Maxwell, K., Nathanson, K. and Zhang, N. (under review). Allele-specific copy number estimation by whole Exome sequencing. |
Authors: | Hao Chen and Nancy R. Zhang |
Maintainer: | Hao Chen <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2 |
Built: | 2024-11-12 06:41:19 UTC |
Source: | CRAN |
This is a data frame with two columns and the column names are "sN", "sT". They are the site-specific bias in total coverage for normal (control) sample and tumor (case) sample, respectively.
This library contains a set of tools for allele-specific DNA copy number profiling using whole exome sequencing. Given the allele-specific coverage and site biases at the variant loci, this program segments the genome into regions of homogeneous allele-specific copy number. It requires, as input, the read counts for each variant allele in a pair of case and control samples, as well as the site biases. For detection of somatic mutations, the case and control samples can be the tumor and normal sample from the same individual.
Hao Chen and Nancy R. Zhang
Maintainer: Hao Chen ([email protected])
getChangepoints.x
, getASCN.x
, view
data(Example) # tauhat = getChangepoints.x(readMatrix, biasMatrix) # uncomment this to run the function. cn = getASCN.x(readMatrix, biasMatrix, tauhat=tauhat) # cn$tauhat would give the indices of change-points. # cn$ascn would give the estimated allele-specific copy numbers for each segment. # cn$Haplotype[[i]] would give the estimated haplotype for the major chromosome in segment i # if this segment has different copy numbers on the two homologous chromosomes. view(cn)
data(Example) # tauhat = getChangepoints.x(readMatrix, biasMatrix) # uncomment this to run the function. cn = getASCN.x(readMatrix, biasMatrix, tauhat=tauhat) # cn$tauhat would give the indices of change-points. # cn$ascn would give the estimated allele-specific copy numbers for each segment. # cn$Haplotype[[i]] would give the estimated haplotype for the major chromosome in segment i # if this segment has different copy numbers on the two homologous chromosomes. view(cn)
Given a set of breakpoints where parent-specific copy number changes, this function estimates the parent-specific copy number for each segment, and the haplotype for the major chromosome on segments where the two homologous chromosomes have different copy numbers. You are recommended to specify the parameter "rdep", the case-control genome-wide average coverage ratio. Usually, a good estimate of rdep is (total mapped reads in tumor)/(total mapped reads in normal).
getASCN.x(readMatrix, biasMatrix, tauhat=NULL, threshold=0.15, COri=c(0.95,1.05), error=1e-5, maxIter=1000, independence=TRUE, pos=NULL, readlength=NULL)
getASCN.x(readMatrix, biasMatrix, tauhat=NULL, threshold=0.15, COri=c(0.95,1.05), error=1e-5, maxIter=1000, independence=TRUE, pos=NULL, readlength=NULL)
readMatrix |
A data frame with four columns and the column names are "AN", "BN", "AT" and "BT". They are A-allele coverage in the tumor (case) sample, B-allele coverage in the tumor (case) sample, A-allele coverage in the normal (control) sample, and B-allele coverage in the normal (control) sample, respectively. |
biasMatrix |
A data frame with two columns and the column names are "sN", "sT". They are the site-specific bias in total coverage for normal (control) sample and tumor (case) sample, respectively. |
tauhat |
The estimated break points. If it is not specified (NULL), then this function will first estimate the break points by calling the function "getChangepoints.x", and then estimate the parent-specific DNA copy number for each segment. |
threshold |
The estimated copy number are set to be 1 if it differs from 1 by less than this threshold. |
COri , error , maxIter
|
Parameters used in estimating the success probabilities of the mixed binomial distribution. See the manuscript by Chen and Zhang for more details. "pOri" provides the initial success probabilities. The two values in pOri needs to be different. "error" provides the stopping criterion. "maxIter" is the maximum iterating steps if the stopping criterion is not achieved. |
independence |
The model assumes reads are conditionally independent. If this argument is FALSE, the pruning approach will be performed. |
pos |
The locations (in base pair) of the heterozygous sites. This information is needed when "independence=FALSE". |
readlength |
The length of read if the data is from single-end sequencing, and the maximum span of read pairs if the data if from paired-end sequencing. This information is needed when "independence=FALSE". |
tauhat |
A vector holding the estimated break points in terms of the index in the coverage vectors. |
ascn |
The estimated parent-specific copy numbers in the segments between the break points in tauhat. |
Haplotype |
The estimated haplotype for the major chromosome (the chromosome has a higher copy number compared to its homologous chromosome) on segments where the two homologous chromosomes have different copy numbers. |
data(Example) cn = getASCN.x(readMatrix, biasMatrix, tauhat=tauhat) # cn$tauhat would give the indices of change-points. # cn$ascn would give the estimated allele-specific copy numbers for each segment. # cn$Haplotype[[i]] would give the estimated haplotype for the major chromosome in segment i # if this segment has different copy numbers on the two homologous chromosomes.
data(Example) cn = getASCN.x(readMatrix, biasMatrix, tauhat=tauhat) # cn$tauhat would give the indices of change-points. # cn$ascn would give the estimated allele-specific copy numbers for each segment. # cn$Haplotype[[i]] would give the estimated haplotype for the major chromosome in segment i # if this segment has different copy numbers on the two homologous chromosomes.
This function estimates the change-points where one or both parent-specific copy numbers change. It uses a circular binary segmentation approach to find change-points in a binomial mixture process. The output of the function is the set of locations of the break points. If the whole genome is analyzed, it is recommended to run this function chromosome by chromosome, and the runs on different chromosomes can be done in parallel to shorten the running time.
getChangepoints.x(readMatrix, biasMatrix, verbose=TRUE, COri=c(0.95,1.05), error=1e-5, maxIter=1000, independence=TRUE, pos=NULL, readlength=NULL)
getChangepoints.x(readMatrix, biasMatrix, verbose=TRUE, COri=c(0.95,1.05), error=1e-5, maxIter=1000, independence=TRUE, pos=NULL, readlength=NULL)
readMatrix |
A data frame with four columns and the column names are "AN", "BN", "AT" and "BT". They are A-allele coverage in the tumor (case) sample, B-allele coverage in the tumor (case) sample, A-allele coverage in the normal (control) sample, and B-allele coverage in the normal (control) sample, respectively. |
biasMatrix |
A data frame with two columns and the column names are "sN", "sT". They are the site-specific bias in total coverage for normal (control) sample and tumor (case) sample, respectively. |
verbose |
Provide progress messages if it is TRUE. This argument is TRUE by default. Set it to be FALSE if you want to turn off the progress messages. |
COri , error , maxIter
|
Parameters used in estimating the success probabilities of the mixed binomial distribution. See the manuscript by Chen and Zhang for more details. "pOri" provides the initial success probabilities. The two values in pOri needs to be different. "error" provides the stopping criterion. "maxIter" is the maximum iterating steps if the stopping criterion is not achieved. |
independence |
The model assumes reads are conditionally independent. If this argument is FALSE, the pruning approach will be performed. |
pos |
The locations (in base pair) of the heterozygous sites. This information is needed when "independence=FALSE". |
readlength |
The length of read if the data is from single-end sequencing, and the maximum span of read pairs if the data if from paired-end sequencing. This information is needed when "independence=FALSE". |
data(Example) # tauhat = getChangepoints.x(readMatrix, biasMatrix) # uncomment this to run the function.
data(Example) # tauhat = getChangepoints.x(readMatrix, biasMatrix) # uncomment this to run the function.
This is a vector containing position (in base pair) of each heterozygous site for the read count data in "Example.rda".
This is a data frame with four columns and the column names are "AN", "BN", "AT" and "BT". They are A-allele coverage in the tumor (case) sample, B-allele coverage in the tumor (case) sample, A-allele coverage in the normal (control) sample, and B-allele coverage in the normal (control) sample, respectively.
This is a vector containing the estimated break points that one would get by calling the function getChangepoints.x(readMatrix, biasMatrix) with "readMatrix" and "biasMatrix" from "Example.rda".
This function generates three plots: The first plots the A-allele frequencies of the case (black) sample overlayed onto those of the control (gray) sample; the second plots the relative depth of the case over control adjusted by the ratio of total mapped reads, i.e. P*(read count in tumor)/(read count in normal), where P=(total reads mapped in normal)/(total reads mapped in tumor); the third plots the estimated parent-specific DNA copy numbers.
view(output, pos=NULL, rdep=NULL, plot="all", independence=TRUE, ...)
view(output, pos=NULL, rdep=NULL, plot="all", independence=TRUE, ...)
output |
The output from calling function "getASCN.x". |
pos |
A vector of the base positions for the SNPs. If this information is not provided, the x-axis of the plots will simply be the SNP ordering. If this information is provided, the x-axis of the plots will be the position information. |
rdep |
The relative depth of the case sample over the control sample. If it is not specified (NULL), then the value median(AT+BT)/median(AN+BN) will be used. |
plot |
This argument determines what to plot. By default, this function gives all three plots described above ("all"). You can also plot each one individually if you set this argument to either of "Afreq", "RelativeCoverage" or "ASCN". |
independence |
If argument "pos" is specified, when "independence=FALSE", the pruned positions will be used. |
... |
Arguments from plot can be passed along. |
data(Example) cn = getASCN.x(readMatrix, biasMatrix, tauhat=tauhat) view(cn) # to view with position as the x-axis view(cn, pos=pos) # to view the plot for only showing A-allele frequency of the case (black) sample overlayed # onto those of the control (gray) sample par(mfrow=c(1,1)) view(cn, plot="Afreq") # to view the relative depth of the case over control adjusted by the ratio of total mapped # reads in fixed size bins par(mfrow=c(1,1)) view(cn, plot="RelativeCoverage") # to view the estimated allele-specific DNA copy numbers par(mfrow=c(1,1)) view(cn, plot="ASCN")
data(Example) cn = getASCN.x(readMatrix, biasMatrix, tauhat=tauhat) view(cn) # to view with position as the x-axis view(cn, pos=pos) # to view the plot for only showing A-allele frequency of the case (black) sample overlayed # onto those of the control (gray) sample par(mfrow=c(1,1)) view(cn, plot="Afreq") # to view the relative depth of the case over control adjusted by the ratio of total mapped # reads in fixed size bins par(mfrow=c(1,1)) view(cn, plot="RelativeCoverage") # to view the estimated allele-specific DNA copy numbers par(mfrow=c(1,1)) view(cn, plot="ASCN")