| Title: | Identify Differential Selection |
|---|---|
| Description: | Tests for context-dependent selection on cancer driver genes using somatic mutation data. The package implements the DiffDriver statistical framework to assess whether the strength of positive selection on mutations in a driver gene is associated with tumor- or individual-level context variables, such as clinical traits, genomic features, or immune microenvironment subtypes. DiffDriver estimates individual- and position-specific background mutation rates, models selection as a deviation from the background rate using functional annotations, and tests context effects through a latent-variable logistic model. It provides utilities for preparing mutation and annotation data, fitting differential-selection models, running gene-level association tests, summarizing candidate genes, and visualizing mutation patterns. The method is described in Zhou et al. (2026) "Detecting context-dependent selection on cancer driver genes with DiffDriver" <doi:10.64898/2026.04.06.716771>. |
| Authors: | Siming Zhao [aut, cre], Jie Zhou [aut], Qirui Zhang [aut] |
| Maintainer: | Siming Zhao <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.7 |
| Built: | 2026-06-26 18:39:43 UTC |
| Source: | https://github.com/cran/diffdriver |
Add intercept, if have functtypecode, then code and move to the front. different from driverMAPS! only allows functtypecode =7 ||8 when functypecode is included in selectvars.
ddmcode(matrixlist, selectvars, functypecodelevel = NULL)ddmcode(matrixlist, selectvars, functypecodelevel = NULL)
matrixlist |
A list of mutation or annotation matrices. |
selectvars |
Variables selected for the model. |
functypecodelevel |
Functional annotation code level. |
This model is applied on data of a single gene. It will infer effect size for both sample-level variable and positional level functional annotations. We used an EM algorithm to infer parameters.
ddmodel(mut, e, mr, fe, label, ...)ddmodel(mut, e, mr, fe, label, ...)
mut |
a matrix of mutation status 0 or 1, rows positions, columns are samples. |
e |
a vector,phenotype of each sample,
should match the columns of |
mr |
a matrix, mutation rate of each sample at each mutation (log scale) that is not dependent on sample level factor |
fe |
a vector, increased mutation rate at each position, depending on e (log scale),
should match the rows of |
label |
A vector or factor of length |
... |
Additional arguments passed to internal functions. |
A list with the following elements:
pvalueThe likelihood-ratio test p-value comparing the null model without a context effect to the alternative model with a context effect.
res.nullA list returned by the null-model fit. It contains
loglikelihood, beta0, and alpha.
res.altA list returned by the alternative-model fit. It contains
loglikelihood, beta0, and alpha.
# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix(0, nrow = 6, ncol = 20) mut[1, 1] <- 1 mut[2, 5] <- 1 mut[c(1, 2), 11] <- 1 mut[c(2, 3), 12] <- 1 mut[c(3, 4), 13] <- 1 mut[c(4, 5), 14] <- 1 mut[c(5, 6), 15] <- 1 mut[c(1, 6), 16] <- 1 mut[c(1, 3), 17] <- 1 mut[c(2, 4), 18] <- 1 mut[c(3, 5), 19] <- 1 mut[c(4, 6), 20] <- 1 e <- c(rep(0, 10), rep(1, 10)) # Mutation rates and functional effects are on the log scale. mr <- matrix(log(0.002), nrow = nrow(mut), ncol = ncol(mut)) fe <- rep(log(3), nrow(mut)) # Each row represents a separate mutation position. label <- factor(seq_len(nrow(mut))) result <- ddmodel(mut, e, mr, fe, label = label) result$pvalue# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix(0, nrow = 6, ncol = 20) mut[1, 1] <- 1 mut[2, 5] <- 1 mut[c(1, 2), 11] <- 1 mut[c(2, 3), 12] <- 1 mut[c(3, 4), 13] <- 1 mut[c(4, 5), 14] <- 1 mut[c(5, 6), 15] <- 1 mut[c(1, 6), 16] <- 1 mut[c(1, 3), 17] <- 1 mut[c(2, 4), 18] <- 1 mut[c(3, 5), 19] <- 1 mut[c(4, 6), 20] <- 1 e <- c(rep(0, 10), rep(1, 10)) # Mutation rates and functional effects are on the log scale. mr <- matrix(log(0.002), nrow = nrow(mut), ncol = ncol(mut)) fe <- rep(log(3), nrow(mut)) # Each row represents a separate mutation position. label <- factor(seq_len(nrow(mut))) result <- ddmodel(mut, e, mr, fe, label = label) result$pvalue
This function uses the model as cmodel.frac, but generalizes to take more than 1 functional categories. This model is applied on data of a single gene
ddmodel_binary(mut, e, bmr, fe)ddmodel_binary(mut, e, bmr, fe)
mut |
a matrix of mutation status 0 or 1 |
e |
a vector,phenotype of each sample,
should match the columns of |
bmr |
a matrix, background mutation rate of each sample at each mutation (log scale) |
fe |
a vector, increased mutation rate at each mutation, due to functional effect (log scale),
should match the rows of |
A list with the following elements:
pvalueThe likelihood-ratio test p-value comparing the null and alternative binary-phenotype models.
null.eta0The fitted null-model parameter.
alt.etaThe fitted alternative-model parameters for the two phenotype groups.
null.llThe maximized log-likelihood under the null model.
alt.llThe maximized log-likelihood under the alternative model.
allA named numeric vector containing pvalue,
null_para, e=0, e=1, null_likelihood, and
alt_likelihood.
# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1 ), nrow = 3, byrow = TRUE ) e <- c(0, 0, 0, 0, 1, 1, 1, 1) # Background mutation rates and functional effects are on the log scale. bmr <- matrix(log(0.001), nrow = nrow(mut), ncol = ncol(mut)) fe <- log(c(2, 2, 2)) result <- ddmodel_binary(mut, e, bmr, fe) result$pvalue# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1 ), nrow = 3, byrow = TRUE ) e <- c(0, 0, 0, 0, 1, 1, 1, 1) # Background mutation rates and functional effects are on the log scale. bmr <- matrix(log(0.001), nrow = nrow(mut), ncol = ncol(mut)) fe <- log(c(2, 2, 2)) result <- ddmodel_binary(mut, e, bmr, fe) result$pvalue
This function uses the model as cmodel.frac, but generalizes to take more than 1 functional categories. This model is applied on data of a single gene. This should give the same results as ddmodel_binary defined above.
ddmodel_binary_simple(mut, e, bmr, fe)ddmodel_binary_simple(mut, e, bmr, fe)
mut |
a matrix of mutation status 0 or 1 |
e |
a vector,phenotype of each sample,
should match the columns of |
bmr |
a matrix, background mutation rate of each sample at each mutation (log scale), as we assume bmr the same across samples, only the first column will be used. |
fe |
a vector, increased mutation rate at each mutation, due to functional effect (log scale),
should match the rows of |
A list with the following elements:
pvalueThe likelihood-ratio test p-value comparing the null and alternative binary-phenotype models.
null.eta0The fitted null-model parameter.
alt.etaThe fitted alternative-model parameters for the two phenotype groups.
null.llThe maximized log-likelihood under the null model.
alt.llThe maximized log-likelihood under the alternative model.
# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1 ), nrow = 3, byrow = TRUE ) e <- c(0, 0, 0, 0, 1, 1, 1, 1) # Background mutation rates and functional effects are on the log scale. bmr <- matrix(log(0.001), nrow = nrow(mut), ncol = ncol(mut)) fe <- log(c(2, 2, 2)) result <- ddmodel_binary_simple(mut, e, bmr, fe) result$pvalue# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1 ), nrow = 3, byrow = TRUE ) e <- c(0, 0, 0, 0, 1, 1, 1, 1) # Background mutation rates and functional effects are on the log scale. bmr <- matrix(log(0.001), nrow = nrow(mut), ncol = ncol(mut)) fe <- log(c(2, 2, 2)) result <- ddmodel_binary_simple(mut, e, bmr, fe) result$pvalue
This function runs diffDriver.
diffdriver( gene, mut, pheno, anno_dir = ".", k = 6, totalnttype = 96, BMRmode = c("signature", "regular"), output_dir = tempdir(), output_prefix = "diffdriver_results" )diffdriver( gene, mut, pheno, anno_dir = ".", k = 6, totalnttype = 96, BMRmode = c("signature", "regular"), output_dir = tempdir(), output_prefix = "diffdriver_results" )
gene |
A vector of genes to be included in the analysis. |
mut |
A data frame containing all somatic mutations from the cohort. The format is:
Example: Chromosome Position Ref Alt SampleID 1 19 55653236 C T TCGA-N6-A4VE-01A-11D-A28R-08 |
pheno |
A data frame containing sample phenotypes. The format is: #'
Example: SampleID SmokingCessation BMI TCGA-N5-A4R8-01A-11D-A28R-08 0.5319630 20.0 TCGA-N5-A4RD-01A-11D-A28R-08 0.0448991 24.4 |
anno_dir |
The path to the directory with all the annotation files. Please download from Zenodo.The default is current folder |
k |
The number of topics used in modeling background mutation rate. The default is 6. |
totalnttype |
either 9 or 96. Will look for annotation files anno9_ntypexxx_annodata.txt when totalnttype is 9 or anno96_ntypexxx_annodata when totalnttype is 96. |
BMRmode |
There are two modes to run diffdriver. One is "signature", this will model individual level BMR, this is the default. The second one is "regular", this assumes BMR is the same across individuals, only models position-level difference. |
output_dir |
The path to the output directory. Defaults to |
output_prefix |
The prefix being added to the output file names. |
A data frame with one row per tested gene. The returned columns include
dd.p, mlr.p, mlr.v2.p, fisher.p, binom.p,
and lr.p, which are p-values from DiffDriver and comparator methods;
the corresponding .fdr columns, which contain Benjamini-Hochberg adjusted
p-values; and mut.E1, mut.E0, E1, and E0, which
summarize mutation and sample counts in the two phenotype groups used by the
Fisher test.
phenof <- system.file( "extdata", "example_phenotypes.txt", package = "diffdriver" ) genef <- system.file( "extdata", "example_gene.txt", package = "diffdriver" ) mutf <- system.file( "extdata", "example_mutations.txt", package = "diffdriver" ) pheno <- read.table(phenof, header = TRUE) gene <- read.table(genef, header = FALSE) mut <- read.table(mutf, header = TRUE) ## Not run: # Download the annotation files and replace this with their directory. annodir <- "/path/to/annodir96" result <- diffdriver( gene = gene, mut = mut, pheno = pheno, anno_dir = annodir, k = 6, totalnttype = 96, BMRmode = "signature", output_dir = tempdir(), output_prefix = "diffdriver_example" ) result ## End(Not run)phenof <- system.file( "extdata", "example_phenotypes.txt", package = "diffdriver" ) genef <- system.file( "extdata", "example_gene.txt", package = "diffdriver" ) mutf <- system.file( "extdata", "example_mutations.txt", package = "diffdriver" ) pheno <- read.table(phenof, header = TRUE) gene <- read.table(genef, header = FALSE) mut <- read.table(mutf, header = TRUE) ## Not run: # Download the annotation files and replace this with their directory. annodir <- "/path/to/annodir96" result <- diffdriver( gene = gene, mut = mut, pheno = pheno, anno_dir = annodir, k = 6, totalnttype = 96, BMRmode = "signature", output_dir = tempdir(), output_prefix = "diffdriver_example" ) result ## End(Not run)
gene level binomial test
genebinom(mut, e)genebinom(mut, e)
mut |
Mutation data. |
e |
Phenotype or context variable. |
A list with the following element:
pvalueThe p-value from a gene-level binomial test comparing mutation counts between the two phenotype groups.
# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) result <- genebinom(mut, e) result$pvalue# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) result <- genebinom(mut, e) result$pvalue
gene level fisher's exact test
genefisher(mut, e)genefisher(mut, e)
mut |
Mutation data. |
e |
Phenotype or context variable. |
A list with the following elements:
pvalueThe p-value from a gene-level Fisher's exact test.
countA numeric vector c(gmutc, gmutn, mu.c, mu.n),
where gmutc and gmutn are mutation counts in the two phenotype
groups, and mu.c and mu.n are the corresponding numbers of
samples.
# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) result <- genefisher(mut, e) result$pvalue result$count# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) result <- genefisher(mut, e) result$pvalue result$count
gene level logistic regression
genelr(mut, e, covariates = rep(1, length(e)))genelr(mut, e, covariates = rep(1, length(e)))
mut |
Mutation data. |
e |
Phenotype or context variable. |
covariates |
Optional covariates included in the model. |
A list with the following elements:
resThe summary object from the fitted logistic regression model
glm(e ~ dstatus + covariates, family = binomial(link = "logit")).
pvalueThe p-value for the mutation-status coefficient
dstatus.
# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) result <- genelr(mut, e) result$pvalue# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) result <- genelr(mut, e) result$pvalue
gene level multiple linear regression
mlr(mut, e, covariates = rep(1, length(e)))mlr(mut, e, covariates = rep(1, length(e)))
mut |
Mutation data. |
e |
Phenotype or context variable. |
covariates |
Optional covariates included in the model. |
A list with the following elements:
resThe summary object from the fitted linear model
lm(e ~ dstatus + covariates).
pvalueThe p-value for the mutation-status coefficient
dstatus.
# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) result <- mlr(mut, e) result$pvalue# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) result <- mlr(mut, e) result$pvalue
gene level multiple linear regression, correcting for total number of mutations
mlr.v2(mut, e, nmut, covariates = 1)mlr.v2(mut, e, nmut, covariates = 1)
mut |
Mutation data. |
e |
Phenotype or context variable. |
nmut |
Number of mutations. |
covariates |
Optional covariates included in the model. |
A list with the following elements:
resThe summary object from the fitted linear model
lm(e ~ dstatus + nmut + covariates).
pvalueThe p-value for the mutation-status coefficient
dstatus, after adjusting for total mutation count and covariates.
# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) # Synthetic per-sample mutation counts. nmut <- c(3, 4, 5, 6, 7, 8) result <- mlr.v2(mut, e, nmut) result$pvalue# Synthetic mutation matrix: rows are positions and columns are samples. mut <- matrix( c( 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 ), nrow = 2, byrow = TRUE ) e <- c(0, 0, 0, 1, 1, 1) # Synthetic per-sample mutation counts. nmut <- c(3, 4, 5, 6, 7, 8) result <- mlr.v2(mut, e, nmut) result$pvalue
its like optim, but with fixed parameters.
optifix( par, fixed, fn, gr = NULL, ..., method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN"), lower = -Inf, upper = Inf, control = list(), hessian = FALSE )optifix( par, fixed, fn, gr = NULL, ..., method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN"), lower = -Inf, upper = Inf, control = list(), hessian = FALSE )
par |
Initial values for the parameters to be optimized. |
fixed |
Logical or index vector indicating fixed parameters. |
fn |
Function to be minimized. |
gr |
Optional gradient function. |
... |
Additional arguments passed to |
method |
Optimization method. |
lower |
Lower bounds for parameters. |
upper |
Upper bounds for parameters. |
control |
Control parameters passed to optimization. |
hessian |
Logical; whether to return the Hessian matrix. |
specify a second argument 'fixed', a vector of TRUE/FALSE values. If TRUE, the corresponding parameter in fn() is fixed. Otherwise its variable and optimised over.
The return thing is the return thing from optim() but with a couple of extra bits - a vector of all the parameters and a vector copy of the 'fixed' argument.
Written by Barry Rowlingson <[email protected]> October 2011
This file released under a CC By-SA license: http://creativecommons.org/licenses/by-sa/3.0/
and must retain the text: "Originally written by Barry Rowlingson" in comments.
plot phenotype, mutation and annotation for a gene across samples
plot_mut( gene_name, mut, pheno, totalnttype = 96, anno_dir = ".", output_prefix = "plot", output_dir = tempdir() )plot_mut( gene_name, mut, pheno, totalnttype = 96, anno_dir = ".", output_prefix = "plot", output_dir = tempdir() )
gene_name |
Gene name to plot. |
mut |
Mutation data. |
pheno |
Phenotype or context data. |
totalnttype |
Total number of nucleotide context types. |
anno_dir |
Directory containing annotation files. |
output_prefix |
Prefix for output files. |
output_dir |
Directory for output files. Defaults to |
A numeric vector returned by the final barplot() call, giving the
coordinates of the plotted bars. The function is primarily called for its side
effect of drawing mutation and phenotype summary plots for the selected gene.
phenof <- system.file( "extdata", "example_phenotypes.txt", package = "diffdriver" ) mutf <- system.file( "extdata", "example_mutations.txt", package = "diffdriver" ) pheno <- read.table(phenof, header = TRUE) mut <- read.table(mutf, header = TRUE) ## Not run: # Download the annotation files and replace this with their directory. annodir <- "/path/to/annodir9" plot_mut( gene_name = "PIK3CA", mut = mut, pheno = pheno, totalnttype = 9, anno_dir = annodir, output_dir = tempdir() ) ## End(Not run)phenof <- system.file( "extdata", "example_phenotypes.txt", package = "diffdriver" ) mutf <- system.file( "extdata", "example_mutations.txt", package = "diffdriver" ) pheno <- read.table(phenof, header = TRUE) mut <- read.table(mutf, header = TRUE) ## Not run: # Download the annotation files and replace this with their directory. annodir <- "/path/to/annodir9" plot_mut( gene_name = "PIK3CA", mut = mut, pheno = pheno, totalnttype = 9, anno_dir = annodir, output_dir = tempdir() ) ## End(Not run)