Coding-variant Allelic Series Test

Data

To run an allelic series test, there are 4 key inputs:

  • A numeric annotation vector, of the same length as the number of variants, coded as 0 for benign missense variants (BMVs), 1 for deleterious missense variants (DMVs), and 2 for protein truncating variants (PTVs).

  • A covariates matrix, with as many rows as subjects and including columns such as age and sex. If omitted, defaults to an intercept only.

  • A genotype matrix, with subjects as rows and variants as columns. The number of columns should correspond to the length of the annotation vector.

  • A numeric phenotype vector, either continuous or binary.

The example data used below were generated using the DGP function provided with the package. The data set includes 100 subjects, 300 variants, and a continuous phenotype. The true effect sizes follow an allelic series, with magnitudes proportional to c(1, 2, 3) for BMVs, DMVs, and PTVs respectively.

set.seed(101)
n <- 100
data <- AllelicSeries::DGP(
  n = n,
  snps = 300,
  beta = c(1, 2, 3) / sqrt(n),
)

# Annotations.
anno <- data$anno
head(anno)
## [1] 2 0 0 0 0 1
# Covariates.
covar <- data$covar
head(covar)
##      int         age sex         pc1        pc2        pc3
## [1,]   1  0.06340401   1  1.54356127 -2.3816890  0.3378446
## [2,]   1 -0.26084149   1 -0.34178454 -1.4227627 -0.2141389
## [3,]   1  0.10148711   1  1.25528541  0.4608104 -0.6585620
## [4,]   1  0.48607024   1 -0.87800487  0.4316516  0.3552285
## [5,]   1 -0.52207993   1 -0.03553949 -0.1124594 -0.4243847
## [6,]   1 -1.41018843   0  0.37024952  1.3373260 -0.4423871
# Genotypes.
geno <- data$geno
head(geno[,1:5])
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0
## [3,]    0    0    0    0    0
## [4,]    0    0    0    0    0
## [5,]    0    0    0    0    0
## [6,]    0    0    0    0    0
# Phenotype.
pheno <- data$pheno
head(pheno)
## [1]  3.1059934  1.1354490  0.4306503  0.7834604  1.0421679 -0.4964317

The example data generated by the preceding are available under vignettes/vignette_data.

Running the alleic series test

The COding-variant Allelic Series Test (COAST) is run using the COAST function. By default, the output of COAST includes a data.frame of counts showing the number of alleles, variants, and carriers in each class that contributed to the test, and a data.frame of p-values, with the omni test denoting the final, overall p-value. Inspection of the component p-values may be useful for determining which model(s) detected evidence of an association. In the present case, the baseline count model provided the greatest power.

results <- AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar
)
show(results)
## Counts:
##   anno alleles variants carriers
## 1    0     287      163       96
## 2    1     162      109       82
## 3    2      61       28       45
## 
## 
## P-values:
##           test   type     pval
## 1     baseline burden 3.11e-26
## 2          ind burden 1.32e-09
## 3    max_count burden 3.08e-10
## 4      max_ind burden 5.37e-09
## 5    sum_count burden 1.66e-20
## 6      sum_ind burden 2.55e-11
## 7 allelic_skat   skat 2.66e-07
## 8         omni   omni 3.74e-25

The counts data.frame is accessed via:

results@Counts
##   anno alleles variants carriers
## 1    0     287      163       96
## 2    1     162      109       82
## 3    2      61       28       45

and the p-values data.frame via:

results@Pvals
##           test   type         pval
## 1     baseline burden 3.112702e-26
## 2          ind burden 1.322084e-09
## 3    max_count burden 3.076876e-10
## 4      max_ind burden 5.374363e-09
## 5    sum_count burden 1.661854e-20
## 6      sum_ind burden 2.554417e-11
## 7 allelic_skat   skat 2.656037e-07
## 8         omni   omni 3.735235e-25

Test options

  • apply_int = TRUE applies the rank-based inverse normal transformation from the RNOmni package. This transformation is expected to improve power for phenotypes that have a skewed or kurtotic (e.g. long-tailed) distribution. It is applied by default in the case of continuous phenotype, and is ignored in the case of a binary phenotype.
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  apply_int = TRUE
)
  • include_orig_skato_all = TRUE includes standard SKAT-O applied to all variants as a component of the omnibus test, while include_orig_skato_ptv = TRUE includes standard SKAT-O applied to PTVs only. Including standard SKAT-O as a component of the omnibus test can improve power to detect associations between the phenotype and genes that may not be allelic series.
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  include_orig_skato_all = TRUE,
  include_orig_skato_ptv = TRUE,
)
  • is_pheno_binary = TRUE is required to indicate that the supplied phenotype is binary, and should be analyzed using a logistic regression model.
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = 1 * (pheno > 0),
  covar = covar,
  is_pheno_binary = TRUE
)
  • min_mac is used to filter the variant set to those containing a minimum minor allele count (MAC). The following example filters to only those variants with a MAC of at least 2:
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  min_mac = 2
)
  • return_omni_only = TRUE is used to return p_omni only when the component p-values are not of interest:
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  return_omni_only = TRUE
)
  • score_test = TRUE specifies the use of a score-type allelic series burden test. The default of score_test = FALSE specifies a Wald-type allelic series burden test.
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  score_test = TRUE
)
  • weights specifies the relative phenotypic effects of BMVs, DMVs, and PTVs. An increasing pattern such as the default setting of weights = c(1, 2, 3) targets allelic series. Setting weights = c(1, 1, 1) would target a genetic architecture where all variants have equivalent expected magnitudes.
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  weights = c(1, 2, 3)
)