Package 'KnockoffTrio'

Title: GWAS with Trio and Duo Data using Knockoff Statistics for FDR Control
Description: Identification of putative causal variants in genome-wide association studies with trio and duo families. The package calculates the W feature statistics from KnockoffTrio and p-values from the family-based association test (FBAT) using trio and/or duo data. Compared to previous versions, a significant improvement has been made in Version 1.1.0 to allow the package to be applied not only to trio families but also to duo families. The package implements the methods in the paper: "Yang, Y., Wang, C., Liu, L., Buxbaum, J., He, Z., & Ionita-Laza, I. (2022). KnockoffTrio: A knockoff framework for the identification of putative causal variants in genome-wide association studies with trio design. The American Journal of Human Genetics, 109(10), 1761-1776."
Authors: Yi Yang [aut, cre]
Maintainer: Yi Yang <[email protected]>
License: GPL-3
Version: 1.1.0
Built: 2025-02-22 06:52:29 UTC
Source: CRAN

Help Index


Identification of putative causal loci

Description

Identification of putative causal loci using KnockoffTrio's feature statistics

Usage

causal_loci(window, M = 10, fdr = 0.1)

Arguments

window

The result window from function KnockoffTrio. If there are multiple result windows (e.g., when you analyze multiple regions in the genome), please use rbind to combine all the windows before running causal_loci.

M

A positive integer for the number of knockoffs. The default is 10.

fdr

A real number in a range of (0,1) indicating the target FDR level. The default is 0.1. Use 0.2 for a more lenient FDR control.

Value

A list that contains the following elements for claiming significance using knockoff statistics. The result window also contains FBAT p-values and ACAT-combined p-values, which can be used for claiming significance in addition to knockoff statistics. If p-values are used, Bonferroni correction is usually necessary to adjust for multiple testing for controlling the family-wise error rate - see examples below.

window

A data frame for an updated result window that includes an extra column for KnockoffTrio's Q-values. A locus with a Q-value <= the target FDR level, i.e., window$q<=fdr, is considered as putative causal at the target FDR.

thr.w

A positive real number indicating the significance threshold for KnockoffTrio's feature statistics. A locus with a feature statistic >= thr.w, i.e., window$w>=thr.w is considered as putative causal at the target FDR. The loci selected by window$w>=thr.w are equivalent to those by window$q<=fdr. No loci are selected at the target FDR level if thr.w=Inf.

Examples

data(KnockoffTrio.example)
knockoff<-create_knockoff(trio.hap=KnockoffTrio.example$trio.hap,
          duo.hap=KnockoffTrio.example$duo.hap, pos=KnockoffTrio.example$pos, M=10)
window<-KnockoffTrio(trio=KnockoffTrio.example$trio, trio.ko=knockoff$trio.ko,
        duo=knockoff$duo, duo.ko=knockoff$duo.ko, pos=KnockoffTrio.example$pos)

#Identification of significant loci using KnockoffTrio's feature statistics (W or Q) 
#at a target FDR
target_fdr<-0.1
result<-causal_loci(window,M=10,fdr=target_fdr)
sig_loci_by_w_index<-which(result$window$w>=result$thr.w)
sig_loci_by_q_index<-which(result$window$q<=target_fdr)

#Identification of significant loci using FBAT p-values with Bonferroni correction
#for controlling the family-wise error rate at 0.05
sig_loci_by_p_fbat_index<-which(window$p.burden<0.05/nrow(window))

#Identification of significant loci using ACAT p-values with Bonferroni correction
#for controlling the family-wise error rate at 0.05
sig_loci_by_p_acat_index<-which(window$p<0.05/nrow(window))

Create knockoff genotype data

Description

Create knockoff genotype data using phased haplotype data.

Usage

create_knockoff(
  trio.hap = NULL,
  duo.hap = NULL,
  pos,
  M = 10,
  maxcor = 0.7,
  maxbp = 80000,
  phasing.dad = NULL,
  phasing.mom = NULL,
  seed = 100
)

Arguments

trio.hap

A 6n*p matrix for trio haplotype data, in which n is the number of trios and p is the number of variants. Each trio must consist of father, mother, and offspring (in this order). The haplotypes must be coded as 0 or 1. Missing haplotypes are not allowed.

duo.hap

A 4m*p matrix for duo haplotype data, in which m is the number of duos and p is the number of variants. Each duo must consist of a single parent and offspring (in this order). The haplotypes must be coded as 0 or 1. Missing haplotypes are not allowed.

pos

A numeric vector of length p for the position of p variants.

M

A positive integer for the number of knockoffs. The default is 10.

maxcor

A real number in a range of [0,1] for hierarchical clustering of neighboring variants used to generate knockoff parents. The default is 0.7.

maxbp

A positive integer for the size of neighboring base pairs used to generate knockoff parents. The default is 80000.

phasing.dad

A numeric vector of length n that contains 1 or 2 to indicate which paternal haplotype was transmitted to offspring in each trio. If NULL, the function will calculate the phasing information based on the input trio haplotype matrix.

phasing.mom

A numeric vector of length n that contains 1 or 2 to indicate which maternal haplotype was transmitted to offspring in each trio. If NULL, the function will calculate the phasing information based on the input trio haplotype matrix.

seed

An integer for the random seed used for knockoff generation.

Value

A list that contains:

trio.ko

A 3n*p*M array for knockoff trio genotype data if trio.hap is provided.

duo.ko

A 3m*p*M array for knockoff duo genotype data if duo.hap is provided.

duo

A 3m*p matrix for duo genotype data if duo.hap is provided.

Examples

data(KnockoffTrio.example)
knockoff<-create_knockoff(trio.hap=KnockoffTrio.example$trio.hap,
          duo.hap=KnockoffTrio.example$duo.hap, pos=KnockoffTrio.example$pos, M=10)

Calculate KnockoffTrio's feature statistics

Description

Calculate KnockoffTrio's feature statistics and FBAT statistics using original and knockoff genotype data.

Usage

KnockoffTrio(
  trio,
  trio.ko = NULL,
  duo = NULL,
  duo.ko = NULL,
  pos,
  start = NULL,
  end = NULL,
  size = c(1, 1000, 5000, 10000, 20000, 50000),
  p_value_only = FALSE,
  adjust_for_cov = FALSE,
  y = NULL,
  chr = "1",
  xchr = FALSE,
  sex = NULL
)

Arguments

trio

A 3n*p matrix for the trio genotype data, in which n is the number of trios and p is the number of variants. Each trio must consist of father, mother, and offspring (in this order). The genotypes must be coded as 0, 1, or 2. Missing genotypes are not allowed.

trio.ko

A 3n*p*M array for the knockoff trio genotype data created by function create_knockoff. M is the number of knockoffs.

duo

A 3m*p matrix for the duo genotype data created by function create_knockoff, in which m is the number of duos and p is the number of variants. Please do not use the original 2m*p duo genotype matrix.

duo.ko

A 3m*p*M array for the knockoff duo genotype data created by function create_knockoff. M is the number of knockoffs.

pos

A numeric vector of length p for the position of p variants.

start

An integer for the first position of sliding windows. If NULL, start=min(pos). Only used if you would like to use the same starting position for different cohorts/analyses.

end

An integer for the last position of sliding windows. If NULL, end=max(pos). Only used if you would like to use the same ending position for different cohorts/analyses.

size

A numeric vector for the size(s) of sliding windows when scanning the genome.

p_value_only

A logical value indicating whether to perform the knockoff analysis. When p_value_only is TRUE, only the ACAT-combined p-values are to be calculated for each window. When p_value_only is FALSE, trio.ko or duo.ko is required and KnockoffTrio's feature statistics are to be calculated for each window in addition to the p-values.

adjust_for_cov

A logical value indicating whether to adjust for covariates. When adjust_for_cov is TRUE, y is required.

y

A numeric vector of length n for the residual Y-Y_hat. Y_hat is the predicted value from the regression model in which the quantitative trait Y is regressed on the covariates. If Y is dichotomous, you may treat Y as quantitative when applying the regression model.

chr

A character for the name of the chromosome, e.g., "1", "2", ..., "22", and "X".

xchr

A logical value indicating whether the analysis is for the X chromosome. When xchr is TRUE, the analysis is for the X chromosome and sex is required. When xchr is FALSE, the analysis is for the autosomes. The default if FALSE.

sex

A numeric vector of length n for the sex of offspring. 0s indicate females and 1s indicate males. Sex is required when xchr is TRUE.

Value

A data frame for analysis results from KnockoffTrio and FBAT. The data frame contains the following columns if p_value_only is FALSE:

chr

The chromosome number.

start, end

The start and end position of a window.

actual_start, actual_end

The position of the first and last variant in a window.

n

The number of variants in a window.

dir

The direction of effect of the most significant variant in a window.

w

The W knockoff feature statistic for a window. Please use function causal_loci to obtain the significance threshold for w at target FDRs.

p

The ACAT-combined p-value for a window. If a window contains multiple variants (i.e., n>1), ACAT combines FBAT p-values for each variant and a burden FBAT p-value for all variants in the window. If a window contains only one variant (i.e., n=1), the ACAT-combined p-value is equivalent to the FBAT p-value for this variant.

z

The FBAT z-score for a window. If a window contains multiple variants (i.e., n>1), z is the burden FBAT z-score for all variants in the window. If a window contains only one variant (i.e., n=1), z is the FBAT z-score for this variant.

p.burden

The FBAT p-value for a window. If a window contains multiple variants (i.e., n>1), p.burden is the burden FBAT p-value for all variants in the window. If a window contains only one variant (i.e., n=1), p.burden is the FBAT p-value for this variant.

kappa, tau

The two columns are used by function causal_loci for knockoff inference.

p_1, ..., p_M

The ACAT-combined p-values for M knockoffs.

z_1, ..., z_M

The FBAT z-scores for M knockoffs.

Examples

data(KnockoffTrio.example)
knockoff<-create_knockoff(trio.hap=KnockoffTrio.example$trio.hap,
          duo.hap=KnockoffTrio.example$duo.hap, pos=KnockoffTrio.example$pos, M=10)

#Analysis for both trios and duos
window<-KnockoffTrio(trio=KnockoffTrio.example$trio, trio.ko=knockoff$trio.ko,
        duo=knockoff$duo, duo.ko=knockoff$duo.ko, pos=KnockoffTrio.example$pos)

#Analysis for trios only
window<-KnockoffTrio(trio=KnockoffTrio.example$trio, trio.ko=knockoff$trio.ko,
        duo=NULL, duo.ko=NULL, pos=KnockoffTrio.example$pos)

#Analysis for duos only
window<-KnockoffTrio(trio=NULL, trio.ko=NULL,
        duo=knockoff$duo, duo.ko=knockoff$duo.ko, pos=KnockoffTrio.example$pos)

Example data for KnockoffTrio

Description

A toy example of haplotype and genotype data for trios and duos.

Usage

KnockoffTrio.example

Format

KnockoffTrio.example contains the following items:

trio

A 9*5 numeric genotype matrix of 3 trios and 5 variants. Each trio contains 3 rows in the order of father, mother and offspring. Each column represents a variant.

trio.hap

A 18*5 numeric haplotype matrix of 3 trios and 5 variants. Each trio contains 6 rows in the order of father, mother and offspring. Each column represents a variant.

duo.hap

A 12*5 numeric haplotype matrix of 3 duos and 5 variants. Each duo contains 4 rows in the order of a single parent and offspring. Each column represents a variant.

pos

A numeric vector of length 5 for the position of 5 variants.


Meta-analysis for KnockoffTrio

Description

Meta-analysis for KnockoffTrio

Usage

meta_analysis(window, n = NA, M = 10)

Arguments

window

A list of windows for the analysis results from different cohorts/studies.

n

A positive integer vector for the number of families in each cohort/study. For weighted meta-analysis, a study's weight is based on the number of families. The default is NA for unweighted meta-analysis.

M

A positive integer for the number of knockoffs. The default is 10.

Value

A data frame for the meta-analysis results.

Examples

data(KnockoffTrio.example)
knockoff<-create_knockoff(trio.hap=KnockoffTrio.example$trio.hap,
          duo.hap=KnockoffTrio.example$duo.hap, pos=KnockoffTrio.example$pos, M=10)
window<-KnockoffTrio(trio=KnockoffTrio.example$trio, trio.ko=knockoff$trio.ko,
        duo=knockoff$duo, duo.ko=knockoff$duo.ko, pos=KnockoffTrio.example$pos)
window.list<-list(window,window)
window.meta<-meta_analysis(window.list,M=10)
result<-causal_loci(window.meta,M=10,fdr=0.1)