Package 'IFP' reference manual

Title:	Identifying Functional Polymorphisms
Description:	A suite for identifying causal models using relative concordances and identifying causal polymorphisms in case-control genetic association data, especially with large controls re-sequenced data.
Authors:	Park L
Maintainer:	Leeyoung Park <[email protected]>
License:	GPL (>= 2)
Version:	0.2.4
Built:	2025-02-28 07:52:03 UTC
Source:	CRAN

Allele Frequency Computation from Genotype Data

Description

Computes allele frequencies from genotype data.

Usage

 allele.freq(geno)
allele.freq(geno)

Arguments

geno

matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent the alleles for each subject. Each allele shoud be represented as numbers (A=1,C=2,G=3,T=4).

Value

array of allele frequencies of each SNP. The computed allele is targeted as an order of alleles, "A", "C", "G", and "T".

Examples

 data(apoe)
 allele.freq(apoe7)
 allele.freq(apoe)
data(apoe)
 allele.freq(apoe7)
 allele.freq(apoe)

Allele Frequency Computation from the sequencing data with a vcf type of the 1000 Genomes Project

Description

Computes allele frequencies from the sequencing data with a vcf type of the 1000 Genomes Project.

Usage

 allele.freq.G(genoG)
allele.freq.G(genoG)

Arguments

genoG

matrix of haplotypes. Each row indicates a variant, and each column ind icates a haplotype of an individual. Two alleles of 0 and 1 are available.

Value

array of allele frequencies of each variant.

Examples

 data(apoeG)
 allele.freq.G(apoeG)
data(apoeG)
 allele.freq.G(apoeG)

Genetic data of APOE gene region

Description

This data set came from a re-sequenced data of APOE gene region in the Molecular Diversity and Epidemiology of Common Disease (MDECODE) database. Sixteen polymorphic sites were included. "apoe7" data contains the genetic data of seven single nucleotide polymorphisms with allele frequencies higher than 0.1 from the apoe data.

Usage

data(apoe)data(apoe)

Format

A matrix with 48 rows and 32 columns

Source

http://droog.gs.washington.edu/mdecode/

References

Nickerson, D. A., S. L. Taylor, S. M. Fullerton, K. M. Weiss, A. G. Clark et al. (2000) Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene. Genome Res 10: 1532-1545.

Sequencing data of APOE gene region from the 1000 Genomes Project

Description

This data set came from a re-sequenced data of APOE gene region from the 1000 Genomes Project. Thirty three polymorphic sites with allele frequencies higher than 0.001 were included for the original data set, apoeG. The test data sets, apoeT and apoeC, indicate the data of 100 controls and 100 cases respectively when the dominant variant is 15th variant with the odds ratio of 3.

Usage

data(apoeG)data(apoeG)

Format

A matrix with 33 rows and 2184 columns

Source

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/

References

Abecasis, G. R. et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467, 1061-1073.

causal models with all possible causal factors: G, GG, GE and E

Description

provides concordance probabilities of relative pairs for a causal model with G, G*G, G*E and E components

Usage

 drgegggne(fdg,frg,fdgg,frgg,fdge,frge,eg,e)
drgegggne(fdg,frg,fdgg,frgg,fdge,frge,eg,e)

Arguments

`fdg`	an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component
`frg`	an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component
`fdgg`	an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G*G component
`frgg`	an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G*G component
`fdge`	an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G*E component
`frge`	an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G*E component
`eg`	a proportion of population who are exposed to environmental cause of GE interactiong the genetic cause of GE during their entire life
`e`	a proportion of population who are exposed to environmental cause during their entire life

Value

matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)

Examples

### PLI=0.01.
ppt<-0.01



### for a model without one or more missing causal factors, 
### set the relevant parameters as zero.

pg<-0.002  # the proportion of G component in total populations
pgg<-0.002  # the proportion of G*G component in total populations
pge<-0.003  # the proportion of G*E component in total populations
e<-1-(1-ppt)/(1-pg)/(1-pgg)/(1-pge)   
   # the proportion of E component in total populations

fd<-0.001  # one dominant gene
tt<-3      # the number of recessive genes

temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

ppd<-sqrt(pgg)
fdg<-array(1-sqrt(1-ppd^(1/2)),2)
ttg<-1
temp<-(pgg/ppd)^(1/2/ttg)
frg<-c(array(0,length(fdg)),array(temp,ttg))
fdg<-c(fdg,array(0,ttg))

ppe<-0.5
ppg<-pge/ppe

fdge<-0.002
ttge<-2      # the number of recessive genes

temp<-sqrt(1-((1-ppg)/(1-fdge)^2)^(1/ttge))
frge<-c(array(0,length(fdge)),array(temp,ttge))
fdge<-c(fdge,array(0,ttge))


drgegggne(fd,fr,fdg,frg,fdge,frge,ppe,e)


### PLI=0.01.
ppt<-0.01



### for a model without one or more missing causal factors, 
### set the relevant parameters as zero.

pg<-0.002  # the proportion of G component in total populations
pgg<-0.002  # the proportion of G*G component in total populations
pge<-0.003  # the proportion of G*E component in total populations
e<-1-(1-ppt)/(1-pg)/(1-pgg)/(1-pge)   
   # the proportion of E component in total populations

fd<-0.001  # one dominant gene
tt<-3      # the number of recessive genes

temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

ppd<-sqrt(pgg)
fdg<-array(1-sqrt(1-ppd^(1/2)),2)
ttg<-1
temp<-(pgg/ppd)^(1/2/ttg)
frg<-c(array(0,length(fdg)),array(temp,ttg))
fdg<-c(fdg,array(0,ttg))

ppe<-0.5
ppg<-pge/ppe

fdge<-0.002
ttge<-2      # the number of recessive genes

temp<-sqrt(1-((1-ppg)/(1-fdge)^2)^(1/ttge))
frge<-c(array(0,length(fdge)),array(temp,ttge))
fdge<-c(fdge,array(0,ttge))


drgegggne(fd,fr,fdg,frg,fdge,frge,ppe,e)

causal models with three possible causal factors: G, G*E and E

Description

provides concordance probabilities of relative pairs for a causal model with G, G*E and E components

Usage

 drgegne(fdg,frg,fdge,frge,eg,e)
drgegne(fdg,frg,fdge,frge,eg,e)

Arguments

`fdg`	an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component
`frg`	an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component
`fdge`	an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G*E component
`frge`	an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G*E component
`eg`	a proportion of population who are exposed to environmental cause of GE interactiong the genetic cause of GE during their entire life
`e`	a proportion of population who are exposed to environmental cause during their entire life

Value

Examples

### PLI=0.01.
ppt<-0.01



pg<-0.002  # the proportion of G component in total populations
pge<-0.005  # the proportion of G*E component in total populations
e<-1-(1-ppt)/(1-pg)/(1-pge)   
  # the proportion of E component in total populations

fd<-0.001  # one dominant gene
tt<-2      # the number of recessive genes

temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

ppe<-0.5
ppg<-pge/ppe

fdge<-0.002
ttge<-2      # the number of recessive genes

temp<-sqrt(1-((1-ppg)/(1-fdge)^2)^(1/ttge))
frge<-c(array(0,length(fdge)),array(temp,ttge))
fdge<-c(fdge,array(0,ttge))


drgegne(fd,fr,fdge,frge,ppe,e)


### PLI=0.01.
ppt<-0.01



pg<-0.002  # the proportion of G component in total populations
pge<-0.005  # the proportion of G*E component in total populations
e<-1-(1-ppt)/(1-pg)/(1-pge)   
  # the proportion of E component in total populations

fd<-0.001  # one dominant gene
tt<-2      # the number of recessive genes

temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

ppe<-0.5
ppg<-pge/ppe

fdge<-0.002
ttge<-2      # the number of recessive genes

temp<-sqrt(1-((1-ppg)/(1-fdge)^2)^(1/ttge))
frge<-c(array(0,length(fdge)),array(temp,ttge))
fdge<-c(fdge,array(0,ttge))


drgegne(fd,fr,fdge,frge,ppe,e)

causal models with G*E

Description

provides concordance probabilities of relative pairs for a causal model with G*E component

Usage

 drgen(fd,fr,e)
drgen(fd,fr,e)

Arguments

`fd`	an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component of GE interacting with E of GE
`fr`	an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component of GE interacting with E of GE
`e`	a proportion of population who are exposed to environmental cause of GE interacting with genetic cause of GE during their entire life

Value

a list of the g*e proportion in population and a matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)

Examples

### PLI=0.01.
ppt<-0.01



### g*e model

pge<-ppt  # the proportion of G*E component in total populations

ppe<-0.5
ppg<-pge/ppe

fd<-0.0005  # one dominant gene
tt<-3      # the number of recessive genes

temp<-sqrt(1-((1-ppg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

drgen(fd,fr,ppe)


### PLI=0.01.
ppt<-0.01



### g*e model

pge<-ppt  # the proportion of G*E component in total populations

ppe<-0.5
ppg<-pge/ppe

fd<-0.0005  # one dominant gene
tt<-3      # the number of recessive genes

temp<-sqrt(1-((1-ppg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

drgen(fd,fr,ppe)

causal models with G*E and E

Description

provides concordance probabilities of relative pairs for a causal model with G*E and E components

Usage

 drgene(fdg,frg,eg,e)
drgene(fdg,frg,eg,e)

Arguments

`fdg`	an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component of GE interacting with E of GE
`frg`	an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component of GE interacting with E of GE
`eg`	a proportion of population who are exposed to environmental cause of GE interacting with genetic cause of GE during their entire life
`e`	a proportion of population who are exposed to environmental cause during their entire life

Value

Examples

### PLI=0.01.
ppt<-0.01



### g*e+e model

pge<-0.007  # the proportion of G*E component in total populations
e<-1-(1-ppt)/(1-pge)   # the proportion of E component in total populations

ppe<-0.5
ppg<-pge/ppe

fd<-0.0005  # one dominant gene
tt<-3      # the number of recessive genes

temp<-sqrt(1-((1-ppg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

drgene(fd,fr,ppe,e)


### PLI=0.01.
ppt<-0.01



### g*e+e model

pge<-0.007  # the proportion of G*E component in total populations
e<-1-(1-ppt)/(1-pge)   # the proportion of E component in total populations

ppe<-0.5
ppg<-pge/ppe

fd<-0.0005  # one dominant gene
tt<-3      # the number of recessive genes

temp<-sqrt(1-((1-ppg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

drgene(fd,fr,ppe,e)

causal models with G*G

Description

provides concordance probabilities of relative pairs for a causal model with G*G component

Usage

 drggn(fd,fr)
drggn(fd,fr)

Arguments

`fd`	an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G*G component
`fr`	an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G*G component

Value

a list of PLI and a matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)

Examples

### PLI=0.01.
ppt<-0.01



### g*g model

pp<-ppt  # the proportion of G*G component in total populations

gd<-sqrt(pp) # dominant gene proportion = recessive gene proportion
fd<-array(1-sqrt(1-gd^(1/2)),2)  # two dominant genes
tt<-2      # the number of recessive genes: 2

temp<-(pp/gd)^(1/2/tt)
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

drggn(fd,fr)


### PLI=0.01.
ppt<-0.01



### g*g model

pp<-ppt  # the proportion of G*G component in total populations

gd<-sqrt(pp) # dominant gene proportion = recessive gene proportion
fd<-array(1-sqrt(1-gd^(1/2)),2)  # two dominant genes
tt<-2      # the number of recessive genes: 2

temp<-(pp/gd)^(1/2/tt)
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

drggn(fd,fr)

causal models with G

Description

provides concordance probabilities of relative pairs for a causal model with G component

Usage

 drgn(fd,fr)
drgn(fd,fr)

Arguments

`fd`	an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component
`fr`	an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component

Value

list of the value of PLI and the matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)

Examples

### PLI=0.01.
ppt<-0.01



### g model

pp<-ppt  # the proportion of G component in total populations

fdt<-0.001 # one dominant gene with frequency of 0.001
tt<-5      # the number of recessive genes: 5

fd<-c(fdt,array(0,tt))
temp<-sqrt(1-((1-pp)/(1-fdt)^2)^(1/tt))
fr<-c(0,array(temp,tt))

drgn(fd,fr)


### PLI=0.01.
ppt<-0.01



### g model

pp<-ppt  # the proportion of G component in total populations

fdt<-0.001 # one dominant gene with frequency of 0.001
tt<-5      # the number of recessive genes: 5

fd<-c(fdt,array(0,tt))
temp<-sqrt(1-((1-pp)/(1-fdt)^2)^(1/tt))
fr<-c(0,array(temp,tt))

drgn(fd,fr)

Error Rates Estimation for Likelihood Ratio Tests Designed for Identifying Number of Functional Polymorphisms

Description

Compute error rates for a given model.

Usage

 error.rates(H0,Z, pMc, geno, no.ca, no.con=nrow(geno), sim.no = 1000)
error.rates(H0,Z, pMc, geno, no.ca, no.con=nrow(geno), sim.no = 1000)

Arguments

`H0`	the index number for a given model for functional SNPs
`Z`	number of functional SNPs for the given model
`pMc`	array of allele frequencies of case samples
`geno`	matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent the alleles for each subject. Each allele shoud be represented as numbers (A=1,C=2,G=3,T=4).
`no.ca`	number of case chromosomes
`no.con`	number of control chromosomes
`sim.no`	number of simulations for error rates estimation

Value

array of results consisted of Type I error rate (alpha=0.05), Type I error rate (alpha=0.01), Type II error rate (beta=0.05), Type II error rate (beta=0.01), percent when the target model has the lowest corrected -2 log likelihood ratio.

Examples

## LRT tests when SNP1 & SNP6 are the functional polymorphisms.


data(apoe)

n<-c(2000, 2000, 2000, 2000, 2000, 2000, 2000) #case sample size = 1000
x<-c(1707, 281,1341, 435, 772, 416, 1797) #allele numbers in case samples 

Z<-2 	#number of functional SNPs for tests
n.poly<-ncol(apoe7)/2 	#total number of SNPs

#index number for the model in this case is 5 for SNP1 and 6. 
#apoe7 is considered to represent the true control allele and haplotype frequencies.
#Control sample size = 1000.

error.rates(5, 2, x/n, apoe7, 2000, 2000, sim.no=2)

# to obtain valid rates, use sim.no=1000.

## LRT tests when SNP1 & SNP6 are the functional polymorphisms.


data(apoe)

n<-c(2000, 2000, 2000, 2000, 2000, 2000, 2000) #case sample size = 1000
x<-c(1707, 281,1341, 435, 772, 416, 1797) #allele numbers in case samples 

Z<-2 	#number of functional SNPs for tests
n.poly<-ncol(apoe7)/2 	#total number of SNPs

#index number for the model in this case is 5 for SNP1 and 6. 
#apoe7 is considered to represent the true control allele and haplotype frequencies.
#Control sample size = 1000.

error.rates(5, 2, x/n, apoe7, 2000, 2000, sim.no=2)

# to obtain valid rates, use sim.no=1000.

Genotype Frequency Computation from the sequencing data with a vcf type of the 1000 Genomes Project

Description

Computes genotype frequencies from the sequencing data with a vcf type of the 1000 Genomes Project.

Usage

 geno.freq(genoG)
geno.freq(genoG)

Arguments

genoG

matrix of haplotypes. Each row indicates a variant, and each column ind icates a haplotype of an individual. Two alleles of 0 and 1 are available.

Value

matrix of genotype frequencies of each variant.

Examples

 data(apoeG)
 geno.freq(apoeG)
data(apoeG)
 geno.freq(apoeG)

Conversion to Genotypes from Alleles using the sequencing data with a vcf type of the 1000 Genomes Project

Description

Convert sequencing data to genotypes.

Usage

 genotype(genoG)
genotype(genoG)

Arguments

genoG

matrix of haplotypes. Each row indicates a variant, and each column ind icates a haplotype of an individual. Two alleles of 0 and 1 are available.

Value

matrix of genotypes with rows of variants and with columns of individuals.

Examples

 data(apoeG)
 genotype(apoeG)
data(apoeG)
 genotype(apoeG)

Estimation of Haplotype Frequencies with Two SNPs

Description

EM computation of haplotype frequencies with two SNPs. The computation is relied on the package"haplo.stats".

Usage

 hap.freq(geno)
hap.freq(geno)

Arguments

geno

Value

matrix of haplotype frequencies consisted of two alleles from each SNP. These alleles are the same ones computed for frequency using the function "allele.freq".

Examples

 data(apoe)
 hap.freq(apoe7)
 hap.freq(apoe)
data(apoe)
 hap.freq(apoe7)
 hap.freq(apoe)

mcmc inference of causal models with all possible causal factors: G, GG, GE and E

Description

provides proportions of each causal factor of G, G*G, G*E and E based on relative concordance data

Usage

 iter.mcmc(ppt,aj=2,n.iter,n.chains,thinning=5,init.cut,darray,x,n,model,mcmcrg=0.01)
iter.mcmc(ppt,aj=2,n.iter,n.chains,thinning=5,init.cut,darray,x,n,model,mcmcrg=0.01)

Arguments

`ppt`	population lifetime incidence
`aj`	a constant for the stage of data collection
`n.iter`	number of mcmc iterations
`n.chains`	number of mcmc chain
`thinning`	mcmc thinning parameter (default=5)
`init.cut`	mcmc data cut
`darray`	indicating the array positions of available data among 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)
`x`	number of disease concordance of relative pairs
`n`	total number of relative pairs
`model`	an array, size of 4 (1: E component; 2: G component; 3: GE component; 4: GG component), indicating the existance of the causal component: 0: excluded; 1: included.
`mcmcrg`	parameter of the data collection stage (default=0.01)

Value

a list of rejectionRate, result summary, Gelman-Rubin diagnostics (point est. & upper C.I.) for output variables: e[1]: proportion of environmental factor (E) g[2]: proportion of genetic factor (G) ge[3]: proportion of gene-environment interaction (G*E) gg[4]: proportion of gene interactions (G*G) gn[5]: number of recessive genes in G ppe[6]: population proportion of interacting environment in G*E ppg[7]: population proportion of interacting genetic factor in G*E fd[8]: frequency of dominant genes in G fdge[9]: frequency of dominant genes in G*E gnge[10]: number of recessive genes in G*E ppd[11]: population proportion of dominant genes in G*G ppr[12]: population proportion of recessive genes in G*G kd[13]: number of dominant genes in G*G kr[14]: number of recessive genes in G*G

References

L. Park, J. Kim, A novel approach for identifying causal models of complex disease from family data, Genetics, 2015 Apr; 199, 1007-1016.

Examples

### PLI=0.01.
ppt<-0.01

### a simple causal model with G and E components

pg<-0.007  # the proportion of G component in total populations
pgg<-0  # the proportion of G*G component in total populations
pge<-0  # the proportion of G*E component in total populations
e<-1-(1-ppt)/(1-pg)   # the proportion of E component in total populations

fd<-0.001  # one dominant gene
tt<-3      # the number of recessive genes

temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

rp<-drgegggne(fd,fr,c(0,0),c(0,0),c(0,0),c(0,0),0,e)

sdata<-rp[,3]/(rp[,2]+rp[,3])
#sdata<-round(sdata*500)

darray<-c(1:2,4:6)  
  ## available data= MZT, P-O, sibs, grandparent-grandchild, avuncular pair
n<-array(1000,length(darray))
x<-array()
for(i in 1:length(darray)){
x[i]<-rbinom(1,n[i],sdata[darray[i]])
}
model<-c(1,1,0,0)

## remove # from the following lines to test examples.
#iter.mcmc(ppt,2,15,2,1,1,darray,x,n,model) # provide a running test
#iter.mcmc(ppt,2,2000,2,10,500,darray,x,n,model) # provide a proper result

### PLI=0.01.
ppt<-0.01

### a simple causal model with G and E components

pg<-0.007  # the proportion of G component in total populations
pgg<-0  # the proportion of G*G component in total populations
pge<-0  # the proportion of G*E component in total populations
e<-1-(1-ppt)/(1-pg)   # the proportion of E component in total populations

fd<-0.001  # one dominant gene
tt<-3      # the number of recessive genes

temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt))
fr<-c(array(0,length(fd)),array(temp,tt))
fd<-c(fd,array(0,tt))

rp<-drgegggne(fd,fr,c(0,0),c(0,0),c(0,0),c(0,0),0,e)

sdata<-rp[,3]/(rp[,2]+rp[,3])
#sdata<-round(sdata*500)

darray<-c(1:2,4:6)  
  ## available data= MZT, P-O, sibs, grandparent-grandchild, avuncular pair
n<-array(1000,length(darray))
x<-array()
for(i in 1:length(darray)){
x[i]<-rbinom(1,n[i],sdata[darray[i]])
}
model<-c(1,1,0,0)

## remove # from the following lines to test examples.
#iter.mcmc(ppt,2,15,2,1,1,darray,x,n,model) # provide a running test
#iter.mcmc(ppt,2,2000,2,10,500,darray,x,n,model) # provide a proper result

Likelihood Ratio Tests for Identifying Number of Functional Polymorphisms

Description

Compute p-values and likelihoods of all possible models for a given number of functional SNP(s).

Usage

 lrt(n.fp, n, x, geno, no.con=nrow(geno))
lrt(n.fp, n, x, geno, no.con=nrow(geno))

Arguments

`n.fp`	number of functional SNPs for tests.
`n`	array of each total number of case sample chromosomes for SNPs
`x`	array of each total allele number in case samples
`geno`	matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent the alleles for each subject. Each allele shoud be represented as numbers (A=1,C=2,G=3,T=4).
`no.con`	number of control chromosomes.

Value

matrix of likelihood ratio test results. First n.fp rows indicate the model for each set of disease polymorphisms, and followed by p-values, -2 log(likelihood ratio) with corrections for variances, maximum likelihood ratio estimates, and likelihood.

References

L. Park, Identifying disease polymorphisms from case-control genetic association data, Genetica, 2010 138 (11-12), 1147-1159.

Examples

## LRT tests when SNP1 & SNP6 are the functional polymorphisms.

data(apoe)

n<-c(2000, 2000, 2000, 2000, 2000, 2000, 2000) #case sample size = 1000
x<-c(1707, 281,1341, 435, 772, 416, 1797) #allele numbers in case samples 


Z<-2 	#number of functional SNPs for tests
n.poly<-ncol(apoe7)/2 	#total number of SNPs

#control sample generation( sample size = 1000 )
con.samp<-sample(nrow(apoe7),1000,replace=TRUE)
con.data<-array()
for (i in con.samp){
con.data<-rbind(con.data,apoe7[i,])
}
con.data<-con.data[2:1001,]

lrt(1,n,x,con.data)
lrt(2,n,x,con.data)
## LRT tests when SNP1 & SNP6 are the functional polymorphisms.

data(apoe)

n<-c(2000, 2000, 2000, 2000, 2000, 2000, 2000) #case sample size = 1000
x<-c(1707, 281,1341, 435, 772, 416, 1797) #allele numbers in case samples 


Z<-2 	#number of functional SNPs for tests
n.poly<-ncol(apoe7)/2 	#total number of SNPs

#control sample generation( sample size = 1000 )
con.samp<-sample(nrow(apoe7),1000,replace=TRUE)
con.data<-array()
for (i in con.samp){
con.data<-rbind(con.data,apoe7[i,])
}
con.data<-con.data[2:1001,]

lrt(1,n,x,con.data)
lrt(2,n,x,con.data)

Likelihood Ratio Tests for Identifying Disease Polymorphisms with Same Effects

Description

Compute p-values and likelihoods of all possible models for a given number of disease SNP(s).

Usage

 lrtG(n.fp, genoT, genoC)
lrtG(n.fp, genoT, genoC)

Arguments

`n.fp`	number of disease SNPs for tests.
`genoT`	matrix of control genotypes. Each row indicates a variant, and each column indicates a haplotype of an individual. Two alleles of 0 and 1 are allowed.
`genoC`	matrix of case genotypes. Each row indicates a variant, and each column indicates a haplotype of an individual. Two alleles of 0 and 1 are allowed.

Value

matrix of likelihood ratio test results. First row indicates the index, and following n.fp rows indicate the model for each set of disease polymorphisms, and followed by p-values, -2 log(likelihood ratio) with corrections for variances, and the degree of freedom.

References

L. Park, J. Kim, Rare high-impact disease variants: properties and identification, Genetics Research, 2016 Mar; 98, e6.

Examples

## LRT tests for a dominant variant (15th variant)
## the odds ratio: 3, control: 100, case: 100.

data(apoeG)
lrtG(1,genoT[,1:20],genoC[,1:20])

# use "lrtG(1,genoT,genoC)" for the actual test.

## LRT tests for a dominant variant (15th variant)
## the odds ratio: 3, control: 100, case: 100.

data(apoeG)
lrtG(1,genoT[,1:20],genoC[,1:20])

# use "lrtG(1,genoT,genoC)" for the actual test.

Package 'IFP'

Help Index

Allele Frequency Computation from Genotype Data

Description

Usage

Arguments

Value

Examples

Allele Frequency Computation from the sequencing data with a vcf type of the 1000 Genomes Project

Description

Usage

Arguments

Value

Examples

Genetic data of APOE gene region

Description

Usage

Format

Source

References

Sequencing data of APOE gene region from the 1000 Genomes Project

Description

Usage

Format

Source

References

causal models with all possible causal factors: G, G*G, G*E and E

Description

Usage

Arguments

Value

See Also

Examples

causal models with three possible causal factors: G, G*E and E

Description

Usage

Arguments

Value

See Also

Examples

causal models with G*E

Description

Usage

Arguments

Value

See Also

Examples

causal models with G*E and E

Description

Usage

Arguments

Value

See Also

Examples

causal models with G*G

Description

Usage

Arguments

Value

See Also

Examples

causal models with G

Description

Usage

Arguments

Value

See Also

Examples

Error Rates Estimation for Likelihood Ratio Tests Designed for Identifying Number of Functional Polymorphisms

Description

Usage

Arguments

Value

See Also

Examples

Genotype Frequency Computation from the sequencing data with a vcf type of the 1000 Genomes Project

Description

Usage

Arguments

Value

causal models with all possible causal factors: G, GG, GE and E

mcmc inference of causal models with all possible causal factors: G, GG, GE and E