Title: | Identifying Functional Polymorphisms |
---|---|
Description: | A suite for identifying causal models using relative concordances and identifying causal polymorphisms in case-control genetic association data, especially with large controls re-sequenced data. |
Authors: | Park L |
Maintainer: | Leeyoung Park <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.4 |
Built: | 2024-10-31 19:58:27 UTC |
Source: | CRAN |
Computes allele frequencies from genotype data.
allele.freq(geno)
allele.freq(geno)
geno |
matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent the alleles for each subject. Each allele shoud be represented as numbers (A=1,C=2,G=3,T=4). |
array of allele frequencies of each SNP. The computed allele is targeted as an order of alleles, "A", "C", "G", and "T".
data(apoe) allele.freq(apoe7) allele.freq(apoe)
data(apoe) allele.freq(apoe7) allele.freq(apoe)
Computes allele frequencies from the sequencing data with a vcf type of the 1000 Genomes Project.
allele.freq.G(genoG)
allele.freq.G(genoG)
genoG |
matrix of haplotypes. Each row indicates a variant, and each column ind icates a haplotype of an individual. Two alleles of 0 and 1 are available. |
array of allele frequencies of each variant.
data(apoeG) allele.freq.G(apoeG)
data(apoeG) allele.freq.G(apoeG)
This data set came from a re-sequenced data of APOE gene region in the Molecular Diversity and Epidemiology of Common Disease (MDECODE) database. Sixteen polymorphic sites were included. "apoe7" data contains the genetic data of seven single nucleotide polymorphisms with allele frequencies higher than 0.1 from the apoe data.
data(apoe)
data(apoe)
A matrix with 48 rows and 32 columns
http://droog.gs.washington.edu/mdecode/
Nickerson, D. A., S. L. Taylor, S. M. Fullerton, K. M. Weiss, A. G. Clark et al. (2000) Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene. Genome Res 10: 1532-1545.
This data set came from a re-sequenced data of APOE gene region from the 1000 Genomes Project. Thirty three polymorphic sites with allele frequencies higher than 0.001 were included for the original data set, apoeG. The test data sets, apoeT and apoeC, indicate the data of 100 controls and 100 cases respectively when the dominant variant is 15th variant with the odds ratio of 3.
data(apoeG)
data(apoeG)
A matrix with 33 rows and 2184 columns
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/
Abecasis, G. R. et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467, 1061-1073.
provides concordance probabilities of relative pairs for a causal model with G, G*G, G*E and E components
drgegggne(fdg,frg,fdgg,frgg,fdge,frge,eg,e)
drgegggne(fdg,frg,fdgg,frgg,fdge,frge,eg,e)
fdg |
an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component |
frg |
an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component |
fdgg |
an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G*G component |
frgg |
an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G*G component |
fdge |
an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G*E component |
frge |
an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G*E component |
eg |
a proportion of population who are exposed to environmental cause of G*E interactiong the genetic cause of G*E during their entire life |
e |
a proportion of population who are exposed to environmental cause during their entire life |
matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)
drggn drgegne
### PLI=0.01. ppt<-0.01 ### for a model without one or more missing causal factors, ### set the relevant parameters as zero. pg<-0.002 # the proportion of G component in total populations pgg<-0.002 # the proportion of G*G component in total populations pge<-0.003 # the proportion of G*E component in total populations e<-1-(1-ppt)/(1-pg)/(1-pgg)/(1-pge) # the proportion of E component in total populations fd<-0.001 # one dominant gene tt<-3 # the number of recessive genes temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) ppd<-sqrt(pgg) fdg<-array(1-sqrt(1-ppd^(1/2)),2) ttg<-1 temp<-(pgg/ppd)^(1/2/ttg) frg<-c(array(0,length(fdg)),array(temp,ttg)) fdg<-c(fdg,array(0,ttg)) ppe<-0.5 ppg<-pge/ppe fdge<-0.002 ttge<-2 # the number of recessive genes temp<-sqrt(1-((1-ppg)/(1-fdge)^2)^(1/ttge)) frge<-c(array(0,length(fdge)),array(temp,ttge)) fdge<-c(fdge,array(0,ttge)) drgegggne(fd,fr,fdg,frg,fdge,frge,ppe,e)
### PLI=0.01. ppt<-0.01 ### for a model without one or more missing causal factors, ### set the relevant parameters as zero. pg<-0.002 # the proportion of G component in total populations pgg<-0.002 # the proportion of G*G component in total populations pge<-0.003 # the proportion of G*E component in total populations e<-1-(1-ppt)/(1-pg)/(1-pgg)/(1-pge) # the proportion of E component in total populations fd<-0.001 # one dominant gene tt<-3 # the number of recessive genes temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) ppd<-sqrt(pgg) fdg<-array(1-sqrt(1-ppd^(1/2)),2) ttg<-1 temp<-(pgg/ppd)^(1/2/ttg) frg<-c(array(0,length(fdg)),array(temp,ttg)) fdg<-c(fdg,array(0,ttg)) ppe<-0.5 ppg<-pge/ppe fdge<-0.002 ttge<-2 # the number of recessive genes temp<-sqrt(1-((1-ppg)/(1-fdge)^2)^(1/ttge)) frge<-c(array(0,length(fdge)),array(temp,ttge)) fdge<-c(fdge,array(0,ttge)) drgegggne(fd,fr,fdg,frg,fdge,frge,ppe,e)
provides concordance probabilities of relative pairs for a causal model with G, G*E and E components
drgegne(fdg,frg,fdge,frge,eg,e)
drgegne(fdg,frg,fdge,frge,eg,e)
fdg |
an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component |
frg |
an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component |
fdge |
an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G*E component |
frge |
an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G*E component |
eg |
a proportion of population who are exposed to environmental cause of G*E interactiong the genetic cause of G*E during their entire life |
e |
a proportion of population who are exposed to environmental cause during their entire life |
matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)
drgn drgene
### PLI=0.01. ppt<-0.01 pg<-0.002 # the proportion of G component in total populations pge<-0.005 # the proportion of G*E component in total populations e<-1-(1-ppt)/(1-pg)/(1-pge) # the proportion of E component in total populations fd<-0.001 # one dominant gene tt<-2 # the number of recessive genes temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) ppe<-0.5 ppg<-pge/ppe fdge<-0.002 ttge<-2 # the number of recessive genes temp<-sqrt(1-((1-ppg)/(1-fdge)^2)^(1/ttge)) frge<-c(array(0,length(fdge)),array(temp,ttge)) fdge<-c(fdge,array(0,ttge)) drgegne(fd,fr,fdge,frge,ppe,e)
### PLI=0.01. ppt<-0.01 pg<-0.002 # the proportion of G component in total populations pge<-0.005 # the proportion of G*E component in total populations e<-1-(1-ppt)/(1-pg)/(1-pge) # the proportion of E component in total populations fd<-0.001 # one dominant gene tt<-2 # the number of recessive genes temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) ppe<-0.5 ppg<-pge/ppe fdge<-0.002 ttge<-2 # the number of recessive genes temp<-sqrt(1-((1-ppg)/(1-fdge)^2)^(1/ttge)) frge<-c(array(0,length(fdge)),array(temp,ttge)) fdge<-c(fdge,array(0,ttge)) drgegne(fd,fr,fdge,frge,ppe,e)
provides concordance probabilities of relative pairs for a causal model with G*E component
drgen(fd,fr,e)
drgen(fd,fr,e)
fd |
an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component of G*E interacting with E of G*E |
fr |
an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component of G*E interacting with E of G*E |
e |
a proportion of population who are exposed to environmental cause of G*E interacting with genetic cause of G*E during their entire life |
a list of the g*e proportion in population and a matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)
drgene.gm
### PLI=0.01. ppt<-0.01 ### g*e model pge<-ppt # the proportion of G*E component in total populations ppe<-0.5 ppg<-pge/ppe fd<-0.0005 # one dominant gene tt<-3 # the number of recessive genes temp<-sqrt(1-((1-ppg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) drgen(fd,fr,ppe)
### PLI=0.01. ppt<-0.01 ### g*e model pge<-ppt # the proportion of G*E component in total populations ppe<-0.5 ppg<-pge/ppe fd<-0.0005 # one dominant gene tt<-3 # the number of recessive genes temp<-sqrt(1-((1-ppg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) drgen(fd,fr,ppe)
provides concordance probabilities of relative pairs for a causal model with G*E and E components
drgene(fdg,frg,eg,e)
drgene(fdg,frg,eg,e)
fdg |
an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component of G*E interacting with E of G*E |
frg |
an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component of G*E interacting with E of G*E |
eg |
a proportion of population who are exposed to environmental cause of G*E interacting with genetic cause of G*E during their entire life |
e |
a proportion of population who are exposed to environmental cause during their entire life |
matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)
drgen.gm
### PLI=0.01. ppt<-0.01 ### g*e+e model pge<-0.007 # the proportion of G*E component in total populations e<-1-(1-ppt)/(1-pge) # the proportion of E component in total populations ppe<-0.5 ppg<-pge/ppe fd<-0.0005 # one dominant gene tt<-3 # the number of recessive genes temp<-sqrt(1-((1-ppg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) drgene(fd,fr,ppe,e)
### PLI=0.01. ppt<-0.01 ### g*e+e model pge<-0.007 # the proportion of G*E component in total populations e<-1-(1-ppt)/(1-pge) # the proportion of E component in total populations ppe<-0.5 ppg<-pge/ppe fd<-0.0005 # one dominant gene tt<-3 # the number of recessive genes temp<-sqrt(1-((1-ppg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) drgene(fd,fr,ppe,e)
provides concordance probabilities of relative pairs for a causal model with G*G component
drggn(fd,fr)
drggn(fd,fr)
fd |
an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G*G component |
fr |
an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G*G component |
a list of PLI and a matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)
drgegggne
### PLI=0.01. ppt<-0.01 ### g*g model pp<-ppt # the proportion of G*G component in total populations gd<-sqrt(pp) # dominant gene proportion = recessive gene proportion fd<-array(1-sqrt(1-gd^(1/2)),2) # two dominant genes tt<-2 # the number of recessive genes: 2 temp<-(pp/gd)^(1/2/tt) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) drggn(fd,fr)
### PLI=0.01. ppt<-0.01 ### g*g model pp<-ppt # the proportion of G*G component in total populations gd<-sqrt(pp) # dominant gene proportion = recessive gene proportion fd<-array(1-sqrt(1-gd^(1/2)),2) # two dominant genes tt<-2 # the number of recessive genes: 2 temp<-(pp/gd)^(1/2/tt) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) drggn(fd,fr)
provides concordance probabilities of relative pairs for a causal model with G component
drgn(fd,fr)
drgn(fd,fr)
fd |
an array (size=number of dominant genes+recessive genes) of dominant gene frequencies including 0 values of recessive genes of G component |
fr |
an array (size=number of dominant genes+recessive genes) of recessive gene frequencies including 0 values of dominant genes of G component |
list of the value of PLI and the matrix of NN, ND, and DD probabilities of 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild)
drgegne.gm
### PLI=0.01. ppt<-0.01 ### g model pp<-ppt # the proportion of G component in total populations fdt<-0.001 # one dominant gene with frequency of 0.001 tt<-5 # the number of recessive genes: 5 fd<-c(fdt,array(0,tt)) temp<-sqrt(1-((1-pp)/(1-fdt)^2)^(1/tt)) fr<-c(0,array(temp,tt)) drgn(fd,fr)
### PLI=0.01. ppt<-0.01 ### g model pp<-ppt # the proportion of G component in total populations fdt<-0.001 # one dominant gene with frequency of 0.001 tt<-5 # the number of recessive genes: 5 fd<-c(fdt,array(0,tt)) temp<-sqrt(1-((1-pp)/(1-fdt)^2)^(1/tt)) fr<-c(0,array(temp,tt)) drgn(fd,fr)
Compute error rates for a given model.
error.rates(H0,Z, pMc, geno, no.ca, no.con=nrow(geno), sim.no = 1000)
error.rates(H0,Z, pMc, geno, no.ca, no.con=nrow(geno), sim.no = 1000)
H0 |
the index number for a given model for functional SNPs |
Z |
number of functional SNPs for the given model |
pMc |
array of allele frequencies of case samples |
geno |
matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent the alleles for each subject. Each allele shoud be represented as numbers (A=1,C=2,G=3,T=4). |
no.ca |
number of case chromosomes |
no.con |
number of control chromosomes |
sim.no |
number of simulations for error rates estimation |
array of results consisted of Type I error rate (alpha=0.05), Type I error rate (alpha=0.01), Type II error rate (beta=0.05), Type II error rate (beta=0.01), percent when the target model has the lowest corrected -2 log likelihood ratio.
allele.freq hap.freq lrtB
## LRT tests when SNP1 & SNP6 are the functional polymorphisms. data(apoe) n<-c(2000, 2000, 2000, 2000, 2000, 2000, 2000) #case sample size = 1000 x<-c(1707, 281,1341, 435, 772, 416, 1797) #allele numbers in case samples Z<-2 #number of functional SNPs for tests n.poly<-ncol(apoe7)/2 #total number of SNPs #index number for the model in this case is 5 for SNP1 and 6. #apoe7 is considered to represent the true control allele and haplotype frequencies. #Control sample size = 1000. error.rates(5, 2, x/n, apoe7, 2000, 2000, sim.no=2) # to obtain valid rates, use sim.no=1000.
## LRT tests when SNP1 & SNP6 are the functional polymorphisms. data(apoe) n<-c(2000, 2000, 2000, 2000, 2000, 2000, 2000) #case sample size = 1000 x<-c(1707, 281,1341, 435, 772, 416, 1797) #allele numbers in case samples Z<-2 #number of functional SNPs for tests n.poly<-ncol(apoe7)/2 #total number of SNPs #index number for the model in this case is 5 for SNP1 and 6. #apoe7 is considered to represent the true control allele and haplotype frequencies. #Control sample size = 1000. error.rates(5, 2, x/n, apoe7, 2000, 2000, sim.no=2) # to obtain valid rates, use sim.no=1000.
Computes genotype frequencies from the sequencing data with a vcf type of the 1000 Genomes Project.
geno.freq(genoG)
geno.freq(genoG)
genoG |
matrix of haplotypes. Each row indicates a variant, and each column ind icates a haplotype of an individual. Two alleles of 0 and 1 are available. |
matrix of genotype frequencies of each variant.
data(apoeG) geno.freq(apoeG)
data(apoeG) geno.freq(apoeG)
Convert sequencing data to genotypes.
genotype(genoG)
genotype(genoG)
genoG |
matrix of haplotypes. Each row indicates a variant, and each column ind icates a haplotype of an individual. Two alleles of 0 and 1 are available. |
matrix of genotypes with rows of variants and with columns of individuals.
data(apoeG) genotype(apoeG)
data(apoeG) genotype(apoeG)
EM computation of haplotype frequencies with two SNPs. The computation is relied on the package"haplo.stats".
hap.freq(geno)
hap.freq(geno)
geno |
matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent the alleles for each subject. Each allele shoud be represented as numbers (A=1,C=2,G=3,T=4). |
matrix of haplotype frequencies consisted of two alleles from each SNP. These alleles are the same ones computed for frequency using the function "allele.freq".
allele.freq
data(apoe) hap.freq(apoe7) hap.freq(apoe)
data(apoe) hap.freq(apoe7) hap.freq(apoe)
provides proportions of each causal factor of G, G*G, G*E and E based on relative concordance data
iter.mcmc(ppt,aj=2,n.iter,n.chains,thinning=5,init.cut,darray,x,n,model,mcmcrg=0.01)
iter.mcmc(ppt,aj=2,n.iter,n.chains,thinning=5,init.cut,darray,x,n,model,mcmcrg=0.01)
ppt |
population lifetime incidence |
aj |
a constant for the stage of data collection |
n.iter |
number of mcmc iterations |
n.chains |
number of mcmc chain |
thinning |
mcmc thinning parameter (default=5) |
init.cut |
mcmc data cut |
darray |
indicating the array positions of available data among 9 relative pairs: 1:mzt,2:parent-offspring,3:dzt,4:sibling,5:2-direct(grandparent-grandchild),6:3rd(uncle-niece),7:3-direct(great-grandparent-great-grandchild),8:4th (causin),9:4d(great-great-grandparent-great-great-grandchild) |
x |
number of disease concordance of relative pairs |
n |
total number of relative pairs |
model |
an array, size of 4 (1: E component; 2: G component; 3: G*E component; 4: G*G component), indicating the existance of the causal component: 0: excluded; 1: included. |
mcmcrg |
parameter of the data collection stage (default=0.01) |
a list of rejectionRate, result summary, Gelman-Rubin diagnostics (point est. & upper C.I.) for output variables: e[1]: proportion of environmental factor (E) g[2]: proportion of genetic factor (G) ge[3]: proportion of gene-environment interaction (G*E) gg[4]: proportion of gene interactions (G*G) gn[5]: number of recessive genes in G ppe[6]: population proportion of interacting environment in G*E ppg[7]: population proportion of interacting genetic factor in G*E fd[8]: frequency of dominant genes in G fdge[9]: frequency of dominant genes in G*E gnge[10]: number of recessive genes in G*E ppd[11]: population proportion of dominant genes in G*G ppr[12]: population proportion of recessive genes in G*G kd[13]: number of dominant genes in G*G kr[14]: number of recessive genes in G*G
L. Park, J. Kim, A novel approach for identifying causal models of complex disease from family data, Genetics, 2015 Apr; 199, 1007-1016.
### PLI=0.01. ppt<-0.01 ### a simple causal model with G and E components pg<-0.007 # the proportion of G component in total populations pgg<-0 # the proportion of G*G component in total populations pge<-0 # the proportion of G*E component in total populations e<-1-(1-ppt)/(1-pg) # the proportion of E component in total populations fd<-0.001 # one dominant gene tt<-3 # the number of recessive genes temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) rp<-drgegggne(fd,fr,c(0,0),c(0,0),c(0,0),c(0,0),0,e) sdata<-rp[,3]/(rp[,2]+rp[,3]) #sdata<-round(sdata*500) darray<-c(1:2,4:6) ## available data= MZT, P-O, sibs, grandparent-grandchild, avuncular pair n<-array(1000,length(darray)) x<-array() for(i in 1:length(darray)){ x[i]<-rbinom(1,n[i],sdata[darray[i]]) } model<-c(1,1,0,0) ## remove # from the following lines to test examples. #iter.mcmc(ppt,2,15,2,1,1,darray,x,n,model) # provide a running test #iter.mcmc(ppt,2,2000,2,10,500,darray,x,n,model) # provide a proper result
### PLI=0.01. ppt<-0.01 ### a simple causal model with G and E components pg<-0.007 # the proportion of G component in total populations pgg<-0 # the proportion of G*G component in total populations pge<-0 # the proportion of G*E component in total populations e<-1-(1-ppt)/(1-pg) # the proportion of E component in total populations fd<-0.001 # one dominant gene tt<-3 # the number of recessive genes temp<-sqrt(1-((1-pg)/(1-fd)^2)^(1/tt)) fr<-c(array(0,length(fd)),array(temp,tt)) fd<-c(fd,array(0,tt)) rp<-drgegggne(fd,fr,c(0,0),c(0,0),c(0,0),c(0,0),0,e) sdata<-rp[,3]/(rp[,2]+rp[,3]) #sdata<-round(sdata*500) darray<-c(1:2,4:6) ## available data= MZT, P-O, sibs, grandparent-grandchild, avuncular pair n<-array(1000,length(darray)) x<-array() for(i in 1:length(darray)){ x[i]<-rbinom(1,n[i],sdata[darray[i]]) } model<-c(1,1,0,0) ## remove # from the following lines to test examples. #iter.mcmc(ppt,2,15,2,1,1,darray,x,n,model) # provide a running test #iter.mcmc(ppt,2,2000,2,10,500,darray,x,n,model) # provide a proper result
Compute p-values and likelihoods of all possible models for a given number of functional SNP(s).
lrt(n.fp, n, x, geno, no.con=nrow(geno))
lrt(n.fp, n, x, geno, no.con=nrow(geno))
n.fp |
number of functional SNPs for tests. |
n |
array of each total number of case sample chromosomes for SNPs |
x |
array of each total allele number in case samples |
geno |
matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent the alleles for each subject. Each allele shoud be represented as numbers (A=1,C=2,G=3,T=4). |
no.con |
number of control chromosomes. |
matrix of likelihood ratio test results. First n.fp rows indicate the model for each set of disease polymorphisms, and followed by p-values, -2 log(likelihood ratio) with corrections for variances, maximum likelihood ratio estimates, and likelihood.
L. Park, Identifying disease polymorphisms from case-control genetic association data, Genetica, 2010 138 (11-12), 1147-1159.
allele.freq hap.freq
## LRT tests when SNP1 & SNP6 are the functional polymorphisms. data(apoe) n<-c(2000, 2000, 2000, 2000, 2000, 2000, 2000) #case sample size = 1000 x<-c(1707, 281,1341, 435, 772, 416, 1797) #allele numbers in case samples Z<-2 #number of functional SNPs for tests n.poly<-ncol(apoe7)/2 #total number of SNPs #control sample generation( sample size = 1000 ) con.samp<-sample(nrow(apoe7),1000,replace=TRUE) con.data<-array() for (i in con.samp){ con.data<-rbind(con.data,apoe7[i,]) } con.data<-con.data[2:1001,] lrt(1,n,x,con.data) lrt(2,n,x,con.data)
## LRT tests when SNP1 & SNP6 are the functional polymorphisms. data(apoe) n<-c(2000, 2000, 2000, 2000, 2000, 2000, 2000) #case sample size = 1000 x<-c(1707, 281,1341, 435, 772, 416, 1797) #allele numbers in case samples Z<-2 #number of functional SNPs for tests n.poly<-ncol(apoe7)/2 #total number of SNPs #control sample generation( sample size = 1000 ) con.samp<-sample(nrow(apoe7),1000,replace=TRUE) con.data<-array() for (i in con.samp){ con.data<-rbind(con.data,apoe7[i,]) } con.data<-con.data[2:1001,] lrt(1,n,x,con.data) lrt(2,n,x,con.data)
Compute p-values and likelihoods of all possible models for a given number of disease SNP(s).
lrtG(n.fp, genoT, genoC)
lrtG(n.fp, genoT, genoC)
n.fp |
number of disease SNPs for tests. |
genoT |
matrix of control genotypes. Each row indicates a variant, and each column indicates a haplotype of an individual. Two alleles of 0 and 1 are allowed. |
genoC |
matrix of case genotypes. Each row indicates a variant, and each column indicates a haplotype of an individual. Two alleles of 0 and 1 are allowed. |
matrix of likelihood ratio test results. First row indicates the index, and following n.fp rows indicate the model for each set of disease polymorphisms, and followed by p-values, -2 log(likelihood ratio) with corrections for variances, and the degree of freedom.
L. Park, J. Kim, Rare high-impact disease variants: properties and identification, Genetics Research, 2016 Mar; 98, e6.
allele.freq.G
## LRT tests for a dominant variant (15th variant) ## the odds ratio: 3, control: 100, case: 100. data(apoeG) lrtG(1,genoT[,1:20],genoC[,1:20]) # use "lrtG(1,genoT,genoC)" for the actual test.
## LRT tests for a dominant variant (15th variant) ## the odds ratio: 3, control: 100, case: 100. data(apoeG) lrtG(1,genoT[,1:20],genoC[,1:20]) # use "lrtG(1,genoT,genoC)" for the actual test.