Title: | Causal Inference of QTL Networks |
---|---|
Description: | Functions to Simultaneously Infer Causal Graphs and Genetic Architecture. Includes acyclic and cyclic graphs for data from an experimental cross with a modest number (<10) of phenotypes driven by a few genetic loci (QTL). Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Annals of Applied Statistics 4: 320-339. <doi:10.1214/09-AOAS288>. |
Authors: | Elias Chaibub Neto <[email protected]> and Brian S. yandell <[email protected]> |
Maintainer: | Brian S. Yandell <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.5.4 |
Built: | 2024-12-07 06:58:41 UTC |
Source: | CRAN |
We generate synthetic data (sample size 300) according to a DAG composed by 100 nodes and 107 edges (exactly as in Figure 1). Each phenotype node is affected by three QTLs, and we allow only additive genetic effects. The QTLs for each phenotype are randomly selected among 200 markers, with 10 markers unevenly distributed on each of 20 autosomes. We allowed different phenotypes to potentially share common QTLs. For each phenotype, the regression coefficients with other phenotypes are chosen uniformly between 0.5 and 1; QTL effects are chosen between 0.2 to 0.6; and residual standard deviations are chosen from 0.1 to 0.5. For each realization we apply the QDG algorithm to infer causal directions for the edges of the skeleton obtained by the PC-skeleton algorithm.
data(acyclic)
data(acyclic)
For cyclic graphs, the output of the qdg function computes the log-likelihood up to the normalization constant (un-normalized log-likelihood). We can use the un-normalized log-likelihood to compare cyclic graphs with reversed directions (since they have the same normalization constant). However we cannot compare cyclic and acyclic graphs.
Chaibub Neto et al. (2008) Inferring causal phenotype networks from segregating populations. Genetics 179: 1089-1100.
sim.cross
,
sim.geno
,
sim.map
,
skeleton
,
qdg
,
graph.qdg
,
generate.qtl.pheno
## Not run: ## This reproduces Figure 1 exactly. set.seed(3456789) tmp <- options(warn=-1) acyclic.DG <- randomDAG(n = 100, prob = 2 / 99) options(tmp) ## Simulate cross object using R/qtl routines. n.ind <- 300 mymap <- sim.map(len=rep(100,20), n.mar=10, eq.spacing=FALSE, include.x=FALSE) mycross <- sim.cross(map=mymap, n.ind=n.ind, type="f2") summary(mycross) mycross <- sim.geno(mycross,n.draws=1) ## Produce 100 QTL at three markers apiece. acyclic.qtl <- generate.qtl.markers(cross=mycross,n.phe=100) ## Generate data from directed graph. bp <- runif(100,0.5,1) stdev <- runif(100,0.1,0.5) bq <- matrix(0,100,3) bq[,1] <- runif(100,0.2,0.4) bq[,2] <- bq[,1]+0.1 bq[,3] <- bq[,2]+0.1 ## Generate phenotypes. acyclic.data <- generate.qtl.pheno("acyclic", cross = mycross, bp = bp, bq = bq, stdev = stdev, allqtl = acyclic.qtl$allqtl) acyclic.qdg <- qdg(cross=acyclic.data, phenotype.names=paste("y",1:100,sep=""), marker.names=acyclic.qtl$markers, QTL=acyclic.qtl$allqtl, alpha=0.005, n.qdg.random.starts=1, skel.method="pcskel") save(acyclic.DG, acyclic.qtl, acyclic.data, acyclic.qdg, file = "acyclic.RData", compress = TRUE) data(acyclic) dims <- dim(acyclic.data$pheno) SuffStat <- list(C = cor(acyclic.data$pheno), n = dims[1]) pc <- skeleton(SuffStat, gaussCItest, p = dims[2], alpha = 0.005) summary(pc) summary(graph.qdg(acyclic.qdg)) gr <- graph.qdg(acyclic.qdg, include.qtl = FALSE) plot(gr) ## End(Not run)
## Not run: ## This reproduces Figure 1 exactly. set.seed(3456789) tmp <- options(warn=-1) acyclic.DG <- randomDAG(n = 100, prob = 2 / 99) options(tmp) ## Simulate cross object using R/qtl routines. n.ind <- 300 mymap <- sim.map(len=rep(100,20), n.mar=10, eq.spacing=FALSE, include.x=FALSE) mycross <- sim.cross(map=mymap, n.ind=n.ind, type="f2") summary(mycross) mycross <- sim.geno(mycross,n.draws=1) ## Produce 100 QTL at three markers apiece. acyclic.qtl <- generate.qtl.markers(cross=mycross,n.phe=100) ## Generate data from directed graph. bp <- runif(100,0.5,1) stdev <- runif(100,0.1,0.5) bq <- matrix(0,100,3) bq[,1] <- runif(100,0.2,0.4) bq[,2] <- bq[,1]+0.1 bq[,3] <- bq[,2]+0.1 ## Generate phenotypes. acyclic.data <- generate.qtl.pheno("acyclic", cross = mycross, bp = bp, bq = bq, stdev = stdev, allqtl = acyclic.qtl$allqtl) acyclic.qdg <- qdg(cross=acyclic.data, phenotype.names=paste("y",1:100,sep=""), marker.names=acyclic.qtl$markers, QTL=acyclic.qtl$allqtl, alpha=0.005, n.qdg.random.starts=1, skel.method="pcskel") save(acyclic.DG, acyclic.qtl, acyclic.data, acyclic.qdg, file = "acyclic.RData", compress = TRUE) data(acyclic) dims <- dim(acyclic.data$pheno) SuffStat <- list(C = cor(acyclic.data$pheno), n = dims[1]) pc <- skeleton(SuffStat, gaussCItest, p = dims[2], alpha = 0.005) summary(pc) summary(graph.qdg(acyclic.qdg)) gr <- graph.qdg(acyclic.qdg, include.qtl = FALSE) plot(gr) ## End(Not run)
Pre-compute BIC values for qtlnet sampling to speed up MCMC sampling.
bic.qtlnet(cross, pheno.col, threshold, addcov = NULL, intcov = NULL, max.parents = 3, parents, verbose = TRUE, ...) bic.join(cross, pheno.col, ..., max.parents = 3) data(Pscdbp.bic)
bic.qtlnet(cross, pheno.col, threshold, addcov = NULL, intcov = NULL, max.parents = 3, parents, verbose = TRUE, ...) bic.join(cross, pheno.col, ..., max.parents = 3) data(Pscdbp.bic)
cross |
Object of class |
pheno.col |
Phenotype identifiers from |
threshold |
Scalar or list of thresholds, one per each node. |
addcov |
Additive covariates for each phenotype ( |
intcov |
Interactive covariates, entered in the same manner as |
max.parents |
Maximum number of parents per node. This reduces the complexity of graphs and shortens run time. Probably best to consider values of 3-5. |
parents |
List containing all possible parents up to |
verbose |
Print iteration and number of models fit. |
... |
Additional arguments passed to internal routines. In the case of
|
The most expensive part of calculations is running
scanone
on each phenotype with parent phenotypes as
covariates. One strategy is to pre-compute the BIC contributions using a
cluster and save them for later use.
We divide the job into three steps: 1) determine parents and divide into reasonable sized groups; 2) compute BIC scores using scanone on a grid of computers; 3) compute multiple MCMC runs on a grid of computers. See the example for details.
Matrix with columns:
code |
Binary code as decimal for the parents of a phenotype
node, excluding the phenotype. Value is between 0 (no parents) and
|
pheno.col |
Phenotype column in reduced set, in range
|
bic |
BIC score for phenotype conditional on parents (and covariates). |
Brian S. Yandell and Elias Chaibub Neto
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://dx.doi.org/10.1214/09-AOAS288
pheno.col <- 1:13 max.parents <- 12 size.qtlnet(pheno.col, max.parents) ## Not run: ## Compute all phenotype/parent combinations. ## This shows how to break up into many smaller jobs. ######################################### ## STEP 1: Preparation. Fast. Needed in steps 2 and 3. pheno.col <- 1:13 max.parents <- 12 threshold <- 3.83 ## Load cross object. Here we use internal object. data(Pscdbp) ## or: load("Pscdbp.RData") cross <- Pscdbp ## or: cross <- read.cross("Pscdbp.csv", "csv") ## Break up into groups to run on several machines. ## ~53 groups of ~1000, for a total of 53248 scanone runs. parents <- parents.qtlnet(pheno.col, max.parents) groups <- group.qtlnet(parents = parents, group.size = 1000) ## Save all relevant objects for later steps. save(cross, pheno.col, max.parents, threshold, parents, groups, file = "Step1.RData", compress = TRUE) ######################################### ## STEP 2: Compute BIC scores. Parallelize. ## NB: Configuration of parallelization determined using Step 1 results. ## Load Step 1 computations. load("Step1.RData") ## Parallelize this: for(i in seq(nrow(groups))) { ## Pre-compute BIC scores for selected parents. bic <- bic.qtlnet(cross, pheno.col, threshold, max.parents = max.parents, parents = parents[seq(groups[i,1], groups[i,2])]) save(bic, file = paste("bic", i, ".RData", sep = ""), compress = TRUE) } ######################################### ## STEP 3: Sample Markov chain (MCMC). Parallelize. ## NB: n.runs sets the number of parallel runs. n.runs <- 100 ## Load Step 1 computations. load("Step1.RData") ## Read in saved BIC scores and combine into one object. bic.group <- list() for(i in seq(nrow(groups))) { load(paste("bic", i, ".RData", sep = "")) bic.group[[i]] <- bic } saved.scores <- bic.join(cross, pheno.col, bic.group) ## Parallelize this: for(i in seq(n.runs)) { ## Run MCMC with randomized initial network. mcmc <- mcmc.qtlnet(cross, pheno.col, threshold = threshold, max.parents = max.parents, saved.scores = saved.scores, init.edges = NULL) save(mcmc, file = paste("mcmc", i, ".RData", sep = ""), compress = TRUE) } ######################################### ## STEP 4: Combine results for post-processing. ## NB: n.runs needed from Step 3. n.runs <- 100 ## Combine outputs together. outs.qtlnet <- list() for(i in seq(n.runs)) { load(paste("mcmc", i, ".RData", sep = "")) outs.qtlnet[[i]] <- mcmc } out.qtlnet <- c.qtlnet(outs.qtlnet) summary(out.qtlnet) print(out.qtlnet) ## End of parallel example. ######################################### ## End(Not run) dim(Pscdbp.bic)
pheno.col <- 1:13 max.parents <- 12 size.qtlnet(pheno.col, max.parents) ## Not run: ## Compute all phenotype/parent combinations. ## This shows how to break up into many smaller jobs. ######################################### ## STEP 1: Preparation. Fast. Needed in steps 2 and 3. pheno.col <- 1:13 max.parents <- 12 threshold <- 3.83 ## Load cross object. Here we use internal object. data(Pscdbp) ## or: load("Pscdbp.RData") cross <- Pscdbp ## or: cross <- read.cross("Pscdbp.csv", "csv") ## Break up into groups to run on several machines. ## ~53 groups of ~1000, for a total of 53248 scanone runs. parents <- parents.qtlnet(pheno.col, max.parents) groups <- group.qtlnet(parents = parents, group.size = 1000) ## Save all relevant objects for later steps. save(cross, pheno.col, max.parents, threshold, parents, groups, file = "Step1.RData", compress = TRUE) ######################################### ## STEP 2: Compute BIC scores. Parallelize. ## NB: Configuration of parallelization determined using Step 1 results. ## Load Step 1 computations. load("Step1.RData") ## Parallelize this: for(i in seq(nrow(groups))) { ## Pre-compute BIC scores for selected parents. bic <- bic.qtlnet(cross, pheno.col, threshold, max.parents = max.parents, parents = parents[seq(groups[i,1], groups[i,2])]) save(bic, file = paste("bic", i, ".RData", sep = ""), compress = TRUE) } ######################################### ## STEP 3: Sample Markov chain (MCMC). Parallelize. ## NB: n.runs sets the number of parallel runs. n.runs <- 100 ## Load Step 1 computations. load("Step1.RData") ## Read in saved BIC scores and combine into one object. bic.group <- list() for(i in seq(nrow(groups))) { load(paste("bic", i, ".RData", sep = "")) bic.group[[i]] <- bic } saved.scores <- bic.join(cross, pheno.col, bic.group) ## Parallelize this: for(i in seq(n.runs)) { ## Run MCMC with randomized initial network. mcmc <- mcmc.qtlnet(cross, pheno.col, threshold = threshold, max.parents = max.parents, saved.scores = saved.scores, init.edges = NULL) save(mcmc, file = paste("mcmc", i, ".RData", sep = ""), compress = TRUE) } ######################################### ## STEP 4: Combine results for post-processing. ## NB: n.runs needed from Step 3. n.runs <- 100 ## Combine outputs together. outs.qtlnet <- list() for(i in seq(n.runs)) { load(paste("mcmc", i, ".RData", sep = "")) outs.qtlnet[[i]] <- mcmc } out.qtlnet <- c.qtlnet(outs.qtlnet) summary(out.qtlnet) print(out.qtlnet) ## End of parallel example. ######################################### ## End(Not run) dim(Pscdbp.bic)
We use a Gibbs sampling scheme to generate a data-set with 200 individuals (according with cyclic graph (a)). Each phenotype is affected by 3 QTLs. We fixed the regression coefficients at 0.5, error variances at 0.025 and the QTL effects at 0.2, 0.3 and 0.4 for the three F2 genotypes. We used a burn-in of 2000 for the Gibbs sampler.
data(cyclica)
data(cyclica)
For cyclic graphs, the output of the qdg function computes the log-likelihood up to the normalization constant (un-normalized log-likelihood). We can use the un-normalized log-likelihood to compare cyclic graphs with reversed directions (they have the same normalization constant). However we cannot compare cyclic and acyclic graphs.
Chaibub Neto et al. (2008) Inferring causal phenotype networks from segregating populations. Genetics 179: 1089-1100.
sim.cross
,
sim.geno
,
sim.map
,
skeleton
,
qdg
,
graph.qdg
,
generate.qtl.pheno
## Not run: bp <- matrix(0, 6, 6) bp[2,1] <- bp[4,2] <- bp[4,3] <- bp[5,4] <- bp[2,5] <- bp[6,5] <- 0.5 stdev <- rep(0.025, 6) ## Use R/qtl routines to simulate. set.seed(3456789) mymap <- sim.map(len = rep(100,20), n.mar = 10, eq.spacing = FALSE, include.x = FALSE) mycross <- sim.cross(map = mymap, n.ind = 200, type = "f2") mycross <- sim.geno(mycross, n.draws = 1) cyclica.qtl <- generate.qtl.markers(cross = mycross, n.phe = 6) mygeno <- pull.geno(mycross)[, unlist(cyclica.qtl$markers)] cyclica.data <- generate.qtl.pheno("cyclica", cross = mycross, burnin = 2000, bq = c(0.2,0.3,0.4), bp = bp, stdev = stdev, geno = mygeno) save(cyclica.qtl, cyclica.data, file = "cyclica.RData", compress = TRUE) data(cyclica) out <- qdg(cross=cyclica.data, phenotype.names=paste("y",1:6,sep=""), marker.names=cyclica.qtl$markers, QTL=cyclica.qtl$allqtl, alpha=0.005, n.qdg.random.starts=10, skel.method="pcskel") gr <- graph.qdg(out) gr plot(gr) ## End(Not run)
## Not run: bp <- matrix(0, 6, 6) bp[2,1] <- bp[4,2] <- bp[4,3] <- bp[5,4] <- bp[2,5] <- bp[6,5] <- 0.5 stdev <- rep(0.025, 6) ## Use R/qtl routines to simulate. set.seed(3456789) mymap <- sim.map(len = rep(100,20), n.mar = 10, eq.spacing = FALSE, include.x = FALSE) mycross <- sim.cross(map = mymap, n.ind = 200, type = "f2") mycross <- sim.geno(mycross, n.draws = 1) cyclica.qtl <- generate.qtl.markers(cross = mycross, n.phe = 6) mygeno <- pull.geno(mycross)[, unlist(cyclica.qtl$markers)] cyclica.data <- generate.qtl.pheno("cyclica", cross = mycross, burnin = 2000, bq = c(0.2,0.3,0.4), bp = bp, stdev = stdev, geno = mygeno) save(cyclica.qtl, cyclica.data, file = "cyclica.RData", compress = TRUE) data(cyclica) out <- qdg(cross=cyclica.data, phenotype.names=paste("y",1:6,sep=""), marker.names=cyclica.qtl$markers, QTL=cyclica.qtl$allqtl, alpha=0.005, n.qdg.random.starts=10, skel.method="pcskel") gr <- graph.qdg(out) gr plot(gr) ## End(Not run)
We use a Gibbs sampling scheme to generate a data-set with 200 individuals (according with cyclic graph (b)). Each phenotype is affected by 3 QTLs. We fixed the regression coefficients at 0.5, error variances at 0.025 and the QTL effects at 0.2, 0.3 and 0.4 for the three F2 genotypes. We used a burn-in of 2000 for the Gibbs sampler.
data(cyclicb)
data(cyclicb)
For cyclic graphs, the output of the qdg function computes the log-likelihood up to the normalization constant (un-normalized log-likelihood). We can use the un-normalized log-likelihood to compare cyclic graphs with reversed directions (since they have the same normalization constant). However we cannot compare cyclic and acyclic graphs.
Chaibub Neto et al. (2008) Inferring causal phenotype networks from segregating populations. Genetics 179: 1089-1100.
sim.cross
,
sim.geno
,
sim.map
,
skeleton
,
qdg
,
graph.qdg
,
generate.qtl.pheno
## Not run: bp <- matrix(0, 6, 6) bp[2,1] <- bp[1,5] <- bp[3,1] <- bp[4,2] <- bp[5,4] <- bp[5,6] <- bp[6,3] <- 0.5 stdev <- rep(0.025, 6) ## Use R/qtl routines to simulate. set.seed(3456789) mymap <- sim.map(len = rep(100,20), n.mar = 10, eq.spacing = FALSE, include.x = FALSE) mycross <- sim.cross(map = mymap, n.ind = 200, type = "f2") mycross <- sim.geno(mycross, n.draws = 1) cyclicb.qtl <- generate.qtl.markers(cross = mycross, n.phe = 6) mygeno <- pull.geno(mycross)[, unlist(cyclicb.qtl$markers)] cyclicb.data <- generate.qtl.pheno("cyclicb", cross = mycross, burnin = 2000, bq = c(0.2,0.3,0.4), bp = bp, stdev = stdev, geno = mygeno) save(cyclicb.qtl, cyclicb.data, file = "cyclicb.RData", compress = TRUE) data(cyclicb) out <- qdg(cross=cyclicb.data, phenotype.names=paste("y",1:6,sep=""), marker.names=cyclicb.qtl$markers, QTL=cyclicb.qtl$allqtl, alpha=0.005, n.qdg.random.starts=10, skel.method="pcskel") gr <- graph.qdg(out) gr plot(gr) ## End(Not run)
## Not run: bp <- matrix(0, 6, 6) bp[2,1] <- bp[1,5] <- bp[3,1] <- bp[4,2] <- bp[5,4] <- bp[5,6] <- bp[6,3] <- 0.5 stdev <- rep(0.025, 6) ## Use R/qtl routines to simulate. set.seed(3456789) mymap <- sim.map(len = rep(100,20), n.mar = 10, eq.spacing = FALSE, include.x = FALSE) mycross <- sim.cross(map = mymap, n.ind = 200, type = "f2") mycross <- sim.geno(mycross, n.draws = 1) cyclicb.qtl <- generate.qtl.markers(cross = mycross, n.phe = 6) mygeno <- pull.geno(mycross)[, unlist(cyclicb.qtl$markers)] cyclicb.data <- generate.qtl.pheno("cyclicb", cross = mycross, burnin = 2000, bq = c(0.2,0.3,0.4), bp = bp, stdev = stdev, geno = mygeno) save(cyclicb.qtl, cyclicb.data, file = "cyclicb.RData", compress = TRUE) data(cyclicb) out <- qdg(cross=cyclicb.data, phenotype.names=paste("y",1:6,sep=""), marker.names=cyclicb.qtl$markers, QTL=cyclicb.qtl$allqtl, alpha=0.005, n.qdg.random.starts=10, skel.method="pcskel") gr <- graph.qdg(out) gr plot(gr) ## End(Not run)
We use a Gibbs sampling scheme to generate a data-set with 200 individuals (according with cyclic graph (c)). Each phenotype is affected by 3 QTLs. We fixed the regression coefficients at 0.5, (except for beta[5,2]=0.8) error variances at 0.025 and the QTL effects at 0.2, 0.3 and 0.4 for the three F2 genotypes. We used a burn-in of 2000 for the Gibbs sampler. This example illustrates that even though our method cannot detect reciprocal interactions (e.g. between phenotypes 2 and 5 in cyclic graph (c)), it can still infer the stronger direction, that is, the direction corresponding to the higher regression coefficient. Since beta[5,2] is greater than beta[2,5], the QDG method should infer the direction from 2 to 5.
data(cyclicc)
data(cyclicc)
For cyclic graphs, the output of the qdg function computes the log-likelihood up to the normalization constant (un-normalized log-likelihood). We can use the un-normalized log-likelihood to compare cyclic graphs with reversed directions (since they have the same normalization constant). However we cannot compare cyclic and acyclic graphs.
Chaibub Neto et al. (2008) Inferring causal phenotype networks from segregating populations. Genetics 179: 1089-1100.
sim.cross
,
sim.geno
,
sim.map
,
skeleton
,
qdg
,
graph.qdg
,
generate.qtl.pheno
## Not run: bp <- matrix(0, 6, 6) bp[2,5] <- 0.5 bp[5,2] <- 0.8 bp[2,1] <- bp[3,2] <- bp[5,4] <- bp[6,5] <- 0.5 stdev <- rep(0.025, 6) ## Use R/qtl routines to simulate map and genotypes. set.seed(34567899) mymap <- sim.map(len = rep(100,20), n.mar = 10, eq.spacing = FALSE, include.x = FALSE) mycross <- sim.cross(map = mymap, n.ind = 200, type = "f2") mycross <- sim.geno(mycross, n.draws = 1) ## Use R/qdg routines to produce QTL sample and generate phenotypes. cyclicc.qtl <- generate.qtl.markers(cross = mycross, n.phe = 6) mygeno <- pull.geno(mycross)[, unlist(cyclicc.qtl$markers)] cyclicc.data <- generate.qtl.pheno("cyclicc", cross = mycross, burnin = 2000, bq = c(0.2,0.3,0.4), bp = bp, stdev = stdev, geno = mygeno) save(cyclicc.qtl, cyclicc.data, file = "cyclicc.RData", compress = TRUE) data(cyclicc) out <- qdg(cross=cyclicc.data, phenotype.names=paste("y",1:6,sep=""), marker.names=cyclicc.qtl$markers, QTL=cyclicc.qtl$allqtl, alpha=0.005, n.qdg.random.starts=1, skel.method="pcskel") gr <- graph.qdg(out) plot(gr) ## End(Not run)
## Not run: bp <- matrix(0, 6, 6) bp[2,5] <- 0.5 bp[5,2] <- 0.8 bp[2,1] <- bp[3,2] <- bp[5,4] <- bp[6,5] <- 0.5 stdev <- rep(0.025, 6) ## Use R/qtl routines to simulate map and genotypes. set.seed(34567899) mymap <- sim.map(len = rep(100,20), n.mar = 10, eq.spacing = FALSE, include.x = FALSE) mycross <- sim.cross(map = mymap, n.ind = 200, type = "f2") mycross <- sim.geno(mycross, n.draws = 1) ## Use R/qdg routines to produce QTL sample and generate phenotypes. cyclicc.qtl <- generate.qtl.markers(cross = mycross, n.phe = 6) mygeno <- pull.geno(mycross)[, unlist(cyclicc.qtl$markers)] cyclicc.data <- generate.qtl.pheno("cyclicc", cross = mycross, burnin = 2000, bq = c(0.2,0.3,0.4), bp = bp, stdev = stdev, geno = mygeno) save(cyclicc.qtl, cyclicc.data, file = "cyclicc.RData", compress = TRUE) data(cyclicc) out <- qdg(cross=cyclicc.data, phenotype.names=paste("y",1:6,sep=""), marker.names=cyclicc.qtl$markers, QTL=cyclicc.qtl$allqtl, alpha=0.005, n.qdg.random.starts=1, skel.method="pcskel") gr <- graph.qdg(out) plot(gr) ## End(Not run)
Various QTLnet diagnostic routines.
dist.qtlnet(qtlnet.object, min.prob = 0.9, method = "manhattan", cex = 5) edgematch.qtlnet(qtlnet.object, min.prob = 0.9, method = "manhattan", cex = 5) mds.qtlnet(qtlnet.object, min.prob = 0.9, method = "manhattan", cex = 5) plotbic.qtlnet(x, ..., smooth = TRUE)
dist.qtlnet(qtlnet.object, min.prob = 0.9, method = "manhattan", cex = 5) edgematch.qtlnet(qtlnet.object, min.prob = 0.9, method = "manhattan", cex = 5) mds.qtlnet(qtlnet.object, min.prob = 0.9, method = "manhattan", cex = 5) plotbic.qtlnet(x, ..., smooth = TRUE)
qtlnet.object , x
|
Object of class |
min.prob |
Minimum probability to include edge in network. |
method |
Distance method to be used between columns of connection
matrix. Used by |
cex |
Character expansion. (Only used for |
smooth |
Use |
... |
Additional unused arguments. |
List containing, for each phenotype in the network, a character vector
of the QTL names as chr@pos
, or pseudomarker name if
chr.pos
is FALSE
.
Brian S. Yandell and Elias Chaibub Neto
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://www.stat.wisc.edu/~yandell/doc/2010/92.AnnApplStat.pdf
loci.qtlnet(Pscdbp.qtlnet)
loci.qtlnet(Pscdbp.qtlnet)
Generate QTLs ane phenotype data for individual examples from cross. These are utility routines to illustrate the examples. They are not meant for users per se.
generate.qtl.markers(cross, n.phe, nqtl = 3) generate.qtl.pheno(name, cross, bp, bq, stdev, allqtl, burnin = 2000, geno)
generate.qtl.markers(cross, n.phe, nqtl = 3) generate.qtl.pheno(name, cross, bp, bq, stdev, allqtl, burnin = 2000, geno)
cross |
object of class |
name |
character string for example name |
bp |
vector or matrix of coefficients for dependencies between phenotypes; see cyclic and acyclic examples |
bq |
vector or matrix of coefficients for QTL effects on phenotypes; see cyclic and acyclic examples |
stdev |
vector of standard deviations per phenotype |
allqtl |
list of objects of class |
burnin |
number of burnin cycles for MCMC; default is 2000 |
geno |
genotypes at markers, typically extracted with
|
n.phe |
number of phenotypes |
nqtl |
number of QTL |
acyclic
,
cyclica
,
cyclicb
,
cyclicc
## Not run: example(acyclic) example(cyclica) example(cyclicb) example(cyclicc) ## End(Not run)
## Not run: example(acyclic) example(cyclica) example(cyclicb) example(cyclicc) ## End(Not run)
This is the Glx network reported in Chaibub Neto et al 2008 and in Ferrara et al 2008. Age was used as an additive covariate and we allowed for sex by genotype interaction. The network differs slightly from the published network due to improved code.
Chaibub Neto et al. 2008 Inferring causal phenotype networks from segregating populations. Genetics 179: 1089-1100.
Ferrara et al. 2008 Genetic networks of liver metabolism revealed by integration of metabolomic and transcriptomic profiling. PLoS Genetics 4: e1000034.
data(glxnet) glxnet.cross <- calc.genoprob(glxnet.cross) set.seed(1234) glxnet.cross <- sim.geno(glxnet.cross) n.node <- nphe(glxnet.cross) - 2 ## Last two are age and sex. markers <- glxnet.qtl <- vector("list", n.node) for(i in 1:n.node) { ac <- model.matrix(~ age + sex, glxnet.cross$pheno)[, -1] ss <- summary(scanone(glxnet.cross, pheno.col = i, addcovar = ac, intcovar = ac[,2]), threshold = 2.999) glxnet.qtl[[i]] <- makeqtl(glxnet.cross, chr = ss$chr, pos = ss$pos) markers[[i]] <- find.marker(glxnet.cross, chr = ss$chr, pos = ss$pos) } names(glxnet.qtl) <- names(markers) <- names(glxnet.cross$pheno)[seq(n.node)] glxnet.qdg <- qdg(cross=glxnet.cross, phenotype.names = names(glxnet.cross$pheno[,seq(n.node)]), marker.names = markers, QTL = glxnet.qtl, alpha = 0.05, n.qdg.random.starts=10, addcov="age", intcov="sex", skel.method="udgskel", udg.order=6) glxnet.qdg ## Not run: gr <- graph.qdg(glxnet.qdg) plot(gr) ## Or use tkplot(). glxnet.cross <- clean(glxnet.cross) save(glxnet.cross, glxnet.qdg, glxnet.qtl, file = "glxnet.RData", compress = TRUE) ## End(Not run)
data(glxnet) glxnet.cross <- calc.genoprob(glxnet.cross) set.seed(1234) glxnet.cross <- sim.geno(glxnet.cross) n.node <- nphe(glxnet.cross) - 2 ## Last two are age and sex. markers <- glxnet.qtl <- vector("list", n.node) for(i in 1:n.node) { ac <- model.matrix(~ age + sex, glxnet.cross$pheno)[, -1] ss <- summary(scanone(glxnet.cross, pheno.col = i, addcovar = ac, intcovar = ac[,2]), threshold = 2.999) glxnet.qtl[[i]] <- makeqtl(glxnet.cross, chr = ss$chr, pos = ss$pos) markers[[i]] <- find.marker(glxnet.cross, chr = ss$chr, pos = ss$pos) } names(glxnet.qtl) <- names(markers) <- names(glxnet.cross$pheno)[seq(n.node)] glxnet.qdg <- qdg(cross=glxnet.cross, phenotype.names = names(glxnet.cross$pheno[,seq(n.node)]), marker.names = markers, QTL = glxnet.qtl, alpha = 0.05, n.qdg.random.starts=10, addcov="age", intcov="sex", skel.method="udgskel", udg.order=6) glxnet.qdg ## Not run: gr <- graph.qdg(glxnet.qdg) plot(gr) ## Or use tkplot(). glxnet.cross <- clean(glxnet.cross) save(glxnet.cross, glxnet.qdg, glxnet.qtl, file = "glxnet.RData", compress = TRUE) ## End(Not run)
Plot inferred causal network using igraph package.
igraph.qtlnet(x, edges, loci.list, pheno.color = "green", qtl.color = "red", vertex.color, include.qtl = TRUE, ...) ## S3 method for class 'qtlnet' plot(x, ...)
igraph.qtlnet(x, edges, loci.list, pheno.color = "green", qtl.color = "red", vertex.color, include.qtl = TRUE, ...) ## S3 method for class 'qtlnet' plot(x, ...)
x |
Object of class |
edges |
Data frame with first two columns being |
loci.list |
List of character names of loci by phenotype. Typically determined by
call to |
pheno.color , qtl.color
|
Name of color to use for phenotypes and QTLs, respectively. |
vertex.color |
Vertex colors in order of pheno-pheno edged augmented by
|
include.qtl |
Include QTL in graph if |
... |
Additional arguments passed to called routines. |
Uses the igraph package to create graph objects. These can be exported
to a variety of other modern graphics packages. graph.qtlnet
is
synonymous with igraph.qtlnet
.
Object of class graph
created using
graph.data.frame
.
Brian S. Yandell and Elias Chaibub Neto
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://www.stat.wisc.edu/~yandell/doc/2010/92.AnnApplStat.pdf
summary.qtlnet
,
loci.qtlnet
,
graph.data.frame
,
tkplot
Pscdbp.graph <- igraph.qtlnet(Pscdbp.qtlnet) Pscdbp.graph ## Not run: tkplot(Pscdbp.graph) ## End(Not run)
Pscdbp.graph <- igraph.qtlnet(Pscdbp.qtlnet) Pscdbp.graph ## Not run: tkplot(Pscdbp.graph) ## End(Not run)
Determines QTL that affect each phenotype conditional on the model-averaged network and on covariates.
loci.qtlnet(qtlnet.object, chr.pos = TRUE, merge.qtl = 10, ...) est.qtlnet(qtlnet.object, ..., verbose = TRUE)
loci.qtlnet(qtlnet.object, chr.pos = TRUE, merge.qtl = 10, ...) est.qtlnet(qtlnet.object, ..., verbose = TRUE)
qtlnet.object |
Object of class |
chr.pos |
Include chromsome and position if |
merge.qtl |
Merge QTL within |
... |
Additional unused arguments. |
verbose |
verbose output if |
List containing, for each phenotype in the network, a character vector
of the QTL names as chr@pos
, or pseudomarker name if
chr.pos
is FALSE
.
Brian S. Yandell and Elias Chaibub Neto
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://www.stat.wisc.edu/~yandell/doc/2010/92.AnnApplStat.pdf
loci.qtlnet(Pscdbp.qtlnet)
loci.qtlnet(Pscdbp.qtlnet)
Use MCMC to alternatively sample genetic architecture and QTL network as directed acyclic graphs (DAGs).
mcmc.qtlnet(cross, pheno.col, threshold, addcov = NULL, intcov = NULL, nSamples = 1000, thinning = 1, max.parents = 3, M0 = NULL, burnin = 0.1, method = "hk", random.seed = NULL, init.edges = 0, saved.scores = NULL, rev.method = c("nbhd", "node.edge", "single"), verbose = FALSE, ...) init.qtlnet(pheno.col, max.parents, init.edges)
mcmc.qtlnet(cross, pheno.col, threshold, addcov = NULL, intcov = NULL, nSamples = 1000, thinning = 1, max.parents = 3, M0 = NULL, burnin = 0.1, method = "hk", random.seed = NULL, init.edges = 0, saved.scores = NULL, rev.method = c("nbhd", "node.edge", "single"), verbose = FALSE, ...) init.qtlnet(pheno.col, max.parents, init.edges)
cross |
Object of class |
pheno.col |
Phenotype identifiers from |
threshold |
Scalar or list of thresholds, one per each node. |
addcov |
Additive covariates for each phenotype ( |
intcov |
Interactive covariates, entered in the same manner as |
nSamples |
Number of samples to record. |
thinning |
Thinning rate. Number of MCMC samples is |
max.parents |
Maximum number of parents to a node. This reduces the complexity of graphs and shortens run time. Probably best to consider values of 3-5. |
M0 |
Matrix of 0s and 1s with initial directed graph of row->col if (row,col)
entry is 1. Cycles are forbidden (e.g. 1s on diagonal or symmetric 1s
across diagonal). Default (if |
burnin |
Proportion of MCMC samples to use as burnin. Default is 0.1 if burnin
is |
method |
Model fitting method for |
random.seed |
Initialization seed for random number generator. Must be |
init.edges |
Initial number of edges for |
saved.scores |
Updated scores, typically pre-computed by
|
rev.method |
Method to use for reversing edges. See details. |
verbose |
Print iteration and number of models fit. |
... |
Additional arguments. Advanced users may want to supply pre-computed
|
Models are coded compactly as (1)(2|1)(3|1,2,4,5)(4|2)(5|2)
. Each
parenthetical entry is a of form (node|parents)
; these each
require a model fit, for now with scanone
.
The scanone
routine is run on multiple
phenotypes in the network that could all have the same parents. For
instance, for 5 phenotypes, if (1|2,4)
is sampled, then do
scanone of this model as well as (3|2,4)
and (5|2,4)
.
Setting the hidden parameter scan.parents
to a value smaller
than length(pheno.col) - 1
(default) disallows multiple
trait scanning with more than that number of parents.
The saved.scores
parameter can greatly reduce MCMC run time,
by supplying pre-computed BIC scores. See
bic.qtlnet
. Another option is to capture
saved.scores
from a previous mcmc.qtlnet
run with the
same phenotypes (and covariates). Caution is advised as only a modest
amount of checking can be done.
The init.qtlnet
routine can be used to randomly find an initial
causal network M0
with up to init.edges
edges.
MCMC updates include delete, add or reverse edge direction. The early
version of this method only considered the edge on its own
(rev.method = "single"
), while the neighborhood method
(rev.method = "nbhd"
) uses the update
List of class qtlnet
post.model |
Model code (see details). |
post.bic |
Posterior BIC |
Mav |
Model average of |
freq.accept |
Frequency of acceptance M-H proposals. |
saved.scores |
Saved LOD score for each phenotype and all possible sets of the other phenotypes as parent nodes. |
all.bic |
|
cross |
The |
In addition, a number of attributes are recorded:
M0 |
Initial network matrix. |
threshold |
threshold list |
nSamples |
Number of samples saved |
thinning |
Thinning rate |
pheno.col |
Phenotype columns. |
pheno.names |
Phenotype names |
addcov |
Additive covariate columns. |
intcov |
Interactive covariate columns. |
burnin |
Burnin proportion |
method |
Method used for |
random.seed |
Initial random number generator seed. |
random.kind |
Random number generator kind from |
Brian S. Yandell and Elias Chaibub Neto
Chiabub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal graphical models in systems genetics: a unified framework for joint inference of causal network and genetic archicecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://dx.doi.org/10.1214/09-AOAS288
Grzegorczyk and Husmeier (2008) Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Mach Learn 71: 265-305. http://dx.doi.org/10.1007/s10994-008-5057-7
read.cross
, scanone
,
Random
,
bic.qtlnet
.
data(Pscdbp) ## Not run: ## Run of subset of traits. Still takes some time. Pscdbp.qtlnet <- mcmc.qtlnet(Pscdbp, pheno.col = c(1,2,4,5,6), threshold = 3.83, nSamples = 1000, thinning = 20, random.seed = 92387475, verbose = TRUE) save(Pscdbp.qtlnet, file = "Pscdbp.qtlnet.RData", compress = TRUE) ## End(Not run) data(Pscdbp.qtlnet) ## Not run: out.qtlnet <- mcmc.qtlnet(Pscdbp, pheno.col = 1:13, threshold = 3.83, nSamples = 1000, thinning = 20, random.seed = 92387475, verbose = TRUE, saved.scores = Pscdbp.bic) ## End(Not run)
data(Pscdbp) ## Not run: ## Run of subset of traits. Still takes some time. Pscdbp.qtlnet <- mcmc.qtlnet(Pscdbp, pheno.col = c(1,2,4,5,6), threshold = 3.83, nSamples = 1000, thinning = 20, random.seed = 92387475, verbose = TRUE) save(Pscdbp.qtlnet, file = "Pscdbp.qtlnet.RData", compress = TRUE) ## End(Not run) data(Pscdbp.qtlnet) ## Not run: out.qtlnet <- mcmc.qtlnet(Pscdbp, pheno.col = 1:13, threshold = 3.83, nSamples = 1000, thinning = 20, random.seed = 92387475, verbose = TRUE, saved.scores = Pscdbp.bic) ## End(Not run)
This routine calls one of four phases in a parallelized version of qtlnet.
parallel.qtlnet(phase, index = 1, ..., dirpath = ".")
parallel.qtlnet(phase, index = 1, ..., dirpath = ".")
phase |
Phase of parallelization as number 1 through 4. See details. |
index |
Index for phase. Used in phases 2 and 4, and for error codes saved
in |
... |
Additional arguments for phases. See details. |
dirpath |
Character string for directory were user can read and write files. When submitting to a cluster, this should remain the default. |
See http://www.stat.wisc.edu/~yandell/sysgen/qtlnet for details of
implementation in progress. The plan is to run qtlnet via Condor
(https://research.cs.wisc.edu/htcondor/) to scale up to larger networks, say up to
100 nodes. Most important information is passed in files. Phase 1
imports arguments from the params.txt
file, which must have
parse-able assignments to the arguments of
qtlnet:::qtlnet.phase1
. This first phase produces file
Phase1.RData
, which included objects used by all other phases.
Phase 1 also creates file groups.txt
, which for each line has
begin and end indices for the parents
that would result from a
call to parents.qtlnet
. Phase 2 should be run the same
number of times as the number of lines in file groups.txt
. Each
run produces a bicN.RData
file containing BIC computations. These
computations are aggregated in Phase 3 to create Phase3.RData
,
which contains the saved.scores
used for
mcmc.qtlnet
runs in Phase 4, which each produce an
mcmcN.RData
file. The number of runs of Phase 4
is an argument nruns
stored in the params.txt
file
processed in Phase 1. Finally, Phase 5 aggregates the MCMC results from
multiple independent runs into one qtlnet
object.
Brian S. Yandell and Elias Chaibub Neto
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://www.stat.wisc.edu/~yandell/doc/2010/92.AnnApplStat.pdf
http://www.stat.wisc.edu/~yandell/sysgen/qtlnet
## Not run: parallel.qtlnet("/u/y/a/yandell/public/html/sysgen/qtlnet/condor", 1) ## End(Not run)
## Not run: parallel.qtlnet("/u/y/a/yandell/public/html/sysgen/qtlnet/condor", 1) ## End(Not run)
Routines useful for examining the size of node-parent combinations.
parents.qtlnet(pheno.col, max.parents = 3, codes.only = FALSE) ## S3 method for class 'parents.qtlnet' summary(object, ...) size.qtlnet(pheno.col, max.parents = 3) group.qtlnet(pheno.col, max.parents = 3, n.groups = NULL, group.size = 50000, parents = parents.qtlnet(pheno.col, max.parents))
parents.qtlnet(pheno.col, max.parents = 3, codes.only = FALSE) ## S3 method for class 'parents.qtlnet' summary(object, ...) size.qtlnet(pheno.col, max.parents = 3) group.qtlnet(pheno.col, max.parents = 3, n.groups = NULL, group.size = 50000, parents = parents.qtlnet(pheno.col, max.parents))
pheno.col |
Phenotype identifiers from |
max.parents |
Maximum number of parents per node. This reduces the complexity of graphs and shortens run time. Probably best to consider values of 3-5. |
parents |
List containing all possible parents up to |
codes.only |
Return only codes of parents if |
n.groups |
Number of groups for parallel computation. Determined from
|
group.size |
Size of groups for parallel computation. See details. |
object |
Object of class |
... |
Additional arguments ignored. |
The most expensive part of calculations is running
scanone
on each phenotype with parent phenotypes as
covariates. One strategy is to pre-compute the BIC contributions using a
cluster and save them for later use. The parents.qtlnet
routine
creates a list of all possible parent sets (up to max.parents
in
size). The size.qtlnet
determines the number of
scanone
calculations possible for a network with
nodes pheno.col
and maximum parent size max.parents
. The
group.qtlnet
groups the parent sets into roughly equal size
groups for parallel computations. See bic.qtlnet
for
further details.
The size.qtlnet
returns the number of possible
scanone
computations needed for BIC scores.
The group.qtlnet
produces and index into the parents list
created by parents.qtlnet
. See details.
The parents.qtlnet
creates a list object with names being the
code
.
The summary
method for such an object is a data
frame with row.names being the code
, a binary code as decimal
for the parents of a phenotype node, excluding the phenotype. Value is
between 0 (no parents) and 2 ^ (length(pheno.col) - 1)
. The
columns are
parents |
Comma-separated string of parents to potential child node. |
n.child |
Number of possible child nodes to this parent set. |
Brian S. Yandell and Elias Chaibub Neto
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://dx.doi.org/10.1214/09-AOAS288
## Restrict to at most 3 parents per node. pheno.col <- 1:6 max.parents <- 3 size.qtlnet(pheno.col, max.parents) parents <- parents.qtlnet(pheno.col, max.parents) summary(parents) ## Allow an arbitrary number (up to 12) of parents per node. pheno.col <- 1:13 max.parents <- 12 size.qtlnet(pheno.col, max.parents) ## Make ~53 groups of ~1000, for a total of 53248 scanone runs. parents <- parents.qtlnet(pheno.col, max.parents) n.child <- summary(parents)$n.child table(n.child) groups <- group.qtlnet(parents = parents, group.size = 1000) apply(groups, 1, function(group, parents) sapply(parents[seq(group[1], group[2])], length), parents)
## Restrict to at most 3 parents per node. pheno.col <- 1:6 max.parents <- 3 size.qtlnet(pheno.col, max.parents) parents <- parents.qtlnet(pheno.col, max.parents) summary(parents) ## Allow an arbitrary number (up to 12) of parents per node. pheno.col <- 1:13 max.parents <- 12 size.qtlnet(pheno.col, max.parents) ## Make ~53 groups of ~1000, for a total of 53248 scanone runs. parents <- parents.qtlnet(pheno.col, max.parents) n.child <- summary(parents)$n.child table(n.child) groups <- group.qtlnet(parents = parents, group.size = 1000) apply(groups, 1, function(group, parents) sapply(parents[seq(group[1], group[2])], length), parents)
The R/qtl cross
object was created from data at source. The
qtlnet
object was created using mcmc.qtlnet
.
data(Pscdbp) data(Pscdbp.qtlnet)
data(Pscdbp) data(Pscdbp.qtlnet)
https://horvath.genetics.ucla.edu/coexpressionnetwork/
Ghazalpour A, Doss S, Zhang B, Wang S, Plaisier C, Castellanos R, Brozell A, Schadt EE, Drake TA, Lusis AJ, Horvath S (2006) Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS Genetics 2: e130-NA. http://dx.doi.org/10.1371/journal.pgen.0020130
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://dx.doi.org/10.1214/09-AOAS288
summary(Pscdbp) ## Not run: summary(Pscdbp.qtlnet) ## End(Not run)
summary(Pscdbp) ## Not run: summary(Pscdbp.qtlnet) ## End(Not run)
This function implements the QDG algorithm described in Chaibub Neto et al 2008. It creates and scores QDGs. The computed scores (log-likelihood and BIC) are only valid for acyclic graphs. For cyclic networks qdgSEM should be used to compute the scores.
qdg(cross, phenotype.names, marker.names, QTL, alpha, n.qdg.random.starts, addcov = NULL, intcov = NULL, skel.method = c("pcskel","udgskel"), udg.order = 2) graph.qdg(x, ...) ## S3 method for class 'qdg' print(x, ...) ## S3 method for class 'qdg' summary(object, ...)
qdg(cross, phenotype.names, marker.names, QTL, alpha, n.qdg.random.starts, addcov = NULL, intcov = NULL, skel.method = c("pcskel","udgskel"), udg.order = 2) graph.qdg(x, ...) ## S3 method for class 'qdg' print(x, ...) ## S3 method for class 'qdg' summary(object, ...)
cross |
object of class |
phenotype.names |
character string with names of phenotype nodes
corresponding to phenotypes in |
marker.names |
list of character strings, one for each of
|
QTL |
object of class |
alpha |
significance level threshold for PC or UDG algorithms (for the inference of the graph skeleton. See step 1 of the QDG algorithm). Must be between 0 and 1. |
n.qdg.random.starts |
number of random starts for the QDG algorithm (see step 3 of the QDG algorithm). |
addcov |
names of additive covariates. Must be valid phenotype names in
|
intcov |
names of additive covariates. Must be valid phenotype names in
|
skel.method |
Either "pcskel" for the PC skeleton algorithm
( |
udg.order |
maximum allowed order of the UDG algorithm. Must be between zero and the number of variables minus 2. |
x , object
|
object of class |
... |
additional arguments (ignored). |
The log-likelihood and BIC scores are computed based in the factorization of the joint distribution, and hence are only valid for acyclic networks. For cyclic networks these scores are relative to the unnormalized likelihoods. Models include phenotypes and QTLs. The 'udgskel' method for the computation of the skeleton of the causal model should be used for small networks only (the UDG algorithm quickly becomes computationally infeasible as the number of nodes increases).
List object that inherits class "qdg" and "qdg" with components:
UDG |
Undirected dependency graph from PC skeleton or UDG algorithms. |
DG |
Directed dependency graph before recheck step (output of the step 2 of the QDG algorithm). |
best.lm |
Solution with lowest BIC (best fit to the data). |
Solutions |
Solutions of dependency graph after recheck step (output of steps 3, 4 and 5 of the QDG algorithm.) |
marker.names |
List of character strings, one for each of
|
phenotype.names |
Character string with names of phenotype nodes
corresponding to phenotypes in |
Chaibub Neto et al. (2008) Inferring causal phenotype networks from segregating populations. Genetics 179: 1089-1100.
## simulate a genetic map (20 autosomes, 10 not equaly spaced markers per ## chromosome) mymap <- sim.map(len=rep(100,20), n.mar=10, eq.spacing=FALSE, include.x=FALSE) ## simulate an F2 cross object with n.ind (number of individuals) n.ind <- 200 mycross <- sim.cross(map=mymap, n.ind=n.ind, type="f2") ## produce multiple imputations of genotypes using the ## sim.geno function. The makeqtl function requires it, ## even though we are doing only one imputation (since ## we don't have missing data and we are using the ## genotypes in the markers, one imputation is enough) mycross <- sim.geno(mycross,n.draws=1) ## sample markers (2 per phenotype) genotypes <- pull.geno(mycross) geno.names <- dimnames(genotypes)[[2]] m1 <- sample(geno.names,2,replace=FALSE) m2 <- sample(geno.names,2,replace=FALSE) m3 <- sample(geno.names,2,replace=FALSE) m4 <- sample(geno.names,2,replace=FALSE) ## get marker genotypes g11 <- genotypes[,m1[1]]; g12 <- genotypes[,m1[2]] g21 <- genotypes[,m2[1]]; g22 <- genotypes[,m2[2]] g31 <- genotypes[,m3[1]]; g32 <- genotypes[,m3[2]] g41 <- genotypes[,m4[1]]; g42 <- genotypes[,m4[2]] ## generate phenotypes y1 <- runif(3,0.5,1)[g11] + runif(3,0.5,1)[g12] + rnorm(n.ind) y2 <- runif(3,0.5,1)[g21] + runif(3,0.5,1)[g22] + rnorm(n.ind) y3 <- runif(1,0.5,1) * y1 + runif(1,0.5,1) * y2 + runif(3,0.5,1)[g31] + runif(3,0.5,1)[g32] + rnorm(n.ind) y4 <- runif(1,0.5,1) * y3 + runif(3,0.5,1)[g41] + runif(3,0.5,1)[g42] + rnorm(n.ind) ## incorporate phenotypes to cross object mycross$pheno <- data.frame(y1,y2,y3,y4) ## create markers list markers <- list(m1,m2,m3,m4) names(markers) <- c("y1","y2","y3","y4") ## create qtl object allqtls <- list() m1.pos <- find.markerpos(mycross, m1) allqtls[[1]] <- makeqtl(mycross, chr = m1.pos[,"chr"], pos = m1.pos[,"pos"]) m2.pos <- find.markerpos(mycross, m2) allqtls[[2]] <- makeqtl(mycross, chr = m2.pos[,"chr"], pos = m2.pos[,"pos"]) m3.pos <- find.markerpos(mycross, m3) allqtls[[3]] <- makeqtl(mycross, chr = m3.pos[,"chr"], pos = m3.pos[,"pos"]) m4.pos <- find.markerpos(mycross, m4) allqtls[[4]] <- makeqtl(mycross, chr = m4.pos[,"chr"], pos = m4.pos[,"pos"]) names(allqtls) <- c("y1","y2","y3","y4") ## infer QDG out <- qdg(cross=mycross, phenotype.names = c("y1","y2","y3","y4"), marker.names = markers, QTL = allqtls, alpha = 0.005, n.qdg.random.starts=10, skel.method="pcskel") out ## Not run: gr <- graph.qdg(out) gr plot(gr) ## End(Not run)
## simulate a genetic map (20 autosomes, 10 not equaly spaced markers per ## chromosome) mymap <- sim.map(len=rep(100,20), n.mar=10, eq.spacing=FALSE, include.x=FALSE) ## simulate an F2 cross object with n.ind (number of individuals) n.ind <- 200 mycross <- sim.cross(map=mymap, n.ind=n.ind, type="f2") ## produce multiple imputations of genotypes using the ## sim.geno function. The makeqtl function requires it, ## even though we are doing only one imputation (since ## we don't have missing data and we are using the ## genotypes in the markers, one imputation is enough) mycross <- sim.geno(mycross,n.draws=1) ## sample markers (2 per phenotype) genotypes <- pull.geno(mycross) geno.names <- dimnames(genotypes)[[2]] m1 <- sample(geno.names,2,replace=FALSE) m2 <- sample(geno.names,2,replace=FALSE) m3 <- sample(geno.names,2,replace=FALSE) m4 <- sample(geno.names,2,replace=FALSE) ## get marker genotypes g11 <- genotypes[,m1[1]]; g12 <- genotypes[,m1[2]] g21 <- genotypes[,m2[1]]; g22 <- genotypes[,m2[2]] g31 <- genotypes[,m3[1]]; g32 <- genotypes[,m3[2]] g41 <- genotypes[,m4[1]]; g42 <- genotypes[,m4[2]] ## generate phenotypes y1 <- runif(3,0.5,1)[g11] + runif(3,0.5,1)[g12] + rnorm(n.ind) y2 <- runif(3,0.5,1)[g21] + runif(3,0.5,1)[g22] + rnorm(n.ind) y3 <- runif(1,0.5,1) * y1 + runif(1,0.5,1) * y2 + runif(3,0.5,1)[g31] + runif(3,0.5,1)[g32] + rnorm(n.ind) y4 <- runif(1,0.5,1) * y3 + runif(3,0.5,1)[g41] + runif(3,0.5,1)[g42] + rnorm(n.ind) ## incorporate phenotypes to cross object mycross$pheno <- data.frame(y1,y2,y3,y4) ## create markers list markers <- list(m1,m2,m3,m4) names(markers) <- c("y1","y2","y3","y4") ## create qtl object allqtls <- list() m1.pos <- find.markerpos(mycross, m1) allqtls[[1]] <- makeqtl(mycross, chr = m1.pos[,"chr"], pos = m1.pos[,"pos"]) m2.pos <- find.markerpos(mycross, m2) allqtls[[2]] <- makeqtl(mycross, chr = m2.pos[,"chr"], pos = m2.pos[,"pos"]) m3.pos <- find.markerpos(mycross, m3) allqtls[[3]] <- makeqtl(mycross, chr = m3.pos[,"chr"], pos = m3.pos[,"pos"]) m4.pos <- find.markerpos(mycross, m4) allqtls[[4]] <- makeqtl(mycross, chr = m4.pos[,"chr"], pos = m4.pos[,"pos"]) names(allqtls) <- c("y1","y2","y3","y4") ## infer QDG out <- qdg(cross=mycross, phenotype.names = c("y1","y2","y3","y4"), marker.names = markers, QTL = allqtls, alpha = 0.005, n.qdg.random.starts=10, skel.method="pcskel") out ## Not run: gr <- graph.qdg(out) gr plot(gr) ## End(Not run)
Conduct permutation test for LOD score of edge direction on directed graph.
qdg.perm.test(cross, nperm, node1, node2, common.cov = NULL, DG, QTLs, addcov = NULL, intcov = NULL) ## S3 method for class 'qdg.perm.test' summary(object, ...) ## S3 method for class 'qdg.perm.test' print(x, ...)
qdg.perm.test(cross, nperm, node1, node2, common.cov = NULL, DG, QTLs, addcov = NULL, intcov = NULL) ## S3 method for class 'qdg.perm.test' summary(object, ...) ## S3 method for class 'qdg.perm.test' print(x, ...)
cross |
Object of class |
nperm |
Number of permutations. |
node1 |
Character string with name of a phenotype nodes. |
node2 |
Character string with name of a phenotype nodes. |
common.cov |
Character string with name of common phenotype covariates. |
DG |
Directed graph of class |
QTLs |
List of objects of class |
addcov |
Names of additive covariates. Must be valid phenotype
names in |
intcov |
Names of additive covariates. Must be valid phenotype
names in |
x , object
|
Object of class |
... |
Additional arguments ignored. |
qdg.perm.test
performs nperm
permutation-based test
of LOD score for an
edge of a directed graph.
List composed by:
pvalue |
Permutation p-value. |
obs.lod |
Observed LOD score. |
PermSample |
Permutation LOD scores sample. |
node1 |
Character string with name of a phenotype nodes. |
node2 |
Character string with name of a phenotype nodes. |
Chaibub Neto et al. (2008) Inferring causal phenotype networks from segregating populations. Genetics 179: 1089-1100.
data(glxnet) glxnet.cross <- calc.genoprob(glxnet.cross) set.seed(1234) glxnet.cross <- sim.geno(glxnet.cross) ## Should really use nperm = 1000 here. qdg.perm.test(glxnet.cross, nperm = 10, "Glx", "Slc1a2", DG = glxnet.qdg$DG, QTLs = glxnet.qtl)
data(glxnet) glxnet.cross <- calc.genoprob(glxnet.cross) set.seed(1234) glxnet.cross <- sim.geno(glxnet.cross) ## Should really use nperm = 1000 here. qdg.perm.test(glxnet.cross, nperm = 10, "Glx", "Slc1a2", DG = glxnet.qdg$DG, QTLs = glxnet.qtl)
Score directed graphs (cyclic or acyclic) outputed by qdg function using the sem R package.
qdg.sem(qdgObject, cross) ## S3 method for class 'qdg.sem' print(x, ...) ## S3 method for class 'qdg.sem' summary(object, ...)
qdg.sem(qdgObject, cross) ## S3 method for class 'qdg.sem' print(x, ...) ## S3 method for class 'qdg.sem' summary(object, ...)
qdgObject |
list containing the output of |
cross |
object of class |
x , object
|
object of class |
... |
extra arguments to print or summary (ignored). |
Fits a SEM to the phenotypes network. QTLs are not included as variables in the model. When additive covariates are used in qdg, qdg.sem fits a SEM model to the residuals of the variables after adjustment of the additive covariates.
List object that inherits class "qdg.sem" and "qdg" composed by:
best.SEM |
Solution with lowest SEM BIC (best fit to the data). |
BIC.SEM |
Vector with the BIC values of all solutions from qdg. |
path.coeffs |
Path coefficients associated with the best SEM solution. |
Solutions |
Solutions of dependency graph after recheck step (output of steps 3, 4 and 5 of the QDG algorithm.) |
marker.names |
List of character strings, one for each of
|
phenotype.names |
Character string with names of phenotype nodes
corresponding to phenotypes in |
dropped |
Indexes of solutions that were dropped ( |
## simulate a genetic map (20 autosomes, 10 not equaly spaced markers per ## chromosome) mymap <- sim.map(len=rep(100,20), n.mar=10, eq.spacing=FALSE, include.x=FALSE) ## simulate an F2 cross object with n.ind (number of individuals) n.ind <- 200 mycross <- sim.cross(map=mymap, n.ind=n.ind, type="f2") ## produce multiple imputations of genotypes using the ## sim.geno function. The makeqtl function requires it, ## even though we are doing only one imputation (since ## we don't have missing data and we are using the ## genotypes in the markers, one imputation is enough) mycross <- sim.geno(mycross,n.draws=1) ## sample markers (2 per phenotype) genotypes <- pull.geno(mycross) geno.names <- dimnames(genotypes)[[2]] m1 <- sample(geno.names,2,replace=FALSE) m2 <- sample(geno.names,2,replace=FALSE) m3 <- sample(geno.names,2,replace=FALSE) m4 <- sample(geno.names,2,replace=FALSE) ## get marker genotypes g11 <- genotypes[,m1[1]]; g12 <- genotypes[,m1[2]] g21 <- genotypes[,m2[1]]; g22 <- genotypes[,m2[2]] g31 <- genotypes[,m3[1]]; g32 <- genotypes[,m3[2]] g41 <- genotypes[,m4[1]]; g42 <- genotypes[,m4[2]] ## generate phenotypes y1 <- runif(3,0.5,1)[g11] + runif(3,0.5,1)[g12] + rnorm(n.ind) y2 <- runif(3,0.5,1)[g21] + runif(3,0.5,1)[g22] + rnorm(n.ind) y3 <- runif(1,0.5,1) * y1 + runif(1,0.5,1) * y2 + runif(3,0.5,1)[g31] + runif(3,0.5,1)[g32] + rnorm(n.ind) y4 <- runif(1,0.5,1) * y3 + runif(3,0.5,1)[g41] + runif(3,0.5,1)[g42] + rnorm(n.ind) ## incorporate phenotypes to cross object mycross$pheno <- data.frame(y1,y2,y3,y4) ## create markers list markers <- list(m1,m2,m3,m4) names(markers) <- c("y1","y2","y3","y4") ## create qtl object allqtls <- list() m1.pos <- find.markerpos(mycross, m1) allqtls[[1]] <- makeqtl(mycross, chr = m1.pos[,"chr"], pos = m1.pos[,"pos"]) m2.pos <- find.markerpos(mycross, m2) allqtls[[2]] <- makeqtl(mycross, chr = m2.pos[,"chr"], pos = m2.pos[,"pos"]) m3.pos <- find.markerpos(mycross, m3) allqtls[[3]] <- makeqtl(mycross, chr = m3.pos[,"chr"], pos = m3.pos[,"pos"]) m4.pos <- find.markerpos(mycross, m4) allqtls[[4]] <- makeqtl(mycross, chr = m4.pos[,"chr"], pos = m4.pos[,"pos"]) names(allqtls) <- c("y1","y2","y3","y4") ## infer QDG out <- qdg(cross=mycross, phenotype.names = c("y1","y2","y3","y4"), marker.names = markers, QTL = allqtls, alpha = 0.005, n.qdg.random.starts=10, skel.method="pcskel") ## Not run: gr <- graph.qdg(out) plot(gr) ## Following does not work. Not sure why. out2 <- qdg.sem(out, cross=mycross) out2 gr2 <- graph.qdg(out2) plot(gr2) ## End(Not run)
## simulate a genetic map (20 autosomes, 10 not equaly spaced markers per ## chromosome) mymap <- sim.map(len=rep(100,20), n.mar=10, eq.spacing=FALSE, include.x=FALSE) ## simulate an F2 cross object with n.ind (number of individuals) n.ind <- 200 mycross <- sim.cross(map=mymap, n.ind=n.ind, type="f2") ## produce multiple imputations of genotypes using the ## sim.geno function. The makeqtl function requires it, ## even though we are doing only one imputation (since ## we don't have missing data and we are using the ## genotypes in the markers, one imputation is enough) mycross <- sim.geno(mycross,n.draws=1) ## sample markers (2 per phenotype) genotypes <- pull.geno(mycross) geno.names <- dimnames(genotypes)[[2]] m1 <- sample(geno.names,2,replace=FALSE) m2 <- sample(geno.names,2,replace=FALSE) m3 <- sample(geno.names,2,replace=FALSE) m4 <- sample(geno.names,2,replace=FALSE) ## get marker genotypes g11 <- genotypes[,m1[1]]; g12 <- genotypes[,m1[2]] g21 <- genotypes[,m2[1]]; g22 <- genotypes[,m2[2]] g31 <- genotypes[,m3[1]]; g32 <- genotypes[,m3[2]] g41 <- genotypes[,m4[1]]; g42 <- genotypes[,m4[2]] ## generate phenotypes y1 <- runif(3,0.5,1)[g11] + runif(3,0.5,1)[g12] + rnorm(n.ind) y2 <- runif(3,0.5,1)[g21] + runif(3,0.5,1)[g22] + rnorm(n.ind) y3 <- runif(1,0.5,1) * y1 + runif(1,0.5,1) * y2 + runif(3,0.5,1)[g31] + runif(3,0.5,1)[g32] + rnorm(n.ind) y4 <- runif(1,0.5,1) * y3 + runif(3,0.5,1)[g41] + runif(3,0.5,1)[g42] + rnorm(n.ind) ## incorporate phenotypes to cross object mycross$pheno <- data.frame(y1,y2,y3,y4) ## create markers list markers <- list(m1,m2,m3,m4) names(markers) <- c("y1","y2","y3","y4") ## create qtl object allqtls <- list() m1.pos <- find.markerpos(mycross, m1) allqtls[[1]] <- makeqtl(mycross, chr = m1.pos[,"chr"], pos = m1.pos[,"pos"]) m2.pos <- find.markerpos(mycross, m2) allqtls[[2]] <- makeqtl(mycross, chr = m2.pos[,"chr"], pos = m2.pos[,"pos"]) m3.pos <- find.markerpos(mycross, m3) allqtls[[3]] <- makeqtl(mycross, chr = m3.pos[,"chr"], pos = m3.pos[,"pos"]) m4.pos <- find.markerpos(mycross, m4) allqtls[[4]] <- makeqtl(mycross, chr = m4.pos[,"chr"], pos = m4.pos[,"pos"]) names(allqtls) <- c("y1","y2","y3","y4") ## infer QDG out <- qdg(cross=mycross, phenotype.names = c("y1","y2","y3","y4"), marker.names = markers, QTL = allqtls, alpha = 0.005, n.qdg.random.starts=10, skel.method="pcskel") ## Not run: gr <- graph.qdg(out) plot(gr) ## Following does not work. Not sure why. out2 <- qdg.sem(out, cross=mycross) out2 gr2 <- graph.qdg(out2) plot(gr2) ## End(Not run)
Multiple qtlnet objects can be catenated together or subsetted by run.
## S3 method for class 'qtlnet' subset(x, run, ...) ## S3 method for class 'qtlnet' c(...) best.qtlnet(x, burnin = attr(x, "burnin"), wh = which.min(meanbic(x, burnin)))
## S3 method for class 'qtlnet' subset(x, run, ...) ## S3 method for class 'qtlnet' c(...) best.qtlnet(x, burnin = attr(x, "burnin"), wh = which.min(meanbic(x, burnin)))
x |
Object of class |
run |
Numeric index to desired run. Must be between 0 and number of runs. |
burnin |
Proportion of MCMC samples to be considered as burnin. Taken from
|
wh |
Number identifying which model is best. |
... |
For |
The catenation is used by parallel.qtlnet
in phase 5 to
join together multiple independent MCMC runs. Note that the averaged
network and the frequency of acceptance for a derived subset are only
based on the saved samples, while the original qtlnet
objects
used all samples. Thus catenation and subset are not strictly reversible
functions.
The best.qtlnet
routine picks the run with the best (lowest) BIC
score on average and returns that run as a qtlnet object. It also
produces a trace plot of BIC for all the runs.
Both return an object of class qtlnet
.
Brian Yandell
## Not run: joined <- c(qtlnet1, qtlnet2) sub1 <- subset(joined, 1) best <- best.qtlnet(joined) ## qtlnet1 and sub1 should be nearly identical. ## End(Not run)
## Not run: joined <- c(qtlnet1, qtlnet2) sub1 <- subset(joined, 1) best <- best.qtlnet(joined) ## qtlnet1 and sub1 should be nearly identical. ## End(Not run)
Print and summary methods for qtlnet objects.
## S3 method for class 'qtlnet' print(x, cutoff = 0.01, digits = 3, ...) ## S3 method for class 'qtlnet' summary(object, parent.patterns = FALSE, ...) ## S3 method for class 'summary.qtlnet' print(x, ...) check.qtlnet(object, min.prob = 0.9, correct = TRUE, verbose = FALSE, ...)
## S3 method for class 'qtlnet' print(x, cutoff = 0.01, digits = 3, ...) ## S3 method for class 'qtlnet' summary(object, parent.patterns = FALSE, ...) ## S3 method for class 'summary.qtlnet' print(x, ...) check.qtlnet(object, min.prob = 0.9, correct = TRUE, verbose = FALSE, ...)
x , object
|
Object of class |
cutoff |
Frequency cutoff for model patterns to be displayed. Always shows at least the most common pattern. |
digits |
Number of digits to display for posterior probabilities on directed edges. |
parent.patterns |
Include summary of parent patterns if |
min.prob |
Set the minimum posterior probability for inclusion of an edge. |
correct |
Correct |
verbose |
Print forbidden edges in model-averaged solution if |
... |
Other hidden arguments. These include |
Brian S. Yandell and Elias Chaibub Neto
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://www.stat.wisc.edu/~yandell/doc/2010/92.AnnApplStat.pdf
data(Pscdbp.qtlnet) print(Pscdbp.qtlnet) summary(Pscdbp.qtlnet)
data(Pscdbp.qtlnet) print(Pscdbp.qtlnet) summary(Pscdbp.qtlnet)
Write resulting graph as text file
write.qtlnet(x, filename, edges, loci.list, include.qtl = TRUE, est.list, include.est = TRUE, digits = 3, col.names = TRUE, ...)
write.qtlnet(x, filename, edges, loci.list, include.qtl = TRUE, est.list, include.est = TRUE, digits = 3, col.names = TRUE, ...)
x |
Object of class |
filename |
Character string with name of text file (usually ends in |
edges |
Data frame with first two columns being |
loci.list |
List of character names of loci by phenotype. Typically determined by
call to |
include.qtl |
Include QTL in graph if |
est.list |
List of estimates from internal est.qtlnet? |
include.est |
Include estimate if |
digits |
Number of significant digits for |
col.names |
Character vector of column names. |
... |
Additional arguments passed to called routines. |
Simple write of causal network, for instance to use with Cytoscape.
Invisibly returns data frame that corresponds to saved file.
Brian S. Yandell and Elias Chaibub Neto
Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: 320-339. http://www.stat.wisc.edu/~yandell/doc/2010/92.AnnApplStat.pdf
## Not run: write.qtlnet(Pscdbp.qtlnet, "Pscdbp.txt") ## End(Not run)
## Not run: write.qtlnet(Pscdbp.qtlnet, "Pscdbp.txt") ## End(Not run)