Graphical Interaction Models

Graphical Interaction Models and the gRim package

Introduction

## Loading required package: gRain
## Loading required package: gRbase

The gRim package is an R package for gRaphical interaction models (hence the name). gRim implements 1) graphical log–linear models for discrete data, that is for contingency tables and 2) Gaussian graphical models for continuous data (multivariate normal data) and 3) mixed homogeneous interaction models for mixed data (data consisiting of both discrete and continuous variables).

Introductory examples

The main functions for creating models of the various types are:

  • Discrete data: The dmod() function creates a hierarchical log–linear model.
  • Continuous data: The cmod() function creates a Gaussian graphical model.
  • Mixed data: The mmod() function creates a mixed interaction model.

The arguments to the model functions are:

args(dmod)
## function (formula, data, marginal = NULL, interactions = NULL, 
##     fit = TRUE, details = 0, ...) 
## NULL
args(cmod)
## function (formula, data, marginal = NULL, fit = TRUE, maximal_only = FALSE, 
##     details = 0) 
## NULL
args(mmod)
## function (formula, data, marginal = NULL, fit = TRUE, details = 0) 
## NULL

The model objects created by these functions are of the respective classes dModel, cModel and mModel and they are also of the class iModel. We focus the presentation on models for discrete data, but most of the topics we discuss apply to all types of models.

A Discrete Model

The reinis data from  is a 26 contingency table.

data(reinis)
str(reinis)
##  'table' num [1:2, 1:2, 1:2, 1:2, 1:2, 1:2] 44 40 112 67 129 145 12 23 35 12 ...
##  - attr(*, "dimnames")=List of 6
##   ..$ smoke  : chr [1:2] "y" "n"
##   ..$ mental : chr [1:2] "y" "n"
##   ..$ phys   : chr [1:2] "y" "n"
##   ..$ systol : chr [1:2] "y" "n"
##   ..$ protein: chr [1:2] "y" "n"
##   ..$ family : chr [1:2] "y" "n"

Models are specified as generating classes. A generating class can be a list or a right–hand–sided formula. In addition, various model specification shortcuts are available.
The following two specifications of a log–linear model are equivalent:

data(reinis)
dm1 <- dmod(list(c("smoke", "systol"), c("smoke", "mental", "phys")), data=reinis)
dm1 <- dmod(~smoke:systol + smoke:mental:phys, data=reinis)
dm1 
## Model: A dModel with 4 variables
##  -2logL    :        9391.38 mdim :    9 aic :      9409.38 
##  ideviance :         730.47 idf  :    5 bic :      9459.05 
##  deviance  :           3.80 df   :    6

The output reads as follows: -2logL is minus twice the maximized log–likelihood and mdim is the number of parameters in the model (no adjustments have been made for sparsity of data). The ideviance and idf gives the deviance and degrees of freedom between the model and the independence model for the same variables and deviance and df is the deviance and degrees of freedom between the model and the saturated model for the same variables.

Notice that the generating class does not appear directly but can be retrieved using formula() and terms():

formula(dm1)
## ~smoke * systol + smoke * mental * phys
terms(dm1)
## [[1]]
## [1] "smoke"  "systol"
## 
## [[2]]
## [1] "smoke"  "mental" "phys"

Model specification shortcuts

Below we illustrate various other ways of specifying log–linear models.

The following models illustrate these abbreviations:

dm2 <- dmod(~.^2, margin=c("smo","men","phy","sys"),
            data=reinis)
formula(dm2)
## ~smoke * mental + smoke * phys + smoke * systol + mental * phys + 
##     mental * systol + phys * systol
dm3 <- dmod(list(c("smoke", "systol"), c("smoke", "mental", "phys")),
            data=reinis, interactions=2)
formula(dm3)
## ~smoke * systol + smoke * mental + smoke * phys + mental * phys

Plotting models

plot(dm1)

A Continuous Model

For Gaussian models there are at most second order interactions. Hence we may specify the saturated model in different ways:

data(carcass)
cm1 <- cmod(~Fat11:Fat12:Fat13, data=carcass)
cm1 <- cmod(~Fat11:Fat12 + Fat12:Fat13 + Fat11:Fat13, data=carcass)
cm1
## Model: A cModel with 3 variables
##  -2logL    :        2432.47 mdim :    6 aic :      2444.47 
##  ideviance :         886.10 idf  :    3 bic :      2467.51 
##  deviance  :          -0.00 df   :    0

plot(cm1)

A Mixed Model

data(milkcomp1)
mm1 <- mmod(~.^., data=milkcomp1)
## 'as(<ddiMatrix>, "dgCMatrix")' is deprecated.
## Use 'as(as(., "generalMatrix"), "CsparseMatrix")' instead.
## See help("Deprecated") and help("Matrix-deprecated").
mm1
## Model: A mModel with 5 variables
##  -2logL    :         475.92 mdim :   44 aic :       563.92 
##  ideviance :         101.57 idf  :   31 bic :       652.24 
##  deviance  :          -0.00 df   :    0
plot(mm1) ## FIXME: should use different colours for disc and cont variables.

Model editing - update()

The update() function enables  objects to be modified by the addition or deletion of interaction terms or edges, using the arguments aterm(), dterm(), aedge() or dedge(). Some examples follow:

###  Set a marginal saturated model:
ms <- dmod(~.^., marginal=c("phys","mental","systol","family"), data=reinis)
formula(ms)
## ~phys * mental * systol * family
###   Delete one edge:
ms1 <- update(ms, list(dedge=~phys:mental))
formula(ms1)
## ~phys * systol * family + mental * systol * family
###   Delete two edges:
ms2<- update(ms, list(dedge=~phys:mental+systol:family))
formula(ms2)
## ~phys * systol + phys * family + mental * systol + mental * family
###   Delete all edges in a set:
ms3 <- update(ms, list(dedge=~phys:mental:systol))
formula(ms3)
## ~systol * family + mental * family + phys * family
### Delete an interaction term
ms4 <- update(ms, list(dterm=~phys:mental:systol) )
formula(ms4)
## ~phys * mental * family + phys * systol * family + mental * systol * 
##     family
###  Set a marginal independence model:
m0 <- dmod(~.^1, marginal=c("phys","mental","systol","family"), data=reinis)
formula(m0)
## ~phys + mental + systol + family
###  Add three interaction terms:
ms5 <- update(m0, list(aterm=~phys:mental+phys:systol+mental:systol) )
formula(ms5)
## ~mental * systol + phys * systol + phys * mental + family
###  Add two edges:
ms6 <- update(m0, list(aedge=~phys:mental+systol:family))
formula(ms6)
## ~mental * phys + family * systol

A brief explanation of these operations may be helpful. To obtain a hierarchical model when we delete a term from a model, we must delete any higher-order relatives to the term. Similarly, when we add an interaction term we must also add all lower-order relatives that were not already present. Deletion of an edge is equivalent to deleting the corresponding two-factor term. Let m − e be the result of deleting edge e from a model m. Then the result of adding e is defined as the maximal model m* for which m* − e = m.

Testing for conditional independence - ciTest()

Tests of general conditional independence hypotheses of the form u ⊥ v|W can be performed using the ciTest(). function.

cit <- ciTest(reinis, set=c("systol", "smoke", "family", "phys"))
cit 
## Testing systol _|_ smoke | family phys 
## Statistic (DEV):   13.045 df: 4 p-value: 0.0111 method: CHISQ
## Slice information:
##   statistic  p.value df family phys
## 1  4.734420 0.029565  1      y    y
## 2  0.003456 0.953121  1      n    y
## 3  7.314160 0.006841  1      y    n
## 4  0.993337 0.318928  1      n    n

The general syntax of the set argument is of the form (u, v, W) where u and v are variables and W is a set of variables. The set argument can also be given as a right–hand sided formula.

In model terms, the test performed by corresponds to the test for removing the edge {u, v} from the saturated model with variables {u, v} ∪ W. If we (conceptually) form a factor S by crossing the factors in W, we see that the test can be formulated as a test of the conditional independence u ⊥ v|S in a three way table. The deviance decomposes into independent contributions from each stratum:

where the contribution Ds from the sth slice is the deviance for the independence model of u and v in that slice. For example,

cit$slice
##     statistic     p.value df family phys
## 1 4.734420192 0.029564812  1      y    y
## 2 0.003456089 0.953120558  1      n    y
## 3 7.314159809 0.006841338  1      y    n
## 4 0.993336675 0.318928227  1      n    n

The sth slice is a |u| × |v| table {nijs}i = 1…|u|, j = 1…|v|. The number of degrees of freedom corresponding to the test for independence in this slice is where ni ⋅ s and njs are the marginal totals.

So the correct number of degrees of freedom for the test in the present example is 3, as calculated by ciTtest() and testdelete().

An alternative to the asymptotic χ2 test is to determine the reference distribution using Monte Carlo methods. The marginal totals are sufficient statistics under the null hypothesis, and in a conditional test the test statistic is evaluated in the conditional distribution given the sufficient statistics. Hence one can generate all possible tables with those given margins, calculate the desired test statistic for each of these tables and then see how extreme the observed test statistic is relative to those of the calculated tables. A Monte Carlo approximation to this procedure is to randomly generate large number of tables with the given margins, evaluate the statistic for each simulated table and then see how extreme the observed test statistic is in this distribution. This is called a Monte Carlo exact test and it provides a :

ciTest(reinis, set=c("systol","smoke","family","phys"), method='MC')
## Testing systol _|_ smoke | family phys 
## Statistic (DEV):   13.045 df: NA p-value: 0.0135 method: MC
## Slice information:
##   statistic n.extreme p.value df family phys
## 1  4.734420        59  0.0295 NA      y    y
## 2  0.003456      1706  0.8530 NA      n    y
## 3  7.314160        16  0.0080 NA      y    n
## 4  0.993337       567  0.2835 NA      n    n

Fundamental methods for inference

This section describes some fundamental methods for inference in . As basis for the description consider the following model shown in Fig. @ref(fig:fundamentalfig1):

dm5 <- dmod(~ment:phys:systol + ment:systol:family + phys:systol:smoke,
            data=reinis)
Model for reinis data.

Model for reinis data.

Testing for addition and deletion of edges

Let $\cal M_0$ be a model and let e = {u, v} be an edge in $\cal M_0$. The candidate model formed by deleting e from $\cal M_0$ is $\cal M_1$. The testdelete() function can be used to test for deletion of an edge from a model:

testdelete(dm5, ~smoke:systol)
## dev:   11.698 df:  2 p.value: 0.00288 AIC(k=2.0):    7.7 edge: smoke:systol 
## host:  phys systol smoke 
## Notice: Test performed in saturated marginal model
testdelete(dm5, ~family:systol)
## dev:    1.085 df:  2 p.value: 0.58135 AIC(k=2.0):   -2.9 edge: family:systol 
## host:  mental systol family 
## Notice: Test performed in saturated marginal model

In the first case the p–value suggests that the edge can not be deleted. In the second case the p–value suggests that the edge can be deleted. The reported AIC value is the difference in AIC between the candidate model and the original model. A negative value of AIC suggest that the candidate model is to be preferred.

Next, let $\cal M_0$ be a model and let e = {u, v} be an edge not in $\cal M_0$. The candidate model formed by adding e to $\cal M_0$ is denoted $\cal M_1$. The testadd() function can be used to test for deletion of an edge from a model:

testadd(dm5, ~smoke:mental)
## dev:    7.797 df:  4 p.value: 0.09930 AIC(k=2.0):   -0.2 edge: smoke:mental 
## host:  mental phys smoke systol 
## Notice: Test performed in saturated marginal model

The p–value suggests that no significant improvedment of the model is obtained by adding the edge. The reported AIC value is the difference in AIC between the candidate model and the original model. A negative value of AIC would have suggested that the candidate model is to be preferred.

Finding edges

The getInEdges() function will return a list of all the edges in the dependency graph $\cal G$ defined by the model. If we set type='decomposable' then the edges returned are as follows: An edge e = {u, v} is returned if $\cal G$ minus the edge e is decomposable. In connection with model selection this is convenient because it is thereby possibly to restrict the search to decomposable models.

ed.in <- getInEdges(ugList(terms(dm5)), type="decomposable")

The getOutEdges() function will return a list of all the edges which are not in the dependency graph $\cal G$ defined by the model. If we set type='decomposable' then the edges returned are as follows: An edge e = {u, v} is returned if $\cal G$ plus the edge e is decomposable. In connection with model selection this is convenient because it is thereby possibly to restrict the search to decomposable models.

ed.out <- getOutEdges(ugList(terms(dm5)), type="decomposable")

Testing several edges

args(testInEdges)
## function (object, edgeMAT = NULL, criterion = "aic", k = 2, alpha = NULL, 
##     headlong = FALSE, details = 1, ...) 
## NULL
args(testOutEdges)
## function (object, edgeMAT = NULL, criterion = "aic", k = 2, alpha = NULL, 
##     headlong = FALSE, details = 1, ...) 
## NULL

The functions labelInEdges()} and \code{labelOutEdges() will test for deletion of edges and addition of edges. The default is to use AIC for evaluating each edge. It is possible to specify the penalty parameter for AIC to being other values than 2 and it is possible to base the evaluation on significance tests instead of AIC. Setting headlong=TRUE causes the function to exit once an improvement is found. For example:

testInEdges(dm5, getInEdges(ugList(terms(dm5)), type="decomposable"),
             k=log(sum(reinis)))
##    statistic df      p.value        aic     V1     V2 action
## 1 686.702943  2 0.000000e+00 671.666814 mental   phys      -
## 2   4.692559  2 9.572463e-02 -10.343569 mental family      +
## 3  28.146937  2 7.726277e-07  13.110809   phys  smoke      -
## 4   1.084805  2 5.813498e-01 -13.951323 systol family      +
## 5  11.698229  2 2.882450e-03  -3.337899 systol  smoke      +

Stepwise model selection

Two functions are currently available for model selection: backward() and forward(). These functions employ the functions in Section @ref(sec:labeledges))

Fixing edges/terms in model as part of model selection

The stepwise model selection can be controlled by fixing specific edges. For example we can specify edges which are not to be considered in a bacward selection:

fix <- list(c("smoke","phys","systol"), c("systol","protein"))
fix <- do.call(rbind, unlist(lapply(fix, names2pairs),recursive=FALSE))
fix
##      [,1]      [,2]    
## [1,] "phys"    "smoke" 
## [2,] "smoke"   "systol"
## [3,] "phys"    "systol"
## [4,] "protein" "systol"
dm.s3 <- backward(dm.sat, fixin=fix, details=1)
##   change.AIC  -19.7744 Edge deleted: mental,systol
##   change.AIC   -4.6982 Edge deleted: systol,family
##   change.AIC   -6.8301 Edge deleted: protein,family
##   change.AIC   -1.2294 Edge deleted: mental,protein
##   change.AIC   -0.9819 Edge deleted: phys,family
##   change.AIC   -1.3419 Edge deleted: smoke,family

There is an important detail here: The matrix fix specifies a set of edges. Submitting these in a call to does not mean that these edges are forced to be in the model. It means that those edges in fixin which are in the model will not be removed.

Likewise in forward selection:

dm.i3 <- forward(dm.i, fixout=fix, details=1)
##   change.AIC  683.9717 Edge added: mental,phys
##   change.AIC   15.9293 Edge added: mental,protein
##   change.AIC   15.4003 Edge added: smoke,protein
##   change.AIC    8.6638 Edge added: smoke,mental
##   change.AIC    2.7316 Edge added: mental,family
##   change.AIC    1.1727 Edge added: phys,protein

Edges in fix will not be added to the model but if they are in the starting model already, they will remain in the final model.

Further topics on models for contingency tables

Sparse Contingency Tables

data(mildew)
dm1 <- dmod(~.^., data=mildew)
dm1
## Model: A dModel with 6 variables
##  -2logL    :         365.75 mdim :   63 aic :       491.75 
##  ideviance :         209.72 idf  :   57 bic :       633.41 
##  deviance  :           0.00 df   :    0 
## Notice: Table is sparse
##   Asymptotic chi2 distribution may be questionable.
##   Degrees of freedom can not be trusted.
##   Model dimension adjusted for sparsity : 21
dm2 <- stepwise(dm1)
dm2
## Model: A dModel with 6 variables
##  -2logL    :         383.01 mdim :   11 aic :       405.01 
##  ideviance :         192.46 idf  :    5 bic :       429.74 
##  deviance  :          17.26 df   :   52 
## Notice: Table is sparse
##   Asymptotic chi2 distribution may be questionable.
##   Degrees of freedom can not be trusted.
##   Model dimension adjusted for sparsity : 10
plot(dm2)

Dimension of a log–linear model

{#sec:dimloglin}

The dim_loglin() is a general function for finding the dimension of a log–linear model. It works on the generating class of a model being represented as a list. For a decomposable model it is possible to calculate and adjusted dimension which accounts for sparsity of data with dim_loglin_decomp():

ff <- ~la10:locc:mp58:c365+mp58:c365:p53a:a367
mm <- dmod(ff, data=mildew)
plot(mm)

dim_loglin(terms(mm), mildew)
## [1] 27
dim_loglin_decomp(terms(mm), mildew)
## [1] 19

A space–efficient implementation of IPS for contingency tables

The IPS algorithm for hierarchical log–linear models is inefficient in the sense that it requires the entire table to be fitted. For example, if there are 81 variables each with 10 levels then a table with 1081 will need to be created. (Incidently, 1081 is one of the figures reported as the number of atoms in the universe. It is a large number!).

Consider a hierarchical log–linear model with generating class $\cal A = \{a_1, \dots, a_M\}$ over a set of variables Δ. The Iterative Proportional Scaling (IPS) algorithm (as described e.g. in @lauritzen:96, p. 83) as a commonly used method for fitting such models. The updating steps are of the form

The IPS algorithm is implemented in the loglin() function.

A more efficient IPS algorithm is described by @jirousek:preucil:95, and this is implemented in the effloglin() function. The implementation of effloglin() is made entirely in R and therefore the word efficient should be understood in terms of space requirement (for small problems, loglin() is much faster than effloglin()).

The algorithm goes as follows: It is assumed that $\cal A$ is minimally specified, i.e. that no element in $\cal A$ is contained in another element in $\cal A$. Form the dependency graph $\cal G(\cal A)$ induced by $\cal A$. Let $\cal G'$ denoted a triangulation of $\cal G(\cal A)$ and let $\cal C=\{C_1,\dots,C_N\}$ denote the cliques of $\cal G'$. Each $a\in \cal A$ is then contained in exactly one clique $C\in \cal C$. Let $\cal A_C=\{a\in \cal A:a\subset C\}$ so that $\cal A_{C_1}, \dots, \cal A_{C_N}$ is a disjoint partitioning of $\cal A$.

Any probability p satisfying the constraints of $\cal A$ will also factorize according to $\cal G'$ so that

Using e.g. the computation architecture of @lauritzen:spiegelhalter:88 the clique marginals

can be obtained from (@ref(eq:effloglin1)). In practice calculation of (@ref(eq:effloglin2)) is done using the gRrain package. For $C\in \cal C$ and an $a \in \cal A_C$ update ψC in (@ref(eq:effloglin1)) as

where pa is obtained by summing over variables in C \ a in pC from (@ref(eq:effloglin2)). Then find the new clique marginals in (@ref(eq:effloglin2)), move on to the next a in $\cal A_{C}$ and so on.

As an example, consider 4–cycle model for reinis data:

data(reinis)
ff    <- ~smoke:mental+mental:phys+phys:systol+systol:smoke
dmod(ff, data=reinis)
## Model: A dModel with 4 variables
##  -2logL    :        9414.99 mdim :    8 aic :      9430.99 
##  ideviance :         706.87 idf  :    4 bic :      9475.13 
##  deviance  :          27.40 df   :    7

This model can be fitted with loglin() as

glist <- rhsFormula2list(ff)
glist
## [[1]]
## [1] "smoke"  "mental"
## 
## [[2]]
## [1] "mental" "phys"  
## 
## [[3]]
## [1] "phys"   "systol"
## 
## [[4]]
## [1] "systol" "smoke"
fv1 <- loglin(reinis, glist, print=FALSE)
fv1[1:3]
## $lrt
## [1] 1233.054
## 
## $pearson
## [1] 1198.018
## 
## $df
## [1] 55

An alternative is effloglin() which uses the algorithm above on a triangulated graph:

fv2 <- effloglin(reinis, glist, print=FALSE)
## 103
## loop over margins. logL: 28.763350644279 
## 103
## loop over margins. logL: 371.795118911387 
## 103
## loop over margins. logL: 391.272776485601 
## 103
## loop over margins. logL: 396.832642464605 
## 103
## loop over margins. logL: 396.832741133973 
## 103
## loop over margins. logL: 396.832742073716 
## 103
## loop over margins. logL: 396.842881530033 
## 103
## loop over margins. logL: 396.842900258297 
## 103
fv2[c('logL','nparm','df')]
## $logL
## [1] 396.8429
## 
## $nparm
## [1] 8
## 
## $df
## [1] 7

The real virtue of effloglin() lies in that it is possible to submit data as a list of sufficient marginals:

stab <- lapply(glist, function(gg) tableMargin(reinis, gg))
fv3 <- effloglin(stab, glist, print=FALSE)
## 103
## loop over margins. logL: 28.763350644279 
## 103
## loop over margins. logL: 371.795118911387 
## 103
## loop over margins. logL: 391.272776485601 
## 103
## loop over margins. logL: 396.832642464605 
## 103
## loop over margins. logL: 396.832741133973 
## 103
## loop over margins. logL: 396.832742073716 
## 103
## loop over margins. logL: 396.842881530033 
## 103
## loop over margins. logL: 396.842900258297 
## 103

A sanity check:

m1 <- loglin(reinis, glist, print=F, fit=T)
f1 <- m1$fit
m3 <- effloglin(stab, glist, print=F, fit=T)
## 103
## loop over margins. logL: 28.763350644279 
## 103
## loop over margins. logL: 371.795118911387 
## 103
## loop over margins. logL: 391.272776485601 
## 103
## loop over margins. logL: 396.832642464605 
## 103
## loop over margins. logL: 396.832741133973 
## 103
## loop over margins. logL: 396.832742073716 
## 103
## loop over margins. logL: 396.842881530033 
## 103
## loop over margins. logL: 396.842900258297 
## 103
f3 <- m3$fit
max(abs(f1 %a-% f3))
## [1] 184.2839

Testing for addition and deletion of edges

Consider the saturated and the independence models for the carcass data:

data(carcass)
cm1 <- cmod(~.^., carcass)
cm2 <- cmod(~.^1, data=carcass)

testdelete()

Let $\cal M_0$ be a model and let e = {u, v} be an edge in $\cal M_0$. The candidate model formed by deleting e from $\cal M_0$ is $\cal M_1$. The testdelete() function can be used to test for deletion of an edge from a model:

testdelete(cm1, ~Meat11:Fat11)
## dev:    3.898 df:  1 p.value: 0.04834 AIC(k=2.0):    1.9 edge: Meat11:Fat11 
## host:  Meat12 Fat11 LeanMeat Meat13 Fat13 Fat12 Meat11 
## Notice: Test performed in saturated marginal model
testdelete(cm1, ~Meat12:Fat13)
## dev:    0.139 df:  1 p.value: 0.70889 AIC(k=2.0):   -1.9 edge: Meat12:Fat13 
## host:  Meat12 Fat11 LeanMeat Meat13 Fat13 Fat12 Meat11 
## Notice: Test performed in saturated marginal model

In the first case the p–value suggests that the edge can not be deleted. In the second case the p–value suggests that the edge can be deleted. The reported AIC value is the difference in AIC between the candidate model and the original model. A negative value of AIC suggest that the candidate model is to be preferred.

testadd()

Next, let $\cal M_0$ be a model and let e = {u, v} be an edge not in $\cal M_0$. The candidate model formed by adding e to $\cal M_0$ is denoted $\cal M_1$. The testadd() function can be used to test for deletion of an edge from a model:

testadd(cm2, ~Meat11:Fat11)
## dev:    0.507 df:  1 p.value: 0.47625 AIC(k=2.0):   -1.5 edge: Meat11:Fat11 
## host:  Meat11 Fat11 
## Notice: Test performed in saturated marginal model
testadd(cm2, ~Meat12:Fat13)
## dev:    4.168 df:  1 p.value: 0.04120 AIC(k=2.0):    2.2 edge: Meat12:Fat13 
## host:  Meat12 Fat13 
## Notice: Test performed in saturated marginal model

In the first case the p–value suggests that no significant improvedment of the model is obtained by adding the edge. In the second case a significant improvement is optained by adding the edge. The reported AIC value is the difference in AIC between the candidate model and the original model. A negative value of AIC suggest that the candidate model is to be preferred.

Finding edges

getInEdges()

Consider the following model for the  data:

data(carcass)
cm1 <- cmod(~LeanMeat:Meat12:Fat12+LeanMeat:Fat11:Fat12+Fat11:Fat12:Fat13, data=carcass)
plot(cm1)

The edges in the model are

getInEdges(cm1)
##      [,1]       [,2]    
## [1,] "LeanMeat" "Meat12"
## [2,] "LeanMeat" "Fat12" 
## [3,] "LeanMeat" "Fat11" 
## [4,] "Meat12"   "Fat12" 
## [5,] "Fat12"    "Fat11" 
## [6,] "Fat12"    "Fat13" 
## [7,] "Fat11"    "Fat13"

In connection with model selection it is sometimes convenient to get only the edges which are contained in only one clique:

getInEdges(cm1, type="decomposable")
##      [,1]       [,2]    
## [1,] "LeanMeat" "Meat12"
## [2,] "LeanMeat" "Fat11" 
## [3,] "Meat12"   "Fat12" 
## [4,] "Fat12"    "Fat13" 
## [5,] "Fat11"    "Fat13"

getOutEdges()

The edges not in the model are

getOutEdges(cm1)
##      [,1]       [,2]   
## [1,] "LeanMeat" "Fat13"
## [2,] "Meat12"   "Fat11"
## [3,] "Meat12"   "Fat13"

In connection with model selection it is sometimes convenient to get only the edges which when added will be in only one clique of the new model:

getOutEdges(cm1, type="decomposable")
##      [,1]       [,2]   
## [1,] "LeanMeat" "Fat13"
## [2,] "Meat12"   "Fat11"

Evaluating edges in the model

data(carcass)
cm1 <- cmod(~LeanMeat:Meat12:Fat12+LeanMeat:Fat11:Fat12+Fat11:Fat12:Fat13+Fat12:Meat11:Meat13, data=carcass[1:20,])
plot(cm1)

evalInEdges()

in.ed <- getInEdges(cm1)
z <- testInEdges(cm1, edgeList=in.ed)
z
##     statistic df      p.value        aic     V1       V2 action
## 1   2.3180596  1 1.278795e-01  0.3180596  Fat11    Fat12      -
## 2   2.7529164  1 9.707721e-02  0.7529164  Fat11    Fat13      -
## 3   7.1618067  1 7.447215e-03  5.1618067  Fat11 LeanMeat      -
## 4   1.8377898  1 1.752102e-01 -0.1622102 Meat11    Fat12      +
## 5  33.2906683  1 7.936256e-09 31.2906683 Meat11   Meat13      -
## 6   1.6517259  1 1.987242e-01 -0.3482741  Fat12   Meat12      +
## 7   5.1268178  1 2.355888e-02  3.1268178  Fat12    Fat13      -
## 8   1.5850908  1 2.080289e-01 -0.4149092  Fat12   Meat13      +
## 9   0.6102019  1 4.347118e-01 -1.3897981  Fat12 LeanMeat      +
## 10  3.5761623  1 5.861442e-02  1.5761623 Meat12 LeanMeat      -

Hence there are four edges which lead to a decrease in AIC. If we set headlong=T then the function exist as soon as one decrease in AIC is found:

z <-testInEdges(cm1, edgeList=in.ed, headlong=T)
z
##   statistic df    p.value        aic     V1       V2 action
## 1  3.576162  1 0.05861442  1.5761623 Meat12 LeanMeat      -
## 2  1.585091  1 0.20802892 -0.4149092  Fat12   Meat13      +

evalOutEdges()

out.ed <- getOutEdges(cm1)
z <- testOutEdges(cm1, edgeList=out.ed)

Hence there are four edges which lead to a decrease in AIC. If we set headlong=T then the function exist as soon as one decrease in AIC is found:

z <- testOutEdges(cm1, edgeList=out.ed, headlong=T)
z
##   statistic df   p.value        aic     V1     V2 action
## 1 1.6000783  1 0.2058921 -0.3999217 Meat12  Fat13      -
## 2 0.9605238  1 0.3270549 -1.0394762 Meat11  Fat13      -
## 3 2.3301143  1 0.1268926  0.3301143  Fat11 Meat12      +

Miscellaneous

The Model Object

It is worth looking at the information in the model object:

dm3 <- dmod(list(c("smoke", "systol"), c("smoke", "mental", "phys")),
            data=reinis)
names(dm3)
## [1] "modelinfo" "varNames"  "datainfo"  "fitinfo"   "isFitted"  "call"

Methods for model objects

A summary() of a model:

summary(dm1) ## FIXME
##           Length Class  Mode     
## modelinfo  4     -none- list     
## varNames   6     -none- character
## datainfo   1     -none- list     
## fitinfo   10     -none- list     
## isFitted   1     -none- logical  
## call       3     -none- call
str(fitted(dm1))
##  'table' num [1:2, 1:2, 1:2, 1:2, 1:2, 1:2] 0 0 0 0 3 0 1 0 0 1 ...
##  - attr(*, "dimnames")=List of 6
##   ..$ la10: chr [1:2] "1" "2"
##   ..$ locc: chr [1:2] "1" "2"
##   ..$ mp58: chr [1:2] "1" "2"
##   ..$ c365: chr [1:2] "1" "2"
##   ..$ p53a: chr [1:2] "1" "2"
##   ..$ a367: chr [1:2] "1" "2"
str(dm1$data)
## List of 1
##  $ data: 'table' num [1:2, 1:2, 1:2, 1:2, 1:2, 1:2] 0 0 0 0 3 0 1 0 0 1 ...
##   ..- attr(*, "dimnames")=List of 6
##   .. ..$ la10: chr [1:2] "1" "2"
##   .. ..$ locc: chr [1:2] "1" "2"
##   .. ..$ mp58: chr [1:2] "1" "2"
##   .. ..$ c365: chr [1:2] "1" "2"
##   .. ..$ p53a: chr [1:2] "1" "2"
##   .. ..$ a367: chr [1:2] "1" "2"

Hence we can make a simple diagnostic plot of Pearson residuals as FIXME