Title: | Learning Bayesian Networks with Mixed Variables |
---|---|
Description: | Bayesian networks with continuous and/or discrete variables can be learned and compared from data. The method is described in Boettcher and Dethlefsen (2003), <doi:10.18637/jss.v008.i20>. |
Authors: | Susanne Gammelgaard Bottcher, Claus Dethlefsen. |
Maintainer: | Claus Dethlefsen <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2-42 |
Built: | 2024-12-15 07:30:22 UTC |
Source: | CRAN |
From initial network, does local perturbations to increase network score.
autosearch(initnw,data,prior=jointprior(network(data)),maxiter=50, trylist= vector("list",size(initnw)),trace=TRUE, timetrace=TRUE,showban=FALSE,removecycles=FALSE) heuristic(initnw,data,prior=jointprior(network(data)), maxiter=100,restart=10,degree=size(initnw), trylist= vector("list",size(initnw)),trace=TRUE, timetrace=TRUE,removecycles=FALSE) gettable(x)
autosearch(initnw,data,prior=jointprior(network(data)),maxiter=50, trylist= vector("list",size(initnw)),trace=TRUE, timetrace=TRUE,showban=FALSE,removecycles=FALSE) heuristic(initnw,data,prior=jointprior(network(data)), maxiter=100,restart=10,degree=size(initnw), trylist= vector("list",size(initnw)),trace=TRUE, timetrace=TRUE,removecycles=FALSE) gettable(x)
initnw |
an object of class |
data |
a data frame used for learning the network, see
|
prior |
a list containing parameter priors, generated by
|
maxiter |
an integer, which gives the maximum number of steps in the search algorithm. |
restart |
an integer, which gives the number of times to perturb
|
degree |
an integer, which gives the degree of perturbation, see
|
trylist |
a list used internally for reusing learning of nodes,
see |
trace |
a logical. If |
timetrace |
a logical. If |
showban |
a logical passed to the plot method for network
objects. If
|
removecycles |
a logical. If |
x |
an output object from a search. |
In autosearch
, a list of networks is in each step
created with either one
arrow added, one arrow deleted or one arrow turned (if a cycle is not
generated). The network scores of all the proposal networks are
calculated and the network with the highest score is chosen for the
next step in the search. If no proposed network has a higher network
score than the previous network, the search is terminated. The
network with the highest network score is returned, along with a list
containing all tried networks (depending on the value of removecycles
).
heuristic
restarts by perturbing initnw
degree
times and calling
autosearch
again. The number
of restarts is given by the option restart
.
autosearch
and heuristic
returns a list with three
elements, that may be accessed using getnetwork
,
gettable
and gettrylist
. The elements are
nw |
an object of class |
table |
a table with all tried
networks. If removecycles is |
trylist |
an updated list used internally for reusing learning
of nodes, see |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
data(rats) fit <- network(rats) fit.prior <- jointprior(fit,12) fit <- getnetwork(learn(fit,rats,fit.prior)) fit <- getnetwork(insert(fit,2,1,rats,fit.prior)) fit <- getnetwork(insert(fit,1,3,rats,fit.prior)) hisc <- autosearch(fit,rats,fit.prior,trace=FALSE) hisc <- autosearch(fit,rats,fit.prior,trace=FALSE,removecycles=TRUE) # slower plot(getnetwork(hisc)) hisc2 <- heuristic(fit,rats,fit.prior,restart=10,trace=FALSE) plot(getnetwork(hisc2)) print(modelstring(getnetwork(hisc2))) plot(makenw(gettable(hisc2),fit))
data(rats) fit <- network(rats) fit.prior <- jointprior(fit,12) fit <- getnetwork(learn(fit,rats,fit.prior)) fit <- getnetwork(insert(fit,2,1,rats,fit.prior)) fit <- getnetwork(insert(fit,1,3,rats,fit.prior)) hisc <- autosearch(fit,rats,fit.prior,trace=FALSE) hisc <- autosearch(fit,rats,fit.prior,trace=FALSE,removecycles=TRUE) # slower plot(getnetwork(hisc)) hisc2 <- heuristic(fit,rats,fit.prior,restart=10,trace=FALSE) plot(getnetwork(hisc2)) print(modelstring(getnetwork(hisc2))) plot(makenw(gettable(hisc2),fit))
drawnetwork
allows the user to specify a Bayesian network through a point and click interface.
drawnetwork(nw,df,prior,trylist=vector("list",size(nw)), unitscale=20,cexscale=8, arrowlength=.25,nocalc=FALSE, yr=c(0,350),xr=yr,...)
drawnetwork(nw,df,prior,trylist=vector("list",size(nw)), unitscale=20,cexscale=8, arrowlength=.25,nocalc=FALSE, yr=c(0,350),xr=yr,...)
nw |
an object of class |
df |
a data frame used for learning the network, see
|
prior |
a list containing parameter priors, generated by
|
trylist |
a list used internally for reusing learning of nodes,
see |
cexscale |
a numeric passed to the plot method for network objects. Measures the scaled size of text and symbols. |
arrowlength |
a numeric passed to the plot method for network objects. Measures the length of the edges of the arrowheads. |
nocalc |
a logical. If |
unitscale |
a numeric passed to the plot method for network objects. Scale parameter for chopping off arrow heads. |
xr |
a numeric vector with two components containing the range on x-axis. |
yr |
a numeric vector with two components containing the range on y-axis. |
... |
additional plot arguments, passed to the plot method for network objects. |
To insert an arrow from node 'A' to node 'B', first click node 'A' and then click node 'B'. When the graph is finished, click 'stop'.
To specify that an arrow must not be present, press 'ban' (a toggle)
and draw the arrow. This is shown as a red dashed arrow. It is possible
to ban both directions between nodes. The ban list is stored with the
network in the property banlist
. It is a matrix with two
columns. Each row is the 'from' node index and the 'to' node index,
where the indices are the column number in the data frame.
Note that the network score changes as the network is re-learned
whenever a change is made (unless nocalc
is TRUE
).
A list with two elements that may be accessed using
getnetwork
and gettrylist
. The elements are
nw |
an object of class |
trylist |
an updated list used internally for reusing learning
of nodes, see |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
data(rats) rats.nw <- network(rats) rats.prior <- jointprior(rats.nw,12) rats.nw <- getnetwork(learn(rats.nw,rats,rats.prior)) ## Not run: newrat <- getnetwork(drawnetwork(rats.nw,rats,rats.prior))
data(rats) rats.nw <- network(rats) rats.prior <- jointprior(rats.nw,12) rats.nw <- getnetwork(learn(rats.nw,rats,rats.prior)) ## Not run: newrat <- getnetwork(drawnetwork(rats.nw,rats,rats.prior))
The networks in a network family is arranged as pictex-graphs in a LaTeX-table.
genlatex(nwl,outdir="pic/",prefix="scoretable",picdir="",picpre="pic", ncol=5,nrow=7,width=12/ncol,vadjust=-1.8) genpicfile (nwl,outdir="pic/",prefix="pic",w=1.6,h=1.6,bigscale=3)
genlatex(nwl,outdir="pic/",prefix="scoretable",picdir="",picpre="pic", ncol=5,nrow=7,width=12/ncol,vadjust=-1.8) genpicfile (nwl,outdir="pic/",prefix="pic",w=1.6,h=1.6,bigscale=3)
nwl |
object of class |
outdir |
character string, the directory for storing output. |
prefix |
character string, the filename (without extension) of the LaTeX file. The filenames of the picfiles begin with the given prefix. |
picdir |
character string, the directory where pic-files are stored. |
picpre |
character string, prefix for pic-files. |
ncol |
integer, the number of columns in LaTeX table. |
nrow |
integer, the number of rows in LaTeX table. |
width |
numeric, the width of each cell in the LaTeX table. |
vadjust |
numeric, the vertical adjustment in LaTeX table. |
w |
numeric, the width of pictex objects |
h |
numeric, the height of pictex objects |
bigscale |
numeric, the scaling of the best network, which is output in 'nice.tex' |
Files:
{outdir}{picpre}xx.tex |
one pictex file for each network in the network family, indexed by xx. |
{outdir}{prefix}.tex |
LaTeX file with table including all pictex files. |
{outdir}{picpre}nice.tex |
pictex file with the best network. |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
data(rats) allrats <- getnetwork(networkfamily(rats,network(rats))) allrats <- nwfsort(allrats) ## Not run: dir.create("c:/temp") ## Not run: genpicfile(allrats,outdir="c:/temp/pic/") ## Not run: genlatex(allrats,outdir="c:/temp/pic/",picdir="c:/temp/pic/") ## LATEX FILE: #\documentclass{article} #\usepackage{array,pictex} #\begin{document} #\input{scoretable} #\input{picnice} #\end{document} #data(ksl) #ksl.nw <- network(ksl) #ksl.prior <- jointprior(ksl.nw,64) #mybanlist <- matrix(c(5,5,6,6,7,7,9, # 8,9,8,9,8,9,8),ncol=2) #banlist(ksl.nw) <- mybanlist #ksl.nw <- getnetwork(learn(ksl.nw,ksl,ksl.prior)) #ksl.search <- autosearch(ksl.nw,ksl,ksl.prior, # trace=TRUE) #ksl.searchlist <- makenw(ksl.search$table,ksl.search$nw) #ksl.searchlist <- nwfsort(ksl.searchlist) ## Not run: genpicfile(ksl.searchlist) ## Not run: genlatex(ksl.searchlist)
data(rats) allrats <- getnetwork(networkfamily(rats,network(rats))) allrats <- nwfsort(allrats) ## Not run: dir.create("c:/temp") ## Not run: genpicfile(allrats,outdir="c:/temp/pic/") ## Not run: genlatex(allrats,outdir="c:/temp/pic/",picdir="c:/temp/pic/") ## LATEX FILE: #\documentclass{article} #\usepackage{array,pictex} #\begin{document} #\input{scoretable} #\input{picnice} #\end{document} #data(ksl) #ksl.nw <- network(ksl) #ksl.prior <- jointprior(ksl.nw,64) #mybanlist <- matrix(c(5,5,6,6,7,7,9, # 8,9,8,9,8,9,8),ncol=2) #banlist(ksl.nw) <- mybanlist #ksl.nw <- getnetwork(learn(ksl.nw,ksl,ksl.prior)) #ksl.search <- autosearch(ksl.nw,ksl,ksl.prior, # trace=TRUE) #ksl.searchlist <- makenw(ksl.search$table,ksl.search$nw) #ksl.searchlist <- nwfsort(ksl.searchlist) ## Not run: genpicfile(ksl.searchlist) ## Not run: genlatex(ksl.searchlist)
Inserts/removes one arrow in a network (if legal)
insert (nw,j,i,df,prior,nocalc=FALSE,trylist=vector("list",size(nw))) remover(nw,j,i,df,prior,nocalc=FALSE,trylist=vector("list",size(nw)))
insert (nw,j,i,df,prior,nocalc=FALSE,trylist=vector("list",size(nw))) remover(nw,j,i,df,prior,nocalc=FALSE,trylist=vector("list",size(nw)))
nw |
an object of class |
j |
integer, giving the index of the 'from' node. |
i |
integer, giving the index of the 'to' node. |
df |
a data frame used for learning the network, see
|
prior |
a list describing parameter priors, generated by
|
nocalc |
a logical. If |
trylist |
a list, used internally for reusing learning of nodes,
see |
Examines if the arrow from j
to i
is legal according to
the following criteria
Arrows from/to the same node are not legal.
Arrows from continous nodes to discrete nodes are not legal.
Arrows banned in ban list are not legal, see drawnetwork
.
Arrows already existing in the network are not legal.
If the arrow is not legal, a NULL
network is returned. Otherwise, the
arrow is inserted/removed, the network is re-learned (if
nocalc
is FALSE
). The trylist is updated.
A list with two elements
nw |
an object of class |
trylist |
an updated list, used internally for reusing learning
of nodes, see |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
data(rats) rats.nw <- network(rats) rats.nw <- getnetwork(insert(rats.nw,2,1,nocalc=TRUE)) rats.prior <- jointprior(rats.nw,12) rats.nw2 <- network(rats) rats.nw2 <- getnetwork(learn(rats.nw2,rats,rats.prior)) rats.nw2 <- getnetwork(insert(rats.nw2,1,2,rats,rats.prior)) rats.nw3 <- getnetwork(remover(rats.nw2,1,2,rats,rats.prior))
data(rats) rats.nw <- network(rats) rats.nw <- getnetwork(insert(rats.nw,2,1,nocalc=TRUE)) rats.prior <- jointprior(rats.nw,12) rats.nw2 <- network(rats) rats.nw2 <- getnetwork(learn(rats.nw2,rats,rats.prior)) rats.nw2 <- getnetwork(insert(rats.nw2,1,2,rats,rats.prior)) rats.nw3 <- getnetwork(remover(rats.nw2,1,2,rats,rats.prior))
Given a network with a prob
property for each node, derives the
joint probability distribution. Then the quantities needed in
the local master procedure for finding the local parameter priors are
deduced.
jointprior(nw,N=NA,phiprior="bottcher",timetrace=FALSE)
jointprior(nw,N=NA,phiprior="bottcher",timetrace=FALSE)
nw |
an object of class |
N |
an integer, which gives the size of the imaginary data base. If
this is too small,
|
phiprior |
a string, which specifies how the prior for phi is
calculated. Either |
timetrace |
a logical. If |
For the discrete part of the network, the joint probability
distribution is
calculated by multiplying together the local probability
distributions. Then, jointalpha
is determined by multiplying
each entry in the joint probability distribution by the size of the
imaginary data base N
.
For the mixed part of the network, for each configuration of the discrete
variables, the joint Gaussian distribution of the continuous
variables is constructed and represented by jointmu
(one
row for each configuration of the discrete parents) and
jointsigma
(a list of matrices – one for each configuration of
the discrete parents). The configurations of the discrete parents are
ordered according to findex
. The algorithm for
constructing the joint distribution of the continuous variables is
described in Shachter and Kenley (1989).
Then, jointalpha
, jointnu
, jointrho
, mu
and
jointphi
are deduced. These quantities are later used for
deriving local parameter priors.
For each configuration i
of the discrete variables,
and
if phiprior="bottcher"
, see Bottcher(2001) and
if phiprior="heckerman"
, see Heckerman, Geiger and Chickering (1995).
A list with the following elements,
jointalpha |
a table used in the local master procedure for discrete variables. |
jointnu |
a table used in the local master procedure for continuous variables. |
jointrho |
a table used in the local master procedure for continuous variables. |
jointmu |
a numeric matrix used in the local master procedure for continuous variables. |
jointsigma |
a list of numeric matrices (not used in further calculations). |
jointphi |
a list of numeric matrices used in the local master procedure for continuous variables. |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
Bottcher, S.G. (2001). Learning Bayesian Networks with Mixed Variables, Artificial Intelligence and Statistics 2001, Morgan Kaufmann, San Francisco, CA, USA, 149-156.
Heckerman, D., Geiger, D. and Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20: 197-243.
Shachter, R.D. and Kenley, C.R. (1989), Gaussian influence diagrams. Management Science, 35:527-550.
data(rats) rats.nw <- network(rats) rats.prior <- jointprior(rats.nw,12) ## Not run: savenet(rats.nw,file("rats.net")) ## Not run: rats.nw <- readnet(file("rats.net")) ## Not run: rats.nw <- prob(rats.nw,rats) ## Not run: rats.prior <- jointprior(rats.nw,12)
data(rats) rats.nw <- network(rats) rats.prior <- jointprior(rats.nw,12) ## Not run: savenet(rats.nw,file("rats.net")) ## Not run: rats.nw <- readnet(file("rats.net")) ## Not run: rats.nw <- prob(rats.nw,rats) ## Not run: rats.prior <- jointprior(rats.nw,12)
Data from a study measuring health and social characteristics of representative samples of Danish 70 year olds, taken in 1967 and 1984.
A data frame with variables of both discrete and continuous types.
Forced ejection volume
Cholesterol
Hypertension (no/yes)
Logarithm of Body Mass Index
Smoking (no/yes)
Alcohol consumption (seldom/frequently)
Working (yes/no)
male/female
Survey year (1967/1984)
Updates the distributions of the parameters in the network, based on a prior network and data. Also, the network score is calculated.
learn (nw, df, prior=jointprior(nw), nodelist=1:size(nw), trylist=vector("list",size(nw)), timetrace=FALSE)
learn (nw, df, prior=jointprior(nw), nodelist=1:size(nw), trylist=vector("list",size(nw)), timetrace=FALSE)
nw |
an object of class |
df |
a data frame used for learning the network, see
|
prior |
a list containing parameter priors, generated by
|
nodelist |
a numeric vector of indices of nodes to be learned. |
trylist |
a list used internally for reusing learning of nodes,
see |
timetrace |
a logical. If |
The procedure learn
determines the master prior, local parameter
priors and local parameter posteriors, see Bottcher (2001). It may be called on all nodes
(default) or just a single node.
From the joint prior distribution, the marginal distribution of
all parameters in the family consisting of the node and its parents
can be determined. This is the master prior, see
localmaster
.
The local parameter priors are now determined by conditioning in
the master prior distribution, see
conditional
. The hyperparameters associated with the
local parameter prior distribution is attached to each node in the
property condprior
.
Finally, the local parameter posterior distributions are calculated (see
post
) and attached to each node in the property
condposterior
.
A so-called trylist is maintained to speedup the learning process. The trylist consists of a list of matrices for each node. The matrix for a given node holds previously evaluated parent configurations and the corresponding log-likelihood contribution. If a node with a certain parent configuration needs to be learned, it is checked, whether the node has already been learned. The previously learned nodes are given as input in the trylist parameter and is updated in the learning procedure.
When one or more nodes in a network have been learned, the network
score is updated and attached to the network in the property
score
.
The learning procedure is called from various functions using the
principle, that networks should always be updated with their
score. Thus, e.g.\ drawnetwork
keeps the network updated
when the graph is altered.
A list with two elements that may be accessed using
getnetwork
and gettrylist
. The elements are
nw |
an object of class |
trylist |
an updated list used internally for reusing learning
of nodes, see |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
Bottcher, S.G. (2001). Learning Bayesian Networks with Mixed Variables, Artificial Intelligence and Statistics 2001, Morgan Kaufmann, San Francisco, CA, USA, 149-156.
networkfamily
,
jointprior
,
maketrylist
,
network
data(rats) fit <- network(rats) fit.prior <- jointprior(fit,12) fit.learn <- learn(fit,rats,fit.prior,timetrace=TRUE) fit.nw <- getnetwork(fit.learn) fit.learn2<- learn(fit,rats,fit.prior,trylist=gettrylist(fit.learn),timetrace=TRUE)
data(rats) fit <- network(rats) fit.prior <- jointprior(fit,12) fit.learn <- learn(fit,rats,fit.prior,timetrace=TRUE) fit.nw <- getnetwork(fit.learn) fit.learn2<- learn(fit,rats,fit.prior,trylist=gettrylist(fit.learn),timetrace=TRUE)
Creates local probability distributions reflecting the
graph of the network. These are attached as a simprob
property
to each node in the network and can be edited and used for
rnetwork
.
makesimprob(nw, s2=function(idx,cf) { cf <- as.vector(cf) xs <- (1:length(cf)) log(xs%*%cf+1) }, m0=function(idx,cf) { cf <- as.vector(cf) xs <- (1:length(cf))^2 .69*(xs%*%cf) }, m1=function(idx,cf) { cf <- as.vector(cf) xs <- (1:length(cf))*10 idx*(cf%*%xs) })
makesimprob(nw, s2=function(idx,cf) { cf <- as.vector(cf) xs <- (1:length(cf)) log(xs%*%cf+1) }, m0=function(idx,cf) { cf <- as.vector(cf) xs <- (1:length(cf))^2 .69*(xs%*%cf) }, m1=function(idx,cf) { cf <- as.vector(cf) xs <- (1:length(cf))*10 idx*(cf%*%xs) })
nw |
an object of class |
s2 |
function that returns the variance as a function of the node index and the configuration of the discrete variables. |
m0 |
function that returns the intercept as a function of the node index and the configuration of the discrete variables. |
m1 |
function that returns the regression coefficients as a function of the node index and the configuration of the discrete variables. |
For each node, the local simprob
is determined. If the node is
discrete, the probability distribution is uniform (and thus not
reflecting the dependence in the graph, as it should). If the node is
continuous, one mean and variance is attached per configuration of the
discrete parents. The mean depends on the continuos parents and is the
regression coefficients determined by the functions m0
(intercept) and m1
(regression coefficients). The variance is
determined by the function s2
.
The network object nw
, where each node has attached the
property simprob
.
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
For faster learning, a trylist is maintained as a lookup table for a given parent configuration of a node.
maketrylist(initnw,data,prior=jointprior(network(data)),timetrace=FALSE)
maketrylist(initnw,data,prior=jointprior(network(data)),timetrace=FALSE)
initnw |
an object of class |
data |
a data frame used for learning the network, see
|
prior |
a list containing parameter priors, generated by
|
timetrace |
a logical. If |
This procedure is included for illustrative purposes. For each node in the network, all possible parent configurations are created and learned. The result is called a trylist. To create the full trylist is very time-consuming, and a better choice is to maintain a trylist while searching and indeed this is automatically done. The trylist is given as output to all functions that call the learning procedure and can be given as an argument.
A list with one element per node in the network. In the list,
element i is a matrix with two columns: a string with the
indices of the parent nodes, separated by ":", and a numeric with the
log-likelihood contribution of the node given the parent
configuration. Whenever learning is performed of a node given a parent
configuration, the trylist is consulted to yield faster learning,
especially useful when using autosearch
or
heuristic
.
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
networkfamily
,
autosearch
heuristic
data(rats) rats.nw <- network(rats) rats.pr <- jointprior(rats.nw,12) rats.nw <- getnetwork(learn(rats.nw,rats,rats.pr)) rats.tr <- maketrylist(rats.nw,rats,rats.pr) rats.hi <- getnetwork(heuristic(rats.nw,rats,rats.pr,trylist=rats.tr))
data(rats) rats.nw <- network(rats) rats.pr <- jointprior(rats.nw,12) rats.nw <- getnetwork(learn(rats.nw,rats,rats.pr)) rats.tr <- maketrylist(rats.nw,rats,rats.pr) rats.hi <- getnetwork(heuristic(rats.nw,rats,rats.pr,trylist=rats.tr))
A Bayesian network is represented as an object of class
network
. Methods for printing and plotting are defined.
network(df,specifygraph=FALSE,inspectprob=FALSE, doprob=TRUE,yr=c(0,350),xr=yr) ## S3 method for class 'network' print(x,filename=NA,condposterior=FALSE, condprior=FALSE,...) ## S3 method for class 'network' plot(x,arrowlength=.25, notext=FALSE, sscale=7,showban=TRUE,yr=c(0,350),xr=yr, unitscale=20,cexscale=8,...)
network(df,specifygraph=FALSE,inspectprob=FALSE, doprob=TRUE,yr=c(0,350),xr=yr) ## S3 method for class 'network' print(x,filename=NA,condposterior=FALSE, condprior=FALSE,...) ## S3 method for class 'network' plot(x,arrowlength=.25, notext=FALSE, sscale=7,showban=TRUE,yr=c(0,350),xr=yr, unitscale=20,cexscale=8,...)
df |
a data frame, where the columns define the variables. A
continuous variable should have type |
specifygraph |
a logical. If |
inspectprob |
a logical. If |
doprob |
a logical. If |
x |
an object of class |
filename |
a string or |
condprior |
a logical. If |
condposterior |
a logical. If |
sscale |
a numeric. The nodes are initially placed on a circle
with radius |
unitscale |
a numeric. Scale parameter for chopping off arrow heads. |
cexscale |
a numeric. Scale parameter to set the size of the nodes. |
arrowlength |
a numeric containing the length of the arrow heads. |
xr |
a numeric vector with two components containing the range on x-axis. |
yr |
a numeric vector with two components containing the range on y-axis. |
notext |
a logical. If |
showban |
a logical. If |
... |
additional plot arguments, passed to |
The netork
creator function returns an object of class
network
, which is a list with the following
elements (properties),
nodes |
a list of objects of class |
n |
an integer containing the number of nodes in the network. |
discrete |
a numeric vector of indices of discrete nodes. |
continuous |
a numeric vector of indices of continuous nodes. |
banlist |
a numeric matrix with two columns. Each row contains the
indices |
score |
a numeric added by |
relscore |
a numeric added by |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
networkfamily
,
node
,
rnetwork
,
learn
,
drawnetwork
,
jointprior
,
heuristic
,
nwequal
A <- factor(rep(c("A1","A2"),50)) B <- factor(rep(rep(c("B1","B2"),25),2)) thisnet <- network( data.frame(A,B) ) set.seed(109) sex <- gl(2,4,label=c("male","female")) age <- gl(2,2,8) yield <- rnorm(length(sex)) weight <- rnorm(length(sex)) mydata <- data.frame(sex,age,yield,weight) mynw <- network(mydata) # adjust prior probability distribution localprob(mynw,"sex") <- c(0.4,0.6) localprob(mynw,"age") <- c(0.6,0.4) localprob(mynw,"yield") <- c(2,0) localprob(mynw,"weight")<- c(1,0) print(mynw) plot(mynw) prior <- jointprior(mynw) mynw <- getnetwork(learn(mynw,mydata,prior)) thebest <- getnetwork(autosearch(mynw,mydata,prior)) print(mynw,condposterior=TRUE) ## Not run: savenet(mynw,file("yield.net"))
A <- factor(rep(c("A1","A2"),50)) B <- factor(rep(rep(c("B1","B2"),25),2)) thisnet <- network( data.frame(A,B) ) set.seed(109) sex <- gl(2,4,label=c("male","female")) age <- gl(2,2,8) yield <- rnorm(length(sex)) weight <- rnorm(length(sex)) mydata <- data.frame(sex,age,yield,weight) mynw <- network(mydata) # adjust prior probability distribution localprob(mynw,"sex") <- c(0.4,0.6) localprob(mynw,"age") <- c(0.6,0.4) localprob(mynw,"yield") <- c(2,0) localprob(mynw,"weight")<- c(1,0) print(mynw) plot(mynw) prior <- jointprior(mynw) mynw <- getnetwork(learn(mynw,mydata,prior)) thebest <- getnetwork(autosearch(mynw,mydata,prior)) print(mynw,condposterior=TRUE) ## Not run: savenet(mynw,file("yield.net"))
Various extraction/replacement functions for networks
modelstring(x) makenw(tb,template) as.network(nwstring,template) size(x) banlist(x) banlist(x) <- value getnetwork(x) gettrylist(x)
modelstring(x) makenw(tb,template) as.network(nwstring,template) size(x) banlist(x) banlist(x) <- value getnetwork(x) gettrylist(x)
x |
an object of class |
tb |
a table output from |
template |
an object of class |
nwstring |
a string representing the network. |
value |
a numeric matrix with two columns. Each row contains the
indices |
The string representation of a network is a minimal size
representation to speed up calculations. The functions
modelstring
, as.network
and makenw
converts
between the string represention and network objects.
size
extracts the number of nodes in a network object.
banlist
extracts the banlist from a network object.
getnetwork
and gettrylist
are accessor function that
extracts a network object or trylist from the result from
autosearch
, heuristic
,
learn
, perturb
,
networkfamily
, drawnetwork
.
Method for generating and learning all networks that are
possible for a given set of variables. These may be
plotted or printed. Also, functions for
sorting according to the network score (see nwfsort
) and for
making a network family unique (see the unique
method for
networkfamily
objects) are available.
networkfamily(data,nw=network(data), prior=jointprior(nw), trylist=vector("list",size(nw)), timetrace=TRUE) ## S3 method for class 'networkfamily' print(x,...) ## S3 method for class 'networkfamily' plot(x,layout=, cexscale=5,arrowlength=0.1,sscale=7,...)
networkfamily(data,nw=network(data), prior=jointprior(nw), trylist=vector("list",size(nw)), timetrace=TRUE) ## S3 method for class 'networkfamily' print(x,...) ## S3 method for class 'networkfamily' plot(x,layout=, cexscale=5,arrowlength=0.1,sscale=7,...)
nw |
an object of class |
data |
a data frame used for learning the network, see
|
prior |
a list containing parameter priors, generated by
|
trylist |
a list used internally for reusing learning of nodes,
see |
timetrace |
a logical. If |
x |
an object of class |
layout |
a numeric two dimensional vector with the number of plots in the rows
and columns of each plotting page. Default set to |
cexscale |
a numeric. A scaling parameter to set the size of the nodes. |
arrowlength |
a numeric, which gives the length of the arrow heads. |
sscale |
a numeric. The nodes are initially placed on a circle
with radius |
... |
additional plot arguments passed to the plot method for network objects. |
networkfamily
generates and learns all possible networks with
the nodes given as in the initial network nw
. This is done by
successively trying to generate the networks with all possible arrows
to/from each node (see addarrows
). If there is a ban list
present in nw
(see network
), then this is
respected, as are the restrictions described in insert
.
After generation of all possible networks, a test for cycles (see
cycletest
) is performed and only networks with directed
acyclic graphs are returned.
The function networkfamily
returns a list with two components,
nw |
an object of class |
trylist |
an updated list used internally for reusing learning
of nodes, see |
Generating all possible networks can be very time consuming!
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
network
,
genlatex
,
heuristic
,
nwfsort
,
unique.networkfamily
,
elementin
,
addarrows
,
cycletest
data(rats) allrats <- getnetwork(networkfamily(rats)) plot(allrats) print(allrats)
data(rats) allrats <- getnetwork(networkfamily(rats)) plot(allrats) print(allrats)
An important part of a network
is the list of
nodes. The nodes summarize the local properties of a node, given the
parents of the node.
node (idx,parents,type="discrete",name=paste(idx), levels=2,levelnames=paste(1:levels),position=c(0,0)) ## S3 method for class 'node' print(x,filename=NA,condposterior=TRUE,condprior=TRUE,...) ## S3 method for class 'node' plot(x,cexscale=10,notext=FALSE,...) nodes(nw) nodes(nw) <- value
node (idx,parents,type="discrete",name=paste(idx), levels=2,levelnames=paste(1:levels),position=c(0,0)) ## S3 method for class 'node' print(x,filename=NA,condposterior=TRUE,condprior=TRUE,...) ## S3 method for class 'node' plot(x,cexscale=10,notext=FALSE,...) nodes(nw) nodes(nw) <- value
x |
an object of class |
parents |
a numeric vector with indices of the parents of the node. |
idx |
an integer, which gives the index of the node (the column number of the corresponding data frame). |
type |
a string, which gives the type of the node. Either
|
name |
a string, which gives the name used when plotting and printing. Defaults to the column name in the data frame. |
levels |
an integer. If |
levelnames |
if |
position |
a numeric vector with coordinates where the node should
appear in the
plot. Usually set by |
nw |
an object of class |
value |
a list of elements of class |
filename |
a string or |
condprior |
a logical. If |
condposterior |
a logical. If |
cexscale |
a numeric. Scale parameter to set the size of the nodes. |
notext |
a logical. If |
... |
additional plot arguments. |
The operations on a node are typically done when operating on a
network
, so these functions are not to be called
directly.
When a network is created with network
, the nodes in the
nodelist are created using the node
procedure.
Local
probability distributions are added as the property prob
to
each node using prob.node
. If the node is continuous, this is a
numeric vector with
the conditional variance and the conditional regression coefficients
arising from a regression on the continuous parents, using data. If
the node has discrete parents, prob
is a matrix with a row
for each configuration of the discrete parents. If the node is
discrete, prob
is a multiway array which gives the conditional
probability distribution for each configuration of the discrete
parents. The generated prob
can be replaced to match the prior
information available.
nodes
gives the list of nodes of a network. localprob
gives the probability distribution for each node in the network.
The node
creator function returns an object of class
node
, which is a list with the following
elements (properties),
idx |
an integer. A unique index for this node. It MUST correspond to the column index of the variable in the data frame. |
name |
a string. The printed name of the node. |
type |
a string. Either |
levels |
an integer. If the node is of type |
levelnames |
if |
parents |
a vector of indices of the parents to this node. It is
best to manage this vector using the |
prob |
a numeric vector, matrix or multiway array, giving the
initial probability distribution. If the node is discrete,
|
condprior |
a list, generated by |
condposterior |
a list, which gives the parameter posteriors obtained from
|
loglik |
a numeric giving the log likelihood contribution for this node,
calculated in |
simprob |
a numeric vector, matrix or multiway array similar to |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
Calculates the number of different directed acyclic graphs for a set of discrete and continuous nodes.
numbermixed(nd,nc)
numbermixed(nd,nc)
nd |
an integer, which gives the number of discrete nodes. |
nc |
an integer, which gives the number of continuous nodes. |
No arrows are allowed from continuous nodes to discrete nodes. Cycles are not allowed. The number of networks is given by Bottcher (2003), using the result in Robinson (1977).
When nd+nc>15, the procedure is quite slow.
A numeric containing the number of directed acyclic graphs with the given node configuration.
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
Bottcher, S.G. (2003). Learning Conditional Gaussian Networks. Aalborg University, 2003.
Robinson, R.W. (1977). Counting unlabeled acyclic digraphs, Lecture Notes in Mathematics, 622: Combinatorial Mathematics.
numbermixed(2,2) ## Not run: numbermixed(5,10)
numbermixed(2,2) ## Not run: numbermixed(5,10)
According to the score
property of the networks in
a network family, the networks are sorted and the relative score,
i.e.\ the score of a network relative to the
highest score, is attached to each network as the relscore
property.
nwfsort(nwf)
nwfsort(nwf)
nwf |
an object of class |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
Randomly insert/delete/turn arrows to obtain another network.
perturb(nw,data,prior,degree=size(nw),trylist=vector("list",size(nw)), nocalc=FALSE,timetrace=TRUE)
perturb(nw,data,prior,degree=size(nw),trylist=vector("list",size(nw)), nocalc=FALSE,timetrace=TRUE)
nw |
an object of class |
data |
a data frame used for learning the network, see
|
prior |
a list containing parameter priors, generated by
|
degree |
an integer, which gives the number of attempts to randomly insert/remove/turn an arrow. |
trylist |
a list used internally for reusing learning of nodes,
see |
nocalc |
a logical. If |
timetrace |
a logical. If |
Given the initial network, a new network is constructed by randomly choosing an action: remove, turn, add. After the action is chosen, we choose randomly among all possibilities of that action. If there are no possibilites, the unchanged network is returned.
A list with two elements that may be accessed using
getnetwork
and gettrylist
. The elements are
nw |
an object of class |
trylist |
an updated list used internally for reusing learning
of nodes, see |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
set.seed(200) data(rats) fit <- network(rats) fit.prior <- jointprior(fit) fit <- getnetwork(learn(fit,rats,fit.prior)) fit.new <- getnetwork(perturb(fit,rats,fit.prior,degree=10)) data(ksl) ksl.nw <- network(ksl) ksl.rand <- getnetwork(perturb(ksl.nw,nocalc=TRUE,degree=10)) plot(ksl.rand)
set.seed(200) data(rats) fit <- network(rats) fit.prior <- jointprior(fit) fit <- getnetwork(learn(fit,rats,fit.prior)) fit.new <- getnetwork(perturb(fit,rats,fit.prior,degree=10)) data(ksl) ksl.nw <- network(ksl) ksl.rand <- getnetwork(perturb(ksl.nw,nocalc=TRUE,degree=10)) plot(ksl.rand)
Methods for accessing or changing the local probability distributions and for accessing the local prior and posterior distributions
prob(x,df,...) ## S3 method for class 'node' prob(x,df,nw,...) ## S3 method for class 'network' prob(x,df,...) localprob(nw) localprob(nw,name) <- value localprior(node) localposterior(node)
prob(x,df,...) ## S3 method for class 'node' prob(x,df,nw,...) ## S3 method for class 'network' prob(x,df,...) localprob(nw) localprob(nw,name) <- value localprior(node) localposterior(node)
x |
an object of class |
df |
a data frame, where the columns define the variables. A
continuous variable should have type |
nw |
an object of class |
node |
an object of class |
name |
a string, which gives the node name. |
... |
additional arguments for specific methods. |
value |
If the node is continuous, this is a numeric vector with the conditional variance and the conditional regression coefficients arising from a regression on the continuous parents, using data. If the node has discrete parents, it is a matrix with a row for each configuration of the discrete parents. If the node is discrete, it is a multiway array which gives the conditional probability distribution for each configuration of the discrete parents. |
The prob
methods add local
probability distributions to
each node. If the node is continuous, this is a
numeric vector with
the conditional variance and the conditional regression coefficients
arising from a regression on the continuous parents, using data. If
the node has discrete parents, prob
is a matrix with a row
for each configuration of the discrete parents. If the node is
discrete, prob
is a multiway array which gives the conditional
probability distribution for each configuration of the discrete
parents. The generated prob
can be replaced to match the prior
information available.
localprob
returns the probability distribution
for each node in the network.
In a learned network, the local prior and posterior can be accessed
for each node using localprior
and localposterior
.
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
An artificial data set. 24 rats (12 female, 12 male) have been randomized to use one of three drugs (products for loosing weight). The weightloss for each rat is noted after one and two weeks.
A data frame with 4 variables.
a factor with two levels: "M" (male), "F" (female)
a factor with three levels: "D1", "D2", "D3" (three types)
a numeric: weightloss, week one.
a numeric: weightloss, week 2.
Morrison, D.F. (1976). Multivariate Statistical Methods. McGraw-Hill, USA.
Edwards, D. (1995). Introduction to Graphical Modelling, Springer-Verlag. New York.
Reads/saves a Bayesian network specification in the .net
language used by Hugin.
readnet(con=file("default.net")) savenet(nw, con=file("default.net"))
readnet(con=file("default.net")) savenet(nw, con=file("default.net"))
con |
a connection. |
nw |
an object of class |
readnet
reads only the structure of a network, i.e.\ the
directed acyclic graph.
savenet
exports the prob
property for each node in the
network object along with the network structure defined by the parents
of each node.
readnet
creates an object of class network
with
the nodes specified as
in the .net
connection. The network
has not been learned and the nodes do not have prob
properties
(see prob.network
).
savenet
writes the object to the connection.
The call to readnet(savenet(network))
is not the identity
function as information is thrown away in both savenet
and
readnet
.
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
data(rats) nw <- network(rats) ## Not run: savenet(nw,file("default.net")) ## Not run: nw2 <- readnet(file("default.net")) ## Not run: nw2 <- prob(nw2,rats)
data(rats) nw <- network(rats) ## Not run: savenet(nw,file("default.net")) ## Not run: nw2 <- readnet(file("default.net")) ## Not run: nw2 <- prob(nw2,rats)
Given a network with nodes having the simprob
property,
rnetwork
simulates
a data set.
rnetwork(nw, n=24, file="")
rnetwork(nw, n=24, file="")
nw |
an object of class |
n |
an integer, which gives the number of cases to simulate. |
file |
a string. If non-empty, the data set is stored there. |
The variables are simulated one at a time in an order that ensures
that the parents of the node have already been simulated. For discrete
variables a multinomial distribution is used and for continuous
variables, a Gaussian distribution is used, according to the
simprob
property in each node.
A data frame with one row per case. If a file name is given, a file is created with the data set.
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
A <- factor(NA,levels=paste("A",1:2,sep="")) B <- factor(NA,levels=paste("B",1:3,sep="")) c1 <- NA c2 <- NA df <- data.frame(A,B,c1,c2) nw <- network(df,doprob=FALSE) # doprob must be FALSE nw <- makesimprob(nw) # create simprob properties set.seed(944) sim <- rnetwork(nw,n=100) # create simulated data frame
A <- factor(NA,levels=paste("A",1:2,sep="")) B <- factor(NA,levels=paste("B",1:3,sep="")) c1 <- NA c2 <- NA df <- data.frame(A,B,c1,c2) nw <- network(df,doprob=FALSE) # doprob must be FALSE nw <- makesimprob(nw) # create simprob properties set.seed(944) sim <- rnetwork(nw,n=100) # create simulated data frame
Accessor for the score from a node or network
score(x,...) ## S3 method for class 'node' score(x,...) ## S3 method for class 'network' score(x,...)
score(x,...) ## S3 method for class 'node' score(x,...) ## S3 method for class 'network' score(x,...)
x |
an object of class |
... |
additional arguments for specific methods. |
For networks, the log network score is returned. For nodes, the contribution to the log network score is returned.
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
Removes networks that are equal or equivalent to networks already in the network family.
## S3 method for class 'networkfamily' unique(x,incomparables=FALSE,equi=FALSE,timetrace=FALSE,epsilon=1e-12,...)
## S3 method for class 'networkfamily' unique(x,incomparables=FALSE,equi=FALSE,timetrace=FALSE,epsilon=1e-12,...)
x |
an object of class |
incomparables |
a logical, but has no effect. |
equi |
a logical. If |
timetrace |
a logical. If |
epsilon |
a numeric, which measures how close network scores are allowed to be from each other to be 'equivalent'. |
... |
further arguments (no effect) |
Susanne Gammelgaard Bottcher,
Claus Dethlefsen [email protected].
data(rats) rats.nwf <- networkfamily(rats) rats.nwf2<- unique(getnetwork(rats.nwf),equi=TRUE)
data(rats) rats.nwf <- networkfamily(rats) rats.nwf2<- unique(getnetwork(rats.nwf),equi=TRUE)