Package 'bnpa'

Title: Bayesian Networks & Path Analysis
Description: This project aims to enable the method of Path Analysis to infer causalities from data. For this we propose a hybrid approach, which uses Bayesian network structure learning algorithms from data to create the input file for creation of a PA model. The process is performed in a semi-automatic way by our intermediate algorithm, allowing novice researchers to create and evaluate their own PA models from a data set. The references used for this project are: Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press. <doi:10.1017/S0269888910000275>. Nagarajan, R., Scutari, M., & Lèbre, S. (2013). Bayesian networks in r. Springer, 122, 125-127. Scutari, M., & Denis, J. B. <doi:10.1007/978-1-4614-6446-4>. Scutari M (2010). Bayesian networks: with examples in R. Chapman and Hall/CRC. <doi:10.1201/b17065>. Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1 - 36. <doi:10.18637/jss.v048.i02>.
Authors: Elias Carvalho, Joao R N Vissoci, Luciano Andrade, Wagner Machado, Emerson P Cabrera, Julio C Nievola
Maintainer: Elias Carvalho <[email protected]>
License: GPL-3
Version: 0.3.0
Built: 2024-12-04 07:30:19 UTC
Source: CRAN

Help Index


Executes a bootstrap during the learning of a BN structure

Description

This function receives a list of parameters, executes the bootstrap process and learn the Bayesian Network (BN) from the data set, then executes the process of model averaging to extract the final BN structure and print it.

Usage

boot.strap.bn(bn.algorithm, bn.score.test, data.to.work, black.list,
  white.list, nreplicates = 1000, type.of.algorithm, outcome.var)

Arguments

bn.algorithm

is a list of algorithms to learn the BN structure.

bn.score.test

is list of conditional independence tests and the network scores to be used.

data.to.work

is a data from which the BN structure will be learned.

black.list

is a list of forbiden connections of BN structure to be created.

white.list

is a list of mandatory connections of BN structure to be created.

nreplicates

is the number of replications to be done in the bootstrap process.

type.of.algorithm

is the type of algorithm to learn the BN sctructure, it would be constrained or score based.

outcome.var

is the variable to be used as outcome (dependent) and be highlighted in the BN.

Value

The final BN structure learned.

Author(s)

Elias Carvalho

References

Claeskens N, Hjort N (2009) Model selection and model avaraging. Cambridge University Press, Cambridge, England.

Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge.

Scutari M (2017). Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package. Journal of Statistical Software, 77(2), 1-20.

Examples

## Not run: 
# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQualiN)
# Start the cluster
cl <- bnpa::create.cluster()
# Set the number of replications
nreplicates=1000
# Set the algorithm to be used
bn.algorithm="hc"
# Executes a parallel bootstrap process
data.bn.boot.strap=bnlearn::boot.strength(data = dataQualiN, R = nreplicates, algorithm =
bn.algorithm, cluster=cl, algorithm.args=list(score="bic"), cpdag = FALSE)
# Release the cluster
parallel::stopCluster(cl)
head(data.bn.boot.strap)

## End(Not run)

Verifies the BN learning algorithms

Description

This function receives a list of algorithms of bnlearn package and check if it would be used in bnpa package.

Usage

check.algorithms(bn.learn.algorithms)

Arguments

bn.learn.algorithms

is a list of algorithms (present in bnlearn package) to be used in BN structure learning pocess in bnpa.

Author(s)

Elias Carvalho

References

Scutari M (2017). Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package. Journal of Statistical Software, 77(2), 1-20.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("~/your working directory")
# Load packages
library(bnpa)
# Set what BN learning algorithms will be used
bn.learn.algorithms <- c("gs", "hc")
# Check these algorithms
check.algorithms(bn.learn.algorithms)

Verify if one specific variable of a data set is dichotomic

Description

This function receives a data set and the name of a specific variable and verify if it is dichotomic or not. If 'yes' then the function return TRUE.

Usage

check.dichotomic.one.var(data.to.work, variable.name)

Arguments

data.to.work

is a data set containing the variables to be checked.

variable.name

is the name of a variable to be checked.

Value

TRUE or FALSE

Author(s)

Elias Carvalho

References

HAYES, A F; PREACHER, K J. Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, v. 67, n. 3, p. 451-470, 2014.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQuantC)
head(dataQuantC)
# Show the structure of data set
str(dataQuantC)
# Set variable name
variable.name = "A"
# data set has not dichotomic variables and function will return FALSE
check.dichotomic.one.var(dataQuantC, variable.name)
# Adding dichotomic data to dataQuantC
dataQuantC$Z <- round(runif(500, min=0, max=1),0)
# Show the new structure of data set
str(dataQuantC)
# Set variable name
variable.name = "Z"
# Now data set has dichotomic variables and function will return TRUE
check.dichotomic.one.var(dataQuantC, variable.name)

Check the levels of a categorical variable

Description

This function receives a data set and a variable name, check the type of variable to be sure it is categorical (factor) and then count the number of levels it has.

Usage

check.levels.one.variable(data.to.work, variable.name)

Arguments

data.to.work

is a data set with variable.

variable.name

is the name of variable to be checked.

Author(s)

Elias Carvalho

References

GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQualiN)
head(dataQualiN)
# Adding random data to dataQualiN, function will return TRUE
dataQualiN$Z <- round(runif(500, min=0, max=1000),2)
# Converting the numeric variable into factor
dataQualiN$Z <- factor(dataQualiN$Z)
# Set the variable name to a non categorical one
variable.name = "Z"
# Count the number o levels of a specific variable
number.of.levels <- check.levels.one.variable(dataQualiN, variable.name)
number.of.levels
# Set the variable name to a categorical variable
variable.name = "A"
# Count the number o levels of a specific variable
number.of.levels <- check.levels.one.variable(dataQualiN, variable.name)
number.of.levels

Verify variables with NA

Description

This function receives a data set and calculates the number of NAs to each variable, then calculates the percentual of existing NAs and inform the variables, number/percent of NAs.

Usage

check.na(data.to.work)

Arguments

data.to.work

is a data set containing the variables to check NAs.

Value

the number and percent of NAs.

Author(s)

Elias Carvalho

References

LITTLE, R J A; RUBIN, D B. Statistical analysis with missing data. John Wiley & Sons, 2014.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQuantC)
head(dataQuantC)
# Adding NAs to dataQuantC # credits for the random NA code for: https://goo.gl/Xj6caY
dataQuantC <- as.data.frame(lapply(dataQuantC, function(cc) cc[ sample(c(TRUE, NA),
                             prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ]))
# Checking the Nas
check.na(dataQuantC)

Verify if one specific variable of a data set is an ordered factor

Description

Receives a data set, the name of a specific variable and verify if it is an ordered factor or not. If 'yes' then the function return TRUE.

Usage

check.ordered.one.var(data.to.work, var.name)

Arguments

data.to.work

is a data set containing the variables to be checked.

var.name

is the name of variable to be checked.

Value

TRUE or FALSE

Author(s)

Elias Carvalho

References

HAYES, A F; PREACHER, K J. Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, v. 67, n. 3, p. 451-470, 2014.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQualiN)
head(dataQualiN)
# Transform variable A into ordered factor
dataQualiN$A <- ordered(dataQualiN$A)
# Check variable A and return TRUE
var.name <- "A"
check.ordered.one.var(dataQualiN, var.name)
# Check variable B and return FALSE
var.name <- "B"
check.ordered.one.var(dataQualiN, var.name)

Verifies if there are ordered factor variables to be declared in the pa model building process

Description

Receives a BN structure and a data set, then verifies if there are ordered variables. In a positive case return TRUE.

Usage

check.ordered.to.pa(bn.structure, data.to.work)

Arguments

bn.structure

is a BN structure learned from data used to identify if the variable is endogenous or exogenous when building the PA model.

data.to.work

is a data set containing the variables of the BN.

Value

a data frame with ordered variables.

Author(s)

Elias Carvalho

References

HAYES, A F; PREACHER, K J. Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, v. 67, n. 3, p. 451-470, 2014.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("~/your working directory")
# Load packages
library(bnpa)
# Load the dataset
data(dataQualiN) # Pre-Loaded
# Build the BN structure
bn.structure<-bnlearn::hc(dataQualiN)
# Show the BN structure learned
bnlearn::graphviz.plot(bn.structure)
# Tranforms variables A and B in ordered factor
dataQualiN$A <- as.ordered(dataQualiN$A)
dataQualiN$B <- as.ordered(dataQualiN$B)
# Generates a list with variables to be ordered and exogenous variables
cat.var.to.use.in.pa <- bnpa::check.ordered.to.pa(bn.structure, dataQualiN)
# Show the variables
cat.var.to.use.in.pa

Indentifies and gives an option to remove outliers

Description

This function receives a data set, scan all variables e for each one, verifies if there are outliers and ask if we wish to remove them. We can pass a parameter where we set if the function remove it automatically or will ask before.

Usage

check.outliers(data.to.work, ask.before)

Arguments

data.to.work

is a data set with variables to be checked.

ask.before

control if the process will ask for confirmation or not.

Author(s)

Elias Carvalho

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Load the data set
data(dataQuantC) # Pre-Loaded
# Set a variable to ask before remove outlier or not
ask.before = "Y" # or ask.before = "N"
# Call the procedure to check if there are outliers
dataQuantC <- check.outliers(dataQuantC, ask.before)

Verify the type of one variable

Description

Receives a specific variable and return a number indicating its type. The variables can be 1 is integer, 2 is numeric, 3 is factor, 8 is character.

Usage

check.type.one.var(data.to.work, show.message = 0, variable.name)

Arguments

data.to.work

is a data set containing the variables to be verified.

show.message

is a parameter indicating if the function will or not show a message.

variable.name

is the name of variable to be checked.

Value

A variable with the code indicating the type of variable and a message (or not).

Author(s)

Elias Carvalho

References

GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQuantC)
head(dataQuantC)
# Adding random data to dataQuantC, function will return TRUE
dataQuantC$Z <- round(runif(500, min=0, max=1000),2)
# Converting the numeric variable into factor
dataQuantC$Z <- factor(dataQuantC$Z)
# Check and return a numeric value correspondig to the variable type
# Set the variable name
variable.name = "A"
# identify the type
check.type.one.var(dataQuantC, show.message=0, variable.name)
# Set the variable name
variable.name = "Z"
# identify the type
check.type.one.var(dataQuantC, show.message=0, variable.name)

Verify types of variable

Description

This function receives a data set as parameter and check each type of variable returning a number indicating the type of variables in the whole data set. The variables can be 1=integer, 2=numeric, 3=factor, 4=integer and numeric, 5=integer and factor, 6=numeric and factor, 7=integer, numeric and factor, 8=character.

Usage

check.types(data.to.work, show.message = 0)

Arguments

data.to.work

is a data set containing the variables to be verified.

show.message

is a parameter indicating if the function will or not show a message.

Value

A variable with the code indicating the type of variable and a message (or not)

Author(s)

Elias Carvalho

References

GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQuantC)
# Show first lines of data set
head(dataQuantC)
# Check and return a numeric value
show.message <- 1
bnpa::check.types(dataQuantC, show.message)
# Adding random data to dataQuantC, function will return TRUE
dataQuantC$Z <- round(runif(500, min=0, max=1000),2)
# Converting the numeric variable into factor
dataQuantC$Z <- factor(dataQuantC$Z)
# Check and return a numeric value correspondig to: 1=integer, 2=numeric, 3=factor, 4=integer and
# numeric, 5=integer and  factor, 6=numeric and factor or 7=integer, numeric and factor.
show.message <- 1
bnpa::check.types(dataQuantC, show.message)
# Supressing the message
show.message <- 0
bnpa::check.types(dataQuantC, show.message)

Check if the variables need to be ordered

Description

This function receives a data set and check the level of each factor variable, if they have more than 2 levels the function recommend to check the need to transform it to ordered factor.

Usage

check.variables.to.be.ordered(data.to.work)

Arguments

data.to.work

is a data set with variables to check.

Value

TRUE or FALSE if need or not to tranform the variable into ordered factor.

Author(s)

Elias Carvalho

References

HAYES, A F; PREACHER, K J. Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, v. 67, n. 3, p. 451-470, 2014.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQualiN)
# Show first lines of data set
head(dataQualiN)
# Insert categorical variables with more than 2 levels
dataQualiN$test.variable[dataQualiN$A == "yes"] <- "low"
dataQualiN$test.variable[dataQualiN$B == "yes"] <- "medium"
dataQualiN$test.variable[dataQualiN$X == "yes"] <- "high"
# Transform it to factor variable
dataQualiN$test.variable <- as.factor(dataQualiN$test.variable)
# Check the necessity to transform in ordered variables
bnpa::check.variables.to.be.ordered(dataQualiN)

Converts the position of any element of confusion matrix to VP, FP, FN, VN

Description

This function receives a confusion matrix and the matrix values to keep the order VP, FP, FN, VN.

Usage

convert.confusion.matrix(confusion.matrix, cm.position)

Arguments

confusion.matrix

is the confusion matrix to be converted.

cm.position

is the position of your VP, FP, FN, VN at the confusion matrix.

Value

a new confusion matrix

Author(s)

Elias Carvalho

References

STORY, Michael; CONGALTON, Russell G. Accuracy assessment: a user’s perspective. Photogrammetric Engineering and remote sensing, v. 52, n. 3, p. 397-399, 1986.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Creates a confusion matrix
confusion.matrix <-matrix(c(12395, 4, 377, 1), nrow=2, ncol=2, byrow=TRUE)
# Creates a vector with the position of VP, FP, FN, VN
cm.position <- c(4,3,2,1)
# Shows the original confusion matrix
confusion.matrix
# Converts the confusion matrix
confusion.matrix <- convert.confusion.matrix(confusion.matrix, cm.position)
# Shows the converted confusion matrix
confusion.matrix

Create a Parallel Socket Cluster

Description

This function counts the number of cores of your computer processor and mount a parallel socket cluster. It always creates N-1 node to the cluster to let 1 core to the other tasks.

Usage

create.cluster()

Value

an object of class "cluster"

Author(s)

Elias Carvalho

References

R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Examples

## Not run: 
## Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQualiN)
# Start the cluster
cl <- bnpa::create.cluster()
# Set the number of replications
R=1000
# Set the algorithm to be used
algorithm="hc"
# Executes a parallel bootstrap process
data.bn.boot.strap=boot.strength(data=dataQualiN,R,algorithm,cluster=cl,
                                algorithm.args=list(score="bic"),cpdag = FALSE)
# Release the cluster
parallel::stopCluster(cl)

## End(Not run)

Creates dummy variables in the data set and remove master variables

Description

This function receives a data set and the name of variables to be transformed into dummies. Then it create the dummy variables, transform it into numeric (to work with PA generation) and remove the master variables that originates the dummies.

Usage

create.dummies(data.to.work, dummy.vars)

Arguments

data.to.work

is a data set containing the variables to tranform.

dummy.vars

are the variables to be transformed.

Value

the new data set

Author(s)

Elias Carvalho

References

Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2),1-36.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Use working data sets from package
data(dataQualiN)
# Show the structure before
str(dataQualiN)
# Set possible dummy variables
dummy.vars <- c("A", "B")
# Create dummies
dataQualiN <- bnpa::create.dummies(dataQualiN, dummy.vars)
# Show the structure before
str(dataQualiN)

A qualitative data set to test functions

Description

This is data set with qualitative nominal variables containing 500 obs extracted from ASIA Bayesian Networks of bnlearn package repository.

Usage

dataQualiN

Format

A data frame with 500 rows and 7 variables:

A

a categorical variale

S

a categorical variale

T

a categorical variale

L

a categorical variale

B

a categorical variale

E

a categorical variale

X

a categorical variale

D

a categorical variale

...

Source

http://www.bnlearn.com/bnrepository//


A quantiative data set to test functions

Description

This is data set with quantiative continuous variables containing 500 obs extracted from ASIA Bayesian Networks of bnlearn package repository.

Usage

dataQuantC

Format

A data frame with 500 rows and 7 variables:

A

a numeric variale

B

a numeric variale

C

a numeric variale

D

a numeric variale

E

a numeric variale

F

a numeric variale

G

a numeric variale

...

Source

http://www.bnlearn.com/bnrepository//


Learn the Bayesian Network structure from data and build a PA model

Description

This function receives a data set, a list of parameters to learn the BN structure based on this data set. Then with the BN ready it will build a PA model if required. The process will then save the graphs of BN and PA and PA parameters.

Usage

gera.bn.structure(data.to.work, white.list = "", black.list = "",
  nreplicates = 1000, cb.algorithms = c("gs", "iamb", "fast.iamb",
  "inter.iamb", "mmpc", "si.hiton.pc"), sb.algorithms = c("hc", "tabu"),
  cb.tests = "", sb.tests = "", optimized.option = "FALSE",
  outcome.var, build.pa)

Arguments

data.to.work

is a data from which the BN structure will be learned.

white.list

is a list of mandatory connections of BN structure to be created.

black.list

is a list of forbiden connections of BN structure to be created.

nreplicates

is how many times the boostrap will run.

cb.algorithms

the name of constrained-based algorithms.

sb.algorithms

the name of score-based algorithms.

cb.tests

the name of tests for constrained-based algorithms.

sb.tests

the name of network scores for score-based algorithms.

optimized.option

a paremeter of bnlearn package to optmize the BN learn structre learning.

outcome.var

is the outcome (dependent) variable.

build.pa

indicates if the process will bulld a PA model or not.

Author(s)

Elias Carvalho

References

Scutari M (2017). Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package. Journal of Statistical Software, 77(2), 1-20.

Examples

## Not run: 
# Clean environment
closeAllConnections()
rm(list=ls())
# Set environment
# setwd("To your working directory")
# Load packages
library(bnpa)
# Load Data
data(dataQualiN)
# Set variables to work
nreplicates = 1000
white.list <- NULL
black.list <- "L-T"
cb.algorithms = c("gs")
sb.algorithms = c("hc")
cb.tests = "jt"
sb.tests = "aic"
optimized.option="FALSE"
outcome.var = "E"
build.pa = 0
# Learn the BN from data and save results (data & images)
gera.bn.structure(dataQualiN, white.list, black.list, nreplicates, cb.algorithms,sb.algorithms,
                 cb.tests, sb.tests, optimized.option, outcome.var, build.pa)

## End(Not run)

Generates a PA model

Description

This function receives a BN structure learned, the data set and some parameters and build a PA input model string. Then run the PA model using Structural Equation Model functions and export a PA graph and a PA model summary information.

Usage

gera.pa(bn.structure, data.to.work, pa.name, pa.imgname, bn.algorithm,
  bn.score.test, outcome.var)

Arguments

bn.structure

is a BN structure learned from data.

data.to.work

is a data frame containing the variables of the BN.

pa.name

is a variable to store the name of file to save PA parameters.

pa.imgname

is a variable to store the name of file to save PA graph.

bn.algorithm

is a list of algorithms to learn the BN structure.

bn.score.test

is a list of tests to be used during BN structure learning.

outcome.var

is the outcome variable.

Author(s)

Elias Carvalho

References

Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2),1-36.

Examples

## Not run: 
# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("To your working directory")
# Load packages
library(bnpa)
# Load data sets from package
data(dataQualiN)
# Show first lines
head(dataQualiN)
# Learn BN structure
bn.structure <- bnlearn::hc(dataQualiN)
bnlearn::graphviz.plot(bn.structure)
# Set variables
pa.name<-"docPAHC"
pa.imgname<-"imgPAHC"
bn.algorithm<-"hc"
bn.score.test<-"aic-g"
outcome.var<-"D"
# Generates the PA model from bn structure
gera.pa(bn.structure, dataQualiN, pa.name, pa.imgname, bn.algorithm, bn.score.test, outcome.var)

## End(Not run)

Generates PA input model

Description

This function is called from 'gera.pa' function. It receives a BN structure and a data set, build a PA input model string based on BN structure and return it.

Usage

gera.pa.model(bn.structure, data.to.work)

Arguments

bn.structure

is a BN structure learned from data.

data.to.work

is a data set containing the variables of the BN.

Value

the PA input modeo string

Author(s)

Elias Carvalho

References

Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2),1-36.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("To your working directory")
# Load packages
library(bnpa)
library(bnlearn)
# load data sets from package
data(dataQualiN)
# Show first lines
head(dataQualiN)
# Learn BN structure
bn.structure <- hc(dataQualiN)
bnlearn::graphviz.plot(bn.structure)
# Set variables
# Generates the PA model from bn structure
pa.model <- gera.pa.model(bn.structure, dataQualiN)
pa.model

Mounts a white or black list

Description

This function receives a simple list with one or more couple of variables and mount a new data frame in "bnlearn" syntax. The final result must return an object similar to the result of bnlearn command "data.frame(from = c('B', 'F'), to = c('F', 'B'))" that is more complex syntax.

Usage

mount.wl.bl.list(black_or_white_list)

Arguments

black_or_white_list

is a list of couple of variables.

Value

A new data frame with the 'from' and 'to' variables

Author(s)

Elias Carvalho

References

Scutari M (2017). Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package. Journal of Statistical Software, 77(2), 1-20.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("To your working directory")
# Load packages
library(bnpa)
library(bnlearn)
# Load data sets from package
data(dataQuantC)
# Show the first lines of data
head(dataQuantC)
# Learn the BN structure without black and white list
bn.structure <- hc(dataQuantC)
# Split graph panel in 2 columns
par(mfrow=c(1,2))
# Show the BN structure
bnlearn::graphviz.plot(bn.structure)
# Mounting the black list
black.list <- ("A-C,D-F")
black.list <- mount.wl.bl.list(black.list)
black.list
white.list <- ("A-B,D-G")
white.list <- mount.wl.bl.list(white.list)
white.list
# Learn the BN structure with black and white list
bn.structure <- hc(dataQuantC, whitelist = white.list, blacklist = black.list)
# Show the BN structure
bnlearn::graphviz.plot(bn.structure)

Builds a black list of predictor and/or outcome variable

Description

This function receives a data set, an outcome/predictor variable, the type of variable and a black list. If this variable is classfied as type outcome the function builds a black list from it to all other variable. If it is classified as type predictor builds a list from all other variables to it. You can pass a previously black list and then this function will append a new list in the end of it.

Usage

outcome.predictor.var(data.to.work, var.name, type.var, black.list)

Arguments

data.to.work

is a data set containing the variables to build a list.

var.name

is the outcome/predictor variable name.

type.var

is a type of variable: <o>utcome or <p>redictor.

black.list

is a previous black list, it would be empty or loaded.

Value

a black list with from - to variables

Author(s)

Elias Carvalho

References

KATZ, M H. Multivariable analysis: a primer for readers of medical research. Annals of internal medicine, v. 138, n. 8, p. 644-650, 2003.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("To your working directory")
# Load packages
library(bnpa)
library(bnlearn)
# Load data sets from package
data(dataQuantC)
# Show first lines
head(dataQuantC)
# Create an empty list or fill it before start
black.list <- ""
# Setting the type of var as typical "outcome" what means it will not point to any var
type.var <- "o"
# Setting variable "A" as "outcome" will create a black from this variable to all others
var.name <- "A"
# Creating the black list
black.list <- outcome.predictor.var(dataQuantC, var.name, type.var, black.list)
black.list
# Setting the type of var as typical "predictor" it will not be pointed from any other var
type.var <- "p"
# Setting variable "D" as "predictor" will create a blacklist from all others to it
var.name <- "D"
# Creating the black list
black.list <- outcome.predictor.var(dataQuantC, var.name, type.var, black.list)
black.list

Extract information of outliers

Description

This function receives a data set, the variable content and name, analyzes their content and extract outliers information, showing a boxplot and a histogram.

Usage

preprocess.outliers(data.to.work, variable.content, variable.name)

Arguments

data.to.work

is a data frame containing the variables.

variable.content

is a variable with all content of variable in the data set.

variable.name

is the name of variable to be verified.

Value

a list with number of outliers and the variable content

Author(s)

Elias Carvalho

References

GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
# Load data sets from package
data(dataQuantC)
# Set parameters to function
variable.content <- dataQuantC$A
variable.name <- "A"
# Preprocess information
preprocess.information <- preprocess.outliers(dataQuantC, variable.content, variable.name)
num.outliers <- preprocess.information[[1]]
variable.content <- preprocess.information[[2]]
mean.of.outliers <- preprocess.information[[3]]

Transform categorical variables into ordinal

Description

This function receives a data set with categorical variables, scan all variables and transform it into odered factors.

Usage

transf.into.ordinal(data.to.work)

Arguments

data.to.work

is a data set where all variables will be transformed into odered factors.

Value

The data set transformed

Author(s)

Elias Carvalho

References

GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.

Examples

# Clean environment
closeAllConnections()
rm(list=ls())
# Set enviroment
# setwd("to your working directory")
# Load packages
library(bnpa)
#Load Data
data(dataQualiN)
# Transform all variables into ordinal
dataQualiN <- bnpa::transf.into.ordinal(dataQualiN)
str(dataQualiN)