Title: | Bayesian Networks & Path Analysis |
---|---|
Description: | This project aims to enable the method of Path Analysis to infer causalities from data. For this we propose a hybrid approach, which uses Bayesian network structure learning algorithms from data to create the input file for creation of a PA model. The process is performed in a semi-automatic way by our intermediate algorithm, allowing novice researchers to create and evaluate their own PA models from a data set. The references used for this project are: Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press. <doi:10.1017/S0269888910000275>. Nagarajan, R., Scutari, M., & Lèbre, S. (2013). Bayesian networks in r. Springer, 122, 125-127. Scutari, M., & Denis, J. B. <doi:10.1007/978-1-4614-6446-4>. Scutari M (2010). Bayesian networks: with examples in R. Chapman and Hall/CRC. <doi:10.1201/b17065>. Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1 - 36. <doi:10.18637/jss.v048.i02>. |
Authors: | Elias Carvalho, Joao R N Vissoci, Luciano Andrade, Wagner Machado, Emerson P Cabrera, Julio C Nievola |
Maintainer: | Elias Carvalho <[email protected]> |
License: | GPL-3 |
Version: | 0.3.0 |
Built: | 2024-11-04 06:48:52 UTC |
Source: | CRAN |
This function receives a list of parameters, executes the bootstrap process and learn the Bayesian Network (BN) from the data set, then executes the process of model averaging to extract the final BN structure and print it.
boot.strap.bn(bn.algorithm, bn.score.test, data.to.work, black.list, white.list, nreplicates = 1000, type.of.algorithm, outcome.var)
boot.strap.bn(bn.algorithm, bn.score.test, data.to.work, black.list, white.list, nreplicates = 1000, type.of.algorithm, outcome.var)
bn.algorithm |
is a list of algorithms to learn the BN structure. |
bn.score.test |
is list of conditional independence tests and the network scores to be used. |
data.to.work |
is a data from which the BN structure will be learned. |
black.list |
is a list of forbiden connections of BN structure to be created. |
white.list |
is a list of mandatory connections of BN structure to be created. |
nreplicates |
is the number of replications to be done in the bootstrap process. |
type.of.algorithm |
is the type of algorithm to learn the BN sctructure, it would be constrained or score based. |
outcome.var |
is the variable to be used as outcome (dependent) and be highlighted in the BN. |
The final BN structure learned.
Elias Carvalho
Claeskens N, Hjort N (2009) Model selection and model avaraging. Cambridge University Press, Cambridge, England.
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge.
Scutari M (2017). Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package. Journal of Statistical Software, 77(2), 1-20.
## Not run: # Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) # Start the cluster cl <- bnpa::create.cluster() # Set the number of replications nreplicates=1000 # Set the algorithm to be used bn.algorithm="hc" # Executes a parallel bootstrap process data.bn.boot.strap=bnlearn::boot.strength(data = dataQualiN, R = nreplicates, algorithm = bn.algorithm, cluster=cl, algorithm.args=list(score="bic"), cpdag = FALSE) # Release the cluster parallel::stopCluster(cl) head(data.bn.boot.strap) ## End(Not run)
## Not run: # Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) # Start the cluster cl <- bnpa::create.cluster() # Set the number of replications nreplicates=1000 # Set the algorithm to be used bn.algorithm="hc" # Executes a parallel bootstrap process data.bn.boot.strap=bnlearn::boot.strength(data = dataQualiN, R = nreplicates, algorithm = bn.algorithm, cluster=cl, algorithm.args=list(score="bic"), cpdag = FALSE) # Release the cluster parallel::stopCluster(cl) head(data.bn.boot.strap) ## End(Not run)
This function receives a list of algorithms of bnlearn package and check if it would be used in bnpa package.
check.algorithms(bn.learn.algorithms)
check.algorithms(bn.learn.algorithms)
bn.learn.algorithms |
is a list of algorithms (present in bnlearn package) to be used in BN structure learning pocess in bnpa. |
Elias Carvalho
Scutari M (2017). Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package. Journal of Statistical Software, 77(2), 1-20.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("~/your working directory") # Load packages library(bnpa) # Set what BN learning algorithms will be used bn.learn.algorithms <- c("gs", "hc") # Check these algorithms check.algorithms(bn.learn.algorithms)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("~/your working directory") # Load packages library(bnpa) # Set what BN learning algorithms will be used bn.learn.algorithms <- c("gs", "hc") # Check these algorithms check.algorithms(bn.learn.algorithms)
This function receives a data set and the name of a specific variable and verify if it is dichotomic or not. If 'yes' then the function return TRUE.
check.dichotomic.one.var(data.to.work, variable.name)
check.dichotomic.one.var(data.to.work, variable.name)
data.to.work |
is a data set containing the variables to be checked. |
variable.name |
is the name of a variable to be checked. |
TRUE or FALSE
Elias Carvalho
HAYES, A F; PREACHER, K J. Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, v. 67, n. 3, p. 451-470, 2014.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQuantC) head(dataQuantC) # Show the structure of data set str(dataQuantC) # Set variable name variable.name = "A" # data set has not dichotomic variables and function will return FALSE check.dichotomic.one.var(dataQuantC, variable.name) # Adding dichotomic data to dataQuantC dataQuantC$Z <- round(runif(500, min=0, max=1),0) # Show the new structure of data set str(dataQuantC) # Set variable name variable.name = "Z" # Now data set has dichotomic variables and function will return TRUE check.dichotomic.one.var(dataQuantC, variable.name)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQuantC) head(dataQuantC) # Show the structure of data set str(dataQuantC) # Set variable name variable.name = "A" # data set has not dichotomic variables and function will return FALSE check.dichotomic.one.var(dataQuantC, variable.name) # Adding dichotomic data to dataQuantC dataQuantC$Z <- round(runif(500, min=0, max=1),0) # Show the new structure of data set str(dataQuantC) # Set variable name variable.name = "Z" # Now data set has dichotomic variables and function will return TRUE check.dichotomic.one.var(dataQuantC, variable.name)
This function receives a data set and a variable name, check the type of variable to be sure it is categorical (factor) and then count the number of levels it has.
check.levels.one.variable(data.to.work, variable.name)
check.levels.one.variable(data.to.work, variable.name)
data.to.work |
is a data set with variable. |
variable.name |
is the name of variable to be checked. |
Elias Carvalho
GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) head(dataQualiN) # Adding random data to dataQualiN, function will return TRUE dataQualiN$Z <- round(runif(500, min=0, max=1000),2) # Converting the numeric variable into factor dataQualiN$Z <- factor(dataQualiN$Z) # Set the variable name to a non categorical one variable.name = "Z" # Count the number o levels of a specific variable number.of.levels <- check.levels.one.variable(dataQualiN, variable.name) number.of.levels # Set the variable name to a categorical variable variable.name = "A" # Count the number o levels of a specific variable number.of.levels <- check.levels.one.variable(dataQualiN, variable.name) number.of.levels
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) head(dataQualiN) # Adding random data to dataQualiN, function will return TRUE dataQualiN$Z <- round(runif(500, min=0, max=1000),2) # Converting the numeric variable into factor dataQualiN$Z <- factor(dataQualiN$Z) # Set the variable name to a non categorical one variable.name = "Z" # Count the number o levels of a specific variable number.of.levels <- check.levels.one.variable(dataQualiN, variable.name) number.of.levels # Set the variable name to a categorical variable variable.name = "A" # Count the number o levels of a specific variable number.of.levels <- check.levels.one.variable(dataQualiN, variable.name) number.of.levels
This function receives a data set and calculates the number of NAs to each variable, then calculates the percentual of existing NAs and inform the variables, number/percent of NAs.
check.na(data.to.work)
check.na(data.to.work)
data.to.work |
is a data set containing the variables to check NAs. |
the number and percent of NAs.
Elias Carvalho
LITTLE, R J A; RUBIN, D B. Statistical analysis with missing data. John Wiley & Sons, 2014.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQuantC) head(dataQuantC) # Adding NAs to dataQuantC # credits for the random NA code for: https://goo.gl/Xj6caY dataQuantC <- as.data.frame(lapply(dataQuantC, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ])) # Checking the Nas check.na(dataQuantC)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQuantC) head(dataQuantC) # Adding NAs to dataQuantC # credits for the random NA code for: https://goo.gl/Xj6caY dataQuantC <- as.data.frame(lapply(dataQuantC, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ])) # Checking the Nas check.na(dataQuantC)
Receives a data set, the name of a specific variable and verify if it is an ordered factor or not. If 'yes' then the function return TRUE.
check.ordered.one.var(data.to.work, var.name)
check.ordered.one.var(data.to.work, var.name)
data.to.work |
is a data set containing the variables to be checked. |
var.name |
is the name of variable to be checked. |
TRUE or FALSE
Elias Carvalho
HAYES, A F; PREACHER, K J. Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, v. 67, n. 3, p. 451-470, 2014.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) head(dataQualiN) # Transform variable A into ordered factor dataQualiN$A <- ordered(dataQualiN$A) # Check variable A and return TRUE var.name <- "A" check.ordered.one.var(dataQualiN, var.name) # Check variable B and return FALSE var.name <- "B" check.ordered.one.var(dataQualiN, var.name)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) head(dataQualiN) # Transform variable A into ordered factor dataQualiN$A <- ordered(dataQualiN$A) # Check variable A and return TRUE var.name <- "A" check.ordered.one.var(dataQualiN, var.name) # Check variable B and return FALSE var.name <- "B" check.ordered.one.var(dataQualiN, var.name)
Receives a BN structure and a data set, then verifies if there are ordered variables. In a positive case return TRUE.
check.ordered.to.pa(bn.structure, data.to.work)
check.ordered.to.pa(bn.structure, data.to.work)
bn.structure |
is a BN structure learned from data used to identify if the variable is endogenous or exogenous when building the PA model. |
data.to.work |
is a data set containing the variables of the BN. |
a data frame with ordered variables.
Elias Carvalho
HAYES, A F; PREACHER, K J. Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, v. 67, n. 3, p. 451-470, 2014.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("~/your working directory") # Load packages library(bnpa) # Load the dataset data(dataQualiN) # Pre-Loaded # Build the BN structure bn.structure<-bnlearn::hc(dataQualiN) # Show the BN structure learned bnlearn::graphviz.plot(bn.structure) # Tranforms variables A and B in ordered factor dataQualiN$A <- as.ordered(dataQualiN$A) dataQualiN$B <- as.ordered(dataQualiN$B) # Generates a list with variables to be ordered and exogenous variables cat.var.to.use.in.pa <- bnpa::check.ordered.to.pa(bn.structure, dataQualiN) # Show the variables cat.var.to.use.in.pa
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("~/your working directory") # Load packages library(bnpa) # Load the dataset data(dataQualiN) # Pre-Loaded # Build the BN structure bn.structure<-bnlearn::hc(dataQualiN) # Show the BN structure learned bnlearn::graphviz.plot(bn.structure) # Tranforms variables A and B in ordered factor dataQualiN$A <- as.ordered(dataQualiN$A) dataQualiN$B <- as.ordered(dataQualiN$B) # Generates a list with variables to be ordered and exogenous variables cat.var.to.use.in.pa <- bnpa::check.ordered.to.pa(bn.structure, dataQualiN) # Show the variables cat.var.to.use.in.pa
This function receives a data set, scan all variables e for each one, verifies if there are outliers and ask if we wish to remove them. We can pass a parameter where we set if the function remove it automatically or will ask before.
check.outliers(data.to.work, ask.before)
check.outliers(data.to.work, ask.before)
data.to.work |
is a data set with variables to be checked. |
ask.before |
control if the process will ask for confirmation or not. |
Elias Carvalho
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Load the data set data(dataQuantC) # Pre-Loaded # Set a variable to ask before remove outlier or not ask.before = "Y" # or ask.before = "N" # Call the procedure to check if there are outliers dataQuantC <- check.outliers(dataQuantC, ask.before)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Load the data set data(dataQuantC) # Pre-Loaded # Set a variable to ask before remove outlier or not ask.before = "Y" # or ask.before = "N" # Call the procedure to check if there are outliers dataQuantC <- check.outliers(dataQuantC, ask.before)
Receives a specific variable and return a number indicating its type. The variables can be 1 is integer, 2 is numeric, 3 is factor, 8 is character.
check.type.one.var(data.to.work, show.message = 0, variable.name)
check.type.one.var(data.to.work, show.message = 0, variable.name)
data.to.work |
is a data set containing the variables to be verified. |
show.message |
is a parameter indicating if the function will or not show a message. |
variable.name |
is the name of variable to be checked. |
A variable with the code indicating the type of variable and a message (or not).
Elias Carvalho
GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQuantC) head(dataQuantC) # Adding random data to dataQuantC, function will return TRUE dataQuantC$Z <- round(runif(500, min=0, max=1000),2) # Converting the numeric variable into factor dataQuantC$Z <- factor(dataQuantC$Z) # Check and return a numeric value correspondig to the variable type # Set the variable name variable.name = "A" # identify the type check.type.one.var(dataQuantC, show.message=0, variable.name) # Set the variable name variable.name = "Z" # identify the type check.type.one.var(dataQuantC, show.message=0, variable.name)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQuantC) head(dataQuantC) # Adding random data to dataQuantC, function will return TRUE dataQuantC$Z <- round(runif(500, min=0, max=1000),2) # Converting the numeric variable into factor dataQuantC$Z <- factor(dataQuantC$Z) # Check and return a numeric value correspondig to the variable type # Set the variable name variable.name = "A" # identify the type check.type.one.var(dataQuantC, show.message=0, variable.name) # Set the variable name variable.name = "Z" # identify the type check.type.one.var(dataQuantC, show.message=0, variable.name)
This function receives a data set as parameter and check each type of variable returning a number indicating the type of variables in the whole data set. The variables can be 1=integer, 2=numeric, 3=factor, 4=integer and numeric, 5=integer and factor, 6=numeric and factor, 7=integer, numeric and factor, 8=character.
check.types(data.to.work, show.message = 0)
check.types(data.to.work, show.message = 0)
data.to.work |
is a data set containing the variables to be verified. |
show.message |
is a parameter indicating if the function will or not show a message. |
A variable with the code indicating the type of variable and a message (or not)
Elias Carvalho
GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQuantC) # Show first lines of data set head(dataQuantC) # Check and return a numeric value show.message <- 1 bnpa::check.types(dataQuantC, show.message) # Adding random data to dataQuantC, function will return TRUE dataQuantC$Z <- round(runif(500, min=0, max=1000),2) # Converting the numeric variable into factor dataQuantC$Z <- factor(dataQuantC$Z) # Check and return a numeric value correspondig to: 1=integer, 2=numeric, 3=factor, 4=integer and # numeric, 5=integer and factor, 6=numeric and factor or 7=integer, numeric and factor. show.message <- 1 bnpa::check.types(dataQuantC, show.message) # Supressing the message show.message <- 0 bnpa::check.types(dataQuantC, show.message)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQuantC) # Show first lines of data set head(dataQuantC) # Check and return a numeric value show.message <- 1 bnpa::check.types(dataQuantC, show.message) # Adding random data to dataQuantC, function will return TRUE dataQuantC$Z <- round(runif(500, min=0, max=1000),2) # Converting the numeric variable into factor dataQuantC$Z <- factor(dataQuantC$Z) # Check and return a numeric value correspondig to: 1=integer, 2=numeric, 3=factor, 4=integer and # numeric, 5=integer and factor, 6=numeric and factor or 7=integer, numeric and factor. show.message <- 1 bnpa::check.types(dataQuantC, show.message) # Supressing the message show.message <- 0 bnpa::check.types(dataQuantC, show.message)
This function receives a data set and check the level of each factor variable, if they have more than 2 levels the function recommend to check the need to transform it to ordered factor.
check.variables.to.be.ordered(data.to.work)
check.variables.to.be.ordered(data.to.work)
data.to.work |
is a data set with variables to check. |
TRUE or FALSE if need or not to tranform the variable into ordered factor.
Elias Carvalho
HAYES, A F; PREACHER, K J. Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, v. 67, n. 3, p. 451-470, 2014.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) # Show first lines of data set head(dataQualiN) # Insert categorical variables with more than 2 levels dataQualiN$test.variable[dataQualiN$A == "yes"] <- "low" dataQualiN$test.variable[dataQualiN$B == "yes"] <- "medium" dataQualiN$test.variable[dataQualiN$X == "yes"] <- "high" # Transform it to factor variable dataQualiN$test.variable <- as.factor(dataQualiN$test.variable) # Check the necessity to transform in ordered variables bnpa::check.variables.to.be.ordered(dataQualiN)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) # Show first lines of data set head(dataQualiN) # Insert categorical variables with more than 2 levels dataQualiN$test.variable[dataQualiN$A == "yes"] <- "low" dataQualiN$test.variable[dataQualiN$B == "yes"] <- "medium" dataQualiN$test.variable[dataQualiN$X == "yes"] <- "high" # Transform it to factor variable dataQualiN$test.variable <- as.factor(dataQualiN$test.variable) # Check the necessity to transform in ordered variables bnpa::check.variables.to.be.ordered(dataQualiN)
This function receives a confusion matrix and the matrix values to keep the order VP, FP, FN, VN.
convert.confusion.matrix(confusion.matrix, cm.position)
convert.confusion.matrix(confusion.matrix, cm.position)
confusion.matrix |
is the confusion matrix to be converted. |
cm.position |
is the position of your VP, FP, FN, VN at the confusion matrix. |
a new confusion matrix
Elias Carvalho
STORY, Michael; CONGALTON, Russell G. Accuracy assessment: a user’s perspective. Photogrammetric Engineering and remote sensing, v. 52, n. 3, p. 397-399, 1986.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Creates a confusion matrix confusion.matrix <-matrix(c(12395, 4, 377, 1), nrow=2, ncol=2, byrow=TRUE) # Creates a vector with the position of VP, FP, FN, VN cm.position <- c(4,3,2,1) # Shows the original confusion matrix confusion.matrix # Converts the confusion matrix confusion.matrix <- convert.confusion.matrix(confusion.matrix, cm.position) # Shows the converted confusion matrix confusion.matrix
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Creates a confusion matrix confusion.matrix <-matrix(c(12395, 4, 377, 1), nrow=2, ncol=2, byrow=TRUE) # Creates a vector with the position of VP, FP, FN, VN cm.position <- c(4,3,2,1) # Shows the original confusion matrix confusion.matrix # Converts the confusion matrix confusion.matrix <- convert.confusion.matrix(confusion.matrix, cm.position) # Shows the converted confusion matrix confusion.matrix
This function counts the number of cores of your computer processor and mount a parallel socket cluster. It always creates N-1 node to the cluster to let 1 core to the other tasks.
create.cluster()
create.cluster()
an object of class "cluster"
Elias Carvalho
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
## Not run: ## Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) # Start the cluster cl <- bnpa::create.cluster() # Set the number of replications R=1000 # Set the algorithm to be used algorithm="hc" # Executes a parallel bootstrap process data.bn.boot.strap=boot.strength(data=dataQualiN,R,algorithm,cluster=cl, algorithm.args=list(score="bic"),cpdag = FALSE) # Release the cluster parallel::stopCluster(cl) ## End(Not run)
## Not run: ## Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) # Start the cluster cl <- bnpa::create.cluster() # Set the number of replications R=1000 # Set the algorithm to be used algorithm="hc" # Executes a parallel bootstrap process data.bn.boot.strap=boot.strength(data=dataQualiN,R,algorithm,cluster=cl, algorithm.args=list(score="bic"),cpdag = FALSE) # Release the cluster parallel::stopCluster(cl) ## End(Not run)
This function receives a data set and the name of variables to be transformed into dummies. Then it create the dummy variables, transform it into numeric (to work with PA generation) and remove the master variables that originates the dummies.
create.dummies(data.to.work, dummy.vars)
create.dummies(data.to.work, dummy.vars)
data.to.work |
is a data set containing the variables to tranform. |
dummy.vars |
are the variables to be transformed. |
the new data set
Elias Carvalho
Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2),1-36.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) # Show the structure before str(dataQualiN) # Set possible dummy variables dummy.vars <- c("A", "B") # Create dummies dataQualiN <- bnpa::create.dummies(dataQualiN, dummy.vars) # Show the structure before str(dataQualiN)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Use working data sets from package data(dataQualiN) # Show the structure before str(dataQualiN) # Set possible dummy variables dummy.vars <- c("A", "B") # Create dummies dataQualiN <- bnpa::create.dummies(dataQualiN, dummy.vars) # Show the structure before str(dataQualiN)
This is data set with qualitative nominal variables containing 500 obs extracted from ASIA Bayesian Networks of bnlearn package repository.
dataQualiN
dataQualiN
A data frame with 500 rows and 7 variables:
a categorical variale
a categorical variale
a categorical variale
a categorical variale
a categorical variale
a categorical variale
a categorical variale
a categorical variale
...
http://www.bnlearn.com/bnrepository//
This is data set with quantiative continuous variables containing 500 obs extracted from ASIA Bayesian Networks of bnlearn package repository.
dataQuantC
dataQuantC
A data frame with 500 rows and 7 variables:
a numeric variale
a numeric variale
a numeric variale
a numeric variale
a numeric variale
a numeric variale
a numeric variale
...
http://www.bnlearn.com/bnrepository//
This function receives a data set, a list of parameters to learn the BN structure based on this data set. Then with the BN ready it will build a PA model if required. The process will then save the graphs of BN and PA and PA parameters.
gera.bn.structure(data.to.work, white.list = "", black.list = "", nreplicates = 1000, cb.algorithms = c("gs", "iamb", "fast.iamb", "inter.iamb", "mmpc", "si.hiton.pc"), sb.algorithms = c("hc", "tabu"), cb.tests = "", sb.tests = "", optimized.option = "FALSE", outcome.var, build.pa)
gera.bn.structure(data.to.work, white.list = "", black.list = "", nreplicates = 1000, cb.algorithms = c("gs", "iamb", "fast.iamb", "inter.iamb", "mmpc", "si.hiton.pc"), sb.algorithms = c("hc", "tabu"), cb.tests = "", sb.tests = "", optimized.option = "FALSE", outcome.var, build.pa)
data.to.work |
is a data from which the BN structure will be learned. |
white.list |
is a list of mandatory connections of BN structure to be created. |
black.list |
is a list of forbiden connections of BN structure to be created. |
nreplicates |
is how many times the boostrap will run. |
cb.algorithms |
the name of constrained-based algorithms. |
sb.algorithms |
the name of score-based algorithms. |
cb.tests |
the name of tests for constrained-based algorithms. |
sb.tests |
the name of network scores for score-based algorithms. |
optimized.option |
a paremeter of bnlearn package to optmize the BN learn structre learning. |
outcome.var |
is the outcome (dependent) variable. |
build.pa |
indicates if the process will bulld a PA model or not. |
Elias Carvalho
Scutari M (2017). Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package. Journal of Statistical Software, 77(2), 1-20.
## Not run: # Clean environment closeAllConnections() rm(list=ls()) # Set environment # setwd("To your working directory") # Load packages library(bnpa) # Load Data data(dataQualiN) # Set variables to work nreplicates = 1000 white.list <- NULL black.list <- "L-T" cb.algorithms = c("gs") sb.algorithms = c("hc") cb.tests = "jt" sb.tests = "aic" optimized.option="FALSE" outcome.var = "E" build.pa = 0 # Learn the BN from data and save results (data & images) gera.bn.structure(dataQualiN, white.list, black.list, nreplicates, cb.algorithms,sb.algorithms, cb.tests, sb.tests, optimized.option, outcome.var, build.pa) ## End(Not run)
## Not run: # Clean environment closeAllConnections() rm(list=ls()) # Set environment # setwd("To your working directory") # Load packages library(bnpa) # Load Data data(dataQualiN) # Set variables to work nreplicates = 1000 white.list <- NULL black.list <- "L-T" cb.algorithms = c("gs") sb.algorithms = c("hc") cb.tests = "jt" sb.tests = "aic" optimized.option="FALSE" outcome.var = "E" build.pa = 0 # Learn the BN from data and save results (data & images) gera.bn.structure(dataQualiN, white.list, black.list, nreplicates, cb.algorithms,sb.algorithms, cb.tests, sb.tests, optimized.option, outcome.var, build.pa) ## End(Not run)
This function receives a BN structure learned, the data set and some parameters and build a PA input model string. Then run the PA model using Structural Equation Model functions and export a PA graph and a PA model summary information.
gera.pa(bn.structure, data.to.work, pa.name, pa.imgname, bn.algorithm, bn.score.test, outcome.var)
gera.pa(bn.structure, data.to.work, pa.name, pa.imgname, bn.algorithm, bn.score.test, outcome.var)
bn.structure |
is a BN structure learned from data. |
data.to.work |
is a data frame containing the variables of the BN. |
pa.name |
is a variable to store the name of file to save PA parameters. |
pa.imgname |
is a variable to store the name of file to save PA graph. |
bn.algorithm |
is a list of algorithms to learn the BN structure. |
bn.score.test |
is a list of tests to be used during BN structure learning. |
outcome.var |
is the outcome variable. |
Elias Carvalho
Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2),1-36.
## Not run: # Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("To your working directory") # Load packages library(bnpa) # Load data sets from package data(dataQualiN) # Show first lines head(dataQualiN) # Learn BN structure bn.structure <- bnlearn::hc(dataQualiN) bnlearn::graphviz.plot(bn.structure) # Set variables pa.name<-"docPAHC" pa.imgname<-"imgPAHC" bn.algorithm<-"hc" bn.score.test<-"aic-g" outcome.var<-"D" # Generates the PA model from bn structure gera.pa(bn.structure, dataQualiN, pa.name, pa.imgname, bn.algorithm, bn.score.test, outcome.var) ## End(Not run)
## Not run: # Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("To your working directory") # Load packages library(bnpa) # Load data sets from package data(dataQualiN) # Show first lines head(dataQualiN) # Learn BN structure bn.structure <- bnlearn::hc(dataQualiN) bnlearn::graphviz.plot(bn.structure) # Set variables pa.name<-"docPAHC" pa.imgname<-"imgPAHC" bn.algorithm<-"hc" bn.score.test<-"aic-g" outcome.var<-"D" # Generates the PA model from bn structure gera.pa(bn.structure, dataQualiN, pa.name, pa.imgname, bn.algorithm, bn.score.test, outcome.var) ## End(Not run)
This function is called from 'gera.pa' function. It receives a BN structure and a data set, build a PA input model string based on BN structure and return it.
gera.pa.model(bn.structure, data.to.work)
gera.pa.model(bn.structure, data.to.work)
bn.structure |
is a BN structure learned from data. |
data.to.work |
is a data set containing the variables of the BN. |
the PA input modeo string
Elias Carvalho
Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2),1-36.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("To your working directory") # Load packages library(bnpa) library(bnlearn) # load data sets from package data(dataQualiN) # Show first lines head(dataQualiN) # Learn BN structure bn.structure <- hc(dataQualiN) bnlearn::graphviz.plot(bn.structure) # Set variables # Generates the PA model from bn structure pa.model <- gera.pa.model(bn.structure, dataQualiN) pa.model
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("To your working directory") # Load packages library(bnpa) library(bnlearn) # load data sets from package data(dataQualiN) # Show first lines head(dataQualiN) # Learn BN structure bn.structure <- hc(dataQualiN) bnlearn::graphviz.plot(bn.structure) # Set variables # Generates the PA model from bn structure pa.model <- gera.pa.model(bn.structure, dataQualiN) pa.model
This function receives a simple list with one or more couple of variables and mount a new data frame in "bnlearn" syntax. The final result must return an object similar to the result of bnlearn command "data.frame(from = c('B', 'F'), to = c('F', 'B'))" that is more complex syntax.
mount.wl.bl.list(black_or_white_list)
mount.wl.bl.list(black_or_white_list)
black_or_white_list |
is a list of couple of variables. |
A new data frame with the 'from' and 'to' variables
Elias Carvalho
Scutari M (2017). Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package. Journal of Statistical Software, 77(2), 1-20.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("To your working directory") # Load packages library(bnpa) library(bnlearn) # Load data sets from package data(dataQuantC) # Show the first lines of data head(dataQuantC) # Learn the BN structure without black and white list bn.structure <- hc(dataQuantC) # Split graph panel in 2 columns par(mfrow=c(1,2)) # Show the BN structure bnlearn::graphviz.plot(bn.structure) # Mounting the black list black.list <- ("A-C,D-F") black.list <- mount.wl.bl.list(black.list) black.list white.list <- ("A-B,D-G") white.list <- mount.wl.bl.list(white.list) white.list # Learn the BN structure with black and white list bn.structure <- hc(dataQuantC, whitelist = white.list, blacklist = black.list) # Show the BN structure bnlearn::graphviz.plot(bn.structure)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("To your working directory") # Load packages library(bnpa) library(bnlearn) # Load data sets from package data(dataQuantC) # Show the first lines of data head(dataQuantC) # Learn the BN structure without black and white list bn.structure <- hc(dataQuantC) # Split graph panel in 2 columns par(mfrow=c(1,2)) # Show the BN structure bnlearn::graphviz.plot(bn.structure) # Mounting the black list black.list <- ("A-C,D-F") black.list <- mount.wl.bl.list(black.list) black.list white.list <- ("A-B,D-G") white.list <- mount.wl.bl.list(white.list) white.list # Learn the BN structure with black and white list bn.structure <- hc(dataQuantC, whitelist = white.list, blacklist = black.list) # Show the BN structure bnlearn::graphviz.plot(bn.structure)
This function receives a data set, an outcome/predictor variable, the type of variable and a black list. If this variable is classfied as type outcome the function builds a black list from it to all other variable. If it is classified as type predictor builds a list from all other variables to it. You can pass a previously black list and then this function will append a new list in the end of it.
outcome.predictor.var(data.to.work, var.name, type.var, black.list)
outcome.predictor.var(data.to.work, var.name, type.var, black.list)
data.to.work |
is a data set containing the variables to build a list. |
var.name |
is the outcome/predictor variable name. |
type.var |
is a type of variable: <o>utcome or <p>redictor. |
black.list |
is a previous black list, it would be empty or loaded. |
a black list with from - to variables
Elias Carvalho
KATZ, M H. Multivariable analysis: a primer for readers of medical research. Annals of internal medicine, v. 138, n. 8, p. 644-650, 2003.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("To your working directory") # Load packages library(bnpa) library(bnlearn) # Load data sets from package data(dataQuantC) # Show first lines head(dataQuantC) # Create an empty list or fill it before start black.list <- "" # Setting the type of var as typical "outcome" what means it will not point to any var type.var <- "o" # Setting variable "A" as "outcome" will create a black from this variable to all others var.name <- "A" # Creating the black list black.list <- outcome.predictor.var(dataQuantC, var.name, type.var, black.list) black.list # Setting the type of var as typical "predictor" it will not be pointed from any other var type.var <- "p" # Setting variable "D" as "predictor" will create a blacklist from all others to it var.name <- "D" # Creating the black list black.list <- outcome.predictor.var(dataQuantC, var.name, type.var, black.list) black.list
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("To your working directory") # Load packages library(bnpa) library(bnlearn) # Load data sets from package data(dataQuantC) # Show first lines head(dataQuantC) # Create an empty list or fill it before start black.list <- "" # Setting the type of var as typical "outcome" what means it will not point to any var type.var <- "o" # Setting variable "A" as "outcome" will create a black from this variable to all others var.name <- "A" # Creating the black list black.list <- outcome.predictor.var(dataQuantC, var.name, type.var, black.list) black.list # Setting the type of var as typical "predictor" it will not be pointed from any other var type.var <- "p" # Setting variable "D" as "predictor" will create a blacklist from all others to it var.name <- "D" # Creating the black list black.list <- outcome.predictor.var(dataQuantC, var.name, type.var, black.list) black.list
This function receives a data set, the variable content and name, analyzes their content and extract outliers information, showing a boxplot and a histogram.
preprocess.outliers(data.to.work, variable.content, variable.name)
preprocess.outliers(data.to.work, variable.content, variable.name)
data.to.work |
is a data frame containing the variables. |
variable.content |
is a variable with all content of variable in the data set. |
variable.name |
is the name of variable to be verified. |
a list with number of outliers and the variable content
Elias Carvalho
GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Load data sets from package data(dataQuantC) # Set parameters to function variable.content <- dataQuantC$A variable.name <- "A" # Preprocess information preprocess.information <- preprocess.outliers(dataQuantC, variable.content, variable.name) num.outliers <- preprocess.information[[1]] variable.content <- preprocess.information[[2]] mean.of.outliers <- preprocess.information[[3]]
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) # Load data sets from package data(dataQuantC) # Set parameters to function variable.content <- dataQuantC$A variable.name <- "A" # Preprocess information preprocess.information <- preprocess.outliers(dataQuantC, variable.content, variable.name) num.outliers <- preprocess.information[[1]] variable.content <- preprocess.information[[2]] mean.of.outliers <- preprocess.information[[3]]
This function receives a data set with categorical variables, scan all variables and transform it into odered factors.
transf.into.ordinal(data.to.work)
transf.into.ordinal(data.to.work)
data.to.work |
is a data set where all variables will be transformed into odered factors. |
The data set transformed
Elias Carvalho
GUJARATI, Damodar N. Basic econometrics. Tata McGraw-Hill Education, 2009.
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) #Load Data data(dataQualiN) # Transform all variables into ordinal dataQualiN <- bnpa::transf.into.ordinal(dataQualiN) str(dataQualiN)
# Clean environment closeAllConnections() rm(list=ls()) # Set enviroment # setwd("to your working directory") # Load packages library(bnpa) #Load Data data(dataQualiN) # Transform all variables into ordinal dataQualiN <- bnpa::transf.into.ordinal(dataQualiN) str(dataQualiN)