Package 'fusedMGM' reference manual

Package 'fusedMGM'

Title:	Implementation of Fused MGM to Infer 2-Class Networks
Description:	Implementation of fused Markov graphical model (FMGM; Park and Won, 2022). The functions include building mixed graphical model (MGM) objects from data, inference of networks using FMGM, stable edge-specific penalty selection (StEPS) for the determination of penalization parameters, and the visualization. For details, please refer to Park and Won (2022) <doi:10.48550/arXiv.2208.14959>.
Authors:	Jaehyun Park [aut, cre, cph] , Sungho Won [ths]
Maintainer:	Jaehyun Park <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.2
Built:	2024-10-18 12:35:38 UTC
Source:	CRAN

Title:

Implementation of Fused MGM to Infer 2-Class Networks

Description:

Implementation of fused Markov graphical model (FMGM; Park and Won, 2022). The functions include building mixed graphical model (MGM) objects from data, inference of networks using FMGM, stable edge-specific penalty selection (StEPS) for the determination of penalization parameters, and the visualization. For details, please refer to Park and Won (2022) <doi:10.48550/arXiv.2208.14959>.

Authors:

Jaehyun Park [aut, cre, cph]

, Sungho Won [ths]

Maintainer:

Jaehyun Park <[email protected]>

License:

MIT + file LICENSE

Version:

0.1.2

Built:

2024-10-18 12:35:38 UTC

Source:

CRAN

Help Index

An example of 2-group mixed data

Description

A dataset containing 50 numeric and 50 categorical variables Includes 250 observations in each group

Usage

data_all
data_all

Format

## 'data_all' A data frame with 500 rows and 100 columns.

A toy example of 2-group mixed data

Description

A dataset containing 4 numeric and 6 categorical variables Includes 250 observations in each group

Usage

data_mini
data_mini

Format

## 'data_mini' A data frame with 500 rows and 10 columns.

Main function of fused MGM

Description

Infers networks from 2-class mixed data

Usage

FMGM_mc(
  data,
  ind_disc,
  group,
  t = 1,
  L = NULL,
  eta = 2,
  lambda_intra,
  lambda_intra_prior = NULL,
  lambda_inter,
  with_prior = FALSE,
  prior_list = NULL,
  converge_by_edge = TRUE,
  tol_edge = 3,
  tol_mgm = 1e-05,
  tol_g = 1e-05,
  tol_fpa = 1e-12,
  maxit = 1e+06,
  polish = TRUE,
  tol_polish = 1e-12,
  cores = parallel::detectCores(),
  verbose = FALSE
)
FMGM_mc(
  data,
  ind_disc,
  group,
  t = 1,
  L = NULL,
  eta = 2,
  lambda_intra,
  lambda_intra_prior = NULL,
  lambda_inter,
  with_prior = FALSE,
  prior_list = NULL,
  converge_by_edge = TRUE,
  tol_edge = 3,
  tol_mgm = 1e-05,
  tol_g = 1e-05,
  tol_fpa = 1e-12,
  maxit = 1e+06,
  polish = TRUE,
  tol_polish = 1e-12,
  cores = parallel::detectCores(),
  verbose = FALSE
)

Arguments

`data`	Data frame with rows as observations and columns as variables
`ind_disc`	Indices of discrete variables
`group`	Group indices, must be provided with the observation names
`t`	Numeric. Initial value of coefficient that reflect 2 previous iterations in fast proximal gradient method. Default: 1
`L`	Numeric. Initial guess of Lipschitz constant. Default: missing (use backtracking)
`eta`	Numeric. Multipliers for L in backtracking. Default: 2
`lambda_intra`	Vector with 3 numeric variables. Penalization parameters for network edge weights
`lambda_intra_prior`	Vector with 3 numeric variables. Penalization parameters for network edge weights, applied to the edges with prior information
`lambda_inter`	Vector with 3 numeric variables. Penalization parameters for network edge weight differences
`with_prior`	Logical. Is prior information provided? Default: FALSE
`prior_list`	List of prior information. Each element must be a 3-column data frames, with the 1st and the 2nd columns being variable names and the 3rd column being prior confidence (0,1)
`converge_by_edge`	Logical. The convergence should be judged by null differences of network edges after iteration. If FALSE, the rooted mean square difference (RMSD) of edge weights is used. Default: TRUE
`tol_edge`	Integer. Number of consecutive iterations of convergence to stop the iteration. Default: 3
`tol_mgm`	Numeric. Cutoff of network edge RMSD for convergence. Default: 1e-05
`tol_g`	Numeric. Cutoff of iternations in prox-grad map calculation. Default: 1e-05
`tol_fpa`	Numeric. Cutoff for fixed-point approach. Default: 1e-12
`maxit`	Integer. Maximum number of iterations in fixed-point approach. Default: 1000000
`polish`	Logical. Should the edges with the weights below the cutoff should be discarded? Default: TRUE
`tol_polish`	Numeric. Cutoff of polishing the resulting network. Default: 1e-12
`cores`	Integer. Number of cores to use multi-core utilization. Default: maximum number of available cores
`verbose`	Logical. If TRUE, the procedures are reported in real-time manner. Default: FALSE

Details

If the value of Lipschitz constant, L, is not provided, the backtracking will be performed

Value

The resulting networks, in the form of a list of MGMs

Examples

chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))

if (Sys.info()['sysname'] != 'Linux') {
  cores=1L
} else {
  chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))
  if (nzchar(chk) && (chk != "false")) {
    cores=2L
  } else {
    cores=parallel::detectCores() - 1 ;
  }
}

## Not run: 
data(data_all) ;  # Example 500-by-100 simulation data
data(ind_disc) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_all) ;

res_FMGM <- FMGM_mc(data_all, ind_disc, group, 
                    lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), 
                    cores=cores, verbose=TRUE)

## End(Not run)


data(data_mini) ; # Minimal example 500-by-10 simulation data
data(ind_disc_mini) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_mini) ;

res_FMGM_mini <- FMGM_mc(data_mini, ind_disc_mini, group, 
                    lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), 
                    cores=cores, verbose=TRUE)

chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))

if (Sys.info()['sysname'] != 'Linux') {
  cores=1L
} else {
  chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))
  if (nzchar(chk) && (chk != "false")) {
    cores=2L
  } else {
    cores=parallel::detectCores() - 1 ;
  }
}

## Not run: 
data(data_all) ;  # Example 500-by-100 simulation data
data(ind_disc) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_all) ;

res_FMGM <- FMGM_mc(data_all, ind_disc, group, 
                    lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), 
                    cores=cores, verbose=TRUE)

## End(Not run)


data(data_mini) ; # Minimal example 500-by-10 simulation data
data(ind_disc_mini) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_mini) ;

res_FMGM_mini <- FMGM_mc(data_mini, ind_disc_mini, group, 
                    lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), 
                    cores=cores, verbose=TRUE)

A plot function for a list of MGMs. The output is usually from FMGM main function.

Description

This function is written based on R base function 'heatmap'.

Usage

FMGM_plot(
  MGM_list,
  sortby = "diff",
  highlight = c(),
  tol_polish = 1e-12,
  tol_plot = 0.01,
  sideColor = FALSE,
  distfun = dist,
  hclustfun = hclust,
  reorderfun = function(d, w) reorder(d, w),
  margins = c(2.5, 2.5),
  cexRow = 0.1 + 0.5/log10(n),
  cexCol = cexRow,
  main = NULL,
  xlab = NULL,
  ylab = NULL,
  verbose = getOption("verbose")
)
FMGM_plot(
  MGM_list,
  sortby = "diff",
  highlight = c(),
  tol_polish = 1e-12,
  tol_plot = 0.01,
  sideColor = FALSE,
  distfun = dist,
  hclustfun = hclust,
  reorderfun = function(d, w) reorder(d, w),
  margins = c(2.5, 2.5),
  cexRow = 0.1 + 0.5/log10(n),
  cexCol = cexRow,
  main = NULL,
  xlab = NULL,
  ylab = NULL,
  verbose = getOption("verbose")
)

Arguments

`MGM_list`	A list of graphs from 2 groups. Usually a result of FMGM main function.
`sortby`	Determines the standard of sorting & dendrograms. Either 1, 2, or "diff" (default).
`highlight`	A vector of variable names or indices to highlight
`tol_polish`	A threshold for the network edge presence
`tol_plot`	Only network edges above this value will be displayed on the heatmap
`sideColor`	A named vector determining a sidebar colors. Set NULL to make the colors based on the variable types (discrete/continuous). Default: FALSE (no sidebars)
`distfun`	A function for the distances between rows/columns
`hclustfun`	A function for hierarchical clustering
`reorderfun`	A function of dendrogram and weights for reordering
`margins`	A numeric vector of 2 numbers for row & column name margins
`cexRow`	A visual parameter cex for row axis labeling
`cexCol`	A visual parameter cex for column axis labeling, default to be same as cexRow
`main`	Main title, default to none
`xlab`	X-axis title, default to none
`ylab`	Y-axis title, default to none
`verbose`	Logical. Should plotting information be printed?

Value

None

Examples

chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))

if (Sys.info()['sysname'] != 'Linux') {
  cores=1L
} else {
  chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))
  if (nzchar(chk) && (chk != "false")) {
    cores=2L
  } else {
    cores=parallel::detectCores() - 1 ;
  }
}

## Not run: 
data(data_all) ;  # Example 500-by-100 simulation data
data(ind_disc) ;

group <- rep(c(1,2), each=250) ;
names(group) <- seq(500) ;

res_FMGM <- FMGM_mc(data_all, ind_disc, group, 
                    lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), 
                    cores=cores, verbose=TRUE)
                    
FMGM_plot(res_FMGM)

## End(Not run)


data(data_mini) ; # Minimal example 500-by-10 simulation data
data(ind_disc_mini) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_mini) ;

res_FMGM_mini <- FMGM_mc(data_mini, ind_disc_mini, group, 
                    lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), 
                    cores=cores, verbose=TRUE)
                    
FMGM_plot(res_FMGM_mini)

chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))

if (Sys.info()['sysname'] != 'Linux') {
  cores=1L
} else {
  chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))
  if (nzchar(chk) && (chk != "false")) {
    cores=2L
  } else {
    cores=parallel::detectCores() - 1 ;
  }
}

## Not run: 
data(data_all) ;  # Example 500-by-100 simulation data
data(ind_disc) ;

group <- rep(c(1,2), each=250) ;
names(group) <- seq(500) ;

res_FMGM <- FMGM_mc(data_all, ind_disc, group, 
                    lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), 
                    cores=cores, verbose=TRUE)
                    
FMGM_plot(res_FMGM)

## End(Not run)


data(data_mini) ; # Minimal example 500-by-10 simulation data
data(ind_disc_mini) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_mini) ;

res_FMGM_mini <- FMGM_mc(data_mini, ind_disc_mini, group, 
                    lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), 
                    cores=cores, verbose=TRUE)
                    
FMGM_plot(res_FMGM_mini)

StEPS: train subsamples and calculate edge instabilities

Description

From large to small values of candidates, calculate the edge inference instabilities from subsamples The smallest values with the instabilities under the cutoff are chosen. See Sedgewich et al. (2016) for more details

Usage

FMGM_StEPS(
  data,
  ind_disc,
  group,
  lambda_list,
  with_prior = FALSE,
  prior_list = NULL,
  N = 20,
  b = NULL,
  gamma = 0.05,
  perm = 10000,
  eps = 0.05,
  tol_polish = 1e-12,
  ...,
  cores = parallel::detectCores(),
  verbose = FALSE
)
FMGM_StEPS(
  data,
  ind_disc,
  group,
  lambda_list,
  with_prior = FALSE,
  prior_list = NULL,
  N = 20,
  b = NULL,
  gamma = 0.05,
  perm = 10000,
  eps = 0.05,
  tol_polish = 1e-12,
  ...,
  cores = parallel::detectCores(),
  verbose = FALSE
)

Arguments

`data`	Data frame with rows as observations and columns as variables
`ind_disc`	Indices of discrete variables
`group`	Group indices, must be provided with the observation names
`lambda_list`	Vector with numeric variables. Penalization parameter candidates
`with_prior`	Logical. Is prior information provided? Default: FALSE
`prior_list`	List of prior information. Each element must be a 3-column data frames, with the 1st and the 2nd columns being variable names and the 3rd column being prior confidence (0,1)
`N`	Integer. Number of subsamples to use. Default: 20
`b`	Integer. Number of observations in each subsample. Default: ceiling(10*sqrt(number of total observations))
`gamma`	Numeric. Instability cutoff. Default: 0.05
`perm`	Integer. Number of permutations to normalize the prior confidence. Default: 10000
`eps`	Numeric. Pseudocount to calculate the likelihood of edge detection. Default: 0.05
`tol_polish`	Numeric. Cutoff of polishing the resulting network. Default: 1e-12
`...`	Other arguments sent to fast proximal gradient method
`cores`	Integer. Number of cores to use multi-core utilization. Default: maximum number of available cores
`verbose`	Logical. If TRUE, the procedures are reported in real-time manner. Default: FALSE

Value

The resulting networks, in the form of a list of MGMs

Examples

chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))

if (Sys.info()['sysname'] != 'Linux') {
  cores=1L
} else {
  chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))
  if (nzchar(chk) && (chk != "false")) {
    cores=2L
  } else {
    cores=parallel::detectCores() - 1 ;
  }
}

## Not run: 
data(data_all) ;  # Example 500-by-100 simulation data
data(ind_disc) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_all) ;

lambda_list <- 2^seq(log2(.08), log2(.32), length.out=7) ;
lambda_list <- sort(lambda_list, decreasing=TRUE) ;

res_steps <- FMGM_StEPS(data_all, ind_disc, group, 
                    lambda_list=lambda_list, 
                    cores=cores, verbose=TRUE)

data(data_mini) ; # Minimal example 500-by-10 simulation data
data(ind_disc_mini) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_mini) ;

lambda_list <- 2^seq(log2(.08), log2(.32), length.out=7) ;
lambda_list <- sort(lambda_list, decreasing=TRUE) ;

res_steps_mini <- FMGM_StEPS(data_mini, ind_disc_mini, group, 
                    lambda_list=lambda_list, 
                    cores=cores, verbose=TRUE)

## End(Not run)
chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))

if (Sys.info()['sysname'] != 'Linux') {
  cores=1L
} else {
  chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))
  if (nzchar(chk) && (chk != "false")) {
    cores=2L
  } else {
    cores=parallel::detectCores() - 1 ;
  }
}

## Not run: 
data(data_all) ;  # Example 500-by-100 simulation data
data(ind_disc) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_all) ;

lambda_list <- 2^seq(log2(.08), log2(.32), length.out=7) ;
lambda_list <- sort(lambda_list, decreasing=TRUE) ;

res_steps <- FMGM_StEPS(data_all, ind_disc, group, 
                    lambda_list=lambda_list, 
                    cores=cores, verbose=TRUE)

data(data_mini) ; # Minimal example 500-by-10 simulation data
data(ind_disc_mini) ;

group <- rep(c(1,2), each=250) ;
names(group) <- rownames(data_mini) ;

lambda_list <- 2^seq(log2(.08), log2(.32), length.out=7) ;
lambda_list <- sort(lambda_list, decreasing=TRUE) ;

res_steps_mini <- FMGM_StEPS(data_mini, ind_disc_mini, group, 
                    lambda_list=lambda_list, 
                    cores=cores, verbose=TRUE)

## End(Not run)

An example of 2-group mixed data

Description

A vector indicating which columns in 'data_all' have categorical variables

Usage

ind_disc
ind_disc

Format

## 'ind_disc' A 50-length vector with discrete variable indices.

A toy example of 2-group mixed data

Description

A vector indicating which columns in 'data_mini' have categorical variables

Usage

ind_disc_mini
ind_disc_mini

Format

## 'ind_disc_mini' A 6-length vector with discrete variable indices.

Make MGM lists from input data

Description

Make MGM lists from input data

Usage

make_MGM_list(X, Y, group)
make_MGM_list(X, Y, group)

Arguments

`X`	data frame or matrix of continuous variables (row: observation, column: variable)
`Y`	data frame or matrix of discrete variables (row: observation, column: variable)
`group`	group variable vector, with the sample names

Value

A list of MGM objects. The length is equal to the unique number of groups.

Defining S3 object "MGM"

Description

Defining S3 object "MGM"

Usage

MGM(X, Y, g)
MGM(X, Y, g)

Arguments

`X`	data frame or matrix of continuous variables (row: observation, column: variable)
`Y`	data frame or matrix of discrete variables (row: observation, column: variable)
`g`	group index, needed for temporary files

Value

An S3 'MGM' object, containing data, network parameters, and the 1st derivatives