---
title: "NCT: CPU Time"
bibliography: ../inst/REFERENCES.bib
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{NCT: CPU Time}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


# Introduction

It is important to note that `nct` is based on the `NCT` function
in the `R` package **NetworkComparisonTest**. Key extensions
to the methodology are nonconvex regularization, the de-sparsified 
glasso estimator, additional default tests (e.g., KL-divergence), and 
user-defined test-statistics, to name but a few. 

Furthermore, `nct` should be much faster for two primary reasons:

1. **glassoFast**, introduced in [@sustik2012glassofast], is used 
instead of **glasso**. 

2. Parallel computing via **parallel**'s `parLapply`, whereas 
**NetworkComparisonTest** uses a serially executed for loop
(with `cores = 1`, **GGMncv** uses `lapply`)

The gains in speed will be most notable in larger networks, as
shown in the following comparison.

# CPU Time

In the following, time, as a function of network size, is 
investigated for several `cores` compared to `NCT`. Note that 
all settings in **GGMncv** are set to be comparable to `NCT`, 
for example, using `lasso` and the number of tuning parameters 
(i.e., `50`). One important distinction is that **GGMncv**
compute 4 default test-statistic instead of 2 
(as in `NCT`).


```r
library(GGMncv)
library(dplyr)
library(ggplot2)
library(NetworkComparisonTest)

p <- seq(10, 25, 5)

sim_fun <- function(x){
  
  main <- gen_net(p = x, edge_prob = 0.1)
  
  y1 <- MASS::mvrnorm(n = 500, 
                      mu = rep(0, x), 
                      Sigma = main$cors)
  
  y2 <- MASS::mvrnorm(n = 500, 
                      mu = rep(0, x), 
                      Sigma = main$cors)
  
  
  st_1 <- system.time({
    fit_1 <- nct(Y_g1 = y1, 
                 Y_g2 = y2, 
                 iter = 1000, 
                 desparsify = FALSE, 
                 penalty = "lasso", 
                 cores = 1, 
                 progress = FALSE)
  })
  
  st_2 <- system.time({
    fit_1 <- nct(Y_g1 = y1, 
                 Y_g2 = y2, 
                 iter = 1000, 
                 desparsify = FALSE, 
                 penalty = "lasso", 
                 cores = 2, 
                 progress = FALSE)
  })
  
  st_4 <- system.time({
    fit_1 <- nct(Y_g1 = y1, 
                 Y_g2 = y2, 
                 iter = 1000, 
                 desparsify = FALSE, 
                 penalty = "lasso", 
                 cores = 4, 
                 progress = FALSE)
  })
  
  
  st_8 <- system.time({
    fit_1 <- nct(Y_g1 = y1, 
                 Y_g2 = y2, 
                 iter = 1000, 
                 desparsify = FALSE, 
                 penalty = "lasso", 
                 cores = 8, 
                 progress = FALSE)
  })
  
  
  
  st_NCT <- system.time({
    
    fit_NCT <- NCT(data1 = y1, 
                 data2 = y2, 
                 it = 1000, 
                 progressbar = FALSE)
  })
  
  
  ret <- data.frame(
    times =  c(st_1[3], st_2[3], 
               st_4[3], st_8[3], st_NCT[3]),
    models = c("one", "two", 
               "four", "eight", "NCT"),
    nodes = x
  )
  
  return(ret)

}


sim_res <- list()

reps <- 5

for(i in seq_along(p)){
  print(paste("nodes", p[i]))
  
  sim_res[[i]] <- do.call(rbind.data.frame, 
                          replicate(reps, sim_fun(p[i]), 
                                    simplify = FALSE))
}


do.call(rbind.data.frame, sim_res) %>%
  group_by(models, nodes) %>%
  summarise(mu = mean(times), 
            std = sd(times)) %>%
  ggplot(aes(x = nodes, y = mu, group = models)) +
  theme_bw() +
  geom_ribbon(aes(ymax = mu + std, ymin = mu - std), 
              alpha = 0.1) +
  geom_line(aes(color = models), size = 2) +
  ylab("Seconds") +
  xlab("Nodes") +
  scale_color_discrete(name = "Model", 
                       labels = c("GGMncv(8)",
                                  "GGMncv(4)",
                                  "GGMncv(2)",
                                  "GGMncv(1)", 
                                   "NCT"), 
                       breaks = c("eight", 
                                  "four", 
                                  "two", 
                                  "one", 
                                  "NCT"))

```

![](../man/figures/cpu_time.png)

The performance gains are clear, 
especially when using 8 cores with **GGMncv**.

# References