```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(width=120) ``` In this vignette, we compare the computation time/memory usage of dense `matrix` and sparse `Matrix`. ## Allocation and length We begin with an analysis of the time/memory it takes to create these objects. In the `atime` code below, we allocate a `vector` for comparison, and we specify a `result` function which computes the `length` of the object `x` created by each expression. This means `atime` will save `length` as a function of data size `N` (in addition to time and memory). ```{r} library(Matrix) N_seq <- as.integer(10^seq(1,7,by=0.25)) vec.mat.result <- atime::atime( N=N_seq, vector=numeric(N), matrix=matrix(0, N, N), Matrix=Matrix(0, N, N), result=function(x)data.frame(length=length(x))) plot(vec.mat.result) ``` The plot above shows three panels, one for each unit. * `kilobytes` is the amount of memory used. We see that `Matrix` and `vector` use the same amount of memory asymptotically, whereas `matrix` uses more (larger slope on the log-log plot implies larger asymptotic complexity class). * `length` is the value returned by the `length` function. We see that `matrix` and `Matrix` have the same value, whereas `vector` has asymptotically smaller length (smaller slope on log-log plot). * `seconds` is the amount of time taken. We see that `Matrix` is slower than `vector` and `matrix` by a small constant overhead, which can be seen for small `N`. We also see that for large `N`, `Matrix` and `vector` have the same asymptotic time complexity, which is much faster than `matrix`. ### Comparison with `bench::press` An alternative method to compute asymptotic timings is via `bench::press`, which provides functionality for parameterized benchmarking (similar to `atime_grid`). Because `atime()` has special treatment of the `N` parameter, the code required for asymptotic measurement is relatively simple; compare the `atime` code above to the `bench::press` code below, which measures the same asymptotic quantities (seconds, kilobytes, length). ```{r} seconds.limit <- 0.01 done.vec <- NULL measure.vars <- c("seconds","kilobytes","length") press_result <- bench::press( N = N_seq, { exprs <- function(...){ as.list(match.call()[-1]) } elist <- exprs( vector=numeric(N), matrix=matrix(0, N, N), Matrix=Matrix(0, N, N)) elist[names(done.vec)] <- NA #Don't run exprs which already exceeded limit. mark.args <- c(elist, list(iterations=10, check=FALSE)) mark.result <- do.call(bench::mark, mark.args) ## Rename some columns for easier interpretation. desc.vec <- attr(mark.result$expression, "description") mark.result$description <- desc.vec mark.result$seconds <- as.numeric(mark.result$median) mark.result$kilobytes <- as.numeric(mark.result$mem_alloc/1024) ## Compute length column to measure in addition to time/memory. mark.result$length <- NA for(desc.i in seq_along(desc.vec)){ description <- desc.vec[[desc.i]] result <- eval(elist[[description]]) mark.result$length[desc.i] <- length(result) } ## Set NA time/memory/length for exprs which were not run. mark.result[desc.vec %in% names(done.vec), measure.vars] <- NA ## If expr went over time limit, indicate it is done. over.limit <- mark.result$seconds > seconds.limit over.desc <- desc.vec[is.finite(mark.result$seconds) & over.limit] done.vec[over.desc] <<- TRUE mark.result } ) ``` The `bench::press` code above is relatively complicated, because it re-implements two functions that are provided by atime: * If an expression takes longer than the time limit of 0.01 seconds, then it will not be run for any larger `N` values. This keeps overall computation reasonable, even when comparing expressions which have different asymptotic time complexity (such as quadratic for `matrix` and linear for `Matrix` in this example). * If you want to measure quantities other than `seconds` and `kilobytes` as a function of `N` (such as `length` in this example), then `atime` makes that easy (just provide a `result` function), whereas it is more complex to implement in `bench::press` (for loop is required). Below we visualize the results from `bench::press`, ```{r} library(data.table) (press_long <- melt( data.table(press_result), measure.vars=measure.vars, id.vars=c("N","description"), na.rm=TRUE)) if(require(ggplot2)){ gg <- ggplot()+ ggtitle("bench::press results for comparison")+ facet_grid(variable ~ ., labeller=label_both, scales="free")+ geom_line(aes( N, value, color=description), data=press_long)+ scale_x_log10(limits=c(NA, max(press_long$N*2)))+ scale_y_log10("") if(requireNamespace("directlabels")){ directlabels::direct.label(gg,"right.polygons") }else gg } ``` We can see that the plot from `atime` and `bench::press` are consistent. ### Complexity class estimation with atime Below we estimate the best asymptotic complexity classes: ```{r} vec.mat.best <- atime::references_best(vec.mat.result) plot(vec.mat.best) ``` The plot above shows that * `matrix` has time, memory, and `length` which are all quadratic `O(N^2)`. * `Matrix` has linear `O(N)` time and memory, but `O(N^2)` values for `length`. * `vector` has time, memory, and `length` which are all linear `O(N)`. Below we estimate the throughput for some given limits: ```{r} vec.mat.pred <- predict( vec.mat.best, seconds=vec.mat.result$seconds.limit, kilobytes=1000, length=1e6) plot(vec.mat.pred) ``` In the plot above we can see the throughput `N` for a given limit of `kilobytes`, `length` or `seconds`. Below we use `Matrix` as a reference, and compute the throughput ratio, `Matrix` to other. ```{r} library(data.table) dcast(vec.mat.pred$prediction[ , ratio := N[expr.name=="Matrix"]/N, by=unit ], unit + unit.value ~ expr.name, value.var="ratio") ``` From the table above (`matrix` column), we can see that the throughput of `Matrix` is 100-1000x larger than `matrix`, for the given limits. ## Matrix Multiplication, 90% sparsity First we show the difference between sparse and dense matrix multiplication, when the matrix has 90% zeros (10% non-zeros). ```{r} library(Matrix) sparse.prop <- 0.9 dense.prop <- 1-sparse.prop mult.result <- atime::atime( N=as.integer(10^seq(1,4,by=0.25)), setup={ m <- matrix(0, N, N) set.seed(1) w <- rnorm(N) N.not.zero <- as.integer(dense.prop*N^2) m[sample(N^2, N.not.zero)] <- rnorm(N.not.zero) M <- Matrix(m) }, sparse = M %*% w, dense = m %*% w, result=TRUE) plot(mult.result) ``` Above we see that `sparse` is faster than `dense`, but by constant factors. Below we estimate the best asymptotic complexity classes: ```{r} mult.best <- atime::references_best(mult.result) plot(mult.best) ``` Above we see that both `sparse` and `dense` matrix multiplication are quadratic `O(N^2)` time (for a quadratic number of non-zero entries). Finally, we verify below that both methods yield the same result: ```{r} library(data.table) mult.compare <- dcast( mult.result$measurements, N ~ expr.name, value.var="result" )[ , equal := paste(all.equal(as.numeric(dense[[1]]), as.numeric(sparse[[1]]))) , by=N ][] tibble::tibble(mult.compare) ``` ## Matrix multiplication, linear number of non-zeros Next we show the difference between sparse and dense matrix multiplication, when the matrix has a linear number of non-zeros (asymptotically fewer than in the previous section). ```{r} library(Matrix) mult.result <- atime::atime( N=as.integer(10^seq(1,4,by=0.25)), setup={ m <- matrix(0, N, N) set.seed(1) w <- rnorm(N) N.not.zero <- N m[sample(N^2, N.not.zero)] <- rnorm(N.not.zero) M <- Matrix(m) }, sparse = M %*% w, dense = m %*% w, result=TRUE) plot(mult.result) ``` Above we see that `sparse` is asymptotically faster than `dense` (different asymptotic slopes). Below we estimate the best asymptotic complexity classes: ```{r} mult.best <- atime::references_best(mult.result) plot(mult.best) ``` Above we see that `sparse` is linear `O(N)` time whereas `dense` is quadratic `O(N^2)` time (for a linear number of non-zero entries). Finally, we verify below that both methods yield the same result: ```{r} library(data.table) mult.compare <- dcast( mult.result$measurements, N ~ expr.name, value.var="result" )[ , equal := paste(all.equal(as.numeric(dense[[1]]), as.numeric(sparse[[1]]))) , by=N ][] tibble::tibble(mult.compare) ``` ## Matrix multiplication, linear and quadratic number of non-zeros In this section we show how you can code both comparisons at the same time, without repetition. The trick is to first define a list of parameters to vary: ```{r} param.list <- list( non_zeros=c("N","N^2/10"), fun=c("matrix","Matrix") ) ``` After that we create a grid of expressions to evaluate, by expanding the parameter grid: ```{r} (expr.list <- atime::atime_grid( param.list, Mw=L[[fun]][[non_zeros]]%*%w, collapse="\n")) ``` Finally we pass the list of expressions to `atime`, along with a `setup` argument which creates the required list `L` of input data, based on the parameters: ```{r} mult.result <- atime::atime( N=as.integer(10^seq(1,3.5,by=0.25)), setup={ L <- list() set.seed(1) w <- rnorm(N) for(non_zeros in param.list$non_zeros){ N.not.zero <- as.integer(eval(str2lang(non_zeros))) m <- matrix(0, N, N) m[sample(N^2, N.not.zero)] <- rnorm(N.not.zero) for(fun in param.list$fun){ L[[fun]][[non_zeros]] <- get(fun)(as.numeric(m), N, N) } } }, expr.list=expr.list) plot(mult.result) ``` Below we estimate the best asymptotic complexity classes: ```{r} mult.best <- atime::references_best(mult.result) plot(mult.best) ``` Below we show an alternative visualization: ```{r} only.seconds <- mult.best only.seconds$measurements <- mult.best$measurements[unit=="seconds"] only.seconds$plot.references <- mult.best$plot.references[unit=="seconds"] if(require(ggplot2)){ plot(only.seconds)+ facet_grid(non_zeros ~ fun, labeller=label_both) } ``` ## Conclusion Overall in this vignette we have shown how `atime` can be used to demonstrate when sparse matrices can be used for efficient computations. * sparse matrices have linear rather than quadratic time/memory for creation. * sparse matrix-vector multiply is asymptotically faster (linear rather than quadratic time) if there are a linear number of non-zero elements. We also showed a comparison between `atime` and `bench::press`, which highlighted two areas where `atime` is more convenient (stopping after exceeding a time limit, and measuring quantities other than time/memory as a function of data size `N`).