Title: | Efficiently Impute Large Scale Incomplete Matrix |
---|---|
Description: | Efficiently impute large scale matrix with missing values via its unbiased low-rank matrix approximation. Our main approach is Hard-Impute algorithm proposed in <https://www.jmlr.org/papers/v11/mazumder10a.html>, which achieves highly computational advantage by truncated singular-value decomposition. |
Authors: | Zhe Gao [aut, cre], Jin Zhu [aut], Junxian Zhu [aut], Xueqin Wang [aut], Yixuan Qiu [cph], Gael Guennebaud [cph, ctb], Jitse Niesen [cph, ctb], Ray Gardner [ctb] |
Maintainer: | Zhe Gao <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.2.4 |
Built: | 2024-11-20 06:56:06 UTC |
Source: | CRAN |
Standardize a matrix rows and/or columns to have zero mean or unit variance
biscale(x, thresh.sd = 1e-05, maxit.sd = 100, control = list(...), ...)
biscale(x, thresh.sd = 1e-05, maxit.sd = 100, control = list(...), ...)
x |
an |
thresh.sd |
convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. |
maxit.sd |
maximum number of iterations. |
control |
a list of parameters that control details of standard procedure. See biscale.control. |
... |
arguments to be used to form the default control argument if it is not supplied directly. |
A list is returned
x.st |
The matrix after standardization. |
alpha |
The row mean after iterative process. |
beta |
The column mean after iterative process. |
tau |
The row standard deviation after iterative process. |
gamma |
The column standard deviation after iterative process. |
Hastie, Trevor, Rahul Mazumder, Jason D. Lee, and Reza Zadeh. Matrix completion and low-rank SVD via fast alternating least squares. The Journal of Machine Learning Research 16, no. 1 (2015): 3367-3402.
################# Quick Start ################# m <- 100 n <- 100 r <- 10 x_na <- incomplete.generator(m, n, r) ###### Standardize both mean and variance xs <- biscale(x_na) ###### Only standardize mean ###### xs_mean <- biscale(x_na, row.mean = TRUE, col.mean = TRUE) ###### Only standardize variance ###### xs_std <- biscale(x_na, row.std = TRUE, col.std = TRUE)
################# Quick Start ################# m <- 100 n <- 100 r <- 10 x_na <- incomplete.generator(m, n, r) ###### Standardize both mean and variance xs <- biscale(x_na) ###### Only standardize mean ###### xs_mean <- biscale(x_na, row.mean = TRUE, col.mean = TRUE) ###### Only standardize variance ###### xs_std <- biscale(x_na, row.std = TRUE, col.std = TRUE)
Various parameters that control aspects of the standard procedure.
biscale.control( row.mean = FALSE, row.std = FALSE, col.mean = FALSE, col.std = FALSE )
biscale.control( row.mean = FALSE, row.std = FALSE, col.mean = FALSE, col.std = FALSE )
row.mean |
if |
row.std |
if |
col.mean |
similar to |
col.std |
similar to |
A list with components named as the arguments.
Fit a low-rank matrix approximation to a matrix with missing values. The algorithm iterates like EM: filling the missing values with the current guess, and then approximating the complete matrix via truncated SVD.
eimpute( x, r, svd.method = c("tsvd", "rsvd"), noise.var = 0, thresh = 1e-05, maxit = 100, init = FALSE, init.mat = 0, override = FALSE, control = list(...), ... )
eimpute( x, r, svd.method = c("tsvd", "rsvd"), noise.var = 0, thresh = 1e-05, maxit = 100, init = FALSE, init.mat = 0, override = FALSE, control = list(...), ... )
x |
an |
r |
the rank of low-rank matrix for approximating |
svd.method |
a character string indicating the truncated SVD method.
If |
noise.var |
the variance of noise. |
thresh |
convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. |
maxit |
maximal number of iterations. |
init |
if init = FALSE(the default), the missing entries will initialize with mean. |
init.mat |
the initialization matrix. |
override |
logical value indicating whether the observed elements in |
control |
a list of parameters that control details of standard procedure, See biscale.control. |
... |
arguments to be used to form the default control argument if it is not supplied directly. |
A list containing the following components
x.imp |
the matrix after completion. |
rmse |
the relative mean square error of matrix completion, i.e., training error. |
iter.count |
the number of iterations. |
Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices, Journal of Machine Learning Research 11, 2287-2322
Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp (2011) Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, Siam Review Vol. 53, num. 2, pp. 217-288
################# Quick Start ################# m <- 100 n <- 100 r <- 10 x_na <- incomplete.generator(m, n, r) head(x_na[, 1:6]) x_impute <- eimpute(x_na, r) head(x_impute[["x.imp"]][, 1:6]) x_impute[["rmse"]]
################# Quick Start ################# m <- 100 n <- 100 r <- 10 x_na <- incomplete.generator(m, n, r) head(x_na[, 1:6]) x_impute <- eimpute(x_na, r) head(x_impute[["x.imp"]][, 1:6]) x_impute[["rmse"]]
Generate a matrix with missing values, where the indices of missing values are uniformly randomly distributed in the matrix.
incomplete.generator(m, n, r, snr = 3, prop = 0.5, seed = 1)
incomplete.generator(m, n, r, snr = 3, prop = 0.5, seed = 1)
m |
the rows of the matrix. |
n |
the columns of the matrix. |
r |
the rank of the matrix. |
snr |
the signal-to-noise ratio in generating the matrix. Default |
prop |
the proportion of missing observations. Default |
seed |
the random seed. Default |
We generate the matrix by , where
,
are
by
,
by
matrix satisfy standard normal
distribution.
has a normal distribution with mean 0 and variance
.
A matrix with missing values.
m <- 100 n <- 100 r <- 10 x_na <- incomplete.generator(m, n, r) head(x_na[, 1:6])
m <- 100 n <- 100 r <- 10 x_na <- incomplete.generator(m, n, r) head(x_na[, 1:6])
Estimate a preferable matrix rank magnitude for fitting a low-rank matrix approximation to a matrix with missing values. The algorithm use GIC/CV to search the rank in a given range, and then fill the missing values with the estimated rank.
r.search( x, r.min = 1, r.max = "auto", svd.method = c("tsvd", "rsvd"), rule.type = c("gic", "cv"), noise.var = 0, init = FALSE, init.mat = 0, maxit.rank = 1, nfolds = 5, thresh = 1e-05, maxit = 100, override = FALSE, control = list(...), ... )
r.search( x, r.min = 1, r.max = "auto", svd.method = c("tsvd", "rsvd"), rule.type = c("gic", "cv"), noise.var = 0, init = FALSE, init.mat = 0, maxit.rank = 1, nfolds = 5, thresh = 1e-05, maxit = 100, override = FALSE, control = list(...), ... )
x |
an |
r.min |
the start rank for searching. Default |
r.max |
the max rank for searching. |
svd.method |
a character string indicating the truncated SVD method.
If |
rule.type |
a character string indicating the information criterion rule.
If |
noise.var |
the variance of noise. |
init |
if init = FALSE(the default), the missing entries will initialize with mean. |
init.mat |
the initialization matrix. |
maxit.rank |
maximal number of iterations in searching rank. Default |
nfolds |
number of folds in cross validation. Default |
thresh |
convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. |
maxit |
maximal number of iterations. |
override |
logical value indicating whether the observed elements in |
control |
a list of parameters that control details of standard procedure, See biscale.control. |
... |
arguments to be used to form the default control argument if it is not supplied directly. |
A list containing the following components
x.imp |
the matrix after completion with the estimated rank. |
r.est |
the rank estimation. |
rmse |
the relative mean square error of matrix completion, i.e., training error. |
iter.count |
the number of iterations. |
################# Quick Start ################# m <- 100 n <- 100 r <- 10 x_na <- incomplete.generator(m, n, r) head(x_na[, 1:6]) x_impute <- r.search(x_na, 1, 15, "rsvd", "gic") x_impute[["r.est"]]
################# Quick Start ################# m <- 100 n <- 100 r <- 10 x_na <- incomplete.generator(m, n, r) head(x_na[, 1:6]) x_impute <- r.search(x_na, 1, 15, "rsvd", "gic") x_impute[["r.est"]]