Title: | Projected Refinement for Imputation of Missing Entries in PCA |
---|---|
Description: | Implements the primePCA algorithm, developed and analysed in Zhu, Z., Wang, T. and Samworth, R. J. (2019) High-dimensional principal component analysis with heterogeneous missingness. <arXiv:1906.12125>. |
Authors: | Ziwei Zhu, Tengyao Wang, Richard J. Samworth |
Maintainer: | Ziwei Zhu <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2024-11-10 06:25:39 UTC |
Source: | CRAN |
Center and/or normalize each column of a matrix
col_scale(X, center = T, normalize = F)
col_scale(X, center = T, normalize = F)
X |
a numeric matrix with NAs or "Incomplete" matrix object (see softImpute package) |
center |
center each column of |
normalize |
normalize each column of |
a centered and/or normalized matrix of the same dimension as .
Inverse probability weighted method for estimating the top K eigenspaces
inverse_prob_method(X, K, trace.it = F, center = T, normalize = F)
inverse_prob_method(X, K, trace.it = F, center = T, normalize = F)
X |
a numeric matrix with |
K |
the number of principal components of interest |
trace.it |
report the progress if |
center |
center each column of |
normalize |
normalize each column of |
Columnwise centered matrix of the same dimension as .
X <- matrix(1:30 + .1 * rnorm(30), 10, 3) X[1, 1] <- NA X[2, 3] <- NA v_hat <- inverse_prob_method(X, 1)
X <- matrix(1:30 + .1 * rnorm(30), 10, 3) X[1, 1] <- NA X[2, 3] <- NA v_hat <- inverse_prob_method(X, 1)
primePCA algorithm
primePCA( X, K, V_init = NULL, thresh_sigma = 10, max_iter = 1000, thresh_convergence = 1e-05, thresh_als = 1e-10, trace.it = F, prob = 1, save_file = "", center = T, normalize = F )
primePCA( X, K, V_init = NULL, thresh_sigma = 10, max_iter = 1000, thresh_convergence = 1e-05, thresh_als = 1e-10, trace.it = F, prob = 1, save_file = "", center = T, normalize = F )
X |
an |
K |
the number of the principal components of interest |
V_init |
an initial estimate of the top |
thresh_sigma |
used to select the "good" rows of |
max_iter |
maximum number of iterations of refinement |
thresh_convergence |
The algorithm is halted if the Frobenius-norm sine-theta distance between the two consecutive iterates |
thresh_als |
This is fed into |
trace.it |
report the progress if |
prob |
probability of reserving the "good" rows. |
save_file |
the location that saves the intermediate results, including |
center |
center each column of |
normalize |
normalize each column of |
a list is returned, with components V_cur
, step_cur
and loss_all
.
V_cur
is a -by-
matrix of the top
eigenvectors.
step_cur
is the number of iterations.
loss_all
is an array of the trajectory of MSE.
X <- matrix(1:30 + .1 * rnorm(30), 10, 3) X[1, 1] <- NA X[2, 3] <- NA v_tilde <- primePCA(X, 1)$V_cur
X <- matrix(1:30 + .1 * rnorm(30), 10, 3) X[1, 1] <- NA X[2, 3] <- NA v_tilde <- primePCA(X, 1)$V_cur
Frobenius norm sin theta distance between two column spaces
sin_theta_distance(V1, V2)
sin_theta_distance(V1, V2)
V1 |
a matrix with orthonormal columns |
V2 |
a matrix of the same dimension as V1 with orthonormal columns |
the Frobenius norm sin theta distance between two V1 and V2