| Title: | Projected Refinement for Imputation of Missing Entries in PCA |
|---|---|
| Description: | Implements the primePCA algorithm, developed and analysed in Zhu, Z., Wang, T. and Samworth, R. J. (2019) High-dimensional principal component analysis with heterogeneous missingness. <arXiv:1906.12125>. |
| Authors: | Ziwei Zhu, Tengyao Wang, Richard J. Samworth |
| Maintainer: | Ziwei Zhu <[email protected]> |
| License: | GPL-3 |
| Version: | 1.2 |
| Built: | 2026-05-23 06:55:33 UTC |
| Source: | https://github.com/cran/primePCA |
Center and/or normalize each column of a matrix
col_scale(X, center = T, normalize = F)col_scale(X, center = T, normalize = F)
X |
a numeric matrix with NAs or "Incomplete" matrix object (see softImpute package) |
center |
center each column of |
normalize |
normalize each column of |
a centered and/or normalized matrix of the same dimension as .
Inverse probability weighted method for estimating the top K eigenspaces
inverse_prob_method(X, K, trace.it = F, center = T, normalize = F)inverse_prob_method(X, K, trace.it = F, center = T, normalize = F)
X |
a numeric matrix with |
K |
the number of principal components of interest |
trace.it |
report the progress if |
center |
center each column of |
normalize |
normalize each column of |
Columnwise centered matrix of the same dimension as .
X <- matrix(1:30 + .1 * rnorm(30), 10, 3) X[1, 1] <- NA X[2, 3] <- NA v_hat <- inverse_prob_method(X, 1)X <- matrix(1:30 + .1 * rnorm(30), 10, 3) X[1, 1] <- NA X[2, 3] <- NA v_hat <- inverse_prob_method(X, 1)
primePCA algorithm
primePCA( X, K, V_init = NULL, thresh_sigma = 10, max_iter = 1000, thresh_convergence = 1e-05, thresh_als = 1e-10, trace.it = F, prob = 1, save_file = "", center = T, normalize = F )primePCA( X, K, V_init = NULL, thresh_sigma = 10, max_iter = 1000, thresh_convergence = 1e-05, thresh_als = 1e-10, trace.it = F, prob = 1, save_file = "", center = T, normalize = F )
X |
an |
K |
the number of the principal components of interest |
V_init |
an initial estimate of the top |
thresh_sigma |
used to select the "good" rows of |
max_iter |
maximum number of iterations of refinement |
thresh_convergence |
The algorithm is halted if the Frobenius-norm sine-theta distance between the two consecutive iterates |
thresh_als |
This is fed into |
trace.it |
report the progress if |
prob |
probability of reserving the "good" rows. |
save_file |
the location that saves the intermediate results, including |
center |
center each column of |
normalize |
normalize each column of |
a list is returned, with components V_cur, step_cur and loss_all.
V_cur is a -by- matrix of the top eigenvectors. step_cur is the number of iterations.
loss_all is an array of the trajectory of MSE.
X <- matrix(1:30 + .1 * rnorm(30), 10, 3) X[1, 1] <- NA X[2, 3] <- NA v_tilde <- primePCA(X, 1)$V_curX <- matrix(1:30 + .1 * rnorm(30), 10, 3) X[1, 1] <- NA X[2, 3] <- NA v_tilde <- primePCA(X, 1)$V_cur
Frobenius norm sin theta distance between two column spaces
sin_theta_distance(V1, V2)sin_theta_distance(V1, V2)
V1 |
a matrix with orthonormal columns |
V2 |
a matrix of the same dimension as V1 with orthonormal columns |
the Frobenius norm sin theta distance between two V1 and V2