Title: | Imputation of High-Dimensional Count Data using Side Information |
---|---|
Description: | Analysis, imputation, and multiple imputation of count data using covariates. LORI uses a log-linear Poisson model where main row and column effects, as well as effects of known covariates and interaction terms can be fitted. The estimation procedure is based on the convex optimization of the Poisson loss penalized by a Lasso type penalty and a nuclear norm. LORI returns estimates of main effects, covariate effects and interactions, as well as an imputed count table. The package also contains a multiple imputation procedure. The methods are described in Robin, Josse, Moulines and Sardy (2019) <arXiv:1703.02296v4>. |
Authors: | Genevieve Robin [aut, cre] |
Maintainer: | Genevieve Robin <[email protected]> |
License: | GPL-3 |
Version: | 2.2.2 |
Built: | 2024-11-06 06:40:33 UTC |
Source: | CRAN |
Originally published in Choler, P. 2005. Consistent shifts in Alpine plant traits along a mesotopographical gradient. Arctic, Antarctic, and Alpine Research 37: 444–453.
data(aravo)
data(aravo)
A list with 4 attributes:
abundance table of 82 species in 75 environments
a matrix of 6 covariates for the 75 environments
a matrix of 8 covariates for the 82 species
a vector of 82 species names
Analysed in Dray, S., Choler, P., Dolédec, S., Peres-Neto, P.R., Thuiler, W., Pavoine, S. & ter Braak, C.J.F. 2014. Combining the fourth-corner and the RLQ methods for assessing trait responses to environmental variation. Ecology 95: 14-21
Description from Dray et al. (2014): Community composition of vascular plants was determined in 75 5 × 5 m plots. Each site was described by six environmental variables: mean snowmelt date over the period 1997–1999, slope inclination, aspect, index of microscale landform, index of physical disturbance due to cryoturbation and solifluction, and an index of zoogenic disturbance due to trampling and burrowing activities of the Alpine marmot. All variables are quantitative except the landform and zoogenic disturbance indices that are categorical variables with five and three categories, respectively. Eight quantitative functional traits (i.e., vegetative height, lateral spread, leaf elevation angle, leaf area, leaf thickness, specific leaf area, mass-based leaf nitrogen content, and seed mass) were measured on the 82 most abundant plant species (out of a total of 132 recorded species).
http://pbil.univ-lyon1.fr/ade4/ade4-html/aravo.html
covmat
covmat(n, p, R = NULL, C = NULL, E = NULL, center = F)
covmat(n, p, R = NULL, C = NULL, E = NULL, center = F)
n |
number of rows |
p |
number ofcolumns |
R |
nxK1 matrix of row covariates |
C |
nxK2 matrix of column covariates |
E |
(n+p)xK3 matrix of row-column covariates |
center |
boolean indicating whether the returned covariate matrix should be centered (for identifiability) |
the joint product of R and C column-binded with E, a (np)x(K1+K2+K3) matrix in order row1col1,row2col1,...,rowncol1, row1col2, row2col2,...,rowncolp
R <- matrix(rnorm(10), 5) C <- matrix(rnorm(9), 3) covs <- covmat(5,3,R,C)
R <- matrix(rnorm(10), 5) C <- matrix(rnorm(9), 3) covs <- covmat(5,3,R,C)
The cv.lori method performs automatic selection of the regularization parameters (lambda1 and lambda2) used in the lori function. These parameters are selected by cross-validation. The classical procedure is to apply cv.lori to the data to select the regularization parameters, and to then impute and analyze the data using the lori function (or mi.lori for multiple imputation).
cv.lori( Y, cov = NULL, intercept = T, reff = T, ceff = T, rank.max = 5, N = 5, len = 20, prob = 0.2, algo = c("alt", "mcgd"), thresh = 1e-05, maxit = 10, trace.it = F, parallel = F )
cv.lori( Y, cov = NULL, intercept = T, reff = T, ceff = T, rank.max = 5, N = 5, len = 20, prob = 0.2, algo = c("alt", "mcgd"), thresh = 1e-05, maxit = 10, trace.it = F, parallel = F )
Y |
[matrix, data.frame] abundance table (nxp) |
cov |
[matrix, data.frame] design matris (npxq) |
intercept |
[boolean] whether an intercept should be fitted, default value is FALSE |
reff |
[boolean] whether row effects should be fitted, default value is TRUE |
ceff |
[boolean] whether column effects should be fitted, default value is TRUE |
rank.max |
[integer] maximum rank of interaction matrix, default is 2 |
N |
[integer] number of cross-validation folds |
len |
[integer] the size of the grid |
prob |
[numeric in (0,1)] the proportion of entries to remove for cross-validation |
algo |
type of algorithm to use, either one of "mcgd" (mixed coordinate gradient descent, adapted to large dimensions) or "alt" (alternating minimization, adapted to small dimensions) |
thresh |
[positive number] convergence threshold, default is 1e-5 |
maxit |
[integer] maximum number of iterations, default is 100 |
trace.it |
[boolean] whether information about convergence should be printed |
parallel |
[boolean] whether computations should be performed in parallel on multiple cores |
A list with the following elements
lambda1 |
regularization parameter estimated by cross-validation for nuclear norm penalty (interaction matrix) |
lambda2 |
regularization parameter estimated by cross-validation for l1 norm penalty (main effects) |
errors |
a table containing the prediction errors for all pairs of parameters |
X <- matrix(rnorm(20), 10) Y <- matrix(rpois(10, 1:10), 5) res <- cv.lori(Y, X, N=2, len=2)
X <- matrix(rnorm(20), 10) Y <- matrix(rpois(10, 1:10), 5) res <- cv.lori(Y, X, N=2, len=2)
The lori method implements a method to analyze and impute incomplete count tables. An important feature of the method is that it can take into account main effects of rows and columns, as well as effects of continuous or categorical covariates, and interaction. The estimation procedure is based on minimizing a Poisson loss penalized by a Lasso type penalty (sparse vector of covariate effects) and a nuclear norm penalty inducing a low-rank interaction matrix (a few latent factors summarize the interactions).
lori( Y, cov = NULL, lambda1 = NULL, lambda2 = NULL, intercept = T, reff = T, ceff = T, rank.max = 2, algo = c("alt", "mcgd"), thresh = 1e-05, maxit = 100, trace.it = F, parallel = F )
lori( Y, cov = NULL, lambda1 = NULL, lambda2 = NULL, intercept = T, reff = T, ceff = T, rank.max = 2, algo = c("alt", "mcgd"), thresh = 1e-05, maxit = 100, trace.it = F, parallel = F )
Y |
[matrix, data.frame] count table (nxp). |
cov |
[matrix, data.frame] design matrix (np*q) in order row1xcol1,row2xcol2,..,rownxcol1,row1xcol2,row2xcol2,...,...,rownxcolp |
lambda1 |
[positive number] the regularization parameter for the interaction matrix. |
lambda2 |
[positive number] the regularization parameter for the covariate effects. |
intercept |
[boolean] whether an intercept should be fitted, default value is FALSE |
reff |
[boolean] whether row effects should be fitted, default value is TRUE |
ceff |
[boolean] whether column effects should be fitted, default value is TRUE |
rank.max |
[integer] maximum rank of interaction matrix (smaller than min(n-1,p-1)) |
algo |
type of algorithm to use, either one of "mcgd" (mixed coordinate gradient descent, adapted to large dimensions) or "alt" (alternating minimization, adapted to small dimensions) |
thresh |
[positive number] convergence tolerance of algorithm, by default |
maxit |
[integer] maximum allowed number of iterations. |
trace.it |
[boolean] whether convergence information should be printed |
parallel |
[boolean] whether computations should be performed in parallel on multiple cores |
A list with the following elements
X |
nxp matrix of log of expected counts |
alpha |
row effects |
beta |
column effects |
epsilon |
covariate effects |
theta |
nxp matrix of row-column interactions |
imputed |
nxp matrix of imputed counts |
means |
nxp matrix of expected counts (exp(X)) |
cov |
npxK matrix of covariates |
The mi.lori performs M multiple imputations using the lori method. Multiple imputation allows to produce estimates of missing values, as well as intervals of variability. The classical procedure is to perform M multiple imputations using the mi.lori method, and to aggregate them using the pool.lori method.
mi.lori( Y, cov = NULL, lambda1 = NULL, lambda2 = NULL, M = 25, intercept = T, reff = T, ceff = T, rank.max = 5, algo = c("alt", "mcgd"), thresh = 1e-05, maxit = 1000, trace.it = F )
mi.lori( Y, cov = NULL, lambda1 = NULL, lambda2 = NULL, M = 25, intercept = T, reff = T, ceff = T, rank.max = 5, algo = c("alt", "mcgd"), thresh = 1e-05, maxit = 1000, trace.it = F )
Y |
[matrix, data.frame] count table (nxp). |
cov |
[matrix, data.frame] design matrix (np*q) in order row1xcol1,row2xcol2,..,rownxcol1,row1xcol2,row2xcol2,...,...,rownxcolp |
lambda1 |
[positive number] the regularization parameter for the interaction matrix. |
lambda2 |
[positive number] the regularization parameter for the covariate effects. |
M |
[integer] the number of multiple imputations to perform |
intercept |
[boolean] whether an intercept should be fitted, default value is FALSE |
reff |
[boolean] whether row effects should be fitted, default value is TRUE |
ceff |
[boolean] whether column effects should be fitted, default value is TRUE |
rank.max |
[integer] maximum rank of interaction matrix (smaller than min(n-1,p-1)) |
algo |
type of algorithm to use, either one of "mcgd" (mixed coordinate gradient descent, adapted to large dimensions) or "alt" (alternating minimization, adapted to small dimensions) |
thresh |
[positive number] convergence tolerance of algorithm, by default |
maxit |
[integer] maximum allowed number of iterations. |
trace.it |
[boolean] whether convergence information should be printed |
mi.imputed |
a list of length M containing the imputed count tables |
mi.alpha |
a (Mxn) matrix containing in rows the estimated row effects (one row corresponds to one single imputation) |
mi.beta |
a (Mxp) matrix containing in rows the estimated column effects (one row corresponds to one single imputation) |
mi.epsilon |
a (Mxq) matrix containing in rows the estimated effects of covariates (one row corresponds to one single imputation) |
mi.theta |
a list of length M containing the estimated interaction matrices |
mi.mu |
a list of length M containing the estimated Poisson means |
mi.y |
list of bootstrapped count tables used fot multiple imputation |
Y |
original incomplete count table |
X <- matrix(rnorm(50), 25) Y <- matrix(rpois(25, 1:25), 5) res <- mi.lori(Y, X, 10, 10, 2)
X <- matrix(rnorm(50), 25) Y <- matrix(rpois(25, 1:25), 5) res <- mi.lori(Y, X, 10, 10, 2)
The pool.lori method aggregates lori multiple imputation results. Multiple imputation allows to produce estimates of missing values, as well as intervals of variability. The classical procedure is to perform multiple imputation using the mi.lori method, and to aggregate them using the pool.lori method.
pool.lori(res.mi)
pool.lori(res.mi)
res.mi |
a multiple imputation result from the function mi.lori |
pool.impute |
a list containing the pooled means (mean) and variance (var) of the imputed values |
pool.alpha |
a list containing the pooled means (mean) and variance (var) of the row effects |
pool.beta |
a list containing the pooled means (mean) and variance (var) of the column effects |
pool.epsilon |
a list containing the pooled means (mean) and variance (var) of the covariate effects |
pool.theta |
a list containing the pooled means (mean) and variance (var) of the interactions |
X <- matrix(rnorm(50), 25) Y <- matrix(rpois(25, 1:25), 5) res <- mi.lori(Y, X, 10, 10, 2) poolres <- pool.lori(res)
X <- matrix(rnorm(50), 25) Y <- matrix(rpois(25, 1:25), 5) res <- mi.lori(Y, X, 10, 10, 2) poolres <- pool.lori(res)
automatic selection of nuclear norm regularization parameter
qut(Y, cov, lambda2 = 0, q = 0.95, N = 100, reff = T, ceff = T)
qut(Y, cov, lambda2 = 0, q = 0.95, N = 100, reff = T, ceff = T)
Y |
A matrix of counts (contingency table). |
cov |
A (np)xK matrix of K covariates about rows and columns |
lambda2 |
A positive number, the regularization parameter for covariates main effects |
q |
A number between |
N |
An integer. The number of parametric bootstrap samples to draw. |
reff |
[boolean] whether row effects should be fitted, default value is TRUE |
ceff |
[boolean] whether column effects should be fitted, default value is TRUE |
the value of $lambda_QUT$ to use in LoRI.
X = matrix(rnorm(30), 15) Y = matrix(rpois(15, 1:15), 5) lambda = qut(Y,X, 10, N=10)
X = matrix(rnorm(30), 15) Y = matrix(rpois(15, 1:15), 5) lambda = qut(Y,X, 10, N=10)