Title: | Main Effects and Interactions in Mixed and Incomplete Data |
---|---|
Description: | Generalized low-rank models for mixed and incomplete data frames. The main function may be used for dimensionality reduction of imputation of numeric, binary and count data (simultaneously). Main effects such as column means, group effects, or effects of row-column side information (e.g. user/item attributes in recommendation system) may also be modelled in addition to the low-rank model. Geneviève Robin, Olga Klopp, Julie Josse, Éric Moulines, Robert Tibshirani (2018) <arXiv:1806.09734>. |
Authors: | Geneviève Robin |
Maintainer: | Genevieve Robin <[email protected]> |
License: | GPL-3 |
Version: | 0.2.0 |
Built: | 2024-12-21 06:48:33 UTC |
Source: | CRAN |
A dataset containing answers of 24614 Alabama households to 20 questions
acs2016
acs2016
survey A data frame with 24614 rows and 20 columns:
Number of persons in household
Access to the internet. 1 yes 0 no.
Sales of agriculture products ($, yearly)
Bathtub or shower. 0 yes 1 no.
Number of bedrooms in household.
Cellular data plan for a smartphone or other mobile device
1 yes 2 no
Other computer equipment. 1 yes 2 no
Condo fee ($, monthly)
Electricity ($, monthly)
Food Stamps. 0 no 1 yes
Fuel cost ($, yearly)
Gas ($, monthly)
Mobile home costs
$, yearly
Refrigerator, 1 yes, 2 no.
Number of rooms in household
Hot and cold running water. 1 yes 2 no
Satellite internet service. 1 yes 2 no.
Water ($, yearly)
Family income allocation flag (past 12 months) 0 No 1 yes.
https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t
construct covariate matrix (predictor matrix) in the right format for input to the mimi or cv.mimi functions from tables of attributes about the rows or columns of data frames.
covmat(n, p, R = NULL, C = NULL, E = NULL, center = T)
covmat(n, p, R = NULL, C = NULL, E = NULL, center = T)
n |
number of rows |
p |
number ofcolumns |
R |
nxK1 matrix of row covariates |
C |
nxK2 matrix of column covariates |
E |
(n+p)xK3 matrix of row-column covariates |
center |
boolean indicating whether the returned covariate matrix should be centered (for identifiability) |
the joint product of R and C column-binded with E, a (np)x(K1+K2+K3) matrix in order row1col1,row2col1,...,rowncol1, row1col2, row2col2,...,rowncolp
R <- matrix(rnorm(10), 5) C <- matrix(rnorm(9), 3) covs <- covmat(5,3,R,C)
R <- matrix(rnorm(10), 5) C <- matrix(rnorm(9), 3) covs <- covmat(5,3,R,C)
selection of the regularization parameters (lambda1 and lambda2) of the mimi function by cross-validation
cv.mimi(y, model = c("low-rank", "covariates"), var.type, x = NULL, groups = NULL, N = 5, algo = c("mcgd", "bcgd"), thresh = 1e-05, maxit = 100, max.rank = NULL, trace.it = F, parallel = F, len = 15)
cv.mimi(y, model = c("low-rank", "covariates"), var.type, x = NULL, groups = NULL, N = 5, algo = c("mcgd", "bcgd"), thresh = 1e-05, maxit = 100, max.rank = NULL, trace.it = F, parallel = F, len = 15)
y |
[matrix, data.frame] incomplete and mixed data frame (nxp) |
model |
either one of "groups", "covariates" or "low-rank", indicating which model should be fitted |
var.type |
vector of length p indicating types of y columns (gaussian, binomial, poisson) |
x |
[matrix, data.frame] covariate matrix (npxq) |
groups |
factor of length n indicating groups (optional) |
N |
[integer] number of cross-validation folds |
algo |
type of algorithm to use, either one of "bcgd" (small dimensions, gaussian and binomial variables) or "mcgd" (large dimensions, poisson variables) |
thresh |
[positive number] convergence threshold, default is 1e-5 |
maxit |
[integer] maximum number of iterations, default is 100 |
max.rank |
[integer] maximum rank of interaction matrix, default is 2 |
trace.it |
[boolean] whether information about convergence should be printed |
parallel |
[boolean] whether the N-fold cross-validation should be parallelized, default value is TRUE |
len |
[integer] the size of the grid |
A list with the following elements
lambda1 |
regularization parameter estimated by cross-validation for nuclear norm penalty (interaction matrix) |
lambda2 |
regularization parameter estimated by cross-validation for l1 norm penalty (main effects) |
errors |
a table containing the prediction errors for all pairs of parameters |
main function: low-rank models to analyze and impute mixed and incomplete data frames with numeric, binary and discrete variables, and missing values
mimi(y, model = c("low-rank", "multilevel", "covariates"), x = NULL, groups = NULL, var.type = c("gaussian", "binomial", "poisson"), lambda1, lambda2, algo = c("mcgd", "bcgd"), maxit = 100, alpha0 = NULL, theta0 = NULL, thresh = 1e-05, trace.it = F, max.rank = NULL)
mimi(y, model = c("low-rank", "multilevel", "covariates"), x = NULL, groups = NULL, var.type = c("gaussian", "binomial", "poisson"), lambda1, lambda2, algo = c("mcgd", "bcgd"), maxit = 100, alpha0 = NULL, theta0 = NULL, thresh = 1e-05, trace.it = F, max.rank = NULL)
y |
nxp matrix of observations |
model |
either one of "groups", "covariates" or "low-rank", indicating which model should be fitted |
x |
(np)xN matrix of covariates (optional) |
groups |
factor of length n indicating groups (optional) |
var.type |
vector of length p indicating the data types of the columns of y (gaussian, binomial or poisson) |
lambda1 |
positive number regularization parameter for nuclear norm penalty |
lambda2 |
positive number regularization parameter for l1 norm penalty |
algo |
type of algorithm to use, either one of "bcgd" (small dimensions, gaussian and binomial variables) or "mcgd" (large dimensions, poisson variables) |
maxit |
integer maximum number of iterations |
alpha0 |
vector of length N: initial value of regression parameter (optional) |
theta0 |
matrix of size nxp: initial value of interactions (optional) |
thresh |
positive number, convergence criterion |
trace.it |
boolean indicating whether convergence information should be printed |
max.rank |
integer, maximum rank of interaction matrix theta |
A list with the following elements
alpha |
vector of main effects |
theta |
interaction matrix |
n = 6; p = 2 y1 <- matrix(rnorm(mean = 0, n * p), nrow = n) y2 <- matrix(rnorm(mean = 0, n * p), nrow = n) y3 <- matrix(rnorm(mean = 2, n * p), nrow = n) y <- cbind(matrix(rnorm(mean = c(y1), n * p), nrow = n), matrix(rbinom(n * p, prob = c(exp(y2)/(1+exp(y2))), size = 1), nrow = n), matrix(rpois(n * p, lambda = c(exp(y3))), nrow = n)) var.type <- c(rep("gaussian", p), rep("binomial", p), rep("poisson", p)) idx_NA <- sample(1:(3 * n * p), size = round(0.01 * 3 * n * p)) y[idx_NA] <- NA res <- mimi(y, model = "low-rank", var.type = var.type, lambda1 = 1, maxit=5)
n = 6; p = 2 y1 <- matrix(rnorm(mean = 0, n * p), nrow = n) y2 <- matrix(rnorm(mean = 0, n * p), nrow = n) y3 <- matrix(rnorm(mean = 2, n * p), nrow = n) y <- cbind(matrix(rnorm(mean = c(y1), n * p), nrow = n), matrix(rbinom(n * p, prob = c(exp(y2)/(1+exp(y2))), size = 1), nrow = n), matrix(rpois(n * p, lambda = c(exp(y3))), nrow = n)) var.type <- c(rep("gaussian", p), rep("binomial", p), rep("poisson", p)) idx_NA <- sample(1:(3 * n * p), size = round(0.01 * 3 * n * p)) y[idx_NA] <- NA res <- mimi(y, model = "low-rank", var.type = var.type, lambda1 = 1, maxit=5)