Package 'NPBayesImputeCat' reference manual

Title:	Non-Parametric Bayesian Multiple Imputation for Categorical Data
Description:	These routines create multiple imputations of missing at random categorical data, and create multiply imputed synthesis of categorical data, with or without structural zeros. Imputations and syntheses are based on Dirichlet process mixtures of multinomial distributions, which is a non-parametric Bayesian modeling approach that allows for flexible joint modeling, described in Manrique-Vallier and Reiter (2014) <doi:10.1080/10618600.2013.844700>.
Authors:	Quanli Wang, Daniel Manrique-Vallier, Jerome P. Reiter and Jingchen Hu
Maintainer:	Jingchen Hu <jingchen.monika.hu@gmail.com>
License:	GPL (>= 3)
Version:	0.5
Built:	2025-03-07 06:59:05 UTC
Source:	CRAN

Bayesian Multiple Imputation for Large-Scale Categorical Data with Structural Zeros

Description

This package implements a fully Bayesian, joint modeling approach to multiple imputation for categorical data based on latent class models with structural zeros. The idea is to model the implied contingency table of the categorical variables as a mixture of independent multinomial distributions, estimating the mixture distributions nonparametrically with Dirichlet process prior distributions. Mixtures of multinomials can describe arbitrarily complex dependencies and are computationally expedient, so that they are effective general purpose multiple imputation engines. In contrast to other approaches based on loglinear models or chained equations, the mixture models avoid the need to specify (potentially many) models, which can be a very time-consuming task with no guarantee of a theoretically coherent set of models. The package is designed to include for structural zeros, i.e., certain combinations of variables are not possible a priori.

Details

Package:	NPBayesImputeCat
Type:	Package
Version:	0.4
Date:	2021-06-30
License:	GPL(>=3)

Author(s)

Quanli Wang, Daniel Manrique-Vallier, Jerome P. Reiter and Jingchen Hu

Maintainer: Quanli Wang<quanli@stat.duke.edu>

References

Manrique-Vallier, D. and Reiter, J.P. (2013), "Bayesian Estimation of Discrete Multivariate Latent Structure Models with Structural Zeros", JCGS.

Si, Y. and Reiter, J.P. (2013), "Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys", Journal of Educational and Behavioral Statistics, 38, 499 - 521

Manrique-Vallier, D. and Reiter, J.P. (2014), "Bayesian Multiple Imputation for Large-Scale Categorical Data with Structural Zeros", Survey Methodology.

Examples

require(NPBayesImputeCat)
#Please use NYexample data set for a more realistic example
data('NYMockexample')

#create the model
model <- CreateModel(X,MCZ,10,10000,0.25,0.25,8888)

#run 1 burnins, 2 mcmc iterations and thin every 2 iterations
model$Run(1,2,2,TRUE)

#retrieve parameters from the final iteration
result <- model$snapshot

#convert ImputedX matrix to dataframe, using proper factors/names etc.
ImputedX <- GetDataFrame(result$ImputedX,X)
#View(ImputedX)

#Most exhauststic examples can be found in the demo below
#demo(example_short)
#demo(example)

require(NPBayesImputeCat)
#Please use NYexample data set for a more realistic example
data('NYMockexample')

#create the model
model <- CreateModel(X,MCZ,10,10000,0.25,0.25,8888)

#run 1 burnins, 2 mcmc iterations and thin every 2 iterations
model$Run(1,2,2,TRUE)

#retrieve parameters from the final iteration
result <- model$snapshot

#convert ImputedX matrix to dataframe, using proper factors/names etc.
ImputedX <- GetDataFrame(result$ImputedX,X)
#View(ImputedX)

#Most exhauststic examples can be found in the demo below
#demo(example_short)
#demo(example)

Estimating marginal and joint probabilities in imputed or synthetic datasets

Description

Estimating marginal and joint probabilities in imputed or synthetic datasets

Usage

compute_probs(InputData, varlist)
compute_probs(InputData, varlist)

Arguments

`InputData`	a list of imputed or synthetic datasets
`varlist`	a list of variable names (or combination of names) to evaluate (marginal or joint) probabilities for

Value

Results: a list of marginal and joint probability results after combining rules

Create and initialize the Lcm model object

Description

CreateModel creates and initializes an Lcm Lcm object for non-parametric multiple imputation of discrete multivariate categorical data with or without structural zeros.

Usage

CreateModel(X, MCZ, K, Nmax, aalpha, balpha,seed)
CreateModel(X, MCZ, K, Nmax, aalpha, balpha,seed)

Arguments

`X`	a data frame with the dataset with missing values. All variables must be unordered factors.
`MCZ`	a dataframe with the definition of the structural zeros. Placeholder components are represented with NAs. Variables in MCZ must be factors with the same levels as X. Rows do not need to define disjoint regions of the contingency table. See Manrique-Vallier and Reiter (2014) for details of the definition of structural zeros. MCZ should be set to NULL when there are no structure zeros.
`K`	the maximum number of mixture components.
`Nmax`	An upper truncation limit for the augmented sample size. This parameter will be ignored(set to 0) when there is no structural zeros.
`aalpha`	the hyper parameter 'a' for alpha in stick-breaking prior distribution.
`balpha`	the hyper parameter 'b' for alpha in stick-breaking prior distribution.
`seed`	the random seed for sampling. When setting to NULL(default), the random seed will be set randomly.

Details

This function should be the first function one should call to use the 'NPBayesImputeCat' library. The returned model is a Lcm object. See ?Lcm for more details on the fields available and their arguments.

Value

CreateModel returns an Lcm object. The returned model object will be referenced in all subsequent calls.

References

Examples

require(NPBayesImputeCat)
#Please use NYexample data set for a more realistic example
data('NYMockexample')

#create the model
model <- CreateModel(X,MCZ,10,10000,0.25,0.25,8888)

#run 1 burnins, 2 mcmc iterations and thin every 2 iterations
model$Run(1,2,2,FALSE)

#retrieve parameters from the final iteration
result <- model$snapshot

#convert ImputedX matrix to dataframe, using proper factors/names etc.
ImputedX <- GetDataFrame(result$ImputedX,X)
#View(ImputedX)
require(NPBayesImputeCat)
#Please use NYexample data set for a more realistic example
data('NYMockexample')

#create the model
model <- CreateModel(X,MCZ,10,10000,0.25,0.25,8888)

#run 1 burnins, 2 mcmc iterations and thin every 2 iterations
model$Run(1,2,2,FALSE)

#retrieve parameters from the final iteration
result <- model$snapshot

#convert ImputedX matrix to dataframe, using proper factors/names etc.
ImputedX <- GetDataFrame(result$ImputedX,X)
#View(ImputedX)

Use DPMPM models to impute missing data where there are no structural zeros

Description

Use DPMPM models to impute missing data where there are no structural zeros

Usage

DPMPM_nozeros_imp(X, nrun, burn, thin, K, aalpha, balpha, m, seed, silent)
DPMPM_nozeros_imp(X, nrun, burn, thin, K, aalpha, balpha, m, seed, silent)

Arguments

`X`	data frame for the data containing missing values
`nrun`	number of mcmc iterations
`burn`	number of burn-in iterations
`thin`	thining parameter for outputing iterations
`K`	number of latent classes
`aalpha`	the hyperparameters in stick-breaking prior distribution for alpha
`balpha`	the hyperparameters in stick-breaking prior distribution for alpha
`m`	number of imputations
`seed`	choice of random seed
`silent`	Default to TRUE. Set this parameter to FALSE if more iteration info are to be printed

Value

`impdata`	m imputed datasets
`origdata`	original data containing missing values
`alpha`	saved posterior draws of alpha, which can be used to check MCMC convergence
`kstar`	saved number of occupied mixture components, which can be used to track whether K is large enough

Use DPMPM models to synthesize data where there are no structural zeros

Description

Use DPMPM models to synthesize data where there are no structural zeros

Usage

DPMPM_nozeros_syn(X, dj, nrun, burn, thin, K, aalpha, balpha, m, vars, seed, silent)
DPMPM_nozeros_syn(X, dj, nrun, burn, thin, K, aalpha, balpha, m, vars, seed, silent)

Arguments

`X`	data frame for the original data
`dj`	a vector recording the number of categories of the variables
`nrun`	number of mcmc iterations
`burn`	number of burn-in iterations
`thin`	thining parameter for outputing iterations
`K`	number of latent classes
`aalpha`	the hyperparameters in stick-breaking prior distribution for alpha
`balpha`	the hyperparameters in stick-breaking prior distribution for alpha
`m`	number of synthetic datasets
`vars`	the names of variables to be synthesized
`seed`	choice of random seed
`silent`	Default to TRUE. Set this parameter to FALSE if more iteration info are to be printed

Value

`syndata`	m synthetic datasets
`origdata`	original data
`alpha`	saved posterior draws of alpha, which can be used to check MCMC convergence
`kstar`	saved number of occupied mixture components, which can be used to track whether K is large enough

Use DPMPM models to impute missing data where there are no structural zeros

Description

Use DPMPM models to impute missing data where there are no structural zeros

Usage

DPMPM_zeros_imp(X, MCZ, Nmax, nrun, burn, thin, K, aalpha, balpha, m, seed, silent)
DPMPM_zeros_imp(X, MCZ, Nmax, nrun, burn, thin, K, aalpha, balpha, m, seed, silent)

Arguments

`X`	data frame for the data containing missing values
`MCZ`	data frame containing the structural zeros definition
`Nmax`	an upper truncation limit for the augmented sample size
`nrun`	number of mcmc iterations
`burn`	number of burn-in iterations
`thin`	thining parameter for outputing iterations
`K`	number of latent classes
`aalpha`	the hyperparameters in stick-breaking prior distribution for alpha
`balpha`	the hyperparameters in stick-breaking prior distribution for alpha
`m`	number of imputations
`seed`	choice of random seed
`silent`	Default to TRUE. Set this parameter to FALSE if more iteration info are to be printed

Value

`impdata`	m imputed datasets
`origdata`	original data containing missing values
`alpha`	save posterior draws of alpha, which can be used to check MCMC convergence
`kstar`	saved number of occupied mixture components, which can be used to track whether K is large enough
`Nmax`	saved posterior draws of the augmented sample size, which can be used to check MCMC convergence

Fit GLM models for imputed or synthetic datasets

Description

Fit GLM models for imputed or synthetic datasets

Usage

fit_GLMs(InputData, exp)
fit_GLMs(InputData, exp)

Arguments

`InputData`	a list of imputed or synthetic datasets
`exp`	GLM expression (for polr and nnet, those libraries should be loaded first)

Value

Results: a list of GLM results

Convert imputed data to a dataframe, using the same setting from original input data.

Description

This is a utility function to convert the imputed data matrix to a dataframe. This function will be implemented as a RCPP internal function later on.

Usage

GetDataFrame(dest, from, cols = 1:NCOL(from))
GetDataFrame(dest, from, cols = 1:NCOL(from))

Arguments

`dest`	the imputed output data matrix.
`from`	the original input dataframe.
`cols`	optinal. Always use default for now.

Value

The returned dataframe object for imputed data.

Examples

require(NPBayesImputeCat)
#Please use NYexample data set for a more realistic example
data('NYMockexample')

#create the model
model <- CreateModel(X,MCZ,10,10000,0.25,0.25,8888)

#run 1 burnins, 2 mcmc iterations and thin every 2 iterations
model$Run(1,2,2,TRUE)

#retrieve parameters from the final iteration
result <- model$snapshot

#convert ImputedX matrix to dataframe, using proper factors/names etc.
ImputedX <- GetDataFrame(result$ImputedX,X)
#View(ImputedX)
require(NPBayesImputeCat)
#Please use NYexample data set for a more realistic example
data('NYMockexample')

#create the model
model <- CreateModel(X,MCZ,10,10000,0.25,0.25,8888)

#run 1 burnins, 2 mcmc iterations and thin every 2 iterations
model$Run(1,2,2,TRUE)

#retrieve parameters from the final iteration
result <- model$snapshot

#convert ImputedX matrix to dataframe, using proper factors/names etc.
ImputedX <- GetDataFrame(result$ImputedX,X)
#View(ImputedX)

Convert disjointed structrual zeros to a dataframe, using the same setting from original structrual zero data.

Description

This is a utility function to convert the disjointed structrual zero matrix to a dataframe. This function will be implemented as a RCPP internal function later on.

Usage

GetMCZ(dest, from, mcz, cols = 1:NCOL(from))
GetMCZ(dest, from, mcz, cols = 1:NCOL(from))

Arguments

`dest`	the output data matrix for disjointed structrual zeros.
`from`	the original input dataframe.
`mcz`	the original input dataframe for structrual zeros.
`cols`	optinal. Always use default for now.

Value

The returned dataframe object for disjointed structrual zeros.

References

Perform MCMC diagnostics for kstar

Description

A helper function to perform MCMC diagnostics for kstar

Usage

kstar_MCMCdiag(kstar, nrun, burn, thin)
kstar_MCMCdiag(kstar, nrun, burn, thin)

Arguments

`kstar`	the vector output of kstar from running the DPMPM model
`nrun`	number of MCMC iterations used in running the DPMPM model
`burn`	number of burn-in iterations used in running the DPMPM model
`thin`	number of thinning used in running the DPMPM model

Value

`Traceplot`	the traceplot of kstar post burn-in and thinning
`Autocorrplot`	the autocorrelation plot of kstar post burn-in and thinning

Class `"Rcpp_Lcm"`

Description

This class implements the MCMC sampler for non-parametric imputation of discrete multivariate data described in Manrique-Vallier and Reiter (2014). It provides methods for updating and monitoring the sampler.

Details

Rcpp_lcm objects should be created with CreateModel. Please see the examples in the demo folder for more detailed explanation on model fitting and parameter tracing.

Extends

Class "C++Object", directly.

All reference classes extend and inherit methods from "envRefClass".

Fields

CurrentIteration:

the total number of iterations that have been run so far.

EnableTracer:

to check tracer status or to enable/disable the tracer.

MCZ:

the disjointed structural zero matrix.

snapshot:

retrieve a list with the current state of all the parameters in the sampler, including the imputed sample. A call the the "snapshot" method returns a list with the following components:

alpha:: the concentration parameter of the stick breaking prior.
k_star:: the effective number number of latent classes (mixture components)
Nmis:: the size of the augmented sample.
nu:: a vector with the mixture weights
z:: a matrix with the current latent class assignment of each member of the sample
ImputedX:: the current raw imputed dataset. Use GetDataFrame to convert the raw data to a data frame of factors as defined in the input data set.
psi:: The conditional multinomial probabilties. A Lmax * K * J array, where Lmax is the maximum number of levels of all discrete factors in the dataset, J is the number of factors in the dataset, and K is the number of latent classes. Since variables might have different numbers of levels, unused entries in the first dimension are filled with NAs to complete Lmax.

traceable:

list of model parameters that can be traced by the tracer.

traced:

list of model parameters that are traced.

Methods

SetTrace(paralist,num_of_iterations):

set parameters to be traced.

paralist:: a list of parameters to be traced.
num_of_iterations:: the maximum number of traced iterations.

Run(burnin, iter, thinning,silent):

run MCMC iterations.

burnin:: number of burn in iterations.
iter:: number of MCMC iterations.
thinning:: thinning parameter.
silent:: boolean indication if more iteration should be printed.

Resume():

resume from an interrupted call to run method.

Parameters(paralist):

retrieve a selected list of model parameters from last MCMC iteration.

paralist:: a list of parameters to be traced.

GetTrace():

retrieve all traced iterations. Returns a list with all the parameters set using the method SetTrace(). See description of snapshotreference method for a description of the parameters.

References

Examples

require(NPBayesImputeCat)
#Please use NYexample data set for a more realistic example
data('NYMockexample')

#create the model
model <- CreateModel(X,MCZ,10,10000,0.25,0.25,8888)

#run 1 burnins, 2 mcmc iterations and thin every 2 iterations
model$Run(1,2,2,TRUE)

#retrieve parameters from the final iteration
result <- model$snapshot

#convert ImputedX matrix to dataframe, using proper factors/names etc.
ImputedX <- GetDataFrame(result$ImputedX,X)
#View(ImputedX)
require(NPBayesImputeCat)
#Please use NYexample data set for a more realistic example
data('NYMockexample')

#create the model
model <- CreateModel(X,MCZ,10,10000,0.25,0.25,8888)

#run 1 burnins, 2 mcmc iterations and thin every 2 iterations
model$Run(1,2,2,TRUE)

#retrieve parameters from the final iteration
result <- model$snapshot

#convert ImputedX matrix to dataframe, using proper factors/names etc.
ImputedX <- GetDataFrame(result$ImputedX,X)
#View(ImputedX)

Plot estimated marginal probabilities from observed data vs imputed datasets

Description

Plot estimated marginal probabilities from observed data vs imputed datasets

Usage

marginal_compare_all_imp(obsdata, impdata, vars)
marginal_compare_all_imp(obsdata, impdata, vars)

Arguments

`obsdata`	he observed data
`impdata`	the list of m imputed datasets
`vars`	the variable of interest

Value

`Plot`	the barplot
`Comparison`	a table of marginal probabilies from observed data vs imputed datasets

Plot estimated marginal probabilities from observed data vs synthetic datasets

Description

Plot estimated marginal probabilities from observed data vs synthetic datasets

Usage

marginal_compare_all_syn(obsdata, syndata, vars)
marginal_compare_all_syn(obsdata, syndata, vars)

Arguments

`obsdata`	the observed data
`syndata`	the list of m imputed datasets
`vars`	the variable of interest

Value

`Plot`	the barplot
`Comparison`	a table of marginal probabilies from observed data vs imputed datasets

Example dataframe for structrual zeros based on the NYMockexample dataset.

Description

Example dataframe for structrual zeros based on the NYMockexample dataset. It contains 8 structural zero cases with 10 variables.

[,1]	AGE = 15 and EDUC = 8
[,2]	AGE = 16 and VESTAT = 2
[,3]	OWNERSHIP = 0 and MORTGAGE = 4
[,4]	AGE = 17 and EDUC = 11
[,5]	AGE = [36, 50] and EMPSTAT = 0
[,6]	AGE > 70 and DISABWRK = 0
[,7]	AGE < 15 and EDUC = 10
[,8]	OWNERSHIP = 2 and MORTGAGE = 1

Pool probability estimates from imputed or synthetic datasets

Description

Pool probability estimates from imputed or synthetic datasets

Usage

pool_estimated_probs(ComputeProbsResults, method = 
                      c("imputation", "synthesis_full", "synthesis_partial"))
pool_estimated_probs(ComputeProbsResults, method = 
                      c("imputation", "synthesis_full", "synthesis_partial"))

Arguments

`ComputeProbsResults`	output from the compute_probs function
`method`	choose between "imputation", "synthesis_full", "synthesis_partial"

Value

Results: a list of marginal and joint probability results after combining rules

Pool estimates of fitted GLM models in imputed or synthetic datasets

Description

Pool estimates of fitted GLM models in imputed or synthetic datasets

Usage

pool_fitted_GLMs(GLMResults, method = 
                      c("imputation", "synthesis_full", "synthesis_partial"))
pool_fitted_GLMs(GLMResults, method = 
                      c("imputation", "synthesis_full", "synthesis_partial"))

Arguments

`GLMResults`	output from the fit_GLMs function
`method`	choose between "imputation", "synthesis_full", "synthesis_partial"

Value

Results: a list of GLM results after combining rules

Rcpp implemenation of the Lcm functions

Description

This is the Rcpp implementation of the model class Lcm. All exposed functions and properties are documented in Lcm.

Example dataframe for structrual zeros based on the ss16pusa_sample_zeros dataset.

Description

Example dataframe for structrual zeros based on the ss16pusa_sample_zeros dataset. It contains 8 structural zero cases with 5 variables.

[,1]	AGEP = 16 and SCHL = Bachelor's degree
[,2]	AGEP = 16 and SCHL = Doctorate degree
[,3]	AGEP = 16 and SCHL = Master's degree
[,4]	AGEP = 16 and SCHL = Professional degree
[,5]	AGEP = 17 and SCHL = Bachelor's degree
[,6]	AGEP = 17 and SCHL = Doctorate degree
[,7]	AGEP = 17 and SCHL = Master's degree
[,8]	AGEP = 17 and SCHL = Professional degree

Example dataframe for structrual zeros based on the ss16pusa_sample_zeros dataset.

Description

Example dataframe for structrual zeros based on the ss16pusa_sample_zeros dataset. It contains 8 structural zero cases with 5 variables.

[,1]	AGEP = 16 and SCHL = Bachelor's degree
[,2]	AGEP = 16 and SCHL = Doctorate degree
[,3]	AGEP = 16 and SCHL = Master's degree
[,4]	AGEP = 16 and SCHL = Professional degree
[,5]	AGEP = 17 and SCHL = Bachelor's degree
[,6]	AGEP = 17 and SCHL = Doctorate degree
[,7]	AGEP = 17 and SCHL = Master's degree
[,8]	AGEP = 17 and SCHL = Professional degree

Example dataframe for input categorical data without structural zeros (without missing values).

Description

Example dataframe for input categorical data without structural zeros (without missing values). It contains 1000 observations and 3 variables.

[,1]	MAR	marital status	5 levels: Married; Widowed; Divorced; Separated; Never married.
[,2]	SEX	sex	2 levels: Male; Female.
[,3]	WKL	When last worked	3 levels: Within the last 12 months; 1-5 years ago;
			Over 5 years ago or never worked.

Example dataframe for input categorical data without structural zeros (with missing values).

Description

Example dataframe for input categorical data without structural zeros (with missing values). It contains 1000 observations and 3 variables.

[,1]	MAR	marital status	5 levels: Married; Widowed; Divorced; Separated; Never married.
[,2]	SEX	sex	2 levels: Male; Female.
[,3]	WKL	When last worked	3 levels: Within the last 12 months; 1-5 years ago;
			Over 5 years ago or never worked.

Example dataframe for input categorical data with structural zeros (without missing values).

Description

Example dataframe for input categorical data with structural zeros (without missing values). It contains 1000 observations and 5 variables.

[,1]	AGEP	age	7 levels: 16; 17; [18, 24]; [25, 35]; [36, 50]; [51, 70]; (70, ).
[,2]	MAR	marital status	5 levels: Married; Widowed; Divorced; Separated;
			Never married.
[,3]	SCHL	educational attainment	9 levels: Up to K0; Some K12, no diploma;
			High school diploma or GED; Some college, no degree;
			Associate's degree; Bachelor's degree; Master's degree;
			Professional degree; Doctorate degree.
[,4]	SEX	sex	2 levels: Male; Female.
[,5]	WKL	When last worked	3 levels: Within the last 12 months; 1-5 years ago;
			Over 5 years ago or never worked.

Example dataframe for input categorical data with structural zeros (with missing values).

Description

Example dataframe for input categorical data with structural zeros (with missing values). It contains 1000 observations and 5 variables.

[,1]	AGEP	age	7 levels: 16; 17; [18, 24]; [25, 35]; [36, 50]; [51, 70]; (70, ).
[,2]	MAR	marital status	5 levels: Married; Widowed; Divorced; Separated;
			Never married.
[,3]	SCHL	educational attainment	9 levels: Up to K0; Some K12, no diploma;
			High school diploma or GED; Some college, no degree;
			Associate's degree; Bachelor's degree; Master's degree;
			Professional degree; Doctorate degree.
[,4]	SEX	sex	2 levels: Male; Female.
[,5]	WKL	When last worked	3 levels: Within the last 12 months; 1-5 years ago;
			Over 5 years ago or never worked.

Allow user to update the model with data matrix of same kind.

Description

Allow user to replace initial matrix with a new data matrix of same size and same number of factors. This is not intended for general use and is only useful for very specific circumstance.

Usage

UpdateX(model, X)
UpdateX(model, X)

Arguments

`model`	The Rcpp model object created by the CreateModel function.
`X`	a data frame with the dataset with missing values. All variables must be unordered factors.

Example dataframe for input categorical data with missing values based on the NYMockexample dataset.

Description

Example dataframe for input categorical data with missing values based on the NYMockexample dataset. It contains 2000 observations and 10 variables.

[,1]	OWNERSHIP	ownership of dwelling	3 levels: N/A; Owned or being bought (loan);
			Rented.
[,2]	MORTGAGE	mortgate status	4 levels: N/A; No, owned free and clear;
			Yes, mortgaged / deed of trust or similar debt;
			Yes, contract to purchase.
[,3]	AGE	age	9 levels: [0, 14]; 15; 16; 17; [18, 24]; [25, 35]; [36, 50];
			9 [51, 70]; [71, ).
[,4]	SEX	sex	2 levels: Male; Female.
[,5]	MARST	martial status	6 levels: Married, spouse present; Married, spouse absent;
			Separated; Divorced; Widowed; Never married / single.
[,6]	RACESING	single race identification	5 levels: White; Black; American Indian / Alaska Native;
			Asian and / or Pacific Islander; Other race, non-Hispanic.
[,7]	EDUC	educational attainment	11 levels: N/A or no schooling; Nursery school to grade 4;
			Grade 5, 6, 7, or 8; Grade 9; Grade 10; Grade 11;
			Grade 12; 1 year of college; 2 years of college;
			4 years of college; 5+ years of college.
[,8]	EMPSTAT	employment status	4 levels: N/A; Employed; Unemployed; Not in labor force.
[,9]	DISABWRK	work disability status	3 levels: N/A; No disability that affects work;
			Disability causes difficulty working.
[,10]	VESTAT	veteran status	3 levels: N/A; Not a veteran; Veteran.

Package 'NPBayesImputeCat'

Help Index

Bayesian Multiple Imputation for Large-Scale Categorical Data with Structural Zeros

Description

Details

Author(s)

References

Examples

Estimating marginal and joint probabilities in imputed or synthetic datasets

Description

Usage

Arguments

Value

Create and initialize the Lcm model object

Description

Usage

Arguments

Details

Value

References

Examples

Use DPMPM models to impute missing data where there are no structural zeros

Description

Usage

Arguments

Value

Use DPMPM models to synthesize data where there are no structural zeros

Description

Usage

Arguments

Value

Use DPMPM models to impute missing data where there are no structural zeros

Description

Usage

Arguments

Value

Fit GLM models for imputed or synthetic datasets

Description

Usage

Arguments

Value

Convert imputed data to a dataframe, using the same setting from original input data.

Description

Usage

Arguments

Value

Examples

Convert disjointed structrual zeros to a dataframe, using the same setting from original structrual zero data.

Description

Usage

Arguments

Value

References

Perform MCMC diagnostics for kstar

Description

Usage

Arguments

Value

Class "Rcpp_Lcm"

Description

Details

Extends

Fields

Methods

References

Examples

Plot estimated marginal probabilities from observed data vs imputed datasets

Description

Usage

Arguments

Value

Plot estimated marginal probabilities from observed data vs synthetic datasets

Description

Usage

Arguments

Value

Example dataframe for structrual zeros based on the NYMockexample dataset.

Description

Pool probability estimates from imputed or synthetic datasets

Description

Class `"Rcpp_Lcm"`