Title: | Analysis of Multivariate Normal Datasets with Missing Values |
---|---|
Description: | An integrated set of functions for the analysis of multivariate normal datasets with missing values, including implementation of the EM algorithm, data augmentation, and multiple imputation. |
Authors: | Ported to R by Alvaro A. Novo <[email protected]>. Original by Joseph L. Schafer <[email protected]>. |
Maintainer: | John Fox <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0-11.1 |
Built: | 2025-01-14 06:23:36 UTC |
Source: | CRAN |
Changes missing value code to NA. It's called from 'prelim.norm'.
.code.to.na(x, mvcode)
.code.to.na(x, mvcode)
x |
data object. |
mvcode |
internal input of 'prelim.norm'. |
Initial data object with missing values code changed to NA.
Changes NA's to single precision missing value code It's called internally by other functions in the package, e.g., 'prelim.norm'.
.na.to.snglcode(x, mvcode)
.na.to.snglcode(x, mvcode)
x |
data object. |
mvcode |
internal input of 'prelim.norm'. |
Initial data object with missing values code precision changed to sinlge.
Data augmentation under a normal-inverted Wishart prior. If no prior is specified by the user, the usual "noninformative" prior for the multivariate normal distribution is used. This function simulates one or more iterations of a single Markov chain. Each iteration consists of a random imputation of the missing data given the observed data and the current parameter value (I-step), followed by a draw from the posterior distribution of the parameter given the observed data and the imputed data (P-step).
da.norm(s, start, prior, steps=1, showits=FALSE, return.ymis=FALSE)
da.norm(s, start, prior, steps=1, showits=FALSE, return.ymis=FALSE)
s |
summary list of an incomplete normal data matrix produced by the
function |
start |
starting value of the parameter. This is a parameter vector in packed
storage, such as one created by the function |
prior |
optional prior distribution. This is a list containing the
hyperparameters of a normal-inverted Wishart distribution. In order,
the elements of the list are: tau (a scalar), m (a scalar), mu0 (a
vector of length |
steps |
number of data augmentation iterations to be simulated. |
showits |
if |
return.ymis |
if |
if return.ymis=FALSE
, returns a parameter vector, the result of the last
P-step. If the value of steps
was large enough to guarantee
approximate stationarity, then this parameter can be regarded as a
proper draw from the observed-data posterior, independent of start
.
If return.ymis=TRUE
, then this function returns a list of the following
two components:
parameter |
a parameter vector, the result of the last P-step |
ymis |
a vector of missing values, the result of the last I-step. The length
of this vector is |
Before this function may be used, the random number generator seed
must be initialized with rngseed
at least once in the current S
session.
See Chapter 5 of Schafer (1996).
rngseed
, em.norm
, prelim.norm
, and getparam.norm
.
data(mdata) s <- prelim.norm(mdata) thetahat <- em.norm(s) #find the MLE for a starting value rngseed(1234567) #set random number generator seed theta <- da.norm(s,thetahat,steps=20,showits=TRUE) # take 20 steps getparam.norm(s,theta) # look at result
data(mdata) s <- prelim.norm(mdata) thetahat <- em.norm(s) #find the MLE for a starting value rngseed(1234567) #set random number generator seed theta <- da.norm(s,thetahat,steps=20,showits=TRUE) # take 20 steps getparam.norm(s,theta) # look at result
Performs maximum-likelihood estimation on the matrix of incomplete data using the EM algorithm. Can also be used to find a posterior mode under a normal-inverted Wishart prior supplied by the user.
em.norm(s, start, showits=TRUE, maxits=1000, criterion=0.0001, prior)
em.norm(s, start, showits=TRUE, maxits=1000, criterion=0.0001, prior)
s |
summary list of an incomplete normal data matrix produced by the
function |
start |
optional starting value of the parameter. This is a parameter vector
in packed storage, such as one created by the function
|
showits |
if |
maxits |
maximum number of iterations performed. The algorithm will stop if the parameter still has not converged after this many iterations. |
criterion |
convergence criterion. The algorithm stops when the maximum relative difference in all of the estimated means, variances, or covariances from one iteration to the next is less than or equal to this value. |
prior |
optional prior distribution. This is a list containing the hyperparameters of a normal-inverted Wishart distribution. In order, the elements of the list are: tau (a scalar), m (a scalar), mu0 (a vector of length ncol(x)), and lambdainv (a matrix of dimension c(ncol(x),ncol(x))). The elements of mu0 ans lambdainv apply to the data after transformation, i.e. after the columns have been centered and scaled to have mean zero and variance one. If no prior is supplied, the default is a uniform prior, which results in maximum-likelihood estimation. |
The default starting value takes all means on the transformed scale to be equal to zero, and covariance matrix on the transformed scale equal to the identity. All important computations are carried out in double precision, using the sweep operator.
a vector representing the maximum-likelihood estimates of the normal
parameters. This vector contains means, variances, and covariances on
the transformed scale in packed storage. The parameter can be
transformed back to the original scale and put into a more
understandable format by the function getparam.norm
.
See Section 5.3 of Schafer (1994).
prelim.norm
, getparam.norm
, and makeparam.norm
.
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #compute mle getparam.norm(s,thetahat,corr=TRUE)$r #look at estimated correlations
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #compute mle getparam.norm(s,thetahat,corr=TRUE)$r #look at estimated correlations
Takes a parameter vector, such as one produced by em.norm or da.norm, and returns a list of parameters on the original scale.
getparam.norm(s, theta, corr=FALSE)
getparam.norm(s, theta, corr=FALSE)
s |
summary list of an incomplete normal data matrix created by the
function |
theta |
vector of normal parameters expressed on transformed scale in packed
storage, such as one produced by the function |
corr |
if |
if corr=FALSE
, a list containing the components mu
and sigma
; if
corr=TRUE
, a list containing the components mu
, sdv
, and r
. The
components are:
mu |
vector of means. Elements are in the same order and on the same scale as the columns of the original data matrix, and with names corresponding to the column names of the original data matrix. |
sigma |
matrix of variances and covariances. |
sdv |
vector of standard deviations. |
r |
matrix of correlations. |
prelim.norm
and makeparam.norm
.
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #compute MLE getparam.norm(s,thetahat,corr=TRUE)$r #look at estimated correlations
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #compute MLE getparam.norm(s,thetahat,corr=TRUE)$r #look at estimated correlations
Draws missing elements of a data matrix under the multivariate normal model and a user-supplied parameter
imp.norm(s, theta, x)
imp.norm(s, theta, x)
s |
summary list of an incomplete normal data matrix |
theta |
value of the normal parameter under which the missing data are to be
randomly imputed. This is a parameter vector in packed storage, such
as one created by |
x |
the original data matrix used to create the summary list |
This function simply performs one I-step of data augmentation.
a matrix of the same form as x
, but with all missing values filled in
with simulated values drawn from their predictive distribution given
the observed data and the specified parameter.
Before this function may be used, the random number generator seed
must be initialized with rngseed
at least once in the current S
session.
See Section 5.4.1 of Schafer (1996).
prelim.norm
, makeparam.norm
, and rngseed
.
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #find the mle rngseed(1234567) #set random number generator seed ximp <- imp.norm(s,thetahat,mdata) #impute missing data under the MLE
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #find the mle rngseed(1234567) #set random number generator seed ximp <- imp.norm(s,thetahat,mdata) #impute missing data under the MLE
Evaluates the observed-data loglikelihood function at a user-supplied value of the parameter. This function is useful for monitoring the progress of EM and data augmentation.
loglik.norm(s, theta)
loglik.norm(s, theta)
s |
summary list of an incomplete normal data matrix created by the
function |
theta |
vector of normal parameters expressed on transformed scale in packed
storage, such as one produced by the function |
value of the observed-data loglikelihood
See Section 5.3.5 of Schafer (1996)
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #compute MLE loglik.norm(s,thetahat) #loglikelihood at the MLE
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #compute MLE loglik.norm(s,thetahat) #loglikelihood at the MLE
Evaluates the log of the observed-data posterior density at a user-supplied value of the parameter. Assumes a normal-inverted Wishart prior. This function is useful for monitoring the progress of EM and data augmentation.
logpost.norm(s, theta, prior)
logpost.norm(s, theta, prior)
s |
summary list of an incomplete normal data matrix created by the
function |
theta |
vector of normal parameters expressed on transformed scale in packed
storage, such as one produced by the function |
prior |
optional prior distribution. This is a list containing the
hyperparameters of a normal-inverted Wishart distribution. In order,
the elements of the list are: tau (a scalar), m (a scalar), mu0 (a
vector of length |
value of the observed-data log-posterior density
See Section 5.3.5 of Schafer (1996)
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations prior <- list(0,.5,rep(0,ncol(mdata)), .5*diag(rep(1,ncol(mdata)))) #ridge prior with .5 df thetahat <- em.norm(s,prior=prior) #compute posterior mode logpost.norm(s,thetahat,prior) #log-posterior at mode
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations prior <- list(0,.5,rep(0,ncol(mdata)), .5*diag(rep(1,ncol(mdata)))) #ridge prior with .5 df thetahat <- em.norm(s,prior=prior) #compute posterior mode logpost.norm(s,thetahat,prior) #log-posterior at mode
Does the opposite of getparam.norm
.
Converts a list of user-specified parameters to a parameter vector
suitable for input to functions such as da.norm
and em.norm
.
makeparam.norm(s, thetalist)
makeparam.norm(s, thetalist)
s |
summary list of an incomplete normal data matrix created
by the function |
thetalist |
list of normal parameters of the same form as one produced by
|
normal parameter in packed storage, suitable for use as a starting
value for em.norm
, mda.norm
, or mdamet.norm
.
prelim.norm
and getparam.norm
.
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #compute mle thetahat <- getparam.norm(s,thetahat,corr=TRUE) #extract parameters thetahat$r #look at mle correlations thetahat$r[1,2] <- .5 #tweak a parameter thetahat <- makeparam.norm(s,thetahat) #convert to packed storage thetahat <- em.norm(s,thetahat) #run EM again from new starting value
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations thetahat <- em.norm(s) #compute mle thetahat <- getparam.norm(s,thetahat,corr=TRUE) #extract parameters thetahat$r #look at mle correlations thetahat$r[1,2] <- .5 #tweak a parameter thetahat <- makeparam.norm(s,thetahat) #convert to packed storage thetahat <- em.norm(s,thetahat) #run EM again from new starting value
Monotone data augmentation under the usual noninformative prior, as
described in Chapter 6 of Schafer (1996). This function simulates one
or more iterations of a single Markov chain. One iteration consists of
a random imputation of the missing data given the observed data and
the current parameter value (I-step), followed by a draw from the
posterior distribution of the parameter given the observed data and
the imputed data (P-step). The I-step imputes only enough data to
complete a monotone pattern, which typically makes this algorithm
converge more quickly than da.norm
, particularly when the observed
data are nearly monotone. The order of the variables in the original
data matrix determines the monotone pattern to be completed. For fast
convergence, it helps to order the variables according to their rates
of missingness, with the most observed (least missing) variable on the
left and the least observed variable on the right.
mda.norm(s, theta, steps=1, showits=FALSE)
mda.norm(s, theta, steps=1, showits=FALSE)
s |
summary list of an incomplete normal data matrix produced by the
function |
theta |
starting value of the parameter. This is a parameter vector in packed
storage, such as one created by the function |
steps |
number of monotone data augmentation iterations to be simulated. |
showits |
if |
Returns a parameter vector, the result of the last P-step. If the
value of steps
was large enough to guarantee approximate
stationarity, then this parameter can be regarded as a proper draw
from the observed-data posterior, independent of start
.
Before this function may be used, the random number generator seed
must be initialized with rngseed
at least once in the current S
session.
Chapter 6 of Schafer (1996).
rngseed
, em.norm
, prelim.norm
, and getparam.norm
.
data(mdata) s <- prelim.norm(mdata) thetahat <- em.norm(s) #find the MLE for a starting value rngseed(1234567) #set random number generator seed theta <- mda.norm(s,thetahat,steps=20,showits=TRUE) # take 20 steps getparam.norm(s,theta) # look at result
data(mdata) s <- prelim.norm(mdata) thetahat <- em.norm(s) #find the MLE for a starting value rngseed(1234567) #set random number generator seed theta <- mda.norm(s,thetahat,steps=20,showits=TRUE) # take 20 steps getparam.norm(s,theta) # look at result
Household survey with missing values. See Schafer~(1997).
Schafer, J.L.(1997) Analysis of Incomplete Multivariate Data, Chapman & Hall, London. ISBN: 0412040611
Combines estimates and standard errors from m complete-data analyses performed on m imputed datasets to produce a single inference. Uses the technique described by Rubin (1987) for multiple imputation inference for a scalar estimand.
mi.inference(est, std.err, confidence=0.95)
mi.inference(est, std.err, confidence=0.95)
est |
a list of $m$ (at least 2) vectors representing estimates (e.g., vectors of estimated regression coefficients) from complete-data analyses performed on $m$ imputed datasets. |
std.err |
a list of $m$ vectors containing standard errors from the
complete-data analyses corresponding to the estimates in |
confidence |
desired coverage of interval estimates. |
a list with the following components, each of which is a vector of the
same length as the components of est
and std.err
:
est |
the average of the complete-data estimates. |
std.err |
standard errors incorporating both the between and the within-imputation uncertainty (the square root of the "total variance"). |
df |
degrees of freedom associated with the t reference distribution used for interval estimates. |
signif |
P-values for the two-tailed hypothesis tests that the estimated quantities are equal to zero. |
lower |
lower limits of the (100*confidence)% interval estimates. |
upper |
upper limits of the (100*confidence)% interval estimates. |
r |
estimated relative increases in variance due to nonresponse. |
fminf |
estimated fractions of missing information. |
Uses the method described on pp. 76-77 of Rubin (1987) for combining the complete-data estimates from $m$ imputed datasets for a scalar estimand. Significance levels and interval estimates are approximately valid for each one-dimensional estimand, not for all of them jointly.
See Rubin (1987) or Schafer (1996), Chapter 4.
Simulates a value from a normal-inverted Wishart distribution. This function may be useful for obtaining starting values of the parameters of a multivariate normal distribution for multiple chains of data augmentation.
ninvwish(s, params)
ninvwish(s, params)
s |
summary list of an incomplete normal data matrix produced by the
function |
params |
list of parameters of a normal-inverted Wishart distribution. In order, the elements of the list are: tau (a scalar), m (a scalar), mu0 (a vector of length ncol(x)), and lambdainv (a matrix of dimension c(ncol(x),ncol(x))). When using this function to create starting values for data augmentation, mu0 and lambdainv should be chosen in relation to the data matrix after the columns have been centered and scaled to have mean zero and variance one. |
a vector in packed storage representing the simulated normal-inverted
Wishart variate. This vector has the same form as parameter vectors
produced by functions such as em.norm
and da.norm
, and may be
used directly as a starting value for these functions. This vector can
also be put into a more understandable format by getparam.norm
.
Before this function may be used, the random number generator seed
must be initialized with rngseed
at least once in the current S
session.
See Section 5.4.2 of Schafer (1996).
rngseed
, getparam.norm
, em.norm
and da.norm
.
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations params <- list(1,.5,rep(0,ncol(mdata)), .5*diag(rep(1,ncol(mdata)))) # gives widely dispersed values rngseed(1234567) start <- ninvwish(s,params) # draw a variate thetahat <- em.norm(s,start=start) # run EM from this starting value
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations params <- list(1,.5,rep(0,ncol(mdata)), .5*diag(rep(1,ncol(mdata)))) # gives widely dispersed values rngseed(1234567) start <- ninvwish(s,params) # draw a variate thetahat <- em.norm(s,start=start) # run EM from this starting value
Sorts rows of x by missingness patterns, and centers/scales
columns of x. Calculates various bookkeeping quantities needed
for input to other functions, such as em.norm
and da.norm
.
prelim.norm(x)
prelim.norm(x)
x |
data matrix containing missing values. The rows of x
correspond to observational units, and the columns to variables.
Missing values are denoted by |
a list of thirteen components that summarize various features of x after the data have been centered, scaled, and sorted by missingness patterns. Components that might be of interest to the user include:
nmis |
a vector of length ncol(x) containing the number of missing values for each variable in x. This vector has names that correspond to the column names of x, if any. |
r |
matrix of response indicators showing the missing data patterns in x. Dimension is (S,p) where S is the number of distinct missingness patterns in the rows of x, and p is the number of columns in x. Observed values are indicated by 1 and missing values by 0. The row names give the number of observations in each pattern, and the column names correspond to the column names of x. |
See Section 5.3.1 of Schafer (1996).
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations s$nmis[s$co] #look at nmis s$r #look at missing data patterns
data(mdata) s <- prelim.norm(mdata) #do preliminary manipulations s$nmis[s$co] #look at nmis s$r #look at missing data patterns
Initializes the seed value for the internal random-number generator used in missing-data programs
rngseed(seed)
rngseed(seed)
seed |
a positive number > = 1, preferably a large integer. |
NULL
.
The random number generator seed must be set at least once
by this function before the simulation or imputation functions
in this package (da.norm
, etc.) can be used.