Title: | Simple Imputation |
---|---|
Description: | Easy to use interfaces to a number of imputation methods that fit in the not-a-pipe operator of the 'magrittr' package. |
Authors: | Mark van der Loo [aut, cre] |
Maintainer: | Mark van der Loo <[email protected]> |
License: | GPL-3 |
Version: | 0.2.8 |
Built: | 2024-11-12 06:52:08 UTC |
Source: | CRAN |
deparse
replacement that always returns a length-1 vectorA deparse
replacement that always returns a length-1 vector
deparse(...)
deparse(...)
... |
Arguments passed on to |
The deparsed string
long_formula <- this_is_a_formula_with_long_variables ~ the_test_is_checking_if_deparse_will_return + multiple_strings_or_not simputation:::deparse(long_formula)
long_formula <- this_is_a_formula_with_long_variables ~ the_test_is_checking_if_deparse_will_return + multiple_strings_or_not simputation:::deparse(long_formula)
Te default precict
function doesn't always return the
predicted variable by default. For example, when estimating
a binomial model using glm
, by default the
log-odds are returned. foretell
wraps predict
while
setting options so that the actual predicted value is returned.
foretell(object, ...) ## Default S3 method: foretell(object, ...) ## S3 method for class 'glm' foretell(object, newdata = NULL, type, ...) ## S3 method for class 'rpart' foretell(object, newdata, type, ...)
foretell(object, ...) ## Default S3 method: foretell(object, ...) ## S3 method for class 'glm' foretell(object, newdata = NULL, type, ...) ## S3 method for class 'rpart' foretell(object, newdata, type, ...)
object |
A model object,( |
... |
Furher arguments passed to |
newdata |
|
type |
|
Quick indication of the amount and location of missing values.
The function uses na_status
to print the missing values, but
returns the original x
(invisibly) and therefore can be used in an imputation pipeline
to peek at the NA's status.
glimpse_na(x, show_only_missing = TRUE, ...) lhs %?>% rhs
glimpse_na(x, show_only_missing = TRUE, ...) lhs %?>% rhs
x |
an R object caryying data (e.g. |
show_only_missing |
if |
... |
arguments passed to |
lhs |
left hand side of pipe |
rhs |
right hand side of pipe |
glimpse_na
is especially helpful when interactively adding imputation methods.
glimpse_na
is named after glimpse
in dplyr
.
Operator %?>%
is syntactic sugar: it inserts a glimpse_na
in
the pipe.
irisNA <- iris irisNA[1:3,1] <- irisNA[3:7,2] <- NA # How many NA's? na_status(irisNA) # add an imputation method one at a time iris_imputed <- irisNA |> glimpse_na() # same as above # ok, glimpse_na says "Sepal.Width" has NA's # fix that: iris_imputed <- irisNA |> impute_const(Sepal.Width ~ 7) |> glimpse_na() # end NA # Sepal.Length is having NA's iris_imputed <- irisNA |> impute_const(Sepal.Width ~ 7) |> impute_cart(Sepal.Length ~ .) |> glimpse_na() # end NA # in an existing imputation pipeline we can peek with # glimpse_na or %?>% iris_imputed <- irisNA |> glimpse_na() |> # shows the begin NA impute_const(Sepal.Width ~ 7) |> glimpse_na() |> # after 1 imputation impute_cart(Sepal.Length ~ .) |> glimpse_na() # end NA # or iris_imputed <- irisNA %?>% impute_const(Sepal.Width ~ 7) %?>% impute_cart(Sepal.Length ~ .) na_status(iris_imputed)
irisNA <- iris irisNA[1:3,1] <- irisNA[3:7,2] <- NA # How many NA's? na_status(irisNA) # add an imputation method one at a time iris_imputed <- irisNA |> glimpse_na() # same as above # ok, glimpse_na says "Sepal.Width" has NA's # fix that: iris_imputed <- irisNA |> impute_const(Sepal.Width ~ 7) |> glimpse_na() # end NA # Sepal.Length is having NA's iris_imputed <- irisNA |> impute_const(Sepal.Width ~ 7) |> impute_cart(Sepal.Length ~ .) |> glimpse_na() # end NA # in an existing imputation pipeline we can peek with # glimpse_na or %?>% iris_imputed <- irisNA |> glimpse_na() |> # shows the begin NA impute_const(Sepal.Width ~ 7) |> glimpse_na() |> # after 1 imputation impute_cart(Sepal.Length ~ .) |> glimpse_na() # end NA # or iris_imputed <- irisNA %?>% impute_const(Sepal.Width ~ 7) %?>% impute_cart(Sepal.Length ~ .) na_status(iris_imputed)
Impute one or more variables using a single R object representing a previously fitted model.
impute(dat, formula, predictor = foretell, ...) impute_(dat, variables, model, predictor = foretell, ...)
impute(dat, formula, predictor = foretell, ...) impute_(dat, variables, model, predictor = foretell, ...)
dat |
|
formula |
|
predictor |
|
... |
Extra arguments passed to |
variables |
|
model |
A model object. |
Formulas are of the form
IMPUTED_VARIABLES ~ MODEL_OBJECT
The left-hand-side of the formula object lists the variable or variables to
be imputed. The right-hand-side must be a model object for which an S3
predict
method is implemented. Alternatively, one can specify a custom
predicting function. This function must accept at least a model and a
dataset, and return one predicted value for each row in the dataset.
foretell
implements usefull predict
methods for cases
where by default the predicted output is not of the same type as the predicted
variable (e.g. when using certain link functions in glm
)
impute_
is an explicit version of impute
that works better in
programming contexts, especially in cases involving nonstandard evaluation.
Other imputation:
impute_cart()
,
impute_hotdeck
,
impute_lm()
irisNA <- iris iris[1:3,1] <- NA my_model <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris) impute(irisNA, Sepal.Length ~ my_model)
irisNA <- iris iris[1:3,1] <- NA my_model <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris) impute(irisNA, Sepal.Length ~ my_model)
Imputation based on CART models or Random Forests.
impute_cart( dat, formula, add_residual = c("none", "observed", "normal"), cp, na_action = na.rpart, ... ) impute_rf( dat, formula, add_residual = c("none", "observed", "normal"), na_action = na.omit, ... )
impute_cart( dat, formula, add_residual = c("none", "observed", "normal"), cp, na_action = na.rpart, ... ) impute_rf( dat, formula, add_residual = c("none", "observed", "normal"), na_action = na.omit, ... )
dat |
|
formula |
|
add_residual |
|
cp |
The complexity parameter used to |
na_action |
|
... |
further arguments passed to
|
Formulas are of the form
IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
The left-hand-side of the formula object lists the variable or variables to be imputed. Variables on the right-hand-side are used as predictors in the CART or random forest model.
If grouping variables are specified, the data set is split according to the values of those variables, and model estimation and imputation occur independently for each group.
Grouping using dplyr::group_by
is also supported. If groups are
defined in both the formula and using dplyr::group_by
, the data is
grouped by the union of grouping variables. Any missing value in one of the
grouping variables results in an error.
CART imputation by impute_cart
can be used for numerical,
categorical, or mixed data. Missing values are estimated using a
Classification and Regression Tree as specified by Breiman, Friedman and
Olshen (1984). This means that prediction is fairly robust agains missingess
in predictors.
Random Forest imputation with impute_rf
can be used for numerical,
categorical, or mixed data. Missing values are estimated using a Random Forest
model as specified by Breiman (2001).
Breiman, L., Friedman, J., Stone, C.J. and Olshen, R.A., 1984. Classification and regression trees. CRC press.
Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32.
Other imputation:
impute_hotdeck
,
impute_lm()
,
impute()
Hot-deck imputation methods include random and sequential hot deck, k-nearest neighbours imputation and predictive mean matching.
impute_rhd( dat, formula, pool = c("complete", "univariate", "multivariate"), prob, backend = getOption("simputation.hdbackend", default = c("simputation", "VIM")), ... ) impute_shd( dat, formula, pool = c("complete", "univariate", "multivariate"), order = c("locf", "nocb"), backend = getOption("simputation.hdbackend", default = c("simputation", "VIM")), ... ) impute_pmm( dat, formula, predictor = impute_lm, pool = c("complete", "univariate", "multivariate"), ... ) impute_knn( dat, formula, pool = c("complete", "univariate", "multivariate"), k = 5, backend = getOption("simputation.hdbackend", default = c("simputation", "VIM")), ... )
impute_rhd( dat, formula, pool = c("complete", "univariate", "multivariate"), prob, backend = getOption("simputation.hdbackend", default = c("simputation", "VIM")), ... ) impute_shd( dat, formula, pool = c("complete", "univariate", "multivariate"), order = c("locf", "nocb"), backend = getOption("simputation.hdbackend", default = c("simputation", "VIM")), ... ) impute_pmm( dat, formula, predictor = impute_lm, pool = c("complete", "univariate", "multivariate"), ... ) impute_knn( dat, formula, pool = c("complete", "univariate", "multivariate"), k = 5, backend = getOption("simputation.hdbackend", default = c("simputation", "VIM")), ... )
dat |
|
formula |
|
pool |
|
prob |
|
backend |
|
... |
further arguments passed to
|
order |
|
predictor |
|
k |
|
Formulas are of the form
IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
The left-hand-side of the formula object lists the variable or variables to be imputed. The interpretation of the independent variables on the right-hand-side depends on the imputation method.
impute_rhd
Variables in MODEL_SPECIFICATION
and/or
GROUPING_VARIABLES
are used to split the data set into groups prior to
imputation. Use ~ 1
to specify that no grouping is to be applied.
impute_shd
Variables in MODEL_SPECIFICATION
are used to
sort the data. When multiple variables are specified, each variable after
the first serves as tie-breaker for the previous one.
impute_knn
The predictors are used to determine Gower's distance
between records (see gower_topn
). This may include the
variables to be imputed..
impute_pmm
Predictive mean matching. The
MODEL_SPECIFICATION
is passed through to the predictor
function.
If grouping variables are specified, the data set is split according to the values of those variables, and model estimation and imputation occur independently for each group.
Grouping using dplyr::group_by
is also supported. If groups are
defined in both the formula and using dplyr::group_by
, the data is
grouped by the union of grouping variables. Any missing value in one of the
grouping variables results in an error.
Random hot deck imputation with impute_rhd
can be applied to
numeric, categorical or mixed data. A missing value is copied from a sampled
record. Optionally samples are taken within a group, or with non-uniform
sampling probabilities. See Andridge and Little (2010) for an overview
of hot deck imputation methods.
Sequential hot deck imputation with impute_rhd
can be applied
to numeric, categorical, or mixed data. The dataset is sorted using the
‘predictor variables’. Missing values or combinations thereof are copied
from the previous record where the value(s) are available in the case
of LOCF and from the next record in the case of NOCF.
Predictive mean matching with impute_pmm
can be applied to
numeric data. Missing values or combinations thereof are first imputed using
a predictive model. Next, these predictions are replaced with observed
(combinations of) values nearest to the prediction. The nearest value is the
observed value with the smallest absolute deviation from the prediction.
K-nearest neighbour imputation with impute_knn
can be applied
to numeric, categorical, or mixed data. For each record containing missing
values, the most similar completed records are determined based on
Gower's (1977) similarity coefficient. From these records the actual donor is
sampled.
The VIM package has efficient implementations of several popular imputation methods. In particular, its random and sequential hotdeck implementation is faster and more memory-efficient than that of the current package. Moreover, VIM offers more fine-grained control over the imputation process then simputation.
If you have this package installed, it can be used by setting
backend="VIM"
for functions supporting this option. Alternatively, one
can set options(simputation.hdbackend="VIM")
so it becomes the
default.
Simputation will map the simputation call to a function in the VIM package. In particular:
impute_rhd
is mapped to VIM::hotdeck
where imputed
variables are passed to the variable
argument and the union of
predictor and grouping variables are passed to domain_var
.
Extra arguments in ...
are passed to VIM::hotdeck
as well.
Argument pool
is ignored.
impute_shd
is mapped to VIM::hotdeck
where
imputed variables are passed to the variable
argument, predictor
variables to ord_var
and grouping variables to domain_var
.
Extra arguments in ...
are passed to VIM::hotdeck
as well.
Arguments pool
and order
are ignored. In VIM
the donor pool
is determined on a per-variable basis, equivalent to setting pool="univariate"
with the simputation backend. VIM is LOCF-based. Differences between
simputation and VIM
likely occurr when the sorting variables contain missings.
impute_knn
is mapped to VIM::kNN
where imputed variables
are passed to variable
, predictor variables are passed to dist_var
and grouping variables are ignored with a message.
Extra arguments in ...
are passed to VIM::kNN
as well.
Argument pool
is ignored.
Note that simputation adheres stricktly to the Gower's original
definition of the distance measure, while VIM uses a generalized variant
that can take ordered factors into account.
By default, VIM's imputation functions add indicator variables to the
original data to trace what values have been imputed. This is switched off by
default for consistency with the rest of the simputation package, but it may
be turned on again by setting imp_var=TRUE
.
Andridge, R.R. and Little, R.J., 2010. A review of hot deck imputation for survey non-response. International statistical review, 78(1), pp.40-64.
Gower, J.C., 1971. A general coefficient of similarity and some of its properties. Biometrics, pp.857–871.
Other imputation:
impute_cart()
,
impute_lm()
,
impute()
Regression imputation methods including linear regression, robust
linear regression with -estimators, regularized regression
with lasso/elasticnet/ridge regression.
impute_lm( dat, formula, add_residual = c("none", "observed", "normal"), na_action = na.omit, ... ) impute_rlm( dat, formula, add_residual = c("none", "observed", "normal"), na_action = na.omit, ... ) impute_en( dat, formula, add_residual = c("none", "observed", "normal"), na_action = na.omit, family = c("gaussian", "poisson"), s = 0.01, ... )
impute_lm( dat, formula, add_residual = c("none", "observed", "normal"), na_action = na.omit, ... ) impute_rlm( dat, formula, add_residual = c("none", "observed", "normal"), na_action = na.omit, ... ) impute_en( dat, formula, add_residual = c("none", "observed", "normal"), na_action = na.omit, family = c("gaussian", "poisson"), s = 0.01, ... )
dat |
|
formula |
|
add_residual |
|
na_action |
|
... |
further arguments passed to |
family |
Response type for elasticnet / lasso regression. For
|
s |
The value of |
dat
, but imputed where possible.
Formulas are of the form
IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
The left-hand-side of the formula object lists the variable or variables to
be imputed. The right-hand side excluding the optional GROUPING_VARIABLES
model specification for the underlying predictor.
If grouping variables are specified, the data set is split according to the values of those variables, and model estimation and imputation occur independently for each group.
Grouping using dplyr::group_by
is also supported. If groups are
defined in both the formula and using dplyr::group_by
, the data is
grouped by the union of grouping variables. Any missing value in one of the
grouping variables results in an error.
Grouping is ignored for impute_const
.
Linear regression model imputation with impute_lm
can be used
to impute numerical variables based on numerical and/or categorical
predictors. Several common imputation methods, including ratio and (group)
mean imputation can be expressed this way. See lm
for
details on possible model specification.
Robust linear regression through M-estimation with
impute_rlm
can be used to impute numerical variables employing
numerical and/or categorical predictors. In -estimation, the
minimization of the squares of residuals is replaced with an alternative
convex function of the residuals that decreases the influence of
outliers.
Also see e.g. Huber (1981).
Lasso/elastic net/ridge regression imputation with impute_en
can be used to impute numerical variables employing numerical and/or
categorical predictors. For this method, the regression coefficients are
found by minimizing the least sum of squares of residuals augmented with a
penalty term depending on the size of the coefficients. For lasso regression
(Tibshirani, 1996), the penalty term is the sum of squares of the
coefficients. For ridge regression (Hoerl and Kennard, 1970), the penalty
term is the sum of absolute values of the coefficients. Elasticnet regression
(Zou and Hastie, 2010) allows switching from lasso to ridge by penalizing by
a weighted sum of the sum-of-squares and sum of absolute values term.
Huber, P.J., 2011. Robust statistics (pp. 1248-1251). Springer Berlin Heidelberg.
Hoerl, A.E. and Kennard, R.W., 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), pp.55-67.
Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pp.267-288.
Zou, H. and Hastie, T., 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), pp.301-320.
Getting started with simputation,
Other imputation:
impute_cart()
,
impute_hotdeck
,
impute()
data(iris) irisNA <- iris irisNA[1:4, "Sepal.Length"] <- NA irisNA[3:7, "Sepal.Width"] <- NA # impute a single variable (Sepal.Length) i1 <- impute_lm(irisNA, Sepal.Length ~ Sepal.Width + Species) # impute both Sepal.Length and Sepal.Width, using robust linear regression i2 <- impute_rlm(irisNA, Sepal.Length + Sepal.Width ~ Species + Petal.Length)
data(iris) irisNA <- iris irisNA[1:4, "Sepal.Length"] <- NA irisNA[3:7, "Sepal.Width"] <- NA # impute a single variable (Sepal.Length) i1 <- impute_lm(irisNA, Sepal.Length ~ Sepal.Width + Species) # impute both Sepal.Length and Sepal.Width, using robust linear regression i2 <- impute_rlm(irisNA, Sepal.Length + Sepal.Width ~ Species + Petal.Length)
Impute medians of group-wise medians.
impute_median( dat, formula, add_residual = c("none", "observed", "normal"), type = 7, ... )
impute_median( dat, formula, add_residual = c("none", "observed", "normal"), type = 7, ... )
dat |
|
formula |
|
add_residual |
|
type |
|
... |
Currently not used. |
Formulas are of the form
IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
The left-hand-side of the formula object lists the variable or variables to
be imputed. Variables in MODEL_SPECIFICATION
and/or
GROUPING_VARIABLES
are used to split the data set into groups prior to
imputation. Use ~ 1
to specify that no grouping is to be applied.
# group-wise median imputation irisNA <- iris irisNA[1:3,1] <- irisNA[4:7,2] <- NA a <- impute_median(irisNA, Sepal.Length ~ Species) head(a) # group-wise median imputation, all variables except species a <- impute_median(irisNA, . - Species ~ Species) head(a)
# group-wise median imputation irisNA <- iris irisNA[1:3,1] <- irisNA[4:7,2] <- NA a <- impute_median(irisNA, Sepal.Length ~ Species) head(a) # group-wise median imputation, all variables except species a <- impute_median(irisNA, . - Species ~ Species) head(a)
Models that simultaneously optimize imptuation of multiple variables. Methods include imputation based on EM-estimation of multivariate normal parameters, imputation based on iterative Random Forest estimates and stochastic imptuation based on bootstrapped EM-estimatin of multivariate normal parameters.
impute_em(dat, formula, verbose = 0, ...) impute_mf(dat, formula, ...)
impute_em(dat, formula, verbose = 0, ...) impute_mf(dat, formula, ...)
dat |
|
formula |
|
verbose |
|
... |
Options passed to
|
Formulas are of the form
[IMPUTED_VARIABLES] ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
When IMPUTED_VARIABLES
is empty, every variable in
MODEL_SPECIFICATION
will be imputed. When IMPUTED_VARIABLES
is
specified, all variables in IMPUTED_VARIABLES
and
MODEL_SPECIFICATION
are part of the model, but only the
IMPUTED_VARIABLES
are imputed in the output.
GROUPING_VARIABLES
specify what categorical variables are used to
split-impute-combine the data. Grouping using dplyr::group_by
is also
supported. If groups are defined in both the formula and using
dplyr::group_by
, the data is grouped by the union of grouping
variables. Any missing value in one of the grouping variables results in an
error.
EM-based imputation with impute_em
only works for numerical
variables. These variables are assumed to follow a multivariate normal distribution
for which the means and covariance matrix is estimated based on the EM-algorithm
of Dempster Laird and Rubin (1977). The imputations are the expected values
for missing values, conditional on the value of the estimated parameters.
Multivariate Random Forest imputation with impute_mf
works for
numerical, categorical or mixed data types. It is based on the algorithm
of Stekhoven and Buehlman (2012). Missing values are imputed using a
rough guess after which a predictive random forest is trained and used
to re-impute themissing values. This is iterated until convergence.
Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. "Maximum likelihood from incomplete data via the EM algorithm." Journal of the royal statistical society. Series B (methodological) (1977): 1-38.
Stekhoven, D.J. and Buehlmann, P., 2012. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), pp.112-118.
Impute missing values by a constant, by copying another variable computing transformations from other variables.
impute_proxy(dat, formula, add_residual = c("none", "observed", "normal"), ...) impute_const(dat, formula, add_residual = c("none", "observed", "normal"), ...)
impute_proxy(dat, formula, add_residual = c("none", "observed", "normal"), ...) impute_const(dat, formula, add_residual = c("none", "observed", "normal"), ...)
dat |
|
formula |
|
add_residual |
|
... |
Currently unused |
Formulas are of the form
IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
The left-hand-side of the formula object lists the variable or variables to be imputed.
For impute_const
, the MODEL_SPECIFICATION
is a single
value and GROUPING_VARIABLES
are ignored.
For impute_proxy
, the MODEL_SPECIFICATION
is a variable or
expression in terms of variables in the dataset that must result in either a
single number of in a vector of length nrow(dat)
.
If grouping variables are specified, the data set is split according to the values of those variables, and model estimation and imputation occur independently for each group.
Grouping using dplyr::group_by
is also supported. If groups are
defined in both the formula and using dplyr::group_by
, the data is
grouped by the union of grouping variables. Any missing value in one of the
grouping variables results in an error.
irisNA <- iris irisNA[1:3,1] <- irisNA[3:7,2] <- NA # impute a constant a <- impute_const(irisNA, Sepal.Width ~ 7) head(a) a <- impute_proxy(irisNA, Sepal.Width ~ 7) head(a) # copy a value from another variable (where available) a <- impute_proxy(irisNA, Sepal.Width ~ Sepal.Length) head(a) # group mean imputation a <- impute_proxy(irisNA , Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE) | Species) head(a) # random hot deck imputation a <- impute_proxy(irisNA, Sepal.Length ~ mean(Sepal.Length, na.rm=TRUE) , add_residual = "observed") # ratio imputation (but use impute_lm for that) a <- impute_proxy(irisNA, Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE)/mean(Sepal.Width,na.rm=TRUE) * Sepal.Width)
irisNA <- iris irisNA[1:3,1] <- irisNA[3:7,2] <- NA # impute a constant a <- impute_const(irisNA, Sepal.Width ~ 7) head(a) a <- impute_proxy(irisNA, Sepal.Width ~ 7) head(a) # copy a value from another variable (where available) a <- impute_proxy(irisNA, Sepal.Width ~ Sepal.Length) head(a) # group mean imputation a <- impute_proxy(irisNA , Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE) | Species) head(a) # random hot deck imputation a <- impute_proxy(irisNA, Sepal.Length ~ mean(Sepal.Length, na.rm=TRUE) , add_residual = "observed") # ratio imputation (but use impute_lm for that) a <- impute_proxy(irisNA, Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE)/mean(Sepal.Width,na.rm=TRUE) * Sepal.Width)
Quick indication of the amount and location of missing values.
na_status( x, show_only_missing = TRUE, sort_columns = show_only_missing, show_message = TRUE, ... )
na_status( x, show_only_missing = TRUE, sort_columns = show_only_missing, show_message = TRUE, ... )
x |
an R object caryying data (e.g. |
show_only_missing |
if |
sort_columns |
If |
show_message |
if |
... |
arguments to be passed to other methods. |
data.frame
with the column and number of NA's
irisNA <- iris irisNA[1:3,1] <- irisNA[3:7,2] <- NA na_status(irisNA) # impute a constant a <- impute_const(irisNA, Sepal.Width ~ 7) na_status(a)
irisNA <- iris irisNA[1:3,1] <- irisNA[3:7,2] <- NA na_status(irisNA) # impute a constant a <- impute_const(irisNA, Sepal.Width ~ 7) na_status(a)
This function is re-exported from
randomForest:na.roughfix
when
available. Otherwise it will throw a warning and resort to
options("na.action")
na.roughfix(object, ...)
na.roughfix(object, ...)
object |
an R object caryying data (e.g. |
... |
arguments to be passed to other methods. |
A package to make imputation simpler.
To get started, see the introductory vignette.