control_out(eps=1e-8)method_nn and method_pmmmethod_glmextract added which allows to extract results from the nonprob objectcoef added which allows to obtain the coefficients of underlying models (if possible)cloglog)sampling package from suggested packageplot methodcheck_balance error (closes #75)pop.size, controlSel, controlOut and controlInf
were renamed to pop_size, control_sel, control_out and
control_inf respectively.genSimData removed completely as it is not used anywhere
in the package.maxLik_method renamed to maxlik_method in the
control_sel function.control_out function:
predictive_match renamed to pmm_match_type to align with the
PMM (Predictive Mean Matching) estimator naming convention,
where all related parameters start with pmm_control_sel function:
method removed as it was not usedest_method_sel renamed to est_methodh renamed to gee_h_fun to make this more readable
to the userstart_type now accepts only zero and mle (for gee models
only).control_inf function:
bias_inf renamed to vars_combine and type changed to
logical. TRUE if variables (its levels) should be combined
after variable selection algorithm for the doubly robust
approach.pi_ij -- argument removed as it is not used.nonprobsvy class renamed to nonprob and all related method
adjusted to this changelogit_model_nonprobsvy, probit_model_nonprobsvy and
cloglog_model_nonprobsvy removed in the favour of more readable
method_ps function that specifies the propensity score modelcontrol_inference=control_inf(vars_combine=TRUE) which
allows doubly robust estimator to combine variables prior estimation
i.e. if selection=~x1+x2 and y~x1+x3 then the following models
are fitted selection=~x1+x2+x3 and y~x1+x2+x3. By default we set
control_inference=control_inf(vars_combine=FALSE). Note that this
behaviour is assumed independently from variable selection.nonprob(weights=NULL) replaced to nonprob(case_weights=NULL)
to stress that this refer to case weights not sampling or other weights
in non-probability samplejvs (Job Vacancy
Survey; a probability sample survey) and admin (Central Job Offers
Database; a non-probability sample survey). The units and auxiliary
variables have been aligned in a way that allows the data to be
integrated using the methods implemented in this package.check_balance function was added to check the balance in the
totals of the variables based on the weighted weights between the
non-probability and probability samples.na_action with default na.omitweights -- returns IPW weightsupdate -- allows to update the nonprob class objectmethod_ps -- for modelling propensity scoremethod_glm -- for modelling y using glm functionmethod_nn -- for the NN methodmethod_pmm -- for the PMM methodmethod_npar -- for the non-parametric methodprint.nonprob, summary.nonprob and print.nonprob_summary
methods> result_mi
A nonprob object
- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.1817
- variable y2: 1.8087
- selected estimators:
- variable y1: 2.9498 (se=0.0420, ci=(2.8674, 3.0322))
- variable y2: 1.5760 (se=0.0326, ci=(1.5122, 1.6399))
number of digits can be changed using print(x, digits) as shown below
> print(result_mi,2)
A nonprob object
- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.18
- variable y2: 1.81
- selected estimators:
- variable y1: 2.95 (se=0.04, ci=(2.87, 3.03))
- variable y2: 1.58 (se=0.03, ci=(1.51, 1.64))
> summary(result_mi) |> print(digits=2)
A nonprob_summary object
- call: nonprob(data = subset(population, flag_bd1 == 1), outcome = y1 +
y2 ~ x1 + x2, svydesign = sample_prob)
- estimator type: mass imputation
- nonprob sample size: 693011 (69.3%)
- prob sample size: 1000 (0.1%)
- population size: 1000000 (fixed: false)
- detailed information about models are stored in list element(s): "outcome"
----------------------------------------------------------------
- distribution of outcome residuals:
- y1: min: -4.79; mean: 0.00; median: 0.00; max: 4.54
- y2: min: -4.96; mean: -0.00; median: -0.07; max: 12.25
- distribution of outcome predictions (nonprob sample):
- y1: min: -2.72; mean: 3.18; median: 3.04; max: 16.28
- y2: min: -1.55; mean: 1.81; median: 1.58; max: 13.92
- distribution of outcome predictions (prob sample):
- y1: min: -0.46; mean: 2.95; median: 2.84; max: 10.31
- y2: min: -0.58; mean: 1.58; median: 1.39; max: 7.87
----------------------------------------------------------------
formula.toolsstrata is not supported for the time being.maxit argument from controlSel
function to internally used nleqslv functionvector in model_frame when predicting
y_hat in mass imputation glm model when X is based in one
auxiliary variable only - fix provided converting it to data.frame
object.summary about quality of estimation basing on
difference between estimated and known total values of auxiliary
variablescontrolOut function by switching values
for predictive_match argument. From now on, the
predictive_match = 1 means $\hat{y}-\hat{y}$ in predictive mean
matching imputation and predictive_match = 2 corresponds to
$\hat{y}-y$ matching.div option when variable selection (more in
documentation) for doubly robust estimation.nonprob output such as gradient, hessian
and jacobian derived from IPW estimation for mle and gee methods
when IPW or DR model executed.nonprob output when
IPW or DR model executed.model_frame matrix data from probability sample used for
mass imputation to nonprob when MI or DR model executed.logit, complementary log-log and probit link functions.generalized linear models, nearest neighbours and
predictive mean matching methods for Mass ImputationSCAD, LASSO and MCP
penalization equationsanalytic and bootstrap (with parallel computation -
doSNOW package) variance for described estimatorsnonprob class such as
nobs for samples sizepop.size for population size estimationresiduals for residuals of the inverse probability weighting
modelcooks.distance for identifying influential observations that
have a significant impact on the parameter estimateshatvalues for measuring the leverage of individual
observationslogLik for computing the log-likelihood of the model,AIC (Akaike Information Criterion) for evaluating the model
based on the trade-off between goodness of fit and complexity,
helping in model selectionBIC (Bayesian Information Criterion) for a similar purpose as
AIC but with a stronger penalty for model complexityconfint for calculating confidence intervals around parameter
estimatesvcov for obtaining the variance-covariance matrix of the
parameter estimatesdeviance for assessing the goodness of fit of the modelR-cmd checknonprob function.