Package 'MSIMST' reference manual

Title:	Bayesian Monotonic Single-Index Regression Model with the Skew-T Likelihood
Description:	Incorporates a Bayesian monotonic single-index mixed-effect model with a multivariate skew-t likelihood, specifically designed to handle survey weights adjustments. Features include a simulation program and an associated Gibbs sampler for model estimation. The single-index function is constrained to be monotonic increasing, utilizing a customized Gaussian process prior for precise estimation. The model assumes random effects follow a canonical skew-t distribution, while residuals are represented by a multivariate Student-t distribution. Offers robust Bayesian adjustments to integrate survey weight information effectively.
Authors:	Qingyang Liu [aut, cre] , Debdeep Pati [aut], Dipankar Bandyopadhyay [aut]
Maintainer:	Qingyang Liu <[email protected]>
License:	GPL (>= 3)
Version:	1.1
Built:	2025-02-14 06:58:21 UTC
Source:	CRAN

The Associated Gibbs Sampler

Description

This is the Gibbs sampler associated with the proposed single-index mixed-effects model. This Gibbs sampler supports three different likelihoods, normal, skew-normal and skew-t likelihoods and two types of priors for the single-index funcion: the Gaussian process (GP) prior and the bernstein polynomial (BP) prior.

Usage

Gibbs_Sampler(
  X,
  y,
  group_info,
  beta_value,
  beta_prior_variance,
  beta_b_value,
  beta_lambdasq_value,
  beta_tausq_value,
  xi_value,
  xi_lengthscale_value,
  xi_tausq_value,
  g_func_type,
  dsq_value,
  sigmasq_value,
  delta_value,
  nu_value,
  U_value,
  S_value,
  loglik_type,
  gof_K,
  gof_L,
  iter_warmup,
  iter_sampling,
  verbatim,
  update = 10,
  incremental_output = FALSE,
  incremental_output_filename = NULL,
  incremental_output_update = 1e+06,
  n_core = 1
)
Gibbs_Sampler(
  X,
  y,
  group_info,
  beta_value,
  beta_prior_variance,
  beta_b_value,
  beta_lambdasq_value,
  beta_tausq_value,
  xi_value,
  xi_lengthscale_value,
  xi_tausq_value,
  g_func_type,
  dsq_value,
  sigmasq_value,
  delta_value,
  nu_value,
  U_value,
  S_value,
  loglik_type,
  gof_K,
  gof_L,
  iter_warmup,
  iter_sampling,
  verbatim,
  update = 10,
  incremental_output = FALSE,
  incremental_output_filename = NULL,
  incremental_output_update = 1e+06,
  n_core = 1
)

Arguments

`X`	The list of design matrix.
`y`	The list of response values.
`group_info`	The group information for the grouped horseshoe prior. Use 0 to represent the variables with the normal priors. Use 1,2,... to present the variables with the grouped horseshoe priors. For example, c(0,0,1,1,2,3) represents first two variables with the normal prior, third and fourth variables belong to the same group with one grouped horseshoe prior, and fifth and sixth variables belong to two different groups with two independent horseshoe prior.
`beta_value`	The initial value for the covariates' coefficients.
`beta_prior_variance`	The variance value of the normal prior.
`beta_b_value`	The slope parameter.
`beta_lambdasq_value`	The first hyperparameter associated with the grouped horseshoe prior.
`beta_tausq_value`	The second hyperparameter associated with the grouped horseshoe prior.
`xi_value`	The parameters associated with the single index function.
`xi_lengthscale_value`	The first hyperparameter associated with the Gaussian process kernel.
`xi_tausq_value`	The second hyperparameter associated with the Gaussian process kernel.
`g_func_type`	The type of priors on the single index function. Must be one of "GP" and "BP".
`dsq_value`	The initial value of the conditional variance of the random effects.
`sigmasq_value`	The initial value of the conditional variance of the fixed effects.
`delta_value`	The initial value of the skewness parameter.
`nu_value`	The initial value of the degree of freedom. Must be larger than 2.
`U_value`	The initial values of the latent variable U. The length of `⁠U_value⁠` must be as the same as the number of subjects.
`S_value`	The initial values of the latent variable S. The length of `⁠S_value⁠` must be as the same as the number of subjects.
`loglik_type`	The type of the log-likelihood. Must be one of "skewT","skewN", and "N".
`gof_K`	The first hyperparameter associated with the goodness of fit test. Check (Yuan and Johnson 2012) for details.
`gof_L`	The second hyperparameter associated with the goodness of fit test. Check (Yuan and Johnson 2012) for details.
`iter_warmup`	The number of warm-up iterations of the Gibb samplers.
`iter_sampling`	The number of post warm-up iterations of the Gibb samplers.
`verbatim`	TRUE/FALSE. If verbatim is TRUE, then the updating message of the Gibbs sampler will be printed.
`update`	An integer. For example, if `⁠update⁠` = 10, for each 10 iteration, one udpating message of the Gibbs sampler will be printed.
`incremental_output`	TRUE/FALSE. If `⁠incremental_output⁠` is TRUE, an incremental output will be saved. This option should not be enabled unless users anticipate the sampling process will take longer than days.
`incremental_output_filename`	The filename of the incremental output.
`incremental_output_update`	An integer. For example, if `⁠incremental_output_update⁠` = 10 then for each 10 iteration, the intermediate result will be updated once.
`n_core`	The number of cores will be used during the Gibbs sampler. For the Windows operating system, `⁠n_core⁠` must be 1.

Details

The details of the ST-GP model can be found in the vignette. Users can access the vignette using ⁠vignette(package = "MSIMST")⁠.

Value

A list of random quantitiles drawn from the posterior distribution using the Gibbs sampler.

Examples

# Set the random seed.
set.seed(100)

# Simulate the data.
simulated_data <- reg_simulation1(N = 50,
                                  ni_lambda = 8,
                                  beta = c(0.5,0.5,0.5),
                                  beta_b = 1.5,
                                  dsq = 0.1,
                                  sigmasq = 0.5,
                                  delta = 0.6,
                                  nu = 5.89)

y <- simulated_data$y
X <- simulated_data$X

group_info <- c(0,0,0)
# The number of grids (L) for approximating the single index function
L <- 50
N <- length(y)
GP_MCMC_output <- Gibbs_Sampler(X = X,
                                y = y,
                                group_info = group_info,
                                beta_value = c(0.5,0.5,0.5),
                                beta_prior_variance = 10,
                                beta_b_value = 1.5,
                                beta_lambdasq_value = 1,
                                beta_tausq_value = 1,
                                xi_value = abs(rnorm(n = L + 1)),
                                xi_lengthscale_value = 1.0,
                                xi_tausq_value = 1.0,
                                g_func_type = "GP",
                                dsq_value = 1,
                                sigmasq_value = 1,
                                delta_value = 0.6,
                                nu_value = 5.89,
                                U_value = abs(rnorm(N)),
                                S_value = abs(rnorm(N)),
                                loglik_type = "skewT",
                                gof_K = 10,
                                gof_L = 5,
                                iter_warmup = 10,
                                iter_sampling = 20,
                                verbatim = TRUE,
                                update = 10,
                                incremental_output = FALSE,
                                incremental_output_filename = NULL,
                                incremental_output_update = 1e6,
                                n_core = 1)
# Set the random seed.
set.seed(100)

# Simulate the data.
simulated_data <- reg_simulation1(N = 50,
                                  ni_lambda = 8,
                                  beta = c(0.5,0.5,0.5),
                                  beta_b = 1.5,
                                  dsq = 0.1,
                                  sigmasq = 0.5,
                                  delta = 0.6,
                                  nu = 5.89)

y <- simulated_data$y
X <- simulated_data$X

group_info <- c(0,0,0)
# The number of grids (L) for approximating the single index function
L <- 50
N <- length(y)
GP_MCMC_output <- Gibbs_Sampler(X = X,
                                y = y,
                                group_info = group_info,
                                beta_value = c(0.5,0.5,0.5),
                                beta_prior_variance = 10,
                                beta_b_value = 1.5,
                                beta_lambdasq_value = 1,
                                beta_tausq_value = 1,
                                xi_value = abs(rnorm(n = L + 1)),
                                xi_lengthscale_value = 1.0,
                                xi_tausq_value = 1.0,
                                g_func_type = "GP",
                                dsq_value = 1,
                                sigmasq_value = 1,
                                delta_value = 0.6,
                                nu_value = 5.89,
                                U_value = abs(rnorm(N)),
                                S_value = abs(rnorm(N)),
                                loglik_type = "skewT",
                                gof_K = 10,
                                gof_L = 5,
                                iter_warmup = 10,
                                iter_sampling = 20,
                                verbatim = TRUE,
                                update = 10,
                                incremental_output = FALSE,
                                incremental_output_filename = NULL,
                                incremental_output_update = 1e6,
                                n_core = 1)

The 'MSIMST' package.

Description

Incorporates a Bayesian monotonic single-index mixed-effect model with a multivariate skew-t likelihood, specifically designed to handle survey weights adjustments. Features include a simulation program and an associated Gibbs sampler for model estimation. The single-index function is constrained to be monotonic increasing, utilizing a customized Gaussian process prior for precise estimation. The model assumes random effects follow a canonical skew-t distribution, while residuals are represented by a multivariate Student-t distribution. Offers robust Bayesian adjustments to integrate survey weight information effectively.

Author(s)

Maintainer: Qingyang Liu [email protected] (ORCID)

Authors:

Debdeep Pati [email protected]
Dipankar Bandyopadhyay [email protected]

The Function to Calculate the phiX Matrix for Estimating Single-Index Function

Description

The function ⁠phiX_c⁠ is used to generate the phiX matrix associated with the Gaussian process prior.

Usage

phiX_c(Xbeta, u, L)
phiX_c(Xbeta, u, L)

Arguments

`Xbeta`	The single index values. A vector of length n.
`u`	The vector spanning from -1 to 1 with length L + 1.
`L`	An integer defining the number of nodes.

Value

A n by L + 1 matrix.

Examples

L <- 50
u <- seq(-1,1,length.out = L + 1)
phiX <- phiX_c(0.5,u,L)
print(phiX)
L <- 50
u <- seq(-1,1,length.out = L + 1)
phiX <- phiX_c(0.5,u,L)
print(phiX)

The Function for the Simulation Study without the Variable Selection

Description

This is a simply simulation study that is designed to demonstrate the correctness of the proposed Gibbs sampler, ⁠Gibbs_Sampler()⁠.

Usage

reg_simulation1(N, ni_lambda, beta, beta_b, dsq, sigmasq, delta, nu)
reg_simulation1(N, ni_lambda, beta, beta_b, dsq, sigmasq, delta, nu)

Arguments

`N`	The number of subjects.
`ni_lambda`	The mean of Poisson distribution.
`beta`	A 3 by 1 vector.
`beta_b`	The slope of PD response.
`dsq`	A part of covariance parameter.
`sigmasq`	A part of covariance parameter.
`delta`	The skewness parameter.
`nu`	The degree of freedom.

Details

More details of the design of this simulation study can be found in the vignette. Users can access the vignette by the command ⁠vignette(package = "MSIMST")⁠.

Value

A simulated dataset with the response variable ⁠y⁠ and the design matrix ⁠X⁠.

Examples

set.seed(100)
simulated_data <- reg_simulation1(N = 50,
                                  ni_lambda = 8,
                                  beta = c(0.5,0.5,0.5),
                                  beta_b = 1.5,
                                  dsq = 0.1,
                                  sigmasq = 0.5,
                                  delta = 0.6,
                                  nu = 5.89)
y <- simulated_data$y
X <- simulated_data$X
print(head(y))
print(head(X))

set.seed(100)
simulated_data <- reg_simulation1(N = 50,
                                  ni_lambda = 8,
                                  beta = c(0.5,0.5,0.5),
                                  beta_b = 1.5,
                                  dsq = 0.1,
                                  sigmasq = 0.5,
                                  delta = 0.6,
                                  nu = 5.89)
y <- simulated_data$y
X <- simulated_data$X
print(head(y))
print(head(X))

The Function for the Simulation Study with the Variable Selection

Description

This simulation study is designed to demonstrate that using the grouped horseshoe prior can successfully separate signals from noise.

Usage

reg_simulation2(N, ni_lambda, beta, beta_b, dsq, sigmasq, delta, nu)
reg_simulation2(N, ni_lambda, beta, beta_b, dsq, sigmasq, delta, nu)

Arguments

`N`	The number of subjects.
`ni_lambda`	The mean of Poisson distribution.
`beta`	The covariates' coefficients. A 10 by 1 vector.
`beta_b`	The slope of PD response.
`dsq`	A part of covariance parameter.
`sigmasq`	A part of covariance parameter.
`delta`	The skewness parameter.
`nu`	The degree of freedom.

Details

More details of the design of this simulation study can be found in the vignette. Users can access the vignette by the command ⁠vignette(package = "MSIMST")⁠.

Value

A simulated dataset with the response variable ⁠y⁠ and the design matrix ⁠X⁠.

Examples

set.seed(200)
simulated_data <- reg_simulation2(N = 50,
                                  ni_lambda = 8,
                                  beta = c(rep(1,6),rep(0,4)),
                                  beta_b = 1.5,
                                  dsq = 0.1,
                                  sigmasq = 0.5,
                                  delta = 0.6,
                                  nu = 5.89)

y <- simulated_data$y
X <- simulated_data$X

set.seed(200)
simulated_data <- reg_simulation2(N = 50,
                                  ni_lambda = 8,
                                  beta = c(rep(1,6),rep(0,4)),
                                  beta_b = 1.5,
                                  dsq = 0.1,
                                  sigmasq = 0.5,
                                  delta = 0.6,
                                  nu = 5.89)

y <- simulated_data$y
X <- simulated_data$X

The Function for the Simulation Study with the Variable Selection and Survey Weights

Description

This simulation study is designed to show the effectiveness of the grouped horseshoe prior for the variable selection and the ⁠WFPBB()⁠ function for adjusting survey weights.

Usage

reg_simulation3(
  N,
  ni_lambda,
  beta,
  beta_b,
  dsq,
  sigmasq,
  delta,
  nu,
  muz,
  rho,
  sigmasq_z,
  zeta0,
  zeta1
)
reg_simulation3(
  N,
  ni_lambda,
  beta,
  beta_b,
  dsq,
  sigmasq,
  delta,
  nu,
  muz,
  rho,
  sigmasq_z,
  zeta0,
  zeta1
)

Arguments

`N`	The number of subjects.
`ni_lambda`	The mean of Poisson distribution.
`beta`	The covariates' coefficients. A 10 by 1 vector.
`beta_b`	The slope of PD response.
`dsq`	A part of covariance parameter.
`sigmasq`	A part of covariance parameter.
`delta`	The skewness parameter.
`nu`	The degree of freedom.
`muz`	The location parameter of the latent/selection variable.
`rho`	The correlation parameter of the latent/selection variable.
`sigmasq_z`	The variance parameter of the latent/selection variable.
`zeta0`	The intercept term inside the logistic function.
`zeta1`	The slope term inside the logistic function.

Details

More details of the design of this simulation study can be found in the vignette. Users can access the vignette by the command ⁠vignette(package = "MSIMST")⁠.

Value

A simulated dataset with the response variable ⁠y⁠, the design matrix ⁠X⁠ and the survey weight ⁠survey_weight⁠.

Examples

set.seed(100)
output_data <- reg_simulation3(N = 1000,
                               ni_lambda= 8,
                               beta = c(rep(1,6),rep(0,4)),
                               beta_b = 1.5,
                               dsq = 0.1,
                               sigmasq = 0.5,
                               delta = 0.6,
                               nu = 5.89,
                               muz = 0,
                               rho = 36.0,
                               sigmasq_z = 0.6,
                               zeta0 = -1.8,
                               zeta1 = 0.1)
y <- output_data$y
X <- output_data$X
survey_weight <- output_data$survey_weight

set.seed(100)
output_data <- reg_simulation3(N = 1000,
                               ni_lambda= 8,
                               beta = c(rep(1,6),rep(0,4)),
                               beta_b = 1.5,
                               dsq = 0.1,
                               sigmasq = 0.5,
                               delta = 0.6,
                               nu = 5.89,
                               muz = 0,
                               rho = 36.0,
                               sigmasq_z = 0.6,
                               zeta0 = -1.8,
                               zeta1 = 0.1)
y <- output_data$y
X <- output_data$X
survey_weight <- output_data$survey_weight

Weighted Finite Population Bayesian Bootstrap

Description

The function is implemented based on the WFPBB algorithm from (Gunawan et al. 2020).

Usage

WFPBB(y, w, N, n, verbatim)
WFPBB(y, w, N, n, verbatim)

Arguments

`y`	The index of survey data.
`w`	Survey weights. The summation of survey weights should equal the population size
`N`	The population size.
`n`	The sample size.
`verbatim`	TRUE/FALSE. This variable decides whether print the progress information to the console.

Value

The re-sampled index of survey data.

References

Gunawan D, Panagiotelis A, Griffiths W, Chotikapanich D (2020). “Bayesian weighted inference from surveys.” Australian & New Zealand Journal of Statistics, 62(1), 71–94. ISSN 1467-842X, doi:10.1111/anzs.12284.

Examples

set.seed(100)
output_data <- reg_simulation3(N = 5000,
                               ni_lambda= 8,
                               beta = c(rep(1,6),rep(0,4)),
                               beta_b = 1.5,
                               dsq = 0.1,
                               sigmasq = 0.5,
                               delta = 0.6,
                               nu = 5.89,
                               muz = 0,
                               rho = 36.0,
                               sigmasq_z = 0.6,
                               zeta0 = -1.8,
                               zeta1 = 0.1)
y <- output_data$y
X <- output_data$X
survey_weight <- output_data$survey_weight
# set the population size
population_N <- 5000
# set the sample size
n <- length(y)
# run the WFPBB algorithm
index_WFPBB <- WFPBB(y = 1:n,
                     w = survey_weight,
                     N = population_N,
                     n = n,
                     verbatim = FALSE)
print(head(index_WFPBB))

set.seed(100)
output_data <- reg_simulation3(N = 5000,
                               ni_lambda= 8,
                               beta = c(rep(1,6),rep(0,4)),
                               beta_b = 1.5,
                               dsq = 0.1,
                               sigmasq = 0.5,
                               delta = 0.6,
                               nu = 5.89,
                               muz = 0,
                               rho = 36.0,
                               sigmasq_z = 0.6,
                               zeta0 = -1.8,
                               zeta1 = 0.1)
y <- output_data$y
X <- output_data$X
survey_weight <- output_data$survey_weight
# set the population size
population_N <- 5000
# set the sample size
n <- length(y)
# run the WFPBB algorithm
index_WFPBB <- WFPBB(y = 1:n,
                     w = survey_weight,
                     N = population_N,
                     n = n,
                     verbatim = FALSE)
print(head(index_WFPBB))

Package 'MSIMST'

Help Index

The Associated Gibbs Sampler

Description

Usage

Arguments

Details

Value

Examples

The 'MSIMST' package.

Description

Author(s)

See Also

The Function to Calculate the phiX Matrix for Estimating Single-Index Function

Description

Usage

Arguments

Value

Examples

The Function for the Simulation Study without the Variable Selection

Description

Usage

Arguments

Details

Value

Examples

The Function for the Simulation Study with the Variable Selection

Description

Usage

Arguments

Details

Value

Examples

The Function for the Simulation Study with the Variable Selection and Survey Weights

Description

Usage

Arguments

Details

Value

Examples

Weighted Finite Population Bayesian Bootstrap

Description

Usage

Arguments

Value

References

Examples