| Title: | Find Range of Cronbach Alpha with a Dataset Including Missing Data |
|---|---|
| Description: | Provides functions to calculate the minimum and maximum possible values of Cronbach's alpha when item-level missing data are present. Cronbach's alpha (Cronbach, 1951 <doi:10.1007/BF02310555>) is one of the most widely used measures of internal consistency in the social, behavioral, and medical sciences (Bland & Altman, 1997 <doi:10.1136/bmj.314.7080.572>; Tavakol & Dennick, 2011 <doi:10.5116/ijme.4dfb.8dfd>). However, conventional implementations assume complete data, and listwise deletion is often applied when missingness occurs, which can lead to biased or overly optimistic reliability estimates (Enders, 2003 <doi:10.1037/1082-989X.8.3.322>). This package implements computational strategies including enumeration, Monte Carlo sampling, and optimization algorithms (e.g., Genetic Algorithm, Differential Evolution, Sequential Least Squares Programming) to obtain sharp lower and upper bounds of Cronbach's alpha under arbitrary missing data patterns. The approach is motivated by Manski's partial identification framework and pessimistic bounding ideas from optimization literature. |
| Authors: | Feng Ji [aut], Biying Zhou [aut, cre] |
| Maintainer: | Biying Zhou <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0 |
| Built: | 2026-05-27 07:57:22 UTC |
| Source: | https://github.com/cran/missalpha |
This function calculates the maximum possible value of Cronbach's alpha by using a binary search algorithm with optimization methods. The function iteratively narrows the bounds of alpha until the desired tolerance level is reached.
compute_alpha_max( n_item, sigma_x_info, sigma_y_info, score_max = 1, alpha_lb = 0, alpha_ub = 1, tol = 0.001, num_try = 1, method = "GA", ... )compute_alpha_max( n_item, sigma_x_info, sigma_y_info, score_max = 1, alpha_lb = 0, alpha_ub = 1, tol = 0.001, num_try = 1, method = "GA", ... )
n_item |
An integer specifying the number of items (columns) in the score matrix. |
sigma_x_info |
A list containing the quadratic function information for sigma_x. It should include:
|
sigma_y_info |
A list containing the quadratic function information for sigma_y. It should include:
|
score_max |
An integer specifying the largest possible score for any test item. Default is 1. |
alpha_lb |
A numeric value specifying the lower bound of alpha, usually set to 0.0. |
alpha_ub |
A numeric value specifying the upper bound of alpha, usually set to 1.0. |
tol |
A numeric value representing the desired accuracy for narrowing down the bounds between |
num_try |
An integer specifying the number of times to run the optimization algorithm in each iteration. Default is 1. |
method |
A character string specifying the optimization method to be used. Options are 'GA' (Genetic Algorithm), 'DEoptim' (Differential Evolution), and 'nloptr' (Sequential Least Squares Programming). Default is 'GA'. |
... |
Additional parameters passed to the optimization algorithm. |
This function finds the maximum possible Cronbach's alpha by using an iterative binary search algorithm. It evaluates the feasibility of each midpoint value of alpha by solving the corresponding optimization problem with the chosen method.
The optimization methods can be specified via the method parameter, and additional control parameters for the optimization methods can be passed through the ... argument. The function adjusts the upper and lower bounds of alpha until the tolerance criterion is met.
A numeric value representing the maximum possible Cronbach's alpha.
compute_alpha_min, examine_alpha_bound
This function calculates the minimum possible value of Cronbach's alpha by using a binary search algorithm with optimization methods. The function iteratively narrows the bounds of alpha until the desired tolerance level is reached.
compute_alpha_min( n_item, sigma_x_info, sigma_y_info, score_max = 1, alpha_lb = 0, alpha_ub = 1, tol = 0.001, num_try = 1, method = "GA", ... )compute_alpha_min( n_item, sigma_x_info, sigma_y_info, score_max = 1, alpha_lb = 0, alpha_ub = 1, tol = 0.001, num_try = 1, method = "GA", ... )
n_item |
An integer specifying the number of items (columns) in the score matrix. |
sigma_x_info |
A list containing the quadratic function information for sigma_x. It should include:
|
sigma_y_info |
A list containing the quadratic function information for sigma_y. It should include:
|
score_max |
An integer specifying the largest possible score for any test item. Default is 1. |
alpha_lb |
A numeric value specifying the lower bound of alpha, usually set to 0.0. |
alpha_ub |
A numeric value specifying the upper bound of alpha, usually set to 1.0. |
tol |
A numeric value representing the desired accuracy for narrowing down the bounds between |
num_try |
An integer specifying the number of times to run the optimization algorithm in each iteration. Default is 1. |
method |
A character string specifying the optimization method to be used. Options are 'GA' (Genetic Algorithm), 'DEoptim' (Differential Evolution), and 'nloptr' (Sequential Least Squares Programming). Default is 'GA'. |
... |
Additional parameters passed to the optimization algorithm. |
This function finds the minimum possible Cronbach's alpha by using an iterative binary search algorithm. It evaluates the feasibility of each midpoint value of alpha by solving the corresponding optimization problem with the chosen method.
The optimization methods can be specified via the method parameter, and additional control parameters for the optimization methods can be passed through the ... argument. The function adjusts the upper and lower bounds of alpha until the tolerance criterion is met.
A numeric value representing the minimum possible Cronbach's alpha.
compute_alpha_max, examine_alpha_bound
This function computes the minimum and maximum possible values of Cronbach's alpha by enumerating all possible values for unknown entries in the score matrix.
cronbach_alpha_enum(scores_mat, score_max)cronbach_alpha_enum(scores_mat, score_max)
scores_mat |
A matrix where rows represent individuals and columns represent test items. It contains the performance of individuals on different test items, with NA for missing values. |
score_max |
An integer specifying the largest possible score for any test item. |
This function works by enumerating all possible combinations of values for missing entries (represented by NA) in the scores_mat.
It systematically explores all combinations of missing values from 0 to score_max using the expand.grid function.
For each combination, it calculates Cronbach's alpha using the compute_cronbach_alpha function and keeps track of the minimum and maximum alpha values encountered.
The enumeration ensures that the function finds the exact minimum and maximum possible values of Cronbach's alpha given the possible missing score combinations. However, due to the exhaustive nature of enumeration, this function may become computationally expensive for large datasets or a high number of missing values.
A numeric vector of length 2, where the first element is the minimum Cronbach's alpha and the second element is the maximum Cronbach's alpha.
This function computes a rough approximation of the lower and upper bounds of Cronbach's alpha by performing random sampling or integer sampling for missing values in the score matrix.
cronbach_alpha_rough(scores_mat, score_max, num_try = 1000, int_only = FALSE)cronbach_alpha_rough(scores_mat, score_max, num_try = 1000, int_only = FALSE)
scores_mat |
A matrix where rows represent persons and columns represent tests (or items), providing the performance of a person on a test. NA should be used for missing values. |
score_max |
An integer indicating the largest possible score of the test. |
num_try |
An integer specifying the number of random samples to generate in order to estimate the lower and upper bounds. Default is 1000. |
int_only |
A logical value indicating whether to sample only integer values for missing scores. If FALSE, floating-point values between 0 and |
This function performs random sampling to estimate the bounds of Cronbach's alpha for a given test score matrix with missing values. It first calculates the alpha assuming all missing values are either 0 or score_max. Then, it iteratively samples either integer values or continuous values (depending on the value of int_only) for the missing scores and recalculates the Cronbach's alpha. The minimum and maximum alphas observed over all iterations are returned as the estimated bounds.
A numeric vector of length 2, where the first value is the estimated minimum Cronbach's alpha and the second value is the estimated maximum Cronbach's alpha.
This function computes the lower and upper bound of Cronbach's alpha using various methods such as enumeration, random sampling, or optimization algorithms. The function also supports rough approximations and allows integer-only or floating-point scores during sampling.
cronbachs_alpha( scores_mat, score_max, tol = 0.001, num_random = 1000, enum_all = FALSE, rough = FALSE, num_opt = 1, int_only = TRUE, method = "GA", ... )cronbachs_alpha( scores_mat, score_max, tol = 0.001, num_random = 1000, enum_all = FALSE, rough = FALSE, num_opt = 1, int_only = TRUE, method = "GA", ... )
scores_mat |
A matrix where rows represent persons and columns represent tests (or items), providing the performance of a person on a test. NA should be used for missing values. |
score_max |
An integer indicating the largest possible score of the test. |
tol |
A numeric value representing the desired accuracy in computing the lower and upper bound of Cronbach's alpha. |
num_random |
An integer specifying the number of random samples used in estimating the lower and upper bound. Default is 1000. |
enum_all |
A logical value indicating whether to enumerate all possible scores for Cronbach's alpha. Default is FALSE. |
rough |
A logical value indicating whether to compute a rough approximation of Cronbach's alpha bounds. Default is FALSE. |
num_opt |
An integer specifying the number of times to run the optimization algorithm. Default is 1. |
int_only |
A logical value indicating whether the random sampling should be restricted to integer-only scores. Default is TRUE. |
method |
A character string specifying the optimization method to be used ('GA', 'DEoptim', 'nloptr'). Default is 'GA'. |
... |
Additional parameters passed to the optimization algorithm. |
A list containing:
alpha_min_opt |
The smallest possible Cronbach's alpha computed by the optimization algorithm (if used). |
alpha_max_opt |
The largest possible Cronbach's alpha computed by the optimization algorithm (if used). |
alpha_min_enum |
The smallest possible Cronbach's alpha obtained by enumerating all possible scores (if used). |
alpha_max_enum |
The largest possible Cronbach's alpha obtained by enumerating all possible scores (if used). |
alpha_min_rough |
The smallest possible Cronbach's alpha obtained by rough approximation (if used). |
alpha_max_rough |
The largest possible Cronbach's alpha obtained by rough approximation (if used). |
method |
The optimization method used. |
runtime |
The total computation time in seconds. |
compute_alpha_min, compute_alpha_max, cronbach_alpha_enum, cronbach_alpha_rough, generate_scores_mat_bernoulli, qp_solver
# Example 1: Run `cronbachs_alpha` with a sample matrix scores_mat <- matrix(c( NaN, 1, 0, 0, 0, 0, 0, 0, NaN, 0, 0, 0, 2, 0, 0, 1, NaN, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1 ), nrow = 10, ncol = 4, byrow = TRUE) result <- cronbachs_alpha(scores_mat, score_max = 4, enum_all = FALSE) print(result$alpha_min_opt) print(result$alpha_max_opt) # Example 2: Generate a Bernoulli matrix and compute Cronbach's alpha score_max <- 2 scores_mat_bernoulli <- generate_scores_mat_bernoulli(50, 10, 20, score_max) result <- cronbachs_alpha(scores_mat_bernoulli, score_max, enum_all = FALSE) print(result$alpha_min_opt) print(result$alpha_max_opt) # Example 3: Using a predefined dataset from missalpha scores_df <- missalpha::sample scores_mat <- as.matrix(scores_df) result <- cronbachs_alpha(scores_mat, score_max = 4, enum_all = FALSE) print(result$alpha_min_opt) print(result$alpha_max_opt)# Example 1: Run `cronbachs_alpha` with a sample matrix scores_mat <- matrix(c( NaN, 1, 0, 0, 0, 0, 0, 0, NaN, 0, 0, 0, 2, 0, 0, 1, NaN, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1 ), nrow = 10, ncol = 4, byrow = TRUE) result <- cronbachs_alpha(scores_mat, score_max = 4, enum_all = FALSE) print(result$alpha_min_opt) print(result$alpha_max_opt) # Example 2: Generate a Bernoulli matrix and compute Cronbach's alpha score_max <- 2 scores_mat_bernoulli <- generate_scores_mat_bernoulli(50, 10, 20, score_max) result <- cronbachs_alpha(scores_mat_bernoulli, score_max, enum_all = FALSE) print(result$alpha_min_opt) print(result$alpha_max_opt) # Example 3: Using a predefined dataset from missalpha scores_df <- missalpha::sample scores_mat <- as.matrix(scores_df) result <- cronbachs_alpha(scores_mat, score_max = 4, enum_all = FALSE) print(result$alpha_min_opt) print(result$alpha_max_opt)
This function computes and displays all possible combinations of Cronbach's alpha bounds for various parameter settings, including combinations of optimization methods, rough approximation, random sampling, and enumeration.
display_all( scores_mat, score_max, tol = 0.001, num_random = 1000, num_opt = 1, methods = c("GA", "DEoptim", "nloptr"), enum_all = FALSE, rough = TRUE )display_all( scores_mat, score_max, tol = 0.001, num_random = 1000, num_opt = 1, methods = c("GA", "DEoptim", "nloptr"), enum_all = FALSE, rough = TRUE )
scores_mat |
A matrix where rows represent persons and columns represent tests (or items), providing the performance of a person on a test. NA should be used for missing values. |
score_max |
An integer indicating the largest possible score of the test. |
tol |
A numeric value representing the desired accuracy in computing the lower and upper bound of Cronbach's alpha. |
num_random |
An integer specifying the number of random samples used in estimating the lower and upper bound. Default is 1000. |
num_opt |
An integer specifying the number of times to run the optimization algorithm. Default is 1. |
methods |
A character vector specifying the optimization methods to be used (e.g., 'GA', 'DEoptim', 'nloptr'). Default is c('GA'). |
enum_all |
Logical, whether to include enumeration in the parameter combinations. Default is FALSE. |
rough |
Logical, whether to include rough approximation in the parameter combinations. Default is TRUE. |
A list where each element is a result of the cronbachs_alpha function for a unique parameter combination, including computation time.
This function checks whether a given value of alpha is a feasible solution
to a min/max optimization problem using quadratic functions for sigma_x and sigma_y.
The function supports different optimization methods and iteratively attempts to solve the problem.
examine_alpha_bound( alpha, n_item, sigma_x_info, sigma_y_info, alpha_type, score_max = 1, num_try = 1, method = "GA", ... )examine_alpha_bound( alpha, n_item, sigma_x_info, sigma_y_info, alpha_type, score_max = 1, num_try = 1, method = "GA", ... )
alpha |
A numeric value representing the |
n_item |
An integer representing the number of items or columns in the data. |
sigma_x_info |
A list containing the quadratic function information for sigma_x, including:
|
sigma_y_info |
A list containing the quadratic function information for sigma_y, including:
|
alpha_type |
A character string indicating whether the problem is to minimize or maximize alpha.
It must be either |
score_max |
An integer representing the largest possible score for any test item. Default is 1. |
num_try |
An integer specifying the number of times to run the optimization algorithm. Default is 1. |
method |
A character string specifying the optimization method to use.
Options are |
... |
Additional parameters passed to the optimization algorithm. |
The function combines quadratic information from sigma_x_info and sigma_y_info
to form a new optimization problem. The optimization checks whether the value of alpha is feasible
for either a minimization or maximization problem, depending on the value of alpha_type.
The function supports multiple optimization methods, including Genetic Algorithm (GA),
Differential Evolution (DEoptim), and Sequential Least Squares Programming (nloptr).
Additional control parameters can be passed through the ... argument to fine-tune the optimization process.
For each iteration, the function calls qp_solver with the combined quadratic function
and checks whether the objective function's value is feasible (i.e., less than or equal to 0).
A list with the following elements:
result |
A logical value indicating whether the |
x_value |
A numeric vector representing the optimal values of the decision variables, or |
qp_solver, compute_alpha_min, compute_alpha_max
This function generates a matrix of scores for a set of people and items,
where the scores are generated using a Bernoulli distribution with person-specific probabilities.
It also allows for some scores to be missing (represented by NA).
generate_scores_mat_bernoulli(n_person, n_item, n_missing, score_max = 1)generate_scores_mat_bernoulli(n_person, n_item, n_missing, score_max = 1)
n_person |
An integer representing the number of people (rows in the matrix). |
n_item |
An integer representing the number of items (columns in the matrix). |
n_missing |
An integer representing the number of missing scores (set to |
score_max |
An integer representing the largest possible score for any item. Default is 1. |
The function generates a score matrix where each person's score for each item is drawn
from a Bernoulli distribution with a person-specific probability. A number of scores are set to NA
to simulate missing values.
The probability of each person scoring on the items is determined by randomly generating a
probability for each person using runif. The Bernoulli distribution is then used
(via rbinom) to generate the scores, and NA values are assigned to randomly selected positions
in the matrix based on n_missing.
A matrix of size n_person by n_item containing generated scores (0 or score_max)
with some values replaced by NA to simulate missing data.
# Generate a 10x5 score matrix with 10 missing values and maximum score of 1 scores_mat <- generate_scores_mat_bernoulli(10, 5, 10, score_max = 1) print(scores_mat)# Generate a 10x5 score matrix with 10 missing values and maximum score of 1 scores_mat <- generate_scores_mat_bernoulli(10, 5, 10, score_max = 1) print(scores_mat)
This function provides a general interface to solve quadratic programming problems using different optimization methods. It supports Genetic Algorithm (GA), Differential Evolution (DEoptim), and Sequential Least Squares Programming (SLSQP).
qp_solver(n, A, b, c, x_max = 1, method = "GA", print_message = FALSE, ...)qp_solver(n, A, b, c, x_max = 1, method = "GA", print_message = FALSE, ...)
n |
An integer representing the number of decision variables. |
A |
A matrix representing the quadratic coefficients. |
b |
A numeric vector representing the linear coefficients. |
c |
A numeric scalar representing the constant term in the objective function. |
x_max |
An integer representing the upper bound for the decision variables. Default is 1. |
method |
A character string specifying the optimization method to use. Can be |
print_message |
A logical value indicating whether to print optimization details. Default is FALSE. |
... |
Additional control parameters passed to the chosen optimization method. |
A list containing:
f_cd |
The optimal objective function value. |
x_value |
The optimal values of the decision variables. |
qp_solver_DEoptim, qp_solver_GA, qp_solver_nloptr
This function solves a quadratic programming problem using the Differential Evolution optimization method from the DEoptim package.
qp_solver_DEoptim( n, A, b, c, x_max = 1, print_message = FALSE, NP = 100, itermax = 100, ... )qp_solver_DEoptim( n, A, b, c, x_max = 1, print_message = FALSE, NP = 100, itermax = 100, ... )
n |
An integer representing the number of decision variables. |
A |
A matrix representing the quadratic coefficients. |
b |
A numeric vector representing the linear coefficients. |
c |
A numeric scalar representing the constant term in the objective function. |
x_max |
An integer representing the upper bound for the decision variables. Default is 1. |
print_message |
A logical value indicating whether to print optimization details. Default is FALSE. |
NP |
An integer specifying the population size for the DEoptim algorithm. Default is 50. |
itermax |
An integer specifying the maximum number of iterations. Default is 100. |
... |
Additional control parameters for |
A list containing:
f_cd |
The optimal objective function value. |
x_value |
The optimal values of the decision variables. |
qp_solver, qp_solver_GA, qp_solver_nloptr
This function solves a quadratic programming problem using the Genetic Algorithm (GA) from the GA package.
qp_solver_GA( n, A, b, c, x_max = 1, print_message = FALSE, maxiter = 1000, popSize = 50, pmutation = 0.2, elitism = 5, monitor = FALSE, seed = 123, ... )qp_solver_GA( n, A, b, c, x_max = 1, print_message = FALSE, maxiter = 1000, popSize = 50, pmutation = 0.2, elitism = 5, monitor = FALSE, seed = 123, ... )
n |
An integer representing the number of decision variables. |
A |
A matrix representing the quadratic coefficients. |
b |
A numeric vector representing the linear coefficients. |
c |
A numeric scalar representing the constant term in the objective function. |
x_max |
An integer representing the upper bound for the decision variables. Default is 1. |
print_message |
A logical value indicating whether to print optimization details. Default is FALSE. |
maxiter |
An integer specifying the maximum number of iterations. Default is 1000. |
popSize |
An integer specifying the population size for the GA. Default is 50. |
pmutation |
A numeric value for mutation probability. Default is 0.2. |
elitism |
An integer specifying the number of elite individuals to carry over to the next generation. Default is 5. |
monitor |
A logical value indicating whether to display progress. Default is FALSE. |
seed |
A numeric value used for the random number generator. Default is 123. |
... |
Additional control parameters for |
A list containing:
f_cd |
The optimal objective function value. |
x_value |
The optimal values of the decision variables. |
qp_solver, qp_solver_DEoptim, qp_solver_nloptr
This function solves a quadratic programming problem using the Sequential Least Squares Programming (SLSQP) algorithm from the nloptr package.
qp_solver_nloptr( n, A, b, c, x_max = 1, print_message = FALSE, xtol_rel = 1e-08, maxeval = 10000, print_level = 0, ... )qp_solver_nloptr( n, A, b, c, x_max = 1, print_message = FALSE, xtol_rel = 1e-08, maxeval = 10000, print_level = 0, ... )
n |
An integer representing the number of decision variables. |
A |
A matrix representing the quadratic coefficients. |
b |
A numeric vector representing the linear coefficients. |
c |
A numeric scalar representing the constant term in the objective function. |
x_max |
An integer representing the upper bound for the decision variables. Default is 1. |
print_message |
A logical value indicating whether to print optimization details. Default is FALSE. |
xtol_rel |
A numeric value specifying the relative tolerance for convergence. Default is |
maxeval |
An integer specifying the maximum number of function evaluations. Default is 10000. |
print_level |
An integer controlling the verbosity of output. Default is 0. |
... |
Additional control parameters for |
A list containing:
f_cd |
The optimal objective function value. |
x_value |
The optimal values of the decision variables. |
qp_solver, qp_solver_DEoptim, qp_solver_GA
This dataset contains a matrix of scores with 50 rows and 4 columns,
representing 50 individuals and 4 test items. Some entries are NA,
indicating missing data.
samplesample
A 50 x 4 matrix:
Each row represents an individual (total 50 individuals).
Each column represents a test item or score (total 4 items).
Some entries are NA, representing missing data.
Generated for demonstration purposes.
# Load the sample dataset data(sample) # Display the first few rows of the sample dataset head(sample)# Load the sample dataset data(sample) # Display the first few rows of the sample dataset head(sample)