| Title: | Aggregated Latent Space Index for Binary, Ordinal, and Continuous Data |
|---|---|
| Description: | Provides three stability-validated pipelines for computing an Aggregated Latent Space Index (ALSI): a binary MCA pipeline (alsi_workflow()), an ordinal pipeline using homals alternating least squares optimal scaling (alsi_workflow_ordinal()), and a continuous ipsatized SVD pipeline (calsi_workflow()). All three pipelines share a common bootstrap dual-criterion stability framework (principal angles and Tucker congruence phi) for determining the number of dimensions to retain before index construction. The package is designed to complement Segmented Profile Analysis (SEPA) and is intended for psychometric scale construction and dimensional reduction in survey and clinical research. |
| Authors: | Se-Kang Kim [aut, cre] (ORCID: <https://orcid.org/0000-0003-0928-3396>) |
| Maintainer: | Se-Kang Kim <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0 |
| Built: | 2026-06-02 09:30:04 UTC |
| Source: | https://github.com/cran/alsi |
Calculates ALSI as a variance-weighted Euclidean norm of row principal coordinates within a retained K-dimensional MCA subspace.
alsi(Fmat, eig, K)alsi(Fmat, eig, K)
Fmat |
Matrix of row principal coordinates (N x K or larger) |
eig |
Vector of eigenvalues (inertias) |
K |
Integer, number of dimensions to aggregate |
S3 object of class alsi containing:
alpha |
Numeric vector of ALSI values (length N), representing each individual's variance-weighted distance from the centroid in the retained MCA subspace |
w |
Variance weights (length K), computed as the proportion of retained inertia for each dimension |
alpha_vec |
Aggregated direction vector (length K), equal to sqrt(w), used for projecting category coordinates |
K |
Number of dimensions used in aggregation |
# Create example data set.seed(123) Fmat <- matrix(rnorm(100 * 4), nrow = 100, ncol = 4) eig <- c(0.5, 0.3, 0.15, 0.05) # Compute ALSI a <- alsi(Fmat, eig, K = 3) print(a) hist(a$alpha, main = "Distribution of ALSI")# Create example data set.seed(123) Fmat <- matrix(rnorm(100 * 4), nrow = 100, ncol = 4) eig <- c(0.5, 0.3, 0.15, 0.05) # Compute ALSI a <- alsi(Fmat, eig, K = 3) print(a) hist(a$alpha, main = "Distribution of ALSI")
Runs a complete ALSI analysis including parallel analysis for dimensionality assessment, bootstrap stability evaluation, ALSI computation, and visualization.
alsi_workflow( data, vars, B_pa = 2000, B_boot = 2000, q = 0.95, seed = 20260123 )alsi_workflow( data, vars, B_pa = 2000, B_boot = 2000, q = 0.95, seed = 20260123 )
data |
Data frame or path to .xlsx file |
vars |
Character vector of binary variable names |
B_pa |
Number of permutations for parallel analysis (default: 2000) |
B_boot |
Number of bootstrap resamples (default: 2000) |
q |
Quantile for parallel analysis (default: 0.95) |
seed |
Random seed for reproducibility |
List (returned invisibly) containing all analysis objects:
pa |
Parallel analysis results (class |
boot |
Bootstrap stability results (class |
fit |
MCA fit object (class |
alsi |
ALSI values (class |
K |
Number of dimensions retained based on parallel analysis |
data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") results <- alsi_workflow( data = ANR2, vars = vars, B_pa = 100, B_boot = 100 ) results$pa results$boot results$alsidata(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") results <- alsi_workflow( data = ANR2, vars = vars, B_pa = 100, B_boot = 100 ) results$pa results$boot results$alsi
Runs the four-stage ordinal ALSI pipeline:
Permutation parallel analysis (column-wise shuffle preserves marginals, destroys inter-item structure) determines K_PA.
Reference homals fit followed by varimax rotation on the stacked category score matrix (the loading analogue in homogeneity analysis). The same rotation matrix is applied to person scores.
Bootstrap dual-criterion stability. For each resample, homals is refitted and the category score matrix is Procrustes-aligned to the reference. Principal angle and Tucker congruence phi are computed on the same post-Procrustes matrix. K* is the largest k where ALL dimensions 1..k satisfy BOTH criteria simultaneously.
Eigenvalue-weighted linear ALSI index from K* retained rotated person scores (result can be negative; z-standardized version also returned).
alsi_workflow_ordinal( data, items, reversed_items = character(0L), scale_min = 1L, scale_max = 5L, n_permutations = 100L, pa_percentile = 95, B_boot = 1000L, angle_threshold_deg = 20, tucker_threshold = 0.85, seed = 12345L, itermax = 1000L, verbose = TRUE )alsi_workflow_ordinal( data, items, reversed_items = character(0L), scale_min = 1L, scale_max = 5L, n_permutations = 100L, pa_percentile = 95, B_boot = 1000L, angle_threshold_deg = 20, tucker_threshold = 0.85, seed = 12345L, itermax = 1000L, verbose = TRUE )
data |
A |
items |
Character vector of item column names. |
reversed_items |
Character vector of items to reverse-score
( |
scale_min |
Integer. Lowest valid response value (default 1). |
scale_max |
Integer. Highest valid response value (default 5). |
n_permutations |
Integer. Permutation replicates for Stage 1 (100). |
pa_percentile |
Numeric. Null-distribution percentile cutoff (95). |
B_boot |
Integer. Bootstrap replicates for Stage 3 (1000). |
angle_threshold_deg |
Numeric. Max principal angle in degrees for a dimension to pass the stability criterion (default 20). |
tucker_threshold |
Numeric. Min Tucker congruence phi for a dimension to pass the replicability criterion (default 0.85). |
seed |
Integer. Random seed (default 12345). |
itermax |
Integer. Max ALS iterations passed to homals (1000). |
verbose |
Logical. Print progress messages (default TRUE). |
An S3 object of class "alsi_ordinal" with components:
Numeric vector (n). Raw eigenvalue-weighted linear composite. Can be negative.
Numeric vector (n). Z-standardized ALSI.
Integer. Dimensions retained by parallel analysis.
Integer. Final model order after dual-criterion selection.
Matrix n x K_PA. Varimax-rotated person scores.
Matrix P x K_PA. Varimax-rotated stacked category scores.
Numeric vector (K_PA). Eigenvalues (invariant to varimax rotation).
Data frame. Per-dimension stability metrics (eigenvalue, angle, Tucker phi, pass/fail, grade).
Data frame. Parallel analysis results per dimension.
Integer. Bootstrap replicates discarded due to non-convergence or degenerate resamples.
The matched call.
de Leeuw, J., & Mair, P. (2009). Gifi methods for optimal scaling in R: The package homals. Journal of Statistical Software, 31(4), 1-21.
Gifi, A. (1990). Nonlinear multivariate analysis. Wiley.
Lorenzo-Seva, U., & ten Berge, J. M. F. (2006). Tucker's congruence coefficient as a meaningful index of factor similarity. Methodology, 2, 57-64.
Takane, Y., Young, F. W., & de Leeuw, J. (1977). Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika, 42, 7-67.
A binary indicator dataset recording the presence (1) or absence (0) of
nine psychiatric diagnoses for a sample of patients. The dataset is
included as the primary example dataset for the binary MCA pipeline
(alsi_workflow).
data(ANR2)data(ANR2)
A data frame with 13 columns:
Major Depressive Disorder (0/1)
Dysthymia (0/1)
Depressive disorder NOS (0/1)
Post-Traumatic Stress Disorder (0/1)
Obsessive-Compulsive Disorder (0/1)
Generalized Anxiety Disorder (0/1)
Anxiety disorder NOS (0/1)
Social Phobia (0/1)
Attention Deficit Hyperactivity Disorder (0/1)
Pre-treatment EDI score (numeric)
Post-treatment EDI score (numeric)
Pre-treatment BMI (numeric)
Post-treatment BMI (numeric)
data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") results <- alsi_workflow(ANR2, vars = vars, B_pa = 100, B_boot = 100)data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") results <- alsi_workflow(ANR2, vars = vars, B_pa = 100, B_boot = 100)
Calculates cALSI as a variance-weighted Euclidean norm of row coordinates within a retained K-dimensional ipsatized SVD subspace.
calsi(F, eig, K)calsi(F, eig, K)
F |
Matrix of row coordinates (N x K or larger) |
eig |
Vector of eigenvalues |
K |
Integer, number of dimensions to aggregate |
S3 object of class calsi containing:
alpha |
Numeric vector of cALSI values (length N) |
w |
Variance weights (length K) |
alpha_vec |
Aggregated direction vector (sqrt of weights) |
K |
Number of dimensions used |
Demonstrate what cALSI adds beyond SEPA
calsi_vs_sepa_demo(data, K = 4, B_boot = 2000, seed = 20260206)calsi_vs_sepa_demo(data, K = 4, B_boot = 2000, seed = 20260206)
data |
Data matrix |
K |
Number of dimensions |
B_boot |
Bootstrap samples for stability |
seed |
Random seed |
List with comparison results
Integrates parallel analysis, bootstrap stability, and cALSI computation.
calsi_workflow( data, B_pa = 2000, B_boot = 2000, q = 0.95, seed = 20260206, K_override = NULL )calsi_workflow( data, B_pa = 2000, B_boot = 2000, q = 0.95, seed = 20260206, K_override = NULL )
data |
Data frame or matrix of continuous variables |
B_pa |
Number of permutations for parallel analysis |
B_boot |
Number of bootstrap resamples |
q |
Quantile for parallel analysis |
seed |
Random seed |
K_override |
Optional: override parallel analysis K* with specified value |
List containing all analysis objects
Compare SEPA plane-wise summaries with cALSI
compare_sepa_calsi(fit, K, target_ids = NULL)compare_sepa_calsi(fit, K, target_ids = NULL)
fit |
SVD fit object |
K |
Number of dimensions |
target_ids |
Optional vector of person IDs to highlight |
Data frame comparing SEPA and cALSI person-level indices
Performs orthogonal Procrustes rotation to align a set of category coordinates to a reference solution, then applies sign correction to maximize agreement with the reference on each dimension.
mca_align(G, Gref)mca_align(G, Gref)
G |
Matrix of category coordinates to align (M x K) |
Gref |
Reference matrix of category coordinates (M x K) |
List containing:
G_aligned |
Matrix of aligned category coordinates (M x K), rotated and sign-corrected to match the reference |
R |
Orthogonal rotation matrix (K x K) used for alignment |
# Create example matrices set.seed(123) Gref <- matrix(rnorm(20), nrow = 10, ncol = 2) G <- Gref %*% matrix(c(0.8, 0.6, -0.6, 0.8), 2, 2) # Align G to Gref aligned <- mca_align(G, Gref) print(aligned$G_aligned)# Create example matrices set.seed(123) Gref <- matrix(rnorm(20), nrow = 10, ncol = 2) G <- Gref %*% matrix(c(0.8, 0.6, -0.6, 0.8), 2, 2) # Align G to Gref aligned <- mca_align(G, Gref) print(aligned$G_aligned)
Evaluates reproducibility of retained MCA dimensions via bootstrap resampling. Quantifies stability using Procrustes principal angles (subspace-level) and Tucker's congruence coefficients (dimension-level).
mca_bootstrap(data, vars, K, B = 2000, seed = 20260123, verbose = TRUE)mca_bootstrap(data, vars, K, B = 2000, seed = 20260123, verbose = TRUE)
data |
Data frame or path to .xlsx file |
vars |
Character vector of binary variable names |
K |
Integer, number of dimensions to retain and assess |
B |
Integer, number of bootstrap resamples (default: 2000) |
seed |
Integer, random seed for reproducibility |
verbose |
Logical, print progress messages |
S3 object of class mca_bootstrap containing:
ref |
Reference MCA fit object (class |
K |
Number of dimensions assessed |
B |
Number of bootstrap resamples performed |
angles |
Matrix of principal angles in degrees (B x K), measuring subspace similarity between bootstrap and reference solutions |
tucker |
Matrix of Tucker congruence coefficients (B x K), measuring dimension-level similarity after Procrustes alignment |
angles_summary |
Summary statistics (median, 5th, 95th percentiles) for principal angles |
tucker_summary |
Summary statistics (median, 5th, 95th percentiles) for Tucker congruence coefficients |
# Using included ANR2 dataset data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") boot <- mca_bootstrap(ANR2, vars = vars, K = 3, B = 100) print(boot)# Using included ANR2 dataset data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") boot <- mca_bootstrap(ANR2, vars = vars, K = 3, B = 100) print(boot)
Compares observed MCA eigenvalues against reference distributions from permuted data to identify statistically meaningful dimensions.
mca_pa( data, vars, B = 2000, q = 0.95, seed = 20260123, max_dims = 20, verbose = TRUE )mca_pa( data, vars, B = 2000, q = 0.95, seed = 20260123, max_dims = 20, verbose = TRUE )
data |
Data frame or path to .xlsx file |
vars |
Character vector of binary variable names |
B |
Integer, number of permutations (default: 2000) |
q |
Numeric, reference quantile for retention (default: 0.95) |
seed |
Integer, random seed for reproducibility |
max_dims |
Integer, maximum dimensions to display in plot |
verbose |
Logical, print progress messages |
S3 object of class mca_pa containing:
eig_obs |
Observed eigenvalues from the MCA of the original data |
eig_q |
Reference quantiles from permutation distribution |
eig_perm |
Matrix of permutation eigenvalues (B x dimensions) |
K_star |
Suggested number of dimensions to retain (where observed > reference) |
fit |
MCA fit object (class |
q |
Quantile threshold used for comparison |
B |
Number of permutations performed |
# Using included ANR2 dataset data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") pa <- mca_pa(ANR2, vars = vars, B = 100) print(pa$K_star)# Using included ANR2 dataset data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") pa <- mca_pa(ANR2, vars = vars, B = 100) print(pa$K_star)
Visualizes category coordinates in a 2D MCA subspace and optionally displays projections onto the aggregated ALSI direction.
plot_category_projections( fit, K, alpha_vec = NULL, dim_pair = c(1, 2), cex = 0.8, top_n = 15 )plot_category_projections( fit, K, alpha_vec = NULL, dim_pair = c(1, 2), cex = 0.8, top_n = 15 )
fit |
MCA fit object (class |
K |
Number of dimensions in retained subspace |
alpha_vec |
Optional aggregated direction vector (from |
dim_pair |
Integer vector of length 2, dimensions to plot (default: c(1,2)) |
cex |
Character expansion for labels |
top_n |
Number of top categories to display by projection (default: 15) |
No return value, called for side effects. The function creates
a scatter plot of category coordinates in the specified 2D subspace,
with category labels displayed. If alpha_vec is provided, it also
prints the top categories ranked by their absolute projection onto the
ALSI direction to the console.
data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") pa <- mca_pa(ANR2, vars = vars, B = 100, verbose = FALSE) fit <- pa$fit plot_category_projections(fit, K = pa$K_star)data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") pa <- mca_pa(ANR2, vars = vars, B = 100, verbose = FALSE) fit <- pa$fit plot_category_projections(fit, K = pa$K_star)
Visualizes domain loadings in a 2D subspace (biplot-style).
plot_domain_loadings(fit, dim_pair = c(1, 2), cex = 1)plot_domain_loadings(fit, dim_pair = c(1, 2), cex = 1)
fit |
SVD fit object (class |
dim_pair |
Integer vector of length 2, dimensions to plot |
cex |
Character expansion for labels |
Creates diagnostic plots showing distributions of principal angles and Tucker congruence coefficients across bootstrap resamples.
plot_subspace_stability(boot_obj)plot_subspace_stability(boot_obj)
boot_obj |
Object of class |
No return value, called for side effects. The function creates a two-panel figure with: (1) boxplots of principal angles (left panel), showing the distribution of subspace similarity across bootstrap resamples for each dimension; and (2) boxplots of Tucker congruence coefficients (right panel), showing dimension-level replicability with reference lines at phi = 0.85 (good) and phi = 0.95 (excellent).
data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") boot <- mca_bootstrap(ANR2, vars = vars, K = 3, B = 100) plot_subspace_stability(boot)data(ANR2) vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD") boot <- mca_bootstrap(ANR2, vars = vars, K = 3, B = 100) plot_subspace_stability(boot)
Plot Subspace Stability Diagnostics for Continuous Data
plot_subspace_stability_cont(boot_obj)plot_subspace_stability_cont(boot_obj)
boot_obj |
Object of class |
Align SVD solution via Procrustes rotation with sign anchoring
svd_align(B, Bref)svd_align(B, Bref)
B |
Matrix of domain loadings to align |
Bref |
Reference matrix of domain loadings |
List with aligned coordinates and rotation matrix
Evaluates reproducibility of retained dimensions via bootstrap resampling. Uses Procrustes principal angles (subspace-level) and Tucker's congruence coefficients (dimension-level).
svd_bootstrap(data, K, B = 2000, seed = 20260206, verbose = TRUE)svd_bootstrap(data, K, B = 2000, seed = 20260206, verbose = TRUE)
data |
Data frame or matrix of continuous variables |
K |
Integer, number of dimensions to assess |
B |
Integer, number of bootstrap resamples (default: 2000) |
seed |
Integer, random seed for reproducibility |
verbose |
Logical, print progress messages |
S3 object of class svd_bootstrap
Uses the paran package (Horn's parallel analysis with Longman-Allen-Chabassol bias adjustment) for dimensionality assessment, ensuring compatibility with SEPA methodology. Falls back to a built-in method if paran is unavailable.
svd_pa(data, B = 2000, q = 0.95, seed = 20260206, graph = TRUE, verbose = TRUE)svd_pa(data, B = 2000, q = 0.95, seed = 20260206, graph = TRUE, verbose = TRUE)
data |
Data frame or matrix of continuous variables |
B |
Integer, number of iterations for paran (default: 2000) |
q |
Numeric, centile for retention threshold (default: 0.95) |
seed |
Integer, random seed for reproducibility |
graph |
Logical, whether to display the scree plot (default: TRUE) |
verbose |
Logical, print progress messages |
This function primarily uses the paran package, which implements Horn's parallel analysis with the bias adjustment described in Longman, Cota, Holden, & Fekken (1989). This is the same method used in SEPA.
The paran package should be installed: install.packages("paran")
S3 object of class svd_pa containing:
eig_obs |
Observed eigenvalues |
eig_adj |
Adjusted eigenvalues (from paran) |
eig_rand |
Random eigenvalues (threshold) |
K_star |
Number of dimensions to retain |
fit |
SVD fit object for downstream cALSI computation |
method |
Method used ("paran" or "fallback") |