| Title: | Segment Profile Extraction via Pattern Analysis |
|---|---|
| Description: | Implements the Segment Profile Extraction via Pattern Analysis method for row-mean-centered multivariate data. Core capabilities include SVD-based row-isometric biplot construction, bias-corrected and accelerated, and percentile bootstrap confidence intervals for domain coordinates and per-person direction cosines, Procrustes alignment of bootstrap replicates across planes, parallel analysis for dimensionality selection, and segment profile reconstruction in planes defined by pairs of singular dimensions. A synthetic Woodcock-Johnson IV look-alike dataset is provided for examples and testing. The method is described in Kim and Grochowalski (2019) <doi:10.1007/s00357-018-9277-7>. |
| Authors: | Se-Kang Kim [aut, cre] (ORCID: <https://orcid.org/0000-0003-0928-3396>) |
| Maintainer: | Se-Kang Kim <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-25 07:16:54 UTC |
| Source: | https://github.com/cran/SEPA |
Loops over columns of a boot object and calls
boot.ci for each, returning a tidy data frame. Falls
back to percentile intervals if the BCa calculation fails.
boot_cis_all(boot_obj, type = c("bca", "perc"), level = 0.95, idx_vec = NULL)boot_cis_all(boot_obj, type = c("bca", "perc"), level = 0.95, idx_vec = NULL)
boot_obj |
An object of class |
type |
Character vector passed to |
level |
Numeric confidence level. Default |
idx_vec |
Integer vector of column indices to process. Defaults to
all columns of |
A data frame with columns index, lwr, upr,
and method (one row per element of idx_vec).
## Not run: # See run_sepa() for an end-to-end example ## End(Not run)## Not run: # See run_sepa() for an end-to-end example ## End(Not run)
Produces a base-R row-isometric biplot for a specified pair of dimensions
(p1, p2). All persons are plotted as grey dots; a subset
specified by ids_highlight is overlaid in red and labelled. Domain
loading vectors are drawn as arrows. The plot is optionally saved to a PDF.
draw_sepa_biplot( svd_fit, id_vec, domain_names, p1 = 1L, p2 = 2L, ids_highlight = NULL, out_file = NULL, a_scale = 35, t_scale = 40, arrow_col = "#1F4E79", hi_col = "red3", others_alpha = 0.3 )draw_sepa_biplot( svd_fit, id_vec, domain_names, p1 = 1L, p2 = 2L, ids_highlight = NULL, out_file = NULL, a_scale = 35, t_scale = 40, arrow_col = "#1F4E79", hi_col = "red3", others_alpha = 0.3 )
svd_fit |
List with components |
id_vec |
Vector of length |
domain_names |
Character vector of length |
p1 |
Integer. First dimension (x-axis). Default |
p2 |
Integer. Second dimension (y-axis). Default |
ids_highlight |
Optional vector of IDs to emphasise. Matched against
|
out_file |
Character or |
a_scale |
Numeric. Arrow scaling factor. Default |
t_scale |
Numeric. Label scaling factor. Default |
arrow_col |
Colour string for domain arrows and labels.
Default |
hi_col |
Colour string for highlighted persons.
Default |
others_alpha |
Alpha transparency for background persons.
Default |
Invisibly returns a list with the plotting coordinates:
Fx, Fy (person scores), end_x, end_y
(arrow tips), lab_x, lab_y (domain labels).
X <- as.matrix(fake_wj[, c("LT","ST","CP","AP","VP","CK","FR")]) Xs <- X - rowMeans(X) sv <- svd(Xs) draw_sepa_biplot( svd_fit = list(U = sv$u, d = sv$d, V = sv$v), id_vec = fake_wj$ID, domain_names = c("LT","ST","CP","AP","VP","CK","FR"), p1 = 1L, p2 = 2L, ids_highlight = c(724, 944) )X <- as.matrix(fake_wj[, c("LT","ST","CP","AP","VP","CK","FR")]) Xs <- X - rowMeans(X) sv <- svd(Xs) draw_sepa_biplot( svd_fit = list(U = sv$u, d = sv$d, V = sv$v), id_vec = fake_wj$ID, domain_names = c("LT","ST","CP","AP","VP","CK","FR"), p1 = 1L, p2 = 2L, ids_highlight = c(724, 944) )
A synthetic dataset generated by simulate_sepa_fake_wj that
approximates the observed marginal distributions (means, SDs, and ranges)
of seven WJ-IV broad ability scores while respecting the qualitative
level-elevation / pattern-elevation structure assumed by SEPA. The original
WJ-IV norming data are proprietary; this object provides a fully
reproducible, publicly shareable substitute.
fake_wjfake_wj
A data frame with 5\,127 rows and 8 columns:
Integer person identifier (1–5127).
Long-term retrieval broad ability score.
Short-term working memory score.
Cognitive processing speed score.
Auditory processing score.
Visual processing score.
Comprehension-knowledge score.
Fluid reasoning score.
All domain scores are in a standard score metric (mean ,
SD ) and clipped to the reported empirical range.
Three attributes capture the generative parameters:
B_loadings ( orthonormal loading matrix),
lambda (PE dimension variances), and sigma_LE (LE SD).
Generated by simulate_sepa_fake_wj(n = 5127, seed = 20251127).
See data-raw/generate_fake_wj.R for the exact code.
dim(fake_wj) summary(fake_wj[, c("LT","ST","CP","AP","VP","CK","FR")])dim(fake_wj) summary(fake_wj[, c("LT","ST","CP","AP","VP","CK","FR")])
Determines the number of statistically significant singular dimensions in an
ipsatized score matrix by comparing observed squared singular values to the
conf-quantile of the null distribution obtained by column-permuting
and re-ipsatizing the data B times.
parallel_analysis_ipsatized( Xstar, B = 2000L, Kmax = 10L, conf = 0.95, seed = 123L )parallel_analysis_ipsatized( Xstar, B = 2000L, Kmax = 10L, conf = 0.95, seed = 123L )
Xstar |
Numeric matrix. Ipsatized (row-mean-centered) data,
|
B |
Integer. Number of permutation replicates. Default
|
Kmax |
Integer. Maximum number of dimensions to evaluate.
Internally capped at |
conf |
Numeric in |
seed |
Integer random seed. Default |
A named list with three elements:
sig_dimsInteger vector of dimension indices (1-based) whose observed eigenvalue exceeds the null threshold.
eig_obsNumeric vector of length Kmax: observed
squared singular values.
thrNumeric vector of length Kmax: permutation
null thresholds at level conf.
X <- simulate_sepa_fake_wj(n = 300, seed = 1) Xs <- X[, c("LT","ST","CP","AP","VP","CK","FR")] Xs <- as.matrix(Xs) - rowMeans(as.matrix(Xs)) # ipsatize pa <- parallel_analysis_ipsatized(Xs, B = 100, Kmax = 6, seed = 42) pa$sig_dimsX <- simulate_sepa_fake_wj(n = 300, seed = 1) Xs <- X[, c("LT","ST","CP","AP","VP","CK","FR")] Xs <- as.matrix(Xs) - rowMeans(as.matrix(Xs)) # ipsatize pa <- parallel_analysis_ipsatized(Xs, B = 100, Kmax = 6, seed = 42) pa$sig_dims
Percentile confidence intervals from a matrix of bootstrap draws
percentile_ci_mat(M, level = 0.95)percentile_ci_mat(M, level = 0.95)
M |
Numeric matrix with bootstrap replicates in rows and statistics in columns. |
level |
Numeric confidence level. Default |
A two-column matrix with columns qlo and qhi,
one row per column of M.
set.seed(1) M <- matrix(rnorm(1000 * 5), 1000, 5) percentile_ci_mat(M, level = 0.95)set.seed(1) M <- matrix(rnorm(1000 * 5), 1000, 5) percentile_ci_mat(M, level = 0.95)
Print method for sepa_result objects
## S3 method for class 'sepa_result' print(x, ...)## S3 method for class 'sepa_result' print(x, ...)
x |
A |
... |
Ignored. |
Invisibly returns x, the sepa_result object passed in.
Called primarily for its side effect of printing a compact summary to the
console, including sample size, number of domains, number of dimensions,
parallel-analysis significant dimensions, and marker domains.
Convenience wrapper that executes the full Subprofile Extraction via Pattern
Analysis (SEPA) pipeline on a matrix of domain scores. The function
ipsatizes the data, fits a rank- row-isometric SVD biplot, computes
SEPA statistics (plane-fit rho and direction cosines), runs parallel
analysis, bootstraps domain coordinates with BCa confidence intervals, and
bootstraps per-person cosines with percentile confidence intervals.
run_sepa( data, K = 4L, target_ids = NULL, B_dom = 2000L, B_cos = 2000L, alpha_ci = 0.95, seed = 20251003L, pa_B = 2000L, use_parallel = FALSE, ncores = NULL, run_pa = TRUE, run_boot_dom = TRUE, run_boot_cos = TRUE, verbose = TRUE )run_sepa( data, K = 4L, target_ids = NULL, B_dom = 2000L, B_cos = 2000L, alpha_ci = 0.95, seed = 20251003L, pa_B = 2000L, use_parallel = FALSE, ncores = NULL, run_pa = TRUE, run_boot_dom = TRUE, run_boot_cos = TRUE, verbose = TRUE )
data |
A numeric matrix or data frame of domain scores.
Rows are persons; columns are domains. An optional column named
|
K |
Integer. Number of SVD dimensions to retain.
Default |
target_ids |
Optional vector of person IDs (matched against the
|
B_dom |
Integer. Bootstrap replicates for domain-coordinate
CIs. Default |
B_cos |
Integer. Bootstrap replicates for per-person cosine
CIs. Default |
alpha_ci |
Numeric confidence level. Default |
seed |
Integer random seed. Default |
pa_B |
Integer. Permutation replicates for parallel
analysis. Default |
use_parallel |
Logical. Use parallel processing for the bootstrap?
Default |
ncores |
Integer or |
run_pa |
Logical. Run parallel analysis? Default |
run_boot_dom |
Logical. Run domain-coordinate bootstrap?
Default |
run_boot_cos |
Logical. Run per-person cosine bootstrap?
Ignored unless |
verbose |
Logical. Print progress messages? Default |
A named list of class "sepa_result" containing:
callThe matched call.
domainsCharacter vector of domain names.
pidPerson ID vector.
n, p, K
Dimensions used.
ref_fitList with F (),
B (), d (singular values), U,
V — the reference row-isometric SVD.
XstarIpsatized data matrix.
sepa_statsOutput of sepa_stats_all:
rho, C_all, C_plane.
paOutput of
parallel_analysis_ipsatized, or NULL.
boot_domRaw boot object for domain coordinates,
or NULL.
dom_coordsData frame of domain coordinates with BCa CIs,
or NULL.
len2Data frame of with BCa CIs and
marker flag, or NULL.
boot_cosRaw boot object for per-person cosines,
or NULL.
cosine_tablesNamed list of data frames (one per plane
plus "all") with point estimates and percentile CIs for the
persons in target_ids, or NULL.
dom_dom_cosinesList with plane12 and
plane34 data frames of domain–domain cosines, or NULL.
normsData frame with for exemplar
persons, or NULL.
rho_exemplarData frame with plane-fit rho for exemplar
persons, or NULL.
res <- run_sepa( data = fake_wj, K = 4L, target_ids = c(724, 944), B_dom = 200L, B_cos = 200L, seed = 1L, pa_B = 100L, run_boot_cos = TRUE, verbose = TRUE ) head(res$sepa_stats$rho) res$pa$sig_dimsres <- run_sepa( data = fake_wj, K = 4L, target_ids = c(724, 944), B_dom = 200L, B_cos = 200L, seed = 1L, pa_B = 100L, run_boot_cos = TRUE, verbose = TRUE ) head(res$sepa_stats$rho) res$pa$sig_dims
Given reference loading vectors B_ref and person score matrix
F_ref from a row-isometric SVD biplot, computes for every person:
the plane-fit correlation in each plane,
direction cosines with every domain in the full K-dimensional space,
direction cosines within each plane.
sepa_stats_all(B_ref, F_ref, planes = list(c(1L, 2L), c(3L, 4L)), pid = NULL)sepa_stats_all(B_ref, F_ref, planes = list(c(1L, 2L), c(3L, 4L)), pid = NULL)
B_ref |
Numeric matrix |
F_ref |
Numeric matrix |
planes |
List of integer vectors, each of length 2, specifying which
pair of dimensions defines a plane. Default
|
pid |
Optional integer or character vector of length |
A named list with three elements, each a tidy data frame:
rhoColumns: id, plane, rho.
C_allColumns: id, domain, C_all.
Direction cosines across all dimensions.
C_planeColumns: id, domain,
C_plane, plane. Per-plane cosines.
X <- as.matrix(fake_wj[1:200, c("LT","ST","CP","AP","VP","CK","FR")]) Xs <- X - rowMeans(X) sv <- svd(Xs) B <- sv$v[, 1:4]; F <- sv$u[, 1:4] %*% diag(sv$d[1:4]) rownames(B) <- c("LT","ST","CP","AP","VP","CK","FR") res <- sepa_stats_all(B, F) head(res$rho)X <- as.matrix(fake_wj[1:200, c("LT","ST","CP","AP","VP","CK","FR")]) Xs <- X - rowMeans(X) sv <- svd(Xs) B <- sv$v[, 1:4]; F <- sv$u[, 1:4] %*% diag(sv$d[1:4]) rownames(B) <- c("LT","ST","CP","AP","VP","CK","FR") res <- sepa_stats_all(B, F) head(res$rho)
Generates a data frame that approximates the observed marginal distributions
(means, SDs, and ranges) of the seven WJ-IV broad ability scores while
respecting the qualitative level-elevation (LE) / pattern-elevation (PE)
structure assumed by SEPA. The data are produced from an additive model
comprising a strong person-level elevation component (LE), a
-dimensional orthonormal pattern component (PE), and residual noise;
columns are then linearly calibrated to the target statistics and clipped to
the observed ranges. Because the original norming data are proprietary,
this function provides a fully reproducible, publicly shareable substitute.
simulate_sepa_fake_wj( n = 5127L, domains = c("LT", "ST", "CP", "AP", "VP", "CK", "FR"), seed = 20251127L, K = 4L, sigma_LE = sqrt(0.25), lambda = c(0.3, 0.18, 0.11, 0.06), sigma_eps = sqrt(0.1), target = data.frame(domain = c("LT", "ST", "CP", "AP", "VP", "CK", "FR"), mean = c(100.2, 100.93, 99.64, 101.01, 100.79, 100.92, 99.99), sd = c(15.55, 15.72, 16.01, 15.61, 15.91, 15.75, 15.58), min = c(37.04, 35.77, 12.26, 36.55, 31.76, 38.34, 32.74), max = c(148.37, 159.3, 150, 151.35, 160.44, 153.93, 148.04), stringsAsFactors = FALSE), do_calibrate = TRUE, do_clip = TRUE )simulate_sepa_fake_wj( n = 5127L, domains = c("LT", "ST", "CP", "AP", "VP", "CK", "FR"), seed = 20251127L, K = 4L, sigma_LE = sqrt(0.25), lambda = c(0.3, 0.18, 0.11, 0.06), sigma_eps = sqrt(0.1), target = data.frame(domain = c("LT", "ST", "CP", "AP", "VP", "CK", "FR"), mean = c(100.2, 100.93, 99.64, 101.01, 100.79, 100.92, 99.99), sd = c(15.55, 15.72, 16.01, 15.61, 15.91, 15.75, 15.58), min = c(37.04, 35.77, 12.26, 36.55, 31.76, 38.34, 32.74), max = c(148.37, 159.3, 150, 151.35, 160.44, 153.93, 148.04), stringsAsFactors = FALSE), do_calibrate = TRUE, do_clip = TRUE )
n |
Integer. Number of simulated cases. Default |
domains |
Character vector of length 7. Domain abbreviations used as
column names. Default |
seed |
Integer random seed passed to |
K |
Integer. Number of orthogonal PE dimensions. Must be 4. |
sigma_LE |
Numeric. Standard deviation of the level-elevation
component. Default |
lambda |
Numeric vector of length 4. PE dimension variances.
Default |
sigma_eps |
Numeric. Residual noise SD. Default |
target |
Data frame with columns |
do_calibrate |
Logical. Linearly re-scale each column to match
|
do_clip |
Logical. Clip each column to |
A data frame with n rows and columns ID,
LT, ST, CP, AP, VP, CK,
FR (or as specified by domains). Three attributes are
attached: B_loadings (the orthonormal loading
matrix), lambda (PE variances), and sigma_LE.
fake <- simulate_sepa_fake_wj(n = 200, seed = 1) dim(fake) # 200 x 8 colMeans(fake[, -1])fake <- simulate_sepa_fake_wj(n = 200, seed = 1) dim(fake) # 200 x 8 colMeans(fake[, -1])
Pivots a three-column long data frame (id, time, value) to wide format and optionally prefixes the new column names.
write_long_to_wide(df, id_col, time_col, value_col, file, prefix = "")write_long_to_wide(df, id_col, time_col, value_col, file, prefix = "")
df |
Data frame to pivot. |
id_col |
Name of the person-identifier column. |
time_col |
Name of the within-person variable column (e.g. domain). |
value_col |
Name of the value column. |
file |
Character path for the output CSV. Pass |
prefix |
Optional prefix prepended to the new wide-format column names (empty string = no prefix). |
The wide data frame, invisibly.
long_df <- data.frame( id = rep(1:3, each = 2), domain = rep(c("LT", "ST"), 3), value = c(100, 105, 98, 110, 102, 107) ) wide <- write_long_to_wide(long_df, "id", "domain", "value", file = NULL) widelong_df <- data.frame( id = rep(1:3, each = 2), domain = rep(c("LT", "ST"), 3), value = c(100, 105, 98, 110, 102, 107) ) wide <- write_long_to_wide(long_df, "id", "domain", "value", file = NULL) wide
Write an n x p matrix as a wide CSV with an ID column
write_matrix_wide(M, id, file, domain_names = NULL)write_matrix_wide(M, id, file, domain_names = NULL)
M |
Numeric matrix, |
id |
Vector of length |
file |
Character path for the output CSV. Pass |
domain_names |
Optional character vector of length |
The data frame (ID + matrix columns), invisibly.
M <- matrix(rnorm(6), nrow = 2) out <- write_matrix_wide(M, id = c("A", "B"), file = NULL, domain_names = c("X1","X2","X3")) outM <- matrix(rnorm(6), nrow = 2) out <- write_matrix_wide(M, id = c("A", "B"), file = NULL, domain_names = c("X1","X2","X3")) out