--- title: "Mixed-Subjects 1PL Calibration" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Mixed-Subjects 1PL Calibration} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` The 1PL (one-parameter logistic) model estimates a single shared discrimination $a$ across all items together with per-item intercepts $d_j$: $$P(x_j = 1 \mid \theta) = \text{logistic}(a\,\theta + d_j)$$ The parameter vector has length $J+1$ rather than $2J$. The package provides exact analogues of the 2PL mixed-subjects functions for the 1PL case. **When to prefer 1PL over 2PL:** - Ability-focused tests where the items are designed to be equally discriminating. - Tests built from a single item pool with homogeneous item characteristics. - When the 2PL discrimination estimates are very noisy (small $n$). > **Note on vcov.** `vcov_mixed_subjects_1pl()` currently uses the EM > complete-data Hessian (not Louis' marginal-information correction). The > uncertainty estimates are slightly over-precise. A Louis-corrected 1PL bread > is planned for a future release. ## Simulate a 1PL test ```{r simulate} library(mixedsubjectsirt) library(ggplot2) set.seed(2026) n_human <- 400 n_generated <- 1200 n_items <- 8 # True 1PL: shared discrimination a = 1.2, varying difficulties true_1pl <- data.frame( item = paste0("Item", seq_len(n_items)), a = 1.2, d = seq(-1.1, 1.1, length.out = n_items) ) true_1pl$b <- -true_1pl$d / true_1pl$a theta_human <- rnorm(n_human) observed <- simulate_2pl(theta_human, true_1pl) # LLM: same 1PL structure, small intercept shift llm_1pl <- true_1pl llm_1pl$d <- true_1pl$d + 0.25 llm_1pl$b <- -llm_1pl$d / llm_1pl$a predicted <- simulate_2pl(theta_human, llm_1pl) generated <- simulate_2pl(rnorm(n_generated), llm_1pl) ``` ## Step 1: Fit the 1PL baseline `fit_1pl()` estimates $a$ and $d_1, \ldots, d_J$ by maximizing the IRT marginal likelihood under a standard-normal ability prior. ```{r fit-1pl} fit1 <- fit_1pl(observed, n_quad = 15) cat("Shared a:", round(fit1$pars$a[1], 3), " (true:", true_1pl$a[1], ")\n") cat("Convergence:", fit1$convergence, "\n\n") fit1$pars ``` All items in the output have the same `a` value, confirming the 1PL constraint. ## Step 2: Fit mixed-subjects MML (1PL) `fit_mixed_subjects_mml_1pl()` uses the true marginal likelihood with a 1PL-specific gradient: the shared discrimination gradient accumulates contributions from all $J$ items, while each intercept has its own gradient. ```{r mml-1pl} fit_mml_1pl <- fit_mixed_subjects_mml_1pl( observed = observed, predicted = predicted, generated = generated, lambda = 0.5, initial_pars = fit1$pars, n_quad = 15, control = list(maxit = 300) ) print(fit_mml_1pl) fit_mml_1pl$item_pars ``` ## Step 3: Correct covariance — $(J+1) \times (J+1)$ sandwich `vcov()` dispatches to `vcov_mixed_subjects_1pl()` for 1PL fits, returning a $(J+1) \times (J+1)$ matrix with `a_shared` and per-item `d_j` as rows/columns. ```{r vcov-1pl} Sigma_1pl <- vcov(fit_mml_1pl) dim(Sigma_1pl) rownames(Sigma_1pl) ``` ## Step 4: Ability-score risk and lambda tuning `tune_lambda_ability_risk_1pl()` uses the 1PL-parameterized gradient $\partial\hat\theta / \partial (a_\text{shared}, d_1, \ldots, d_J)$ for the ability-score risk. The chain rule gives $\partial\hat\theta / \partial a_\text{shared} = \sum_j \partial\hat\theta / \partial a_j$. As in the 2PL version, $\lambda$ is chosen by direct 1-D optimization by default (pass `method = "grid"` to scan a grid instead). ```{r tune-1pl} tuned_1pl <- tune_lambda_ability_risk_1pl( observed = observed, predicted = predicted, generated = generated, initial_pars = fit1$pars, n_quad = 15, control = list(maxit = 300) ) tuned_1pl$best_lambda ``` ## Step 5: Verify — F = Y gives lambda > 0 With `predicted = observed` (perfect paired predictor), the ability-risk criterion should select a positive lambda. ```{r perfect-pred-1pl} tuned_fy <- tune_lambda_ability_risk_1pl( observed = observed, predicted = observed, # F = Y generated = simulate_2pl(rnorm(n_generated), true_1pl), initial_pars = fit1$pars, n_quad = 15, control = list(maxit = 300) ) cat("F=Y best lambda:", tuned_fy$best_lambda, " (theory: N/(n+N) =", round(n_generated / (n_human + n_generated), 3), ")\n") ``` ## Compare 1PL and 2PL On a well-specified 1PL test, how do the 1PL and 2PL estimators compare? ```{r compare-1pl-2pl} fit_2pl_mml <- fit_mixed_subjects_mml( observed = observed, predicted = predicted, generated = generated, lambda = tuned_1pl$best_lambda, initial_pars = fit_2pl(observed, technical = list(NCYCLES = 500))$pars, n_quad = 15, control = list(maxit = 300) ) rmse <- function(x, y) sqrt(mean((x - y)^2)) cat("1PL RMSE(a):", round(rmse(tuned_1pl$best_fit$item_pars$a, true_1pl$a), 4), "\n") cat("2PL RMSE(a):", round(rmse(fit_2pl_mml$item_pars$a, true_1pl$a), 4), "\n") # Difficulty recovery cat("1PL RMSE(d):", round(rmse(tuned_1pl$best_fit$item_pars$d, true_1pl$d), 4), "\n") cat("2PL RMSE(d):", round(rmse(fit_2pl_mml$item_pars$d, true_1pl$d), 4), "\n") ``` The 1PL uses fewer parameters ($J+1$ vs $2J$), which can give lower RMSE on a test generated from a true 1PL DGP — especially for small $n$. ## Ability-score risk: 1PL vs 2PL parameterization The 1PL ability-score risk is smaller in the $(J+1)$-parameter space because the shared $a$ concentrates all discrimination information in a single parameter. ```{r risk-compare} Sigma_2pl <- vcov(fit_2pl_mml) # 2J × 2J Louis-corrected risk_1pl <- ability_risk_1pl(observed, tuned_1pl$best_fit) risk_2pl <- ability_risk(observed, fit_2pl_mml, vcov = Sigma_2pl) cat("1PL mean param_var:", round(risk_1pl$summary$mean_param_var, 5), "\n") cat("2PL mean param_var:", round(risk_2pl$summary$mean_param_var, 5), "\n") ```