| Title: | Intersectional Differential Item Functioning Analysis |
|---|---|
| Description: | A toolkit for detecting Differential Item Functioning (DIF) using Logistic Regression (LR) as described in Swaminathan and Rogers (1990) <doi:10.1111/j.1745-3984.1990.tb00754.x>, the IRT Likelihood Ratio Test (LRT) following Thissen, Steinberg & Wainer (1993, ISBN:0-8058-0972-4), and model-based recursive partitioning (MOB) as implemented in 'strucchange' following Strobl, Kopf and Zeileis (2015) <doi:10.1007/s11336-013-9388-3>. Designed for both standard two-group and intersectional multi-group designs, 'iDIFr' prioritises effect size reporting alongside statistical significance, clear guidance on group construction, and interpretable output suitable for applied testing contexts. Built-in Intersectional Contrast Analysis (ICA) classifies items as amplified, pure-intersection, obscured, or none by comparing single-variable and intersectional analyses. |
| Authors: | Thomas Rogers [aut, cre] |
| Maintainer: | Thomas Rogers <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.1 |
| Built: | 2026-06-08 20:22:31 UTC |
| Source: | https://github.com/cran/iDIFr |
Provides a concise summary of the group structure defined by your demographic
variables. Reports how many groups meet the recommended minimum cell size,
optionally checks which levels of specified variables are fully crossed, and
points to group_details() and cross_details() for full breakdowns.
check_groups(data, group, min_cell_size = 50, cross_by = NULL, plot = TRUE)check_groups(data, group, min_cell_size = 50, cross_by = NULL, plot = TRUE)
data |
A data frame containing demographic variables. |
group |
A one-sided formula specifying the grouping variable(s),
using the same syntax as |
min_cell_size |
Minimum recommended group size. Default is 50. |
cross_by |
Optional character vector of variable name(s) to check for
complete crossing. For each unique value of the named variable(s), the
function checks whether every intersectional cell containing that value
meets |
plot |
Logical. If |
An object of class idifr_groups (invisibly), which can be passed
to merge_groups(), group_details(), or cross_details().
group_details(), cross_details(), merge_groups(), idifr()
dat <- simulate_dif(300, 10, demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 1) grp <- check_groups(dat, group = ~ group * nationality, cross_by = "nationality")dat <- simulate_dif(300, 10, demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 1) grp <- check_groups(dat, group = ~ group * nationality, cross_by = "nationality")
For each unique level of the specified variable, shows whether every intersectional cell containing that level meets the minimum cell size. One row per level, showing how many cells are adequate and the smallest cell size observed.
cross_details(grp, cross_by, min_cell_size = NULL)cross_details(grp, cross_by, min_cell_size = NULL)
grp |
An |
cross_by |
Character vector of variable name(s) to check. Must match variables in the group formula. |
min_cell_size |
Minimum recommended group size. Overrides the stored value if supplied. |
The idifr_groups object, invisibly.
check_groups(), group_details()
dat <- simulate_dif(300, 10, demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 1) grp <- check_groups(dat, group = ~ group * nationality, cross_by = "nationality", plot = FALSE) cross_details(grp, cross_by = "nationality")dat <- simulate_dif(300, 10, demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 1) grp <- check_groups(dat, group = ~ group * nationality, cross_by = "nationality", plot = FALSE) cross_details(grp, cross_by = "nationality")
Writes an idifr result object to a formatted .xlsx workbook. Each
requested sheet is written as an Excel table so that column headers are
bold, filters are enabled, and values are properly typed.
Only columns that are actually present in the result object are written; columns listed in the per-method definitions that were not produced by the current run are silently omitted.
export_results(x, file, sheets = NULL, overwrite = TRUE)export_results(x, file, sheets = NULL, overwrite = TRUE)
x |
An |
file |
Path to the output |
sheets |
Character vector of sheet keys to include. Valid keys:
|
overwrite |
Logical. If |
x invisibly (so the call can be piped).
if (requireNamespace("openxlsx", quietly = TRUE)) { dat <- simulate_dif(300, 10, dif_items = c(3, 7), seed = 1) result <- idifr(dat, 1:10, ~ group, method = "LR", verbose = FALSE) export_results(result, tempfile(fileext = ".xlsx")) }if (requireNamespace("openxlsx", quietly = TRUE)) { dat <- simulate_dif(300, 10, dif_items = c(3, 7), seed = 1) result <- idifr(dat, 1:10, ~ group, method = "LR", verbose = FALSE) export_results(result, tempfile(fileext = ".xlsx")) }
Fit a 2PL IRT model via marginal maximum likelihood (EM)
fit_2pl( resp, group = NULL, constrain = "items", n_nodes = 15, max_iter = 200, tol = 1e-04, start = NULL, verbose = FALSE )fit_2pl( resp, group = NULL, constrain = "items", n_nodes = 15, max_iter = 200, tol = 1e-04, start = NULL, verbose = FALSE )
resp |
Integer matrix (0/1/NA). Rows=persons, cols=items. |
group |
Character/factor vector of group membership (length=nrow(resp)). NULL for single-group calibration. |
constrain |
Parameter constraint across groups:
|
n_nodes |
Number of quadrature nodes. Default 15. Values of 11-21 are appropriate for DIF detection; use 21 for publication- quality parameter estimates. |
max_iter |
Maximum EM iterations. Default 200. |
tol |
Convergence tolerance on log-likelihood change. Default 1e-4. |
start |
Optional list with elements |
verbose |
Print iteration log. Default FALSE. |
Object of class irt_2pl.
Prints a detailed table showing the cell size for every intersectional
group, flagging those below the recommended minimum. This is the full
breakdown that check_groups() summarises in a single line.
group_details(grp, min_cell_size = NULL)group_details(grp, min_cell_size = NULL)
grp |
An |
min_cell_size |
Minimum recommended group size. Overrides the stored value if supplied. |
The idifr_groups object, invisibly.
check_groups(), cross_details()
dat <- simulate_dif(300, 10, demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 1) grp <- check_groups(dat, group = ~ group * nationality, plot = FALSE) group_details(grp)dat <- simulate_dif(300, 10, demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 1) grp <- check_groups(dat, group = ~ group * nationality, plot = FALSE) group_details(grp)
The main entry point for iDIFr. Detects Differential Item Functioning (DIF)
using one or more statistical methods, with full support for intersectional
group structures defined by crossing multiple demographic variables.
Effect sizes are reported alongside significance for all methods. Groups with
small cell sizes trigger a warning. Use exclude_below_min and
fully_crossed to control whether those groups are included in the analysis.
idifr( data, items, group, method, ica = FALSE, min_cell_size = 50, exclude_below_min = FALSE, fully_crossed = NULL, value_selection = NULL, anchor = NULL, alpha = 0.05, p_adjust = "BH", nonuniform_es = "MAPPD", verbose = TRUE )idifr( data, items, group, method, ica = FALSE, min_cell_size = 50, exclude_below_min = FALSE, fully_crossed = NULL, value_selection = NULL, anchor = NULL, alpha = 0.05, p_adjust = "BH", nonuniform_es = "MAPPD", verbose = TRUE )
data |
A data frame containing item responses and demographic variables. |
items |
A numeric vector of column indices, or a character vector of column names, identifying the item response columns. Items must be dichotomously scored (0/1). |
group |
A one-sided formula specifying the grouping variable(s).
Use |
method |
A character vector specifying which DIF method(s) to use.
Must be one or more of |
ica |
Logical. If |
min_cell_size |
Minimum acceptable group size. Groups below this
threshold trigger a warning. Also used as the crossing criterion when
|
exclude_below_min |
Logical. If |
fully_crossed |
A character vector of variable name(s). Only levels of
the named variable(s) that are fully crossed – meaning every intersectional
cell for that level meets |
value_selection |
A named list for filtering specific values of
demographic variables before analysis. Each element should be named after
a grouping variable and contain a character vector of values to keep.
Variables not mentioned are left unchanged (all values included). Default
is |
anchor |
A numeric or character vector identifying anchor items
(items assumed to be DIF-free) for IRT scaling. If |
alpha |
Significance level for DIF flagging. Default is |
p_adjust |
Method for p-value adjustment across items. Passed to
|
nonuniform_es |
Character. The effect size metric to use for
non-uniform DIF detection when |
verbose |
Logical. If |
An object of class idifr containing:
A data frame with one row per item per method, including test statistics, p-values, adjusted p-values, effect sizes, and DIF classification (negligible/moderate/large for all methods).
An idifr_groups object describing the group structure,
cell sizes, and any small-cell warnings.
Character vector of methods used.
The matched call.
Character vector of item names analysed.
The significance level used.
The p-value adjustment method used.
Character vector of group labels excluded by
exclude_below_min or fully_crossed, or NULL if no exclusions.
Named list of value_selection filters applied,
or NULL if none.
Data frame of ICA classifications (one row per item per
method) when ica = TRUE and the design is intersectional, otherwise
NULL. Columns: item, method, ica_class, marginal_vars,
intersectional_flag.
check_groups() for exploring group structure before analysis;
group_details() and cross_details() for full breakdowns;
merge_groups() for combining sparse cells.
# Basic two-group analysis using synthetic data dat <- simulate_dif(300, 10, dif_items = c(3, 7), seed = 1) result <- idifr(dat, 1:10, ~ group, method = "LR") print(result) # Intersectional analysis with ICA dat_ix <- simulate_dif(500, 10, demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 2) result_ix <- idifr(dat_ix, 1:10, ~ group * nationality, method = "LR", ica = TRUE)# Basic two-group analysis using synthetic data dat <- simulate_dif(300, 10, dif_items = c(3, 7), seed = 1) result <- idifr(dat, 1:10, ~ group, method = "LR") print(result) # Intersectional analysis with ICA dat_ix <- simulate_dif(500, 10, demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 2) result_ix <- idifr(dat_ix, 1:10, ~ group * nationality, method = "LR", ica = TRUE)
Uses local independence to decompose total LL into item contributions:
LL_j = sum_i log P(x_ij | posterior_i)
where P(x_ij | posterior_i) = sum_k posterior[i,k] * P(x_ij | theta_k)
item_loglik(model, resp = NULL, post = NULL, gi = 1)item_loglik(model, resp = NULL, post = NULL, gi = 1)
model |
An |
resp |
Response matrix (0/1/NA). Defaults to model$resp. |
post |
Posterior matrix (persons x nodes). Defaults to model$posterior. |
gi |
Group index (integer). Used to select group-specific item params and ability nodes. Default 1. |
Numeric vector of length n_items.
For the constrained model, each person uses shared item params but their own group-specific ability nodes.
item_loglik_mg(model, resp = NULL, post = NULL)item_loglik_mg(model, resp = NULL, post = NULL)
model |
An |
resp |
Response matrix. Defaults to model$resp. |
post |
Posterior matrix. Defaults to model$posterior. |
Numeric vector of length n_items.
Combines sparse intersectional cells by collapsing levels of one or more
demographic variables. Returns a modified data frame ready to pass back
to idifr() or check_groups().
merge_groups(groups, grp_formula = NULL, ..., min_cell_size = 50)merge_groups(groups, grp_formula = NULL, ..., min_cell_size = 50)
groups |
An |
grp_formula |
A formula, required only if |
... |
Named arguments specifying merge rules. Each should be named after a demographic variable, with a named list mapping new level names to vectors of old level names. |
min_cell_size |
Minimum cell size to validate against after merging. |
The original data frame with recoded grouping variable(s).
dat <- simulate_dif(300, 10, demo_vars = list(nationality = c("UK", "DE", "FR", "ES")), seed = 1) grp <- check_groups(dat, group = ~ group * nationality, plot = FALSE) merged <- merge_groups(grp, nationality = list("Other" = c("DE", "FR", "ES")))dat <- simulate_dif(300, 10, demo_vars = list(nationality = c("UK", "DE", "FR", "ES")), seed = 1) grp <- check_groups(dat, group = ~ group * nationality, plot = FALSE) merged <- merge_groups(grp, nationality = list("Other" = c("DE", "FR", "ES")))
Plot method for idifr objects
## S3 method for class 'idifr' plot(x, type = "items", ...)## S3 method for class 'idifr' plot(x, type = "items", ...)
x |
An |
type |
Plot type: |
... |
Ignored. |
No return value, called for side effects.
Print method for idifr objects
## S3 method for class 'idifr' print(x, ...)## S3 method for class 'idifr' print(x, ...)
x |
An |
... |
Ignored. |
No return value, called for side effects.
Generates synthetic dichotomous item response data with a known DIF
structure. Supports three DIF patterns: standard group DIF ("standard"),
DIF confined to a single intersectional cell ("intersection"), and a
mixture of both ("mixed").
simulate_dif( n_persons = 500, n_items = 20, n_groups = 2, dif_items = c(3, 7), dif_effect = 0.8, dif_type = "uniform", dif_structure = "standard", dif_group = NULL, demo_vars = NULL, seed = NULL )simulate_dif( n_persons = 500, n_items = 20, n_groups = 2, dif_items = c(3, 7), dif_effect = 0.8, dif_type = "uniform", dif_structure = "standard", dif_group = NULL, demo_vars = NULL, seed = NULL )
n_persons |
Integer. Total number of respondents. |
n_items |
Integer. Number of items. Default 20. |
n_groups |
Integer. Number of groups. Default 2. |
dif_items |
Which items have DIF. For |
dif_effect |
Numeric. DIF shift size in logits. Default |
dif_type |
|
dif_structure |
One of |
dif_group |
Named list identifying the target intersectional cell for
intersection DIF. Variable names must match |
demo_vars |
Named list of additional demographic variables to add, with
their levels. Persons are assigned randomly with uniform probability.
Example: |
seed |
Integer random seed for reproducibility. |
A data frame with item response columns (item_1, item_2, ...),
a group column, and any additional columns specified in demo_vars.
True item parameters and DIF metadata are stored as attributes.
# Standard DIF dat <- simulate_dif(500, 20, 2, c(3, 7), 1.0) # Intersection-only DIF dat_ix <- simulate_dif( n_persons = 500, n_items = 20, dif_items = c(5, 12), dif_effect = 1.5, dif_structure = "intersection", dif_group = list(group = "G1", nationality = "UK", age_band = "Young"), demo_vars = list(nationality = c("UK", "DE", "FR"), age_band = c("Young", "Old")), seed = 42 ) # Mixed DIF dat_mix <- simulate_dif( n_persons = 500, n_items = 20, dif_items = list(standard = c(3, 7), intersection = c(12, 15)), dif_effect = 1.0, dif_structure = "mixed", dif_group = list(group = "G1", nationality = "UK", age_band = "Young"), demo_vars = list(nationality = c("UK", "DE", "FR"), age_band = c("Young", "Old")), seed = 42 )# Standard DIF dat <- simulate_dif(500, 20, 2, c(3, 7), 1.0) # Intersection-only DIF dat_ix <- simulate_dif( n_persons = 500, n_items = 20, dif_items = c(5, 12), dif_effect = 1.5, dif_structure = "intersection", dif_group = list(group = "G1", nationality = "UK", age_band = "Young"), demo_vars = list(nationality = c("UK", "DE", "FR"), age_band = c("Young", "Old")), seed = 42 ) # Mixed DIF dat_mix <- simulate_dif( n_persons = 500, n_items = 20, dif_items = list(standard = c(3, 7), intersection = c(12, 15)), dif_effect = 1.0, dif_structure = "mixed", dif_group = list(group = "G1", nationality = "UK", age_band = "Young"), demo_vars = list(nationality = c("UK", "DE", "FR"), age_band = c("Young", "Old")), seed = 42 )
Summary method for idifr objects
## S3 method for class 'idifr' summary(object, ...)## S3 method for class 'idifr' summary(object, ...)
object |
An |
... |
Ignored. |
No return value, called for side effects.
Re-exports generics::tidy so that
tidy() is available after library(iDIFr) without loading
broom or generics separately. For the iDIFr-specific
method see tidy.idifr.
tidy(x, ...)tidy(x, ...)
x |
An object to tidy. When |
... |
Additional arguments passed to the method. |
A data frame (exact structure depends on the method dispatched).
Returns results as a tidy data frame suitable for use with dplyr,
ggplot2, or for export. Use the table argument to choose which
table to return.
Implements the tidy generic from the generics package so that
tidy() works correctly regardless of whether broom is also loaded.
## S3 method for class 'idifr' tidy(x, table = NULL, ...)## S3 method for class 'idifr' tidy(x, table = NULL, ...)
x |
An |
table |
Which table to return.
|
... |
Ignored. |
A data frame.
dat <- simulate_dif(300, 10, dif_items = c(3, 7), seed = 1) result <- idifr(dat, 1:10, ~ group, method = "LR") # Item-level results (default) tidy(result) tidy(result, table = "results") # Group direction table for flagged items tidy(result, table = "direction") # ICA classification table (requires ica = TRUE) dat_ix <- simulate_dif(500, 10, demo_vars = list(nationality = c("UK", "DE")), seed = 2) result_ix <- idifr(dat_ix, 1:10, ~ group * nationality, method = "LR", ica = TRUE) tidy(result_ix, table = "ica")dat <- simulate_dif(300, 10, dif_items = c(3, 7), seed = 1) result <- idifr(dat, 1:10, ~ group, method = "LR") # Item-level results (default) tidy(result) tidy(result, table = "results") # Group direction table for flagged items tidy(result, table = "direction") # ICA classification table (requires ica = TRUE) dat_ix <- simulate_dif(500, 10, demo_vars = list(nationality = c("UK", "DE")), seed = 2) result_ix <- idifr(dat_ix, 1:10, ~ group * nationality, method = "LR", ica = TRUE) tidy(result_ix, table = "ica")