| Title: | Classical Test Theory Item Analysis for Multiple-Choice Tests |
|---|---|
| Description: | A unified toolkit for classical test theory (CTT) item analysis of multiple-choice test data, including item difficulty (p-value), item discrimination (point-biserial correlation and upper-lower 27-percent discrimination index), per-distractor analysis (frequency, proportion, and discrimination), and Haladyna's distractor efficiency. A wrapper function returns a tidy 'mcq_analysis' object with print, plot (difficulty-discrimination scatter), and APA-style table methods for direct inclusion in journal manuscripts. Implemented in pure R with no compiled code and minimal dependencies. |
| Authors: | Rashed Alqahtani [aut, cre] (ORCID: <https://orcid.org/0000-0002-3317-204X>) |
| Maintainer: | Rashed Alqahtani <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-15 22:10:05 UTC |
| Source: | https://github.com/cran/mcqAnalysis |
S3 generic for converting analysis objects into publication-ready
APA-style tables. The default behavior is dispatched to class-specific
methods (e.g., apa_table.mcq_analysis). Output formats include data
frame, markdown, HTML, and LaTeX for direct inclusion in manuscripts.
apa_table(x, format = c("data.frame", "markdown", "html", "latex"), ...)apa_table(x, format = c("data.frame", "markdown", "html", "latex"), ...)
x |
An object of an appropriate class (e.g., |
format |
One of |
... |
Additional arguments passed to methods. |
A formatted table object whose type depends on format.
Formats item-level results from an mcq_analysis object as a
publication-ready APA-style table, with optional Interpretation
columns based on conventional CTT cutoffs (Ebel & Frisbie, 1991).
## S3 method for class 'mcq_analysis' apa_table( x, format = c("data.frame", "markdown", "html", "latex"), digits = 2, include_interpretation = TRUE, ... )## S3 method for class 'mcq_analysis' apa_table( x, format = c("data.frame", "markdown", "html", "latex"), digits = 2, include_interpretation = TRUE, ... )
x |
An object of class |
format |
Output format. One of |
digits |
Number of decimal places to display. Default 2. |
include_interpretation |
Logical. If |
... |
Additional arguments passed to |
A data frame (when format = "data.frame") or a character
string formatted in the requested style.
Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.
data(mcq_example) result <- mcq_analysis(mcq_example$responses, mcq_example$key) apa_table(result, format = "data.frame")data(mcq_example) result <- mcq_analysis(mcq_example$responses, mcq_example$key) apa_table(result, format = "data.frame")
For each item, summarizes the selection frequency, proportion, and point-biserial correlation with the total test score for every response option (the key and all distractors). Distractor analysis is a core classical test theory diagnostic for evaluating multiple-choice items: the key should be the most-selected option and should have a positive point-biserial correlation with total score, while each distractor should be selected by at least some examinees and should have a negative point-biserial correlation with total score (Haladyna, 2004).
distractor_analysis(responses, key, options = NULL)distractor_analysis(responses, key, options = NULL)
responses |
A matrix or data frame of student responses, with students in rows and items in columns. |
key |
A vector of correct answers with length equal to the number of items. |
options |
Optional character vector listing all possible response
options (e.g., |
A data frame in long format with one row per item-option combination, containing:
item: item identifier
option: response option
is_key: logical, TRUE if this option is the correct answer
frequency: number of students selecting this option
proportion: proportion of students selecting this option
point_biserial: correlation between selecting this option
and the total test score (using all items)
Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Lawrence Erlbaum Associates.
set.seed(1) responses <- matrix( sample(c("A", "B", "C", "D"), 200, replace = TRUE), nrow = 50, ncol = 4, dimnames = list(NULL, paste0("Q", 1:4)) ) key <- c("A", "B", "C", "A") distractor_analysis(responses, key)set.seed(1) responses <- matrix( sample(c("A", "B", "C", "D"), 200, replace = TRUE), nrow = 50, ncol = 4, dimnames = list(NULL, paste0("Q", 1:4)) ) key <- c("A", "B", "C", "A") distractor_analysis(responses, key)
Computes Haladyna's distractor efficiency for each item: the number of functioning distractors per item. A distractor is considered to be functioning if it meets two criteria: (a) it is selected by at least a threshold proportion of examinees (default 5 percent), and (b) it has a negative point-biserial correlation with the total test score (Haladyna & Downing, 1993). The key (correct answer) is excluded from the count.
distractor_efficiency(responses, key, options = NULL, min_proportion = 0.05)distractor_efficiency(responses, key, options = NULL, min_proportion = 0.05)
responses |
A matrix or data frame of student responses, with students in rows and items in columns. |
key |
A vector of correct answers with length equal to the number of items. |
options |
Optional character vector listing all possible response
options. If |
min_proportion |
Minimum proportion of examinees selecting a distractor for it to be considered functioning. Default is 0.05. |
Distractor efficiency provides a simple integer summary of item quality. A four-option multiple-choice item with three functioning distractors (distractor efficiency = 3) is performing optimally. Items with fewer functioning distractors waste examinee time and reduce the item's contribution to score variance, and they are candidates for revision.
A named numeric vector of distractor efficiency values, one per item, representing the count of functioning distractors.
Haladyna, T. M., & Downing, S. M. (1993). How many options is enough for a multiple-choice test item? Educational and Psychological Measurement, 53(4), 999-1010.
set.seed(1) responses <- matrix( sample(c("A", "B", "C", "D"), 400, replace = TRUE), nrow = 100, ncol = 4, dimnames = list(NULL, paste0("Q", 1:4)) ) key <- c("A", "B", "C", "A") distractor_efficiency(responses, key)set.seed(1) responses <- matrix( sample(c("A", "B", "C", "D"), 400, replace = TRUE), nrow = 100, ncol = 4, dimnames = list(NULL, paste0("Q", 1:4)) ) key <- c("A", "B", "C", "A") distractor_efficiency(responses, key)
Computes the proportion of students who answered each item correctly, commonly called the item p-value in classical test theory.
item_difficulty(responses, key, na.rm = FALSE)item_difficulty(responses, key, na.rm = FALSE)
responses |
A matrix or data frame of student responses, with students in rows and items in columns. Entries may be character or numeric (e.g., "A", "B", "C", "D" or 1, 2, 3, 4). |
key |
A vector of correct answers with length equal to the number of items. |
na.rm |
Logical. If |
Item difficulty is interpreted as the easiness of an item: values near 1 indicate an easy item (most students got it correct), while values near 0 indicate a hard item. Conventional interpretive guidelines suggest that well-functioning items typically have p-values between 0.30 and 0.90, with optimal difficulty around 0.50 for maximum discrimination (Crocker & Algina, 1986).
A named numeric vector of item p-values, one per item.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart and Winston.
responses <- matrix( c("A", "A", "B", "C", "A", "B", "B", "C", "A", "A", "C", "D", "B", "A", "B", "C", "A", "A", "B", "A"), nrow = 5, byrow = TRUE, dimnames = list(NULL, c("Q1", "Q2", "Q3", "Q4")) ) key <- c("A", "A", "B", "C") item_difficulty(responses, key)responses <- matrix( c("A", "A", "B", "C", "A", "B", "B", "C", "A", "A", "C", "D", "B", "A", "B", "C", "A", "A", "B", "A"), nrow = 5, byrow = TRUE, dimnames = list(NULL, c("Q1", "Q2", "Q3", "Q4")) ) key <- c("A", "A", "B", "C") item_difficulty(responses, key)
Computes a discrimination index for each item using one of two classical methods: the point-biserial correlation between item and total test score, or the upper-lower 27 percent discrimination index proposed by Kelley (1939).
item_discrimination( responses, key, method = c("point_biserial", "discrimination_index"), group_pct = 0.27 )item_discrimination( responses, key, method = c("point_biserial", "discrimination_index"), group_pct = 0.27 )
responses |
A matrix or data frame of student responses, with students in rows and items in columns. |
key |
A vector of correct answers with length equal to the number of items. |
method |
One of |
group_pct |
For |
The point-biserial method is the most widely used CTT discrimination
index. The discrimination index D compares the proportion of the
upper-scoring group (top 27 percent by total score) who answered the
item correctly to the proportion of the lower-scoring group (bottom
27 percent) who answered it correctly. Kelley (1939) demonstrated
that the 27 percent cutoff maximizes the difference between extreme
groups under a normal distribution of ability.
Interpretive guidelines for D (Ebel & Frisbie, 1991):
D >= 0.40: very good item
0.30 <= D < 0.40: good item, possibly subject to improvement
0.20 <= D < 0.30: marginal item, needs improvement
D < 0.20: poor item, revise or discard
A named numeric vector of discrimination values, one per item.
Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17-24.
set.seed(1) responses <- matrix( sample(c("A", "B", "C", "D"), 200, replace = TRUE), nrow = 40, ncol = 5, dimnames = list(NULL, paste0("Q", 1:5)) ) key <- c("A", "B", "C", "A", "B") item_discrimination(responses, key) item_discrimination(responses, key, method = "discrimination_index")set.seed(1) responses <- matrix( sample(c("A", "B", "C", "D"), 200, replace = TRUE), nrow = 40, ncol = 5, dimnames = list(NULL, paste0("Q", 1:5)) ) key <- c("A", "B", "C", "A", "B") item_discrimination(responses, key) item_discrimination(responses, key, method = "discrimination_index")
Runs the full classical test theory item analysis on a multiple-choice
response matrix and returns a tidy mcq_analysis object containing
per-item difficulty, discrimination (both point-biserial and the
upper-lower 27 percent index), distractor efficiency, and the full
per-option distractor analysis. The returned object has dedicated
print(), plot(), and apa_table() methods.
mcq_analysis(responses, key, options = NULL, min_proportion = 0.05)mcq_analysis(responses, key, options = NULL, min_proportion = 0.05)
responses |
A matrix or data frame of student responses, with students in rows and items in columns. |
key |
A vector of correct answers with length equal to the number of items. |
options |
Optional character vector listing all possible response
options. If |
min_proportion |
Minimum proportion of examinees selecting a distractor for it to be considered functioning when computing distractor efficiency. Default 0.05. |
An object of class mcq_analysis (a list) with components:
itemsData frame with one row per item summarizing difficulty, point-biserial, discrimination index, and distractor efficiency.
distractorsData frame with full per-option distractor analysis (one row per item-option combination).
total_scoresNumeric vector of total test scores, one per student.
n_studentsNumber of students.
n_itemsNumber of items.
keyAnswer key.
data(mcq_example) result <- mcq_analysis(mcq_example$responses, mcq_example$key) resultdata(mcq_example) result <- mcq_analysis(mcq_example$responses, mcq_example$key) result
A simulated dataset for demonstrating the mcqAnalysis package. The test contains 30 four-option multiple-choice items administered to 200 students. The data are generated under a two-parameter logistic framework with a deliberately mixed mix of item quality:
Items 1-8 are easy items with strong discrimination.
Items 9-24 are medium-difficulty items, most discriminating well.
Items 25-28 are harder items with progressively weaker discrimination.
Items 29-30 are deliberately badly-written items with negative discrimination (high-ability students get them wrong more often).
Item 30 additionally has a "trap" distractor disproportionately chosen by high-ability students, useful for demonstrating distractor analysis.
mcq_examplemcq_example
A list with two components:
A 200 x 30 character matrix of student responses
(values in {"A", "B", "C", "D"}).
A named character vector of length 30 giving the correct answer for each item.
data(mcq_example) str(mcq_example, max.level = 1) mcq_example$key head(mcq_example$responses)data(mcq_example) str(mcq_example, max.level = 1) mcq_example$key head(mcq_example$responses)
Produces the classical item quality map: a scatterplot of item difficulty (x-axis) against item discrimination (y-axis), with reference lines marking conventional adequacy cutoffs. Items in the upper-middle region (medium difficulty, high discrimination) are performing well; items in the lower regions are candidates for revision.
## S3 method for class 'mcq_analysis' plot( x, y = NULL, discrimination_metric = c("point_biserial", "discrimination_index"), label = c("flagged", "all", "none"), flag_threshold_difficulty = c(0.3, 0.9), flag_threshold_discrimination = 0.3, point_cex = 1.4, label_cex = 0.75, ... )## S3 method for class 'mcq_analysis' plot( x, y = NULL, discrimination_metric = c("point_biserial", "discrimination_index"), label = c("flagged", "all", "none"), flag_threshold_difficulty = c(0.3, 0.9), flag_threshold_discrimination = 0.3, point_cex = 1.4, label_cex = 0.75, ... )
x |
An object of class |
y |
Ignored. Present for S3 compatibility. |
discrimination_metric |
Which discrimination index to plot on the
y-axis. One of |
label |
One of |
flag_threshold_difficulty |
Numeric vector of length 2 giving the
informative difficulty range. Default |
flag_threshold_discrimination |
Numeric. Discrimination cutoff below which an item is considered weak. Default 0.30. |
point_cex |
Numeric. Point size. Default 1.4. |
label_cex |
Numeric. Label text size. Default 0.75. |
... |
Additional graphical parameters passed to |
By default, only flagged items (those falling outside the conventional
adequacy region) are labeled, to keep the plot legible when many items
cluster in the acceptable region. Use label = "all" to label every
item, or label = "none" to suppress labels entirely.
Reference lines are drawn at conventional cutoffs from Ebel and Frisbie (1991): discrimination >= 0.30 (acceptable) and difficulty between 0.30 and 0.90 (informative range).
The input mcq_analysis object, invisibly.
Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.
data(mcq_example) result <- mcq_analysis(mcq_example$responses, mcq_example$key) plot(result) plot(result, label = "all") plot(result, label = "none")data(mcq_example) result <- mcq_analysis(mcq_example$responses, mcq_example$key) plot(result) plot(result, label = "all") plot(result, label = "none")
Computes the point-biserial correlation between each item and the total test score (excluding the item itself, i.e., corrected for item overlap). This is the standard classical test theory discrimination index based on the correlation between item performance and overall test performance.
point_biserial(responses, key, corrected = TRUE)point_biserial(responses, key, corrected = TRUE)
responses |
A matrix or data frame of student responses, with students in rows and items in columns. |
key |
A vector of correct answers with length equal to the number of items. |
corrected |
Logical. If |
Items with point-biserial correlations of 0.30 or above are generally considered to discriminate well between high- and low-ability students. Values between 0.20 and 0.29 are marginal; values below 0.20 indicate poor discrimination, and negative values suggest a problem with the item (Ebel & Frisbie, 1991).
A named numeric vector of point-biserial correlations, one per item.
Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.
set.seed(1) responses <- matrix( sample(c("A", "B", "C", "D"), 100, replace = TRUE), nrow = 20, ncol = 5, dimnames = list(NULL, paste0("Q", 1:5)) ) key <- c("A", "B", "C", "A", "B") point_biserial(responses, key)set.seed(1) responses <- matrix( sample(c("A", "B", "C", "D"), 100, replace = TRUE), nrow = 20, ncol = 5, dimnames = list(NULL, paste0("Q", 1:5)) ) key <- c("A", "B", "C", "A", "B") point_biserial(responses, key)