| Title: | Assessment of Predictions for an Ordinal Response |
|---|---|
| Description: | Produces several metrics to assess the prediction of ordinal categories based on the estimated probability distribution for each unit of analysis produced by any model returning a matrix with these probabilities. |
| Authors: | Javier Espinosa-Brito [aut, cre] |
| Maintainer: | Javier Espinosa-Brito <[email protected]> |
| License: | GPL-2 |
| Version: | 0.1.1 |
| Built: | 2026-06-02 10:41:26 UTC |
| Source: | https://github.com/cran/apor |
Compute the Normalized Ordinal Prediction Agreement (NOPA) metric, a performance measure for models with ordinal-scaled response variables that output estimated probability distributions (EPDs) instead of predicted labels.
This function assesses the predictive quality of a model for an ordinal response by aggregating the predicted probability mass as a function of the level of disagreement with respect to the observed category. It provides a normalized and interpretable score between 0 and 1, where 1 indicates perfect agreement and 0 represents the worst possible prediction.
NOPA compares the estimated probability distribution produced by a model for
each unit of analysis against the observed ordinal response of the same unit. The
maximum disagreement is , where is the number of ordinal categories
of the response variable, and the
minimum disagreement is 0. Then, aggregates the disagreements of all units of analysis into one single measure.
The function internally computes:
OPD — Ordinal Prediction Disagreement, the average level of disagreement
between the predicted and observed categories.
w — The worst possible OPD given the dataset, representing the
maximum disagreement achievable.
NOPA — The normalized agreement metric defined as .
OPDempDist, OPDur, NOPAempDist, NOPAur: Reference values for
empirical and uniform-random baselines to contextualize model performance assessment
provided by OPD and NOPA.
nopa(predMat, obsVect)nopa(predMat, obsVect)
predMat |
A numeric matrix with |
obsVect |
A numeric or integer vector of observed categories, with values from 1 to |
A list containing:
predMatInput matrix of predicted probabilities.
obsVectInput vector of observed categories.
disagreementsObsA matrix with columns (number of ordinal categories
of the response variable), and rows. Each row shows the level of disagreement
of each ordinal category with respect to the observed one for the same unit of analysis.
rearrangedProbObsMatrix of probabilities aggregated by level of disagreement.
meanDistObsMean aggregated disagreement profile.
OPDObserved Ordinal Prediction Disagreement.
wOPD for the worst prediction possible (maximum disagreement).
NOPANormalized Ordinal Prediction Agreement (main metric).
OPDempDistA version of a reference point for OPD. It considers an
ordinal prediction disagreement measure for the case where the estimated probability distribution
for the categories of the ordinal response follows the same distribution as the empirical one.
OPDurA version of a reference point for OPD. It considers an
ordinal prediction disagreement measure for the case where the observed response
variable has its own empirical distribution and the estimated probability distribution
for the categories of the ordinal response follows a uniform distribution.
NOPAempDistA version of a reference point for NOPA. It considers a
normalized ordinal prediction agreement measure for the case where the estimated probability distribution
for the categories of the ordinal response follows the same distribution as the empirical one.
NOPAurA version of a reference point for NOPA. It considers a
normalized ordinal prediction agreement measure for the case where the estimated probability distribution
for the categories of the ordinal response follows a uniform distribution.
Javier
ordPredArgmax,
ordPredRandom
opdRef
EPD <- t(apply(matrix(runif(100),ncol=5),1,function(y) y/sum(y))) sum(rowSums(EPD))==nrow(EPD) ordResponse <- sample(1:5,20, replace=TRUE) nopa(predMat=EPD,obsVect=ordResponse)EPD <- t(apply(matrix(runif(100),ncol=5),1,function(y) y/sum(y))) sum(rowSums(EPD))==nrow(EPD) ordResponse <- sample(1:5,20, replace=TRUE) nopa(predMat=EPD,obsVect=ordResponse)
Computes two reference values for the Ordinal Prediction Disagreement (OPD):
(i) the expected OPD when the predicted label follows the *same*
empirical distribution as ; and (ii) the expected OPD when
is *uniform* over the ordered categories while
retains its empirical distribution. These values are useful as dataset-specific
anchors for interpreting raw OPD and for constructing normalized benchmarks.
opdRef(p)opdRef(p)
p |
A probability vector of length |
Let denote the empirical distribution of .
The function returns two scalars:
OPDempDist: when
independently of .
OPDur: when
independently of
.
Both are computed via the disagreement-level decomposition
where, for the uniform case,
which is the discrete- version of the expression shown in the manuscript.
A named numeric vector of length two:
c(OPDempDist = ..., OPDur = ...).
nopa,
ordPredArgmax,
ordPredRandom
# Example with k = 5 categories and an empirical distribution p: p <- c(0.10, 0.20, 0.40, 0.20, 0.10) opdRef(p)# Example with k = 5 categories and an empirical distribution p: p <- c(0.10, 0.20, 0.40, 0.20, 0.10) opdRef(p)
Deterministically maps each row of an estimated probability distribution (EPD) matrix to a single predicted class by taking the index of the maximum probability. Rows are normalized to sum to one (within tolerance). Ties can be broken by first, last, or at random among maximizers.
ordPredArgmax(P, tie_break = c("first", "random", "last"), tol = 1e-12)ordPredArgmax(P, tie_break = c("first", "random", "last"), tol = 1e-12)
P |
A numeric matrix of size |
tie_break |
Character string indicating how to break ties among
equal maxima. One of |
tol |
Numeric tolerance used for (i) row-sum checks and (ii) equality
when identifying ties among maximum probabilities. Defaults to |
The function normalizes each row of P to sum to one (within
tol). Rows with (near) zero total probability trigger an error.
If multiple classes achieve the same (within tol) maximum probability,
the returned class depends on tie_break:
"first" — smallest index among maximizers (default).
"last" — largest index among maximizers.
"random" — one index sampled uniformly from the set of
maximizers.
An integer vector of length with the predicted class indices
in for each row of P.
P <- rbind( c(0.05, 0.10, 0.25, 0.60), c(0.40, 0.40, 0.10, 0.10), # tie between classes 1 and 2 c(NA, 0.20, 0.80, 0.00) # NA treated as 0 )P <- rbind( c(0.05, 0.10, 0.25, 0.60), c(0.40, 0.40, 0.10, 0.10), # tie between classes 1 and 2 c(NA, 0.20, 0.80, 0.00) # NA treated as 0 )
Stochastically maps each row of an estimated probability distribution (EPD)
matrix to a single predicted class by drawing one sample from the row's
categorical distribution. Rows are normalized to sum to one (within tolerance),
and the cut-points method is used with intervals ,
ensuring maps to class .
ordPredRandom(P, z = NULL, tol = 1e-12)ordPredRandom(P, z = NULL, tol = 1e-12)
P |
A numeric matrix of size |
z |
Optional numeric vector of length |
tol |
Numeric tolerance used for row-sum checks and for guarding against
underflow when normalizing. Defaults to |
The mapping follows the cumulative cut-points
, for
, and assigns class whenever
. When z is supplied, values are
clipped to to respect interval boundaries. Rows with (near) zero
total probability trigger an error.
An integer vector of length with the predicted class indices
in for each row of P.
set.seed(1) P <- rbind( c(0.05, 0.10, 0.25, 0.60), c(0.40, 0.40, 0.10, 0.10), c(0.00, 0.20, 0.80, 0.00) ) # Stochastic draws from each row's EPD ordPredRandom(P) # Reproducible draws using provided uniforms z <- c(0.2, 0.85, 1.0) ordPredRandom(P, z = z)set.seed(1) P <- rbind( c(0.05, 0.10, 0.25, 0.60), c(0.40, 0.40, 0.10, 0.10), c(0.00, 0.20, 0.80, 0.00) ) # Stochastic draws from each row's EPD ordPredRandom(P) # Reproducible draws using provided uniforms z <- c(0.2, 0.85, 1.0) ordPredRandom(P, z = z)