| Title: | Explore Fitted Linear and Generalised Linear Models with 'shiny' |
|---|---|
| Description: | Provides a 'shiny' application that helps learners connect regression tables to fitted generalised linear models. Users construct models via drag-and-drop controls, obtain fitted equations and plain-language explanations generated by a large language model, and can view plots of the fitted model in settings with a single continuous covariate. |
| Authors: | James Curran [aut, cre] |
| Maintainer: | James Curran <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.4 |
| Built: | 2026-07-04 11:27:44 UTC |
| Source: | https://github.com/cran/WMFM |
Validates and evaluates a single assignment statement such as
t = 1:nrow(data) or month = factor(rep(1:12, 12)).
The RHS is evaluated in a safe environment created by makeSafeEvalEnv(),
so only the data columns plus a small allowlist of base functions are available.
addDerivedVariableToData(data, txt, allowOverwrite = FALSE)addDerivedVariableToData(data, txt, allowOverwrite = FALSE)
data |
A data frame to modify. |
txt |
A length-1 character string containing one assignment statement. |
allowOverwrite |
Logical; if |
The derived variable is appended to the data frame as a new column.
Scalars are recycled to the number of rows; otherwise the result must have
length nrow(data).
A list with elements:
ok: logical, whether the operation succeeded.
msg: character status message.
data: the updated data frame (if ok), otherwise the original data.
name: the new variable name (if ok).
transformation: derived-variable metadata (if ok).
df = data.frame(passengers = 1:144) res = addDerivedVariableToData(df, "t = 1:nrow(data)") res$ok names(res$data) res2 = addDerivedVariableToData(res$data, "month = factor(rep(1:12, 12))") table(res2$data$month)df = data.frame(passengers = 1:144) res = addDerivedVariableToData(df, "t = 1:nrow(data)") res$ok names(res$data) res2 = addDerivedVariableToData(res$data, "month = factor(rep(1:12, 12))") table(res2$data$month)
Flattens a wmfmScores object into a long or wide data frame for analysis,
plotting, and comparison.
## S3 method for class 'wmfmScores' as.data.frame( x, row.names = NULL, optional = FALSE, format = c("long", "wide"), ... )## S3 method for class 'wmfmScores' as.data.frame( x, row.names = NULL, optional = FALSE, format = c("long", "wide"), ... )
x |
A |
row.names |
Ignored. Included for S3 compatibility. |
optional |
Ignored. Included for S3 compatibility. |
format |
Character. One of |
... |
Unused. Included for S3 compatibility. |
A data frame.
Builds a compact audit summary around a set of already graded bad explanations, with optional comparison against a graded good explanation. This is intended as a lightweight helper for diagnosing adversarial or deliberately flawed explanations that may be slipping through the current deterministic rubric.
auditBadExplanationGrading( x, goodGrade = NULL, method = c("deterministic", "llm"), minExpectedDrop = 1, flaggedThreshold = 9, expectedMetrics = NULL, ... )auditBadExplanationGrading( x, goodGrade = NULL, method = c("deterministic", "llm"), minExpectedDrop = 1, flaggedThreshold = 9, expectedMetrics = NULL, ... )
x |
A |
goodGrade |
Optional |
method |
Character scalar. Grading method to audit. One of
|
minExpectedDrop |
Numeric scalar. Minimum drop in mark, relative to the good explanation, required before an explanation is treated as clearly penalised. |
flaggedThreshold |
Numeric scalar. Any explanation with a mark greater than or equal to this threshold is flagged. |
expectedMetrics |
Optional named list. Each element name should match an
explanation name in |
... |
Unused. |
An object of class wmfmBadExplanationAudit.
if (interactive()) { badGrades = grade(modelObj, explanation = badVec, method = "deterministic") goodGrade = grade(modelObj, explanation = modelObj$explanation, method = "deterministic") audit = auditBadExplanationGrading( badGrades, goodGrade = goodGrade, expectedMetrics = list( effectDirectionError = c("factualScore"), wrongScaleError = c("factualScore", "clarityScore"), rSquaredOverclaim = c("factualScore", "calibrationScore"), logicalContradiction = c("factualScore", "clarityScore") ) ) print(audit) }if (interactive()) { badGrades = grade(modelObj, explanation = badVec, method = "deterministic") goodGrade = grade(modelObj, explanation = modelObj$explanation, method = "deterministic") audit = auditBadExplanationGrading( badGrades, goodGrade = goodGrade, expectedMetrics = list( effectDirectionError = c("factualScore"), wrongScaleError = c("factualScore", "clarityScore"), rSquaredOverclaim = c("factualScore", "calibrationScore"), logicalContradiction = c("factualScore", "clarityScore") ) ) print(audit) }
Creates a student-safe, inspectable mapping from the final explanation text back to the deterministic evidence used to support it. The mapping is built from sentence-level units in the final explanation and links each sentence to evidence categories drawn from the explanation audit and teaching summary.
buildExplanationClaimEvidenceMap( explanationText, audit, teachingSummary = NULL, model = NULL )buildExplanationClaimEvidenceMap( explanationText, audit, teachingSummary = NULL, model = NULL )
explanationText |
Character scalar containing the final explanation. |
audit |
A |
teachingSummary |
Optional |
model |
Optional fitted model object, used to derive a teaching summary and to improve sentence matching. |
This object is intentionally not presented as hidden chain-of-thought. It is a transparent post hoc map showing which deterministic ingredients best match the visible explanation text.
The current implementation uses a tag-first hybrid approach:
deterministic sentence-level claim tags from the audit and teaching summary
lightweight sentence-level evidence matching against the final explanation text
a derived legacy single-label claimType field kept for compatibility
An object of class wmfmExplanationClaimEvidenceMap.
Derives a plain-language teaching summary from the deterministic explanation audit. The returned object is designed for student-facing display and avoids exposing raw prompt ingredients or developer-facing internal structures.
buildExplanationTeachingSummary(audit, model, researchQuestion = NULL)buildExplanationTeachingSummary(audit, model, researchQuestion = NULL)
audit |
A |
model |
A fitted model object, typically of class |
researchQuestion |
Optional character string giving the research
question the model is being used to address. When |
An object of class wmfmExplanationTeachingSummary.
Build deterministic model-plot data
buildModelPlotData(model, plotType = c("observedFitted", "residualFitted"))buildModelPlotData(model, plotType = c("observedFitted", "residualFitted"))
model |
A fitted |
plotType |
Plot type. Supported values are |
A list containing plot metadata and a data frame.
This helper constructs a short, deterministic set of language constraints
to guide natural-language interpretation of a contrast. The rules are
designed to enforce wording that is consistent with the model scale
(identity, log, logit, etc.) and any transformation applied to the response
in an lm() model.
buildScalePhrasingRules( isGlm, effectiveScale, respTransform, nounPhrase = NULL )buildScalePhrasingRules( isGlm, effectiveScale, respTransform, nounPhrase = NULL )
isGlm |
Logical. |
effectiveScale |
Character string giving the interpretation scale.
For GLMs this should typically be the model link (e.g. |
respTransform |
Character string describing the detected response
transformation for |
nounPhrase |
Optional character string naming the outcome in
student-facing language (e.g. |
The returned text is intended to be embedded verbatim in an LLM prompt to prevent inconsistent phrasing (e.g., additive vs multiplicative language).
A single character string containing bullet-point language rules. The text is suitable for direct inclusion in an LLM prompt.
buildScalePhrasingRules( isGlm = FALSE, effectiveScale = "identity", respTransform = "none" ) buildScalePhrasingRules( isGlm = TRUE, effectiveScale = "logit", respTransform = "none", nounPhrase = "disease status" )buildScalePhrasingRules( isGlm = FALSE, effectiveScale = "identity", respTransform = "none" ) buildScalePhrasingRules( isGlm = TRUE, effectiveScale = "logit", respTransform = "none", nounPhrase = "disease status" )
Creates a structured record for one repeated WMFM run using a schema that separates:
metadata and model context,
raw explanation text and text summaries, and
extracted semantic claims.
buildWmfmRunRecord( runId, exampleName, package, modelType, formula, equationsText, explanationText, researchQuestion = "", errorMessage = NA_character_, interactionTerms = character(0), interactionMinPValue = NA_real_, interactionAlpha = 0.05, hasFactorPredictors = FALSE, adjustmentVariables = character(0), primaryVariables = character(0), followupScoringContext = NULL )buildWmfmRunRecord( runId, exampleName, package, modelType, formula, equationsText, explanationText, researchQuestion = "", errorMessage = NA_character_, interactionTerms = character(0), interactionMinPValue = NA_real_, interactionAlpha = 0.05, hasFactorPredictors = FALSE, adjustmentVariables = character(0), primaryVariables = character(0), followupScoringContext = NULL )
runId |
Integer run identifier. |
exampleName |
Character example identifier. |
package |
Character package name. |
modelType |
Character model class. |
formula |
Character model formula. |
equationsText |
Character fitted-equation text. |
explanationText |
Character explanation text. |
researchQuestion |
Character research question associated with the model. |
errorMessage |
Character error message, or |
interactionTerms |
Character vector of interaction-term names from the fitted model. |
interactionMinPValue |
Numeric minimum p-value across fitted interaction
terms, or |
interactionAlpha |
Numeric interaction threshold metadata stored with the run record. |
hasFactorPredictors |
Logical. Whether the fitted model includes at least one factor or character predictor. Used by downstream scoring gates to distinguish genuinely applicable factor criteria from numeric-only models. |
adjustmentVariables |
Character vector of adjustment-variable names to carry into explanation scoring. These variables are treated as controls rather than primary narrative targets. |
primaryVariables |
Character vector of variables of scientific interest to carry into explanation scoring. |
followupScoringContext |
Optional named list of deterministic follow-up prediction metadata to carry into scoring prompts and extracted evidence. |
This function is intentionally limited to describing what happened during a
run and what the explanation appeared to say. It does not create any
judged scoring fields or aggregate score placeholders. Scoring is handled
later by downstream functions such as score() and
scoreWmfmRepeatedRuns(), which should operate on the raw run record rather
than being partially embedded in it.
The returned named list contains the following groups of fields.
Integer run identifier.
Character timestamp created at record build time.
Example identifier.
Package name.
High-level model class.
Optional model family. Initialised to NA.
Optional link function. Initialised to NA.
Model formula as character.
Fitted-equation text.
Pipe-separated interaction-term names.
Logical. Whether the fitted model includes interactions.
Integer count of interaction terms.
Minimum interaction-term p-value, or NA.
Stored interaction threshold metadata from the run.
Logical. Whether an error was recorded for the run.
Run error message, or NA.
Raw explanation text.
Normalised explanation used for duplicate detection.
Logical. Whether a non-empty explanation is present.
Approximate word count.
Approximate sentence count.
Logical. Whether a confidence or credible interval is mentioned.
Logical. Whether percent language is used.
Logical. Whether reference/baseline framing is mentioned.
Logical. Whether interaction or differential-pattern language is mentioned.
Logical. Whether uncertainty or evidential caution is mentioned.
Logical. Whether inferential wording is present.
Logical. Whether wording is descriptive only.
Logical. Whether overstrong language is detected.
Logical. Whether unusually weak language is detected.
Logical. Whether conditional wording such as "depends on" or subgroup-conditioned interpretation is present.
Logical. Whether explicit or implicit comparison wording is present.
Logical. Placeholder for whether the response is clearly named. Initialised conservatively.
Logical. Placeholder for whether the focal predictor is clearly named. Initialised conservatively.
Character. One of increase, decrease, mixed_or_both, not_stated.
Character. One of additive, multiplicative, probability_or_odds, mixed_or_unclear, not_stated.
Character. One of difference_claimed_strongly, difference_claimed_cautiously, no_clear_difference, not_mentioned, unclear, not_applicable.
Character. One of inferential, descriptive_only, overclaiming, unclear.
Character. One of none, generic_uncertainty, confidence_interval, mixed, unclear.
Named list containing one structured raw run record.
Implements the layout rules for displaying fitted means when all predictors are factors:
1 factor: one-way table (single column of fitted means).
2 factors: two-way table with the factor having the most levels on rows.
3 factors: a set of two-way tables split by the factor with the fewest levels. Within each two-way table, the factor with the most levels is on rows and the remaining factor is on columns.
chooseFactorLayout(mf, predictors)chooseFactorLayout(mf, predictors)
mf |
A model frame (as returned by |
predictors |
Character vector of predictor names (columns in |
A list describing the layout:
One of "oneWay", "twoWay", "threeWay", or "other".
Row factor name (for "oneWay" and "twoWay" and "threeWay").
Column factor name (for "twoWay" and "threeWay").
Split factor name (for "threeWay" only).
df = data.frame( y = rnorm(24), A = factor(rep(c("a1","a2"), each = 12)), B = factor(rep(paste0("b", 1:4), times = 6)), C = factor(rep(paste0("c", 1:6), times = 4)) ) mod = lm(y ~ A * B * C, data = df) mf = model.frame(mod) chooseFactorLayout(mf, c("A","B","C"))df = data.frame( y = rnorm(24), A = factor(rep(c("a1","a2"), each = 12)), B = factor(rep(paste0("b", 1:4), times = 6)), C = factor(rep(paste0("c", 1:6), times = 4)) ) mod = lm(y ~ A * B * C, data = df) mf = model.frame(mod) chooseFactorLayout(mf, c("A","B","C"))
Uses simple text heuristics to classify the scale on which the explanation describes the main effect. The intent is to identify the dominant scale used for coefficient interpretation, rather than unrelated numeric summaries such as model-fit percentages.
classifyEffectScaleClaim(text)classifyEffectScaleClaim(text)
text |
Character scalar explanation text. |
The classifier is deliberately conservative about assigning
"mixed_or_unclear". In particular, generic percentage language such as
an statement should not by itself force a multiplicative label when
the coefficient interpretation is otherwise clearly additive.
One of "additive", "multiplicative",
"probability_or_odds", "mixed_or_unclear", or
"not_stated".
Removes simple LLM formatting artifacts that can leak into the visible
explanation, such as leading Answer, Answer:, or Answer - tokens.
This is a deterministic surface cleanup step. It does not rewrite the
statistical content of the explanation.
cleanExplanationText(text)cleanExplanationText(text)
text |
Character vector of explanation text. |
A character vector with formatting artifacts removed.
Generic for comparing WMFM objects.
compare(x, y = NULL, ...)compare(x, y = NULL, ...)
x |
An object to compare. |
y |
Optional second object to compare. |
... |
Additional arguments passed to methods. |
Method specific comparison object.
Compares deterministic and llm grading either within a single wmfmGrade
object that contains both methods, or across two separate wmfmGrade
objects.
## S3 method for class 'wmfmGrade' compare(x, y = NULL, methods = c("deterministic", "llm"), ...)## S3 method for class 'wmfmGrade' compare(x, y = NULL, methods = c("deterministic", "llm"), ...)
x |
A scored |
y |
Optional second scored |
methods |
Character vector of length 2 naming the methods to compare
when comparing within a single object. Defaults to
|
... |
Unused. Included for S3 compatibility. |
An object of class wmfmGradeComparison.
Compares score results either:
within a single wmfmScores object containing two methods, or
between two wmfmScores objects.
## S3 method for class 'wmfmScores' compare( x, y = NULL, xMethod = NULL, yMethod = NULL, registry = getWmfmMetricRegistry(), ... )## S3 method for class 'wmfmScores' compare( x, y = NULL, xMethod = NULL, yMethod = NULL, registry = getWmfmMetricRegistry(), ... )
x |
A |
y |
Optional second |
xMethod |
Optional character string naming the method to use from |
yMethod |
Optional character string naming the method to use from |
registry |
Optional WMFM metric registry. Defaults to
|
... |
Unused. Included for S3 compatibility. |
The comparison summarizes agreement separately for binary, ordinal, and continuous score fields using the WMFM metric registry.
An object of class wmfmScoreComparison.
Computes an estimate and 95% confidence interval for a user-specified contrast in a factor-only model, optionally using robust (sandwich) variance estimators.
computeFactorOnlyContrast( model, newData, weights, ciType = c("standard", "sandwich"), hcType = c("HC0", "HC3"), level = 0.95 )computeFactorOnlyContrast( model, newData, weights, ciType = c("standard", "sandwich"), hcType = c("HC0", "HC3"), level = 0.95 )
model |
A fitted model object, typically an |
newData |
A data frame whose rows define the conditions being contrasted. |
weights |
Numeric vector of contrast weights, one per row of |
ciType |
One of |
hcType |
Sandwich type passed to |
level |
Confidence level (default 0.95). |
The contrast is defined through a set of weights applied to the rows of
newData. For example, a pairwise contrast between level A and level B
can be represented with weights c(1, -1, 0, ..., 0).
For GLMs, inference is carried out on the linear predictor (link) scale and then back-transformed for interpretation when possible:
Poisson (log link): reports a mean ratio .
Binomial (logit link): reports an odds ratio .
A list with estimates and confidence intervals on the link scale and (when applicable) an interpreted scale.
Computes pointwise confidence intervals for the fitted mean response
evaluated at newData. These are confidence intervals for the
conditional mean (i.e., the fitted line), not prediction intervals.
computeMeanCi( model, newData, ciType = "standard", hcType = "HC0", level = 0.95 )computeMeanCi( model, newData, ciType = "standard", hcType = "HC0", level = 0.95 )
model |
A fitted |
newData |
A data frame of covariate values at which to evaluate the fitted mean. |
ciType |
Character string: |
hcType |
Character string specifying the HC estimator
(e.g. |
level |
Confidence level for the intervals. |
For generalized linear models, intervals are computed on the link scale and transformed back to the response scale.
Robust (sandwich) confidence intervals are supported via vcovHC().
A data frame containing newData plus columns
fit, lower, and upper.
Count sentences in text
countWmfmSentences(x)countWmfmSentences(x)
x |
Character text. |
Integer.
Count words in text
countWmfmWords(x)countWmfmWords(x)
x |
Character text. |
Integer.
S3 generic for describing fields stored in WMFM objects.
describeField(x, field, ...)describeField(x, field, ...)
x |
A WMFM object. |
field |
Character scalar naming the field to describe. This may be a canonical field name, a recognised alias, or a pretty plot label if the underlying field registry supports it. |
... |
Additional arguments passed to methods. |
Method-specific description output.
Describes a field that is present in a wmfmRuns object. The method accepts
canonical field names, recognised aliases, and supported pretty plot labels.
It validates that the resolved canonical field is actually stored in the raw
run records before delegating to describeWmfmField().
## S3 method for class 'wmfmRuns' describeField( x, field, format = c("text", "list", "data.frame"), includeExamples = TRUE, includeAliases = TRUE, ... )## S3 method for class 'wmfmRuns' describeField( x, field, format = c("text", "list", "data.frame"), includeExamples = TRUE, includeAliases = TRUE, ... )
x |
A |
field |
Character scalar naming the field to describe. |
format |
Character. One of |
includeExamples |
Logical. Should examples be included? |
includeAliases |
Logical. Should recognised aliases be included? |
... |
Unused. Included for S3 compatibility. |
The result of describeWmfmField() in the requested format.
This is a character scalar for format = "text", a named list for
format = "list", or a one-row data.frame for
format = "data.frame".
Describes a field that is present in a wmfmScores object. The method accepts
canonical field names, recognised aliases, and supported pretty plot labels.
It validates that the resolved canonical field is actually stored in the
scored records before delegating to describeWmfmField().
## S3 method for class 'wmfmScores' describeField( x, field, format = c("text", "list", "data.frame"), includeExamples = TRUE, includeAliases = TRUE, ... )## S3 method for class 'wmfmScores' describeField( x, field, format = c("text", "list", "data.frame"), includeExamples = TRUE, includeAliases = TRUE, ... )
x |
A |
field |
Character scalar naming the field to describe. |
format |
Character. One of |
includeExamples |
Logical. Should examples be included? |
includeAliases |
Logical. Should recognised aliases be included? |
... |
Unused. Included for S3 compatibility. |
The result of describeWmfmField() in the requested format.
This is a character scalar for format = "text", a named list for
format = "list", or a one-row data.frame for
format = "data.frame".
Prints a human-readable description of a field used in the WMFM workflow. This is intended to help interpret the raw run fields, judged fields, and aggregate score fields that appear in WMFM objects and related plots.
describeWmfmField( field, format = c("text", "list", "data.frame"), includeExamples = TRUE, includeAliases = TRUE )describeWmfmField( field, format = c("text", "list", "data.frame"), includeExamples = TRUE, includeAliases = TRUE )
field |
Character scalar naming the field to describe. This may be a canonical field name, a recognised alias, or one of the pretty labels shown on the heatmap x-axis. |
format |
Character. One of |
includeExamples |
Logical. Should examples be included in the returned output? |
includeAliases |
Logical. Should recognised aliases be included in the returned output? |
The function accepts:
current schema names such as interactionEvidenceAppropriate,
older aliases such as interactionInference, and
the pretty x-axis labels used by plotWmfmExplanationClaimHeatmap(),
such as "Direction claim", "Scale claim", or "CI mention".
By default the function prints a formatted explanation to the console using
cat() and returns the underlying character vector invisibly. This makes it
convenient for interactive use while still allowing the text to be captured
programmatically when needed.
For judged fields scored on a 0/1/2 scale, the usual interpretation is:
Poor, missing, inappropriate, or clearly problematic.
Partly adequate, weak, borderline, or unclear.
Appropriate, clear, or well handled.
If format = "text", a formatted character vector returned
invisibly after being printed to the console. Otherwise a named list or a
one-row data frame describing the requested field.
describeWmfmField("interactionEvidenceAppropriate") describeWmfmField("Direction claim") describeWmfmField("CI mention") describeWmfmField("overallScore", format = "list")describeWmfmField("interactionEvidenceAppropriate") describeWmfmField("Direction claim") describeWmfmField("CI mention") describeWmfmField("overallScore", format = "list")
Detects comparison structure that may not use explicit phrases such as "compared with" or "relative to", but still clearly expresses a contrast between groups or conditions.
detectImplicitComparison(text)detectImplicitComparison(text)
text |
Character scalar explanation text. |
This helper is intended to support extraction in buildWmfmRunRecord(),
especially for explanations that use constructions such as:
"higher ... than those who did not"
"more ... than students who did not attend"
"lower ... than those without ..."
The function looks for comparative wording together with at least one structural cue indicating contrast or grouping.
Logical scalar. TRUE if implicit comparison language is detected,
otherwise FALSE.
Detects expressions like "from 3.0 to 4.0" or "between 3.8 and 12.2", which indicate uncertainty without explicitly mentioning confidence intervals.
detectRangeExpression(text)detectRangeExpression(text)
text |
Character string |
Logical
Detect pattern in text
detectWmfmPattern(x, pattern)detectWmfmPattern(x, pattern)
x |
Character text. |
pattern |
Regex pattern. |
Logical.
S3 generic for diagnosing disagreement between deterministic and LLM scoring outputs.
diagnose(x, ...)diagnose(x, ...)
x |
An object to diagnose. |
... |
Additional arguments passed to methods. |
Method-specific diagnosis output.
Diagnoses disagreement between deterministic and LLM scoring for either a single metric or all available metrics.
## S3 method for class 'wmfmScores' diagnose(x, metric = NULL, runs = NULL, cmp = NULL, maxExamples = 5, ...)## S3 method for class 'wmfmScores' diagnose(x, metric = NULL, runs = NULL, cmp = NULL, maxExamples = 5, ...)
x |
A |
metric |
Optional single metric name. If |
runs |
Optional |
cmp |
Optional |
maxExamples |
Maximum number of flagged disagreement examples to retain for each metric diagnosis. |
... |
Reserved for future method-specific arguments. |
When metric is supplied, the function returns a
wmfmMetricDiagnosis object. When metric = NULL, it returns a
wmfmScoresDiagnosis object summarising all metrics that can be
compared.
A wmfmMetricDiagnosis or wmfmScoresDiagnosis object.
Runs the deterministic explanation surface processor in debug mode and records surface-language issues before and after processing. This helper is intended for developer review and tests. It does not change the explanation pipeline and it does not make statistical claims.
diagnoseExplanationSurfaceProcessing(text, audit = NULL)diagnoseExplanationSurfaceProcessing(text, audit = NULL)
text |
Character vector of explanation text. |
audit |
Optional explanation audit object passed to
|
A list with original text, processed text, applied rule names, detected issues before processing, detected issues after processing, and the number of unresolved issues.
diagnoseExplanationSurfaceProcessing( "A one-unit rise multiplies the expected count by 0.21." )diagnoseExplanationSurfaceProcessing( "A one-unit rise multiplies the expected count by 0.21." )
Opens the WMFM user configuration file in the default R editor. If the configuration directory or file does not yet exist, they are created first.
editWmfmConfig(editor = utils::file.edit)editWmfmConfig(editor = utils::file.edit)
editor |
Function used to open the configuration file. Defaults to
|
Invisibly returns the path to the configuration file.
Provides a human-readable explanation of how a particular field (claim, judged field, or score) was derived for a given run.
explainWmfmFieldScore(field, x, runIndex = 1L)explainWmfmFieldScore(field, x, runIndex = 1L)
field |
Character. Field name or plotted label (e.g. "Overclaim", "Interaction evidence", "overallScore"). |
x |
A |
runIndex |
Integer. Which run to explain. |
This function works directly with a repeated-runs object and will
internally call scoreWmfmRepeatedRuns() if needed.
Invisibly returns a character vector explanation. Also prints to console using cat().
Converts model explanation or equation objects into plain text.
extractWmfmText(x)extractWmfmText(x)
x |
Object returned by WMFM functions. |
Character string.
Fill missing predictor columns in new data using the model's training frame
fillMissingPredictors(model, newData)fillMissingPredictors(model, newData)
model |
A fitted model object with a model frame (e.g., lm/glm). |
newData |
A data frame intended for prediction/design-matrix construction. |
newData with any missing predictor columns added, using values from the first row of the model frame and preserving factor levels.
Scans explanation text for recurring surface-language issues that the deterministic post-processing layer is intended to control. This helper is diagnostic only: it does not rewrite the text and it does not make any statistical claims.
findExplanationSurfaceIssues( text, issueTypes = c("modelMechanismLanguage", "unitChangePhrasing", "verbalFractions", "confidenceIntervalTerminology", "longSentencePatterns") )findExplanationSurfaceIssues( text, issueTypes = c("modelMechanismLanguage", "unitChangePhrasing", "verbalFractions", "confidenceIntervalTerminology", "longSentencePatterns") )
text |
Character vector of explanation text. |
issueTypes |
Character vector naming issue groups to scan for. Defaults to all supported issue groups. |
A data frame with one row per detected issue and columns element,
issueType, pattern, and matchedText.
Format a single numeric value for display in a summary table using a significant-digits rule that aims to keep only the digits that carry useful information. Integer values are shown without decimal places. Non-integer values are rounded to a chosen number of significant digits.
formatSummaryValue(x, sigDigits = 3)formatSummaryValue(x, sigDigits = 3)
x |
A numeric scalar to format. |
sigDigits |
Integer. The number of significant digits to keep for non-integer values. Defaults to 3. |
This is useful for compact summary tables where fixed decimal places can
produce distracting trailing zeros, for example showing 11 rather than
11.000.
A character scalar. Returns NA_character_ when x has length 0 or
is NA.
formatSummaryValue(11) formatSummaryValue(52.8767) formatSummaryValue(68.5) formatSummaryValue(0.012345, sigDigits = 2)formatSummaryValue(11) formatSummaryValue(52.8767) formatSummaryValue(68.5) formatSummaryValue(0.012345, sigDigits = 2)
Converts elapsed time in seconds to a compact human-readable string for console progress reporting.
formatWmfmElapsedTime(seconds)formatWmfmElapsedTime(seconds)
seconds |
Numeric. Elapsed time in seconds. |
Character string such as "12s", "3m 08s", or "1h 02m 15s".
Generic for generating plausible but intentionally flawed explanations from a fitted model.
generateBadExplanation(x, ...) ## S3 method for class 'wmfmModel' generateBadExplanation( x, explanation = NULL, type = "auto", severity = c("subtle", "moderate", "severe"), n = 1, mixTypes = FALSE, labelErrors = FALSE, provider = NULL, showProgress = interactive(), showTiming = interactive(), ... )generateBadExplanation(x, ...) ## S3 method for class 'wmfmModel' generateBadExplanation( x, explanation = NULL, type = "auto", severity = c("subtle", "moderate", "severe"), n = 1, mixTypes = FALSE, labelErrors = FALSE, provider = NULL, showProgress = interactive(), showTiming = interactive(), ... )
x |
An object. |
... |
Additional arguments reserved for future use. |
explanation |
Optional character scalar giving the base good
explanation. If |
type |
Character vector of bad explanation types, or |
severity |
Character scalar. One of |
n |
Integer. Number of bad explanations to generate. |
mixTypes |
Logical. Should individual explanations combine multiple bad types? |
labelErrors |
Logical. Should the return value include error labels and severity metadata? |
provider |
Optional chat provider. If |
showProgress |
Logical. Should command-line progress messages be shown?
Defaults to |
showTiming |
Logical. Should command-line timing summaries be shown?
Defaults to |
Method-dependent output.
If n = 1 and labelErrors = FALSE, a character scalar.
If n > 1 and labelErrors = FALSE, a named character vector.
If labelErrors = TRUE, a named list containing explanations, error type
labels, and severity labels.
generateBadExplanation(wmfmModel): Generate plausible but intentionally flawed explanations starting from a
good explanation, either supplied directly or stored in a wmfmModel
object. The generated explanations are returned in a form that can be passed
directly to grade().
Returns the names of predictor variables in a fitted model that are factors in the supplied data set.
getFactorPredictors(model, data)getFactorPredictors(model, data)
model |
A fitted model object (e.g. |
data |
A data frame containing the variables used to fit the model. |
A character vector of factor predictor names. May be empty.
df = data.frame( y = rpois(10, 3), site = factor(rep(c("A", "B"), each = 5)), depth = rnorm(10) ) mod = glm(y ~ site + depth, family = poisson, data = df) getFactorPredictors(mod, df)df = data.frame( y = rpois(10, 3), site = factor(rep(c("A", "B"), each = 5)), depth = rnorm(10) ) mod = glm(y ~ site + depth, family = poisson, data = df) getFactorPredictors(mod, df)
Returns run-level paired scores for a given metric from a
wmfmScoreComparison object.
getMetricComparisonData(x, metric)getMetricComparisonData(x, metric)
x |
A |
metric |
Character name of the metric. Must be one of the metric names
stored in |
This accessor uses the metric registry stored on the comparison object as the
source of truth for valid metric names. Because the choices are dynamic,
match.arg() is used for validation and partial matching, but IDE tab
completion will not show the possible values automatically.
An object of class metricComparisonData with one row per run.
When one of the compared methods is deterministic, the returned data frame
includes a detValue column and a method-specific value column such as
llmValue. When the comparison is between deterministic and LLM scoring,
the output also includes llmMinusDet and detMinusLlm.
Provides a single entry point for equation generation. Deterministic equations are now the default. The older LLM equation path remains available as an opt-in compatibility route during the transition.
getModelEquations( model, method = c("deterministic", "llm"), chat = NULL, digits = 2 )getModelEquations( model, method = c("deterministic", "llm"), chat = NULL, digits = 2 )
model |
A fitted model object, typically of class |
method |
Character string giving the equation engine. Must be one of
|
chat |
Optional chat provider object. Required when |
digits |
Integer number of decimal places for displayed coefficients in
the deterministic path. Defaults to |
Either a deterministic equation table or the existing language-model equation object.
Returns a named character vector of colours for semantic values used in WMFM raw repeated-run claim-profile heatmaps.
getWmfmClaimColorMap()getWmfmClaimColorMap()
The palette is designed for raw extracted claim fields and aims to keep semantically different values visually distinct.
A named character vector of hexadecimal colours.
getWmfmClaimColorMap()getWmfmClaimColorMap()
Returns the directory that WMFM uses for user-specific local configuration.
The path respects options(wmfm.config_dir = ...) when that option has
been set.
getWmfmConfigDir()getWmfmConfigDir()
Character scalar path to the WMFM configuration directory.
Returns the full path to the WMFM user configuration file. This is useful when checking or backing up a local configuration during setup and testing.
getWmfmConfigPath()getWmfmConfigPath()
Character scalar path to config.json.
Defines metric metadata used by WMFM score comparison, stability summaries, and plotting. The registry is intended to be the single source of truth for which metrics are available and how they should be treated.
getWmfmMetricRegistry()getWmfmMetricRegistry()
A data frame containing metric metadata.
Extracts selected raw claim fields from a wmfmRuns object and reshapes them
into long format for heatmap plotting.
getWmfmRunsClaimProfileData( x, fieldColumns = NULL, naLabel = "(missing)", prettyFieldLabels = TRUE, fieldOrder = c("semantic", "purity") )getWmfmRunsClaimProfileData( x, fieldColumns = NULL, naLabel = "(missing)", prettyFieldLabels = TRUE, fieldOrder = c("semantic", "purity") )
x |
A |
fieldColumns |
Optional character vector of raw claim fields to include.
If |
naLabel |
Character label used for missing values. |
prettyFieldLabels |
Logical. Should field names be converted to more readable display labels? |
fieldOrder |
Character. One of |
Runs are intended to be shown on rows and claim fields on columns. Runs are ordered by run purity. Fields can be ordered semantically or by field purity.
A data frame with columns runId, field, fieldLabel, value,
modalValue, fieldPurity, and runPurity.
Build extracted-claim frequency data for a WMFM runs object
getWmfmRunsClaimsData(x)getWmfmRunsClaimsData(x)
x |
A |
A data frame summarising claim frequencies.
Extracts selected per-run descriptive metrics from a wmfmRuns object for
plotting and inspection.
getWmfmRunsTextMetricsData(x)getWmfmRunsTextMetricsData(x)
x |
A |
A data frame with one row per run.
Generic for grading user-written explanations against a prepared WMFM model context.
grade(x, ...)grade(x, ...)
x |
An object to grade against. |
... |
Additional arguments passed to methods. |
A method-specific graded output.
Creates either a wmfmGrade object for a single explanation or a
wmfmGradeListObj when multiple explanations are supplied. By default the
returned object is immediately scored using either the deterministic or LLM
WMFM grading rubric.
## S3 method for class 'wmfmModel' grade( x, explanation, modelAnswer = NULL, method = c("deterministic", "llm", "both"), autoScore = TRUE, score = NULL, scoreScale = 10, nLlm = 1L, confirmLargeLlmJob = FALSE, maxLlmJobsWithoutConfirmation = 20L, ... )## S3 method for class 'wmfmModel' grade( x, explanation, modelAnswer = NULL, method = c("deterministic", "llm", "both"), autoScore = TRUE, score = NULL, scoreScale = 10, nLlm = 1L, confirmLargeLlmJob = FALSE, maxLlmJobsWithoutConfirmation = 20L, ... )
x |
A |
explanation |
Character vector, character scalar, or list of character scalars giving the explanation(s) to grade. |
modelAnswer |
Optional character scalar giving a reference answer. |
method |
Character. One of |
autoScore |
Logical. Should the returned object be scored immediately?
Defaults to |
score |
Deprecated logical alias for |
scoreScale |
Numeric scalar giving the displayed mark scale. Defaults to
|
nLlm |
Integer. Number of repeated LLM gradings per explanation when
|
confirmLargeLlmJob |
Logical. Set to |
maxLlmJobsWithoutConfirmation |
Integer. Maximum number of LLM grading calls allowed without explicit confirmation. |
... |
Additional arguments passed to |
An object of class wmfmGrade or wmfmGradeListObj.
Determines whether all predictors in a fitted linear or generalised linear model are factors in the supplied data set.
isFactorOnlyModel(model, data)isFactorOnlyModel(model, data)
model |
A fitted model object (e.g. |
data |
A data frame containing the variables used to fit the model. |
Intercept-only models are not considered factor-only.
Logical scalar. Returns TRUE if all predictors are factors,
otherwise FALSE.
df = data.frame( y = rpois(20, 5), site = factor(rep(c("A", "B"), each = 10)) ) mod = glm(y ~ site, family = poisson, data = df) isFactorOnlyModel(mod, df)df = data.frame( y = rpois(20, 5), site = factor(rep(c("A", "B"), each = 10)) ) mod = glm(y ~ site, family = poisson, data = df) isFactorOnlyModel(mod, df)
Determines whether a fitted model has only factor predictors in its model frame (i.e., every predictor column is a factor). This is used to switch the "Fitted equations" view into an ANOVA-style "Fitted means" view.
isFactorOnlyPredictorModel(m)isFactorOnlyPredictorModel(m)
m |
A fitted model object, typically an |
The response is not checked for being a factor; in typical "fitted means"
usage the response is numeric and predictors are factors. Intercept-only
models (no predictors) return FALSE.
A logical scalar. TRUE if all predictors in
model.frame(m) are factors; otherwise FALSE.
df = data.frame(y = rnorm(12), A = factor(rep(letters[1:3], each = 4))) mod = lm(y ~ A, data = df) isFactorOnlyPredictorModel(mod)df = data.frame(y = rnorm(12), A = factor(rep(letters[1:3], each = 4))) mod = lm(y ~ A, data = df) isFactorOnlyPredictorModel(mod)
Returns the currently supported error types that can be requested when generating deliberately flawed model explanations.
listBadExplanationTypes()listBadExplanationTypes()
A character vector of supported bad explanation types.
Returns metadata-backed details for packaged examples stored under
inst/extdata/examples. This is intended for release-mode and
developer-mode example-library displays that need more than the example
names alone.
listWMFMExampleDetails(package = "WMFM", includeTestExamples = FALSE)listWMFMExampleDetails(package = "WMFM", includeTestExamples = FALSE)
package |
A character string giving the package name containing the packaged examples. |
includeTestExamples |
Logical. Should developer-only examples marked
as |
A data frame with one row per available example and columns for example name, path, audience, model family, difficulty, teaching topic, and developer purpose.
if (interactive()) { listWMFMExampleDetails(package = "WMFM") }if (interactive()) { listWMFMExampleDetails(package = "WMFM") }
Lists packaged examples stored in the package's
inst/extdata/examples directory by looking for specification files
ending in .spec.yml.
listWMFMExamples(package = "WMFM", includeTestExamples = FALSE)listWMFMExamples(package = "WMFM", includeTestExamples = FALSE)
package |
A character string giving the package name containing the packaged examples. |
includeTestExamples |
Logical. Should developer-only examples marked
as |
Each example is expected to live in its own subdirectory under
extdata/examples/<name>/ and to contain a specification file named
<name>.spec.yml. Example visibility is controlled by the optional
extdata/examples/example-metadata.yml manifest.
A character vector of available example names. If no examples are found, an empty character vector is returned.
if (interactive()) { listWMFMExamples(package = "WMFM") }if (interactive()) { listWMFMExamples(package = "WMFM") }
Creates a salted password hash suitable for storing in the
WMFM_DEVELOPER_MODE_PASSWORD_HASH environment variable.
makeDeveloperModePasswordHash(password)makeDeveloperModePasswordHash(password)
password |
Character scalar giving the plain-text password. |
A salted password hash string.
Produces a grouped plot for models with only factor predictors.
makeFactorOnlyPlot( model, data, ciType = c("standard", "sandwich"), hcType = c("HC0", "HC3") )makeFactorOnlyPlot( model, data, ciType = c("standard", "sandwich"), hcType = c("HC0", "HC3") )
model |
A fitted model object (e.g. |
data |
A data frame containing the variables used to fit the model. |
ciType |
Confidence-interval type. One of |
hcType |
Heteroskedasticity-consistent estimator type for robust CIs.
One of |
Uses boxplots when all groups have at least 10 observations
Uses jittered point plots when any group has fewer than 10 observations
Adds fitted means and 95% confidence intervals alongside each group
Supports standard (model-based) and robust (sandwich) confidence intervals
Applies a log(1 + y) scale when the model is Poisson
For multiple factor predictors, groups are defined by their interaction.
Robust confidence intervals are computed on the linear predictor scale using X V X^T, where V is either vcov(model) or sandwich::vcovHC(model, type = ...). For GLMs, intervals are then transformed back to the response scale using the inverse link function.
A ggplot object.
Creates a full grid of factor-level combinations for all predictors in the fitted model frame and computes fitted values for each combination.
makeFittedMeansData(m)makeFittedMeansData(m)
m |
A fitted model object, typically an |
For glm models, fitted values are returned on the response scale
via predict(type = "response"). For lm models, fitted values
are returned from predict().
A list with components:
The response variable name.
Character vector of predictor names (in model frame order).
A data frame of predictor level combinations with an added
numeric column .fit.
The model frame used for extracting factor levels.
df = data.frame( y = rnorm(12), A = factor(rep(c("a","b"), each = 6)), B = factor(rep(c("u","v","w"), times = 4)) ) mod = lm(y ~ A * B, data = df) out = makeFittedMeansData(mod) head(out$grid)df = data.frame( y = rnorm(12), A = factor(rep(c("a","b"), each = 6)), B = factor(rep(c("u","v","w"), times = 4)) ) mod = lm(y ~ A * B, data = df) out = makeFittedMeansData(mod) head(out$grid)
Builds an explicit equation for a single fitted mean using the model's
coefficient vector and the corresponding design row from
model.matrix(). This is designed to show how each fitted mean is
constructed from the regression table coefficients.
makeMeanEquation(m, oneRowDf, label)makeMeanEquation(m, oneRowDf, label)
m |
A fitted model object, typically an |
oneRowDf |
A one-row data frame containing predictor values for which to construct the equation. Column names must match the model terms. |
label |
A character label to place on the left-hand side of the equation. |
For GLMs, the regression coefficients combine to form the linear predictor
(often written ). This function shows that linear predictor and
then shows the back-transformation to the mean on the response scale.
A length-1 character string containing the equation (may include newline characters for multi-step GLM working).
df = data.frame(y = rnorm(8), A = factor(rep(c("a","b"), each = 4))) mod = lm(y ~ A, data = df) oneRow = data.frame(A = factor("b", levels = levels(df$A))) makeMeanEquation(mod, oneRow, "Mean(A=b)")df = data.frame(y = rnorm(8), A = factor(rep(c("a","b"), each = 4))) mod = lm(y ~ A, data = df) oneRow = data.frame(A = factor("b", levels = levels(df$A))) makeMeanEquation(mod, oneRow, "Mean(A=b)")
Builds an environment that contains the columns of data and a small
allowlist of safe base functions. This is intended for evaluating the
right-hand side (RHS) of user-entered derived-variable assignments in a
Shiny app without exposing powerful functions like file I/O or system calls.
makeSafeEvalEnv(data)makeSafeEvalEnv(data)
data |
A data frame. |
The returned environment contains:
All columns of data as symbols.
A symbol data bound to the full data frame.
Only the allowlisted functions (from base) in the parent environment.
An environment suitable for eval(rhs, envir = env).
df = data.frame(x = 1:5) env = makeSafeEvalEnv(df) eval(parse(text = "log(x)"), envir = env) eval(parse(text = "factor(rep(1:2, length.out = nrow(data)))"), envir = env)df = data.frame(x = 1:5) env = makeSafeEvalEnv(df) eval(parse(text = "log(x)"), envir = env) eval(parse(text = "factor(rep(1:2, length.out = nrow(data)))"), envir = env)
Generates a stable, perceptually stronger mapping from category values to colours using only base R / grDevices. The same category value always maps to the same colour, regardless of the order in which values appear.
makeWmfmDeterministicCategoryColors(values, naLabel = "(missing)")makeWmfmDeterministicCategoryColors(values, naLabel = "(missing)")
values |
A vector of category values. |
naLabel |
Character label used for missing values after coercion. |
Colours are generated in HCL space using a simple deterministic string hash. This gives better separation than short fixed palettes, especially when many distinct values are present.
A named character vector of colours, where names are category values.
makeWmfmDeterministicCategoryColors(c("yes", "no", "mixed", "(missing)"))makeWmfmDeterministicCategoryColors(c("yes", "no", "mixed", "(missing)"))
Converts raw semantic claim values into nicer display labels for plot legends.
makeWmfmLegendLabels(values)makeWmfmLegendLabels(values)
values |
Character vector of raw legend values. |
A character vector of display labels.
makeWmfmLegendLabels(c("TRUE", "mixed_or_unclear", "(missing)", ""))makeWmfmLegendLabels(c("TRUE", "mixed_or_unclear", "(missing)", ""))
Creates a classed wmfmGrade object that stores a candidate explanation,
an optional reference answer, and the model context needed for scoring.
newWmfmGrade( x, explanation, modelAnswer = NULL, scoreScale = 10, records = list(), scores = list(), feedback = list(), meta = list() )newWmfmGrade( x, explanation, modelAnswer = NULL, scoreScale = 10, records = list(), scores = list(), feedback = list(), meta = list() )
x |
A |
explanation |
Character scalar. The explanation being graded. |
modelAnswer |
Optional character scalar giving a reference answer. |
scoreScale |
Numeric scalar giving the displayed mark scale. Defaults to
|
records |
Optional named list of run records. |
scores |
Optional named list of scored data frames. |
feedback |
Optional named list of feedback components. |
meta |
Optional named list of metadata. |
An object of class wmfmGrade.
Creates a classed wmfmGradeListObj object that stores multiple
wmfmGrade objects together with batch-level metadata.
newWmfmGradeListObj(grades, model, inputs, meta = list())newWmfmGradeListObj(grades, model, inputs, meta = list())
grades |
Named list of |
model |
A |
inputs |
Named list describing the supplied explanations. |
meta |
Optional named list of metadata. |
An object of class wmfmGradeListObj.
Creates a classed wmfmModel object that stores a fitted model together
with the additional context and generated outputs used by the WMFM
command-line workflow.
newWmfmModel( model, formula, modelType, data, dataContext = NULL, researchQuestion = NULL, equations = NULL, explanation = NULL, explanationAudit = NULL, explanationClaimEvidenceMap = NULL, modelProfile = NULL, variableTransformations = list(), responseTransformationMode = "both", interactionTerms = character(0), interactionMinPValue = NA_real_, meta = list() )newWmfmModel( model, formula, modelType, data, dataContext = NULL, researchQuestion = NULL, equations = NULL, explanation = NULL, explanationAudit = NULL, explanationClaimEvidenceMap = NULL, modelProfile = NULL, variableTransformations = list(), responseTransformationMode = "both", interactionTerms = character(0), interactionMinPValue = NA_real_, meta = list() )
model |
A fitted model object. |
formula |
A model formula. |
modelType |
Character string giving the model family. |
data |
A |
dataContext |
Optional character string giving dataset context. |
researchQuestion |
Optional character string giving the research question associated with the fitted model. |
equations |
Generated equations object, or |
explanation |
Generated explanation text, or |
explanationAudit |
Deterministic explanation-audit object, or |
explanationClaimEvidenceMap |
Deterministic claim-to-evidence map, or |
modelProfile |
Deterministic explanation model-profile metadata, or |
variableTransformations |
Named list of derived-variable transformation records used by this fitted model. |
responseTransformationMode |
Character scalar describing how later
response-scale interpretation should handle recognised response
transformations. One of |
interactionTerms |
Character vector of fitted interaction-term names. |
interactionMinPValue |
Minimum p-value across fitted interaction terms,
or |
meta |
Optional named list of metadata. |
An object of class wmfmModel.
Creates a wmfmScores object aligned to a wmfmRuns object but containing
no scoring results yet. This is a constructor/helper for the scoring
workflow. Actual score values are added later by score().
newWmfmScores(x, methods = character(0))newWmfmScores(x, methods = character(0))
x |
A |
methods |
Character vector of scoring methods to reserve. Allowed values
are |
An object of class wmfmScores.
Lowercases and removes extra whitespace.
normaliseWmfmText(x)normaliseWmfmText(x)
x |
Character text. |
Character string.
Returns legend values in a semantic order rather than alphabetical order. This makes the legend easier to read by grouping related claim values together.
orderWmfmLegendValues(values, includeBreaks = TRUE)orderWmfmLegendValues(values, includeBreaks = TRUE)
values |
Character vector of legend values to order. |
includeBreaks |
Logical. Should blank spacer rows be inserted between semantic groups? |
If includeBreaks = TRUE, blank entries are inserted between groups so the
legend displays with visual spacing.
A character vector of ordered legend values. If includeBreaks = TRUE, the returned vector may contain empty strings used as spacer rows.
orderWmfmLegendValues(c("unclear", "TRUE", "appropriate", "(missing)")) orderWmfmLegendValues( c("unclear", "TRUE", "appropriate", "(missing)"), includeBreaks = TRUE )orderWmfmLegendValues(c("unclear", "TRUE", "appropriate", "(missing)")) orderWmfmLegendValues( c("unclear", "TRUE", "appropriate", "(missing)"), includeBreaks = TRUE )
Parses user input and verifies it is exactly one assignment statement of the form
name = expr or name <- expr. This is designed for validating a
derived-variable input box in a Shiny app.
parseSingleAssignment(txt)parseSingleAssignment(txt)
txt |
A length-1 character string containing R code. |
A list with elements:
ok: logical, whether parsing/validation succeeded.
msg: character message if ok = FALSE.
name: (if ok) the variable name on the LHS.
rhs: (if ok) the RHS expression (language object).
parseSingleAssignment("t = 1:10")$ok parseSingleAssignment("1:10")$ok parseSingleAssignment("x = log(y)")$nameparseSingleAssignment("t = 1:10")$ok parseSingleAssignment("1:10")$ok parseSingleAssignment("x = log(y)")$name
Provides quick diagnostic plots for the run-level output returned by
getMetricComparisonData().
## S3 method for class 'metricComparisonData' plot(x, type = c("confusion", "runs"), ...)## S3 method for class 'metricComparisonData' plot(x, type = c("confusion", "runs"), ...)
x |
A |
type |
Plot type. One of |
... |
Unused. Included for S3 compatibility. |
A ggplot2 object.
Plots the relationship between disagreement and deterministic ease for a
metricComparisonSummary object.
## S3 method for class 'metricComparisonSummary' plot( x, type = c("scatter", "lollipop", "diagnostic"), metricType = NULL, labelPoints = TRUE, orderBy = c("disagreement", "ease"), disagreementThreshold = 0.5, easeThreshold = 0.8, ... )## S3 method for class 'metricComparisonSummary' plot( x, type = c("scatter", "lollipop", "diagnostic"), metricType = NULL, labelPoints = TRUE, orderBy = c("disagreement", "ease"), disagreementThreshold = 0.5, easeThreshold = 0.8, ... )
x |
A |
type |
Plot type. One of |
metricType |
Optional metric type filter. One of |
labelPoints |
Logical. Should points be labelled in the scatter-style plots? |
orderBy |
Ordering for the lollipop plot. One of |
disagreementThreshold |
Numeric threshold used by |
easeThreshold |
Numeric threshold used by |
... |
Unused. Included for S3 compatibility. |
A ggplot2 object.
Produces plots for a raw wmfmRuns object. These plots are descriptive and
focus on run-to-run variation in generated outputs, extracted claims, and run
metrics. Judged fields and score summaries are intentionally excluded and
belong to wmfmScores objects instead.
## S3 method for class 'wmfmRuns' plot(x, type = c("claims", "textMetrics", "claimProfile"), ...)## S3 method for class 'wmfmRuns' plot(x, type = c("claims", "textMetrics", "claimProfile"), ...)
x |
A |
type |
Character. Plot type. One of |
... |
Passed through to lower-level plotting helpers. |
Supported plot types are:
"claims"Bar plot of extracted binary claim frequencies across runs.
"textMetrics"Bar plot of per-run text and timing metrics.
"claimProfile"Heatmap of raw extracted claim fields across runs.
A ggplot2 object.
Provides plotting modes for a wmfmScoreComparison object.
## S3 method for class 'wmfmScoreComparison' plot(x, type = c("agreement", "overall", "heatmap"), ...)## S3 method for class 'wmfmScoreComparison' plot(x, type = c("agreement", "overall", "heatmap"), ...)
x |
A |
type |
Character. One of |
... |
Additional arguments passed to the underlying helper. |
type = "overall"A Bland-Altman plot for paired overall scores.
type = "agreement"An ordinal-agreement summary plot showing exact agreement, adjacent agreement, weighted kappa, and mean absolute difference.
type = "heatmap"A run-by-metric disagreement heatmap based on run-level comparison pairs.
A ggplot object.
Provides plotting methods for a wmfmScores object returned by score().
Score heatmaps use a continuous colour scale and, by default, include only
the 0 to 2 dimension scores to avoid distortion from the 0 to 100 overall
score.
## S3 method for class 'wmfmScores' plot( x, method = NULL, type = c("scores", "overall", "summary"), fieldColumns = NULL, ... )## S3 method for class 'wmfmScores' plot( x, method = NULL, type = c("scores", "overall", "summary"), fieldColumns = NULL, ... )
x |
A |
method |
Optional character. Scoring method to plot. One of "deterministic" or "llm". If omitted, an available method is chosen automatically. |
type |
Character. One of "scores", "overall", or "summary". |
fieldColumns |
Optional character vector of score columns to plot when
|
... |
Additional arguments for future extensibility. |
A ggplot2 object.
Creates a visual summary of within-method score stability.
## S3 method for class 'wmfmScoreStability' plot( x, type = c("continuous", "ordinal", "binary"), metric = NULL, value = NULL, ... )## S3 method for class 'wmfmScoreStability' plot( x, type = c("continuous", "ordinal", "binary"), metric = NULL, value = NULL, ... )
x |
A |
type |
Character. One of |
metric |
Character or |
value |
Character. Quantity to plot. Allowed values depend on |
... |
Unused. Included for S3 compatibility. |
A ggplot2 object.
UI controls for confidence intervals. The UI adapts to the plot type:
"factorOnly": show CI type (standard vs robust) and HC choice.
"continuous": show an optional "Show confidence intervals"
checkbox; when enabled, show level, CI type, and HC choice.
plotCiControlsUi( mode = c("factorOnly", "continuous"), showCiInputId = "plotShowCi", ciLevelInputId = "plotCiLevel", ciTypeInputId = "plotCiType", hcTypeInputId = "plotHcType" )plotCiControlsUi( mode = c("factorOnly", "continuous"), showCiInputId = "plotShowCi", ciLevelInputId = "plotCiLevel", ciTypeInputId = "plotCiType", hcTypeInputId = "plotHcType" )
mode |
Either |
showCiInputId |
Input id for the "show confidence intervals" checkbox. |
ciLevelInputId |
Input id for the confidence level slider. |
ciTypeInputId |
Input id for CI type radio buttons. |
hcTypeInputId |
Input id for HC type dropdown. |
Designed to be placed in the Plot tab sidebar / controls area.
A Shiny tag list.
Draw a student-facing model plot
plotModelPlot( model, plotType = c("observedFitted", "residualFitted"), showSmoothTrend = TRUE )plotModelPlot( model, plotType = c("observedFitted", "residualFitted"), showSmoothTrend = TRUE )
model |
A fitted model object or |
plotType |
Plot type. |
showSmoothTrend |
Logical; if |
A ggplot object, or NULL for unsupported models.
Draws a run-by-field heatmap for a wmfmRuns object using raw extracted
claim fields only. Rows represent runs and columns represent claim fields.
plotWmfmExplanationClaimHeatmap( x, fieldColumns = NULL, naLabel = "(missing)", main = "Claim profile across repeated runs", xlab = NULL, ylab = "Run ID", prettyFieldLabels = TRUE, fieldOrder = c("semantic", "purity"), xLabelAngle = 45, includeLegendBreaks = FALSE )plotWmfmExplanationClaimHeatmap( x, fieldColumns = NULL, naLabel = "(missing)", main = "Claim profile across repeated runs", xlab = NULL, ylab = "Run ID", prettyFieldLabels = TRUE, fieldOrder = c("semantic", "purity"), xLabelAngle = 45, includeLegendBreaks = FALSE )
x |
A |
fieldColumns |
Optional character vector of raw claim fields to plot. |
naLabel |
Character label used for missing values. |
main |
Character plot title. |
xlab |
Character x-axis label. |
ylab |
Character y-axis label. |
prettyFieldLabels |
Logical. Should field names be prettified for display? |
fieldOrder |
Character. One of |
xLabelAngle |
Numeric rotation angle for x-axis tick labels. |
includeLegendBreaks |
Logical. Passed to |
A ggplot2 object.
Plot ordinal agreement summary for WMFM score comparison
plotWmfmScoreAgreementSummary(x, orderBy = c("worst", "registry"))plotWmfmScoreAgreementSummary(x, orderBy = c("worst", "registry"))
x |
A wmfmScoreComparison object |
orderBy |
"worst" or "registry" |
ggplot object
Visualises run-level disagreement between two scoring methods stored in a
wmfmScoreComparison object. The heatmap is built from pairData and is
intended primarily for ordinal and binary metrics.
plotWmfmScoreHeatmap( x, includeMetricTypes = c("ordinal", "binary"), orderBy = c("worst", "registry", "alphabetical"), facetBy = c("group", "metricType"), prettyLabels = TRUE, showRunEvery = 1L, ... )plotWmfmScoreHeatmap( x, includeMetricTypes = c("ordinal", "binary"), orderBy = c("worst", "registry", "alphabetical"), facetBy = c("group", "metricType"), prettyLabels = TRUE, showRunEvery = 1L, ... )
x |
A |
includeMetricTypes |
Character vector of metric types to include.
Defaults to |
orderBy |
Character. One of |
facetBy |
Character. One of |
prettyLabels |
Logical. Should metric labels be prettified using the
stored labels in |
showRunEvery |
Integer. Show every |
... |
Unused. Included for future compatibility. |
Ordinal metrics are classified into disagreement classes using the absolute difference between the paired scores:
exact = no difference,
adjacent = difference of 1,
moderate = difference of 2 or more.
Binary metrics are classified as either exact or different.
Metrics can be ordered by mean disagreement severity so that the worst-agreeing metrics appear first.
A ggplot object.
Applies deterministic surface-level cleanup to generated explanation text. This step is deliberately conservative: it removes known formatting artifacts, standardises a small set of recurring pedagogical phrasing problems, and avoids changing numeric values or statistical meaning.
postProcessExplanationText(text, audit = NULL, debug = FALSE)postProcessExplanationText(text, audit = NULL, debug = FALSE)
text |
Character vector of explanation text. |
audit |
Optional explanation audit object. Reserved for future audit-aware cleanup rules. |
debug |
Logical. If |
A character vector with deterministic surface cleanup applied, or a
list with original, processed, and rulesApplied when debug = TRUE.
Print a metric comparison summary
## S3 method for class 'metricComparisonSummary' print(x, ...)## S3 method for class 'metricComparisonSummary' print(x, ...)
x |
A |
... |
Unused. Included for S3 compatibility. |
The input object, invisibly.
Print a WMFM grade summary
## S3 method for class 'summary.wmfmGrade' print(x, digits = 2, ...)## S3 method for class 'summary.wmfmGrade' print(x, digits = 2, ...)
x |
A |
digits |
Number of digits for numeric output. |
... |
Unused. |
Invisibly returns x.
Print a WMFM grade list summary
## S3 method for class 'summary.wmfmGradeListObj' print(x, digits = 2, ...)## S3 method for class 'summary.wmfmGradeListObj' print(x, digits = 2, ...)
x |
A |
digits |
Number of digits for numeric output. |
... |
Unused. |
Invisibly returns x.
Print a summary.wmfmRuns object
## S3 method for class 'summary.wmfmRuns' print(x, ...)## S3 method for class 'summary.wmfmRuns' print(x, ...)
x |
A |
... |
Reserved for future extensions. |
x, invisibly.
Print a bad explanation grading audit
## S3 method for class 'wmfmBadExplanationAudit' print(x, digits = 2, maxExamples = 10, ...)## S3 method for class 'wmfmBadExplanationAudit' print(x, digits = 2, maxExamples = 10, ...)
x |
A |
digits |
Number of digits for numeric output. |
maxExamples |
Maximum number of flagged examples to print in detail. |
... |
Unused. |
Invisibly returns x.
Prints one teaching case at a time, preserving separate scale-specific lines where available.
## S3 method for class 'wmfmEquationTable' print(x, ...)## S3 method for class 'wmfmEquationTable' print(x, ...)
x |
A |
... |
Unused. |
Invisibly returns x.
Print a WMFM explanation audit
## S3 method for class 'wmfmExplanationAudit' print(x, ...)## S3 method for class 'wmfmExplanationAudit' print(x, ...)
x |
A |
... |
Unused. |
Invisibly returns x.
Print an explanation surface diagnosis
## S3 method for class 'wmfmExplanationSurfaceDiagnosis' print(x, ...)## S3 method for class 'wmfmExplanationSurfaceDiagnosis' print(x, ...)
x |
A |
... |
Additional arguments, currently unused. |
The input object, invisibly.
Print a WMFM grade object
## S3 method for class 'wmfmGrade' print( x, method = NULL, format = c("plaintext", "html"), digits = 2, maxRows = 6, ... )## S3 method for class 'wmfmGrade' print( x, method = NULL, format = c("plaintext", "html"), digits = 2, maxRows = 6, ... )
x |
A |
method |
Optional character. One of "deterministic" or "llm". |
format |
Character. One of "plaintext" or "html". |
digits |
Number of digits for numeric output. |
maxRows |
Maximum rows per section. |
... |
Unused. |
Invisibly returns x for plaintext and an HTML document for html.
Print a wmfmGradeComparison object
## S3 method for class 'wmfmGradeComparison' print(x, digits = 2, maxRows = 6, ...)## S3 method for class 'wmfmGradeComparison' print(x, digits = 2, maxRows = 6, ...)
x |
A |
digits |
Number of digits for numeric output. |
maxRows |
Maximum number of rows to print in each section. |
... |
Unused. Included for S3 compatibility. |
Invisibly returns x.
Print a WMFM grade list object
## S3 method for class 'wmfmGradeListObj' print(x, digits = 2, maxExamples = 6, ...)## S3 method for class 'wmfmGradeListObj' print(x, digits = 2, maxExamples = 6, ...)
x |
A |
digits |
Number of digits for numeric output. |
maxExamples |
Maximum number of explanation marks to print. |
... |
Unused. |
Invisibly returns x.
Print a metric diagnosis object
## S3 method for class 'wmfmMetricDiagnosis' print(x, ...)## S3 method for class 'wmfmMetricDiagnosis' print(x, ...)
x |
A |
... |
Unused. |
The input object, invisibly.
Provides a concise console summary of agreement between two WMFM scoring result sets.
## S3 method for class 'wmfmScoreComparison' print(x, digits = 2, ...)## S3 method for class 'wmfmScoreComparison' print(x, digits = 2, ...)
x |
A |
digits |
Integer. Number of digits to display. |
... |
Unused. Included for S3 compatibility. |
Invisibly returns x.
Print a scores diagnosis object
## S3 method for class 'wmfmScoresDiagnosis' print(x, ...)## S3 method for class 'wmfmScoresDiagnosis' print(x, ...)
x |
A |
... |
Unused. |
The input object, invisibly.
Provides a concise console summary of within-method score stability.
## S3 method for class 'wmfmScoreStability' print(x, digits = 2, ...)## S3 method for class 'wmfmScoreStability' print(x, digits = 2, ...)
x |
A |
digits |
Integer. Number of digits to display. |
... |
Unused. Included for S3 compatibility. |
Invisibly returns x.
Alias for getWmfmConfigPath().
readWmfmConfigPath()readWmfmConfigPath()
Character scalar path to config.json.
Recomputes raw extracted fields for an existing wmfmRuns object by
re-running buildWmfmRunRecord() on the stored run metadata and generated
text. This is useful when extraction rules change and you want to refresh the
raw run records without generating new LLM outputs.
rebuildWmfmRunRecords(x, preserveClass = TRUE)rebuildWmfmRunRecords(x, preserveClass = TRUE)
x |
A |
preserveClass |
Logical. Should the class of |
This function is intentionally limited to rebuilding raw run records. It does
not rescore runs and does not compute summaries. If scoring is needed after
rebuilding, call score() on the returned object.
A rebuilt wmfmRuns object with refreshed runs records.
Creates a simple HTML table (as Shiny tag objects) for a one-factor fitted means display.
renderOneWayTable(df, rowVar, valueName)renderOneWayTable(df, rowVar, valueName)
df |
A data frame containing the factor column and a fitted mean column. |
rowVar |
The name of the factor column used for rows. |
valueName |
The name of the numeric fitted-mean column (e.g. |
A Shiny tag object representing an HTML table.
if (requireNamespace("shiny", quietly = TRUE)) { df = data.frame(A = c("a","b"), .fit = c(1.2, 2.3)) renderOneWayTable(df, "A", ".fit") }if (requireNamespace("shiny", quietly = TRUE)) { df = data.frame(A = c("a","b"), .fit = c(1.2, 2.3)) renderOneWayTable(df, "A", ".fit") }
Creates a simple HTML two-way table (as Shiny tag objects) where one factor
defines the rows and another defines the columns. Cells are filled from
df[[valueName]] for each rowVar-colVar combination.
renderTwoWayTable(df, rowVar, colVar, valueName)renderTwoWayTable(df, rowVar, colVar, valueName)
df |
A data frame containing |
rowVar |
Name of the factor used for table rows. |
colVar |
Name of the factor used for table columns. |
valueName |
Name of the numeric fitted-mean column (e.g. |
A Shiny tag object representing an HTML table.
if (requireNamespace("shiny", quietly = TRUE)) { df = data.frame( A = rep(c("a","b"), each = 2), B = rep(c("u","v"), times = 2), .fit = c(1, 2, 3, 4) ) renderTwoWayTable(df, "A", "B", ".fit") }if (requireNamespace("shiny", quietly = TRUE)) { df = data.frame( A = rep(c("a","b"), each = 2), B = rep(c("u","v"), times = 2), .fit = c(1, 2, 3, 4) ) renderTwoWayTable(df, "A", "B", ".fit") }
Loads a packaged example, fits the specified model, generates equations and a plain-language explanation, and stores each run as a separate raw run record.
runExample( name, package = "WMFM", nRuns = 1, printOutput = FALSE, pauseSeconds = 0, showProgress = TRUE, useExplanationCache = FALSE, interactionAlpha = 0.05, ... )runExample( name, package = "WMFM", nRuns = 1, printOutput = FALSE, pauseSeconds = 0, showProgress = TRUE, useExplanationCache = FALSE, interactionAlpha = 0.05, ... )
name |
Character. Name of the packaged example. |
package |
Character. Package containing the example. Defaults to
|
nRuns |
Integer. Number of runs to perform. Defaults to |
printOutput |
Logical. Passed to |
pauseSeconds |
Numeric. Optional delay between runs. |
showProgress |
Logical. Should a console progress bar and timing summary be shown when work is repeated? |
useExplanationCache |
Logical. Passed to |
interactionAlpha |
Numeric. Threshold used when judging whether
interaction evidence wording is appropriate in |
... |
Additional arguments passed to |
This function is a unified replacement for separate single-run and
repeated-run entry points. Use nRuns = 1 for a single run.
No scoring is performed here. Scoring should be done later using score()
on the returned object.
An object of class wmfmRuns with elements:
Example name.
Package name.
Parsed example specification.
Optional example context text.
Optional example research question text.
List of raw run records.
Metadata about the run set, including elapsed time, average per-run time, and per-run timing details.
if (interactive()) { x = runExample("Course") y = runExample("Course", nRuns = 20) }if (interactive()) { x = runExample("Course") y = runExample("Course", nRuns = 20) }
Fits a model using a supplied dataset and formula, optionally attaches dataset context, and then attempts to generate fitted equations and a model explanation using the same helper functions used by the app.
runModel( data, formula, modelType = c("lm", "logistic", "poisson"), dataContext = NULL, researchQuestion = NULL, variableTransformations = NULL, responseTransformationMode = "both", ollamaBaseUrl = NULL, generateExplanation = TRUE, printOutput = TRUE, useExplanationCache = TRUE, equationMethod = c("deterministic", "llm") )runModel( data, formula, modelType = c("lm", "logistic", "poisson"), dataContext = NULL, researchQuestion = NULL, variableTransformations = NULL, responseTransformationMode = "both", ollamaBaseUrl = NULL, generateExplanation = TRUE, printOutput = TRUE, useExplanationCache = TRUE, equationMethod = c("deterministic", "llm") )
data |
A |
formula |
A model formula, either as a formula object or a character string that can be converted to a formula. |
modelType |
A character string giving the model family. Must be one
of |
dataContext |
Optional character string giving additional context about the dataset, study, variables, coding, or research aim. |
researchQuestion |
Optional character string giving the research question the user wants the fitted model to help answer. |
variableTransformations |
Optional named list of derived-variable transformation records to preserve when the fitted formula uses derived variables. |
responseTransformationMode |
Character scalar describing how later
response-scale interpretation should handle recognised response
transformations. One of |
ollamaBaseUrl |
Optional character string giving the base URL for the language model service. |
generateExplanation |
Logical. If |
printOutput |
Logical. If |
useExplanationCache |
Logical. Should cached explanation text be reused
when the same fitted model is encountered? Defaults to |
equationMethod |
Character string giving the equation engine. Must be
one of |
Supported model types are linear regression, logistic regression, and Poisson regression.
Explanation caching can be controlled via useExplanationCache. For normal
usage this can remain TRUE, but when repeatedly querying the language
model for the same fitted model it is often useful to set it to FALSE
so that each run makes a fresh explanation request.
The returned object also includes interaction-term names and the minimum interaction-term p-value, which can be used later when evaluating whether the explanation interpreted interaction evidence appropriately.
Invisibly returns an object of class wmfmModel.
Launches the WMFM Shiny application. The Ollama base URL used for language-model calls can be configured here.
runWMFMApp(ollamaBaseUrl = NULL)runWMFMApp(ollamaBaseUrl = NULL)
ollamaBaseUrl |
Optional character string giving the base URL of
the Ollama server, for example |
A shiny.appobj, invisibly.
Generic for scoring WMFM objects.
score(x, ...)score(x, ...)
x |
An object to score. |
... |
Additional arguments passed to methods. |
Method-specific scored output.
Scores a wmfmGrade object using either the deterministic WMFM rubric or an
LLM-based scoring rubric. The candidate explanation is always scored. If a
reference answer is present, it is scored separately and used to enrich
feedback.
## S3 method for class 'wmfmGrade' score( x, method = c("deterministic", "llm"), preferredMinWords = 80L, preferredMaxWords = 220L, fatalFlawCap = 40, passThreshold = 65, chat = NULL, useCache = FALSE, showProgress = TRUE, verbose = FALSE, nLlm = 1L, ... )## S3 method for class 'wmfmGrade' score( x, method = c("deterministic", "llm"), preferredMinWords = 80L, preferredMaxWords = 220L, fatalFlawCap = 40, passThreshold = 65, chat = NULL, useCache = FALSE, showProgress = TRUE, verbose = FALSE, nLlm = 1L, ... )
x |
A |
method |
Character. One of |
preferredMinWords |
Integer. Passed to deterministic scoring. |
preferredMaxWords |
Integer. Passed to deterministic scoring. |
fatalFlawCap |
Numeric. Passed to deterministic scoring. |
passThreshold |
Numeric. Passed to deterministic scoring. |
chat |
Optional chat provider object. Used for LLM scoring. |
useCache |
Logical. Passed to LLM scoring. |
showProgress |
Logical. Should progress messages be shown for LLM scoring? |
verbose |
Logical. Passed to LLM scoring. |
nLlm |
Integer. Number of repeated LLM gradings for the student explanation. |
... |
Additional arguments passed to the relevant scoring helper. |
A scored wmfmGrade object.
Scores each contained wmfmGrade object and records batch-level timing.
## S3 method for class 'wmfmGradeListObj' score( x, method = c("deterministic", "llm", "both"), nLlm = 1L, confirmLargeLlmJob = FALSE, maxLlmJobsWithoutConfirmation = 20L, showProgress = TRUE, ... )## S3 method for class 'wmfmGradeListObj' score( x, method = c("deterministic", "llm", "both"), nLlm = 1L, confirmLargeLlmJob = FALSE, maxLlmJobsWithoutConfirmation = 20L, showProgress = TRUE, ... )
x |
A |
method |
Character. One of |
nLlm |
Integer. Number of repeated LLM gradings per explanation. |
confirmLargeLlmJob |
Logical. Whether to allow large requests. |
maxLlmJobsWithoutConfirmation |
Integer. Maximum number of LLM calls allowed without explicit confirmation. |
showProgress |
Logical. Should progress messages be shown? |
... |
Additional arguments passed to |
A scored wmfmGradeListObj object.
Scores a wmfmRuns object using deterministic scoring, LLM scoring, or both,
and returns a separate wmfmScores object.
## S3 method for class 'wmfmRuns' score( x, method = c("deterministic", "llm", "both"), chat = NULL, useCache = FALSE, showProgress = TRUE, verbose = FALSE, ... )## S3 method for class 'wmfmRuns' score( x, method = c("deterministic", "llm", "both"), chat = NULL, useCache = FALSE, showProgress = TRUE, verbose = FALSE, ... )
x |
A |
method |
Character. One of |
chat |
Optional chat provider object. If omitted and LLM scoring is
requested, a provider is obtained via |
useCache |
Logical. Passed to LLM scoring helpers. |
showProgress |
Logical. Should progress and timing be shown for LLM scoring? |
verbose |
Logical. Passed to LLM scoring helpers. |
... |
Reserved for future method-specific arguments. |
This method assumes that x is a raw runs object produced by
runExample(). Judged fields and aggregate scores are created during the
scoring step and are stored only in the returned wmfmScores object.
An object of class wmfmScores.
Applies a rubric-based scoring framework to repeated WMFM explanation runs.
The scoring structure is designed to align with the revised
buildWmfmRunRecord() schema by separating:
scoreWmfmRepeatedRuns( runsDf, preferredMinWords = 80L, preferredMaxWords = 220L, penaliseDuplicates = TRUE, duplicatePenalty = 5, fatalFlawCap = 40, passThreshold = 65, factualWeight = 0.3, inferenceWeight = 0.25, completenessWeight = 0.2, clarityWeight = 0.15, calibrationWeight = 0.1 )scoreWmfmRepeatedRuns( runsDf, preferredMinWords = 80L, preferredMaxWords = 220L, penaliseDuplicates = TRUE, duplicatePenalty = 5, fatalFlawCap = 40, passThreshold = 65, factualWeight = 0.3, inferenceWeight = 0.25, completenessWeight = 0.2, clarityWeight = 0.15, calibrationWeight = 0.1 )
runsDf |
A data.frame of repeated-run outputs, or a list containing a
|
preferredMinWords |
Integer. Lower bound for a preferred explanation length band. |
preferredMaxWords |
Integer. Upper bound for a preferred explanation length band. |
penaliseDuplicates |
Logical. Should exact duplicate explanations be penalised? |
duplicatePenalty |
Numeric penalty subtracted from the final
|
fatalFlawCap |
Numeric in |
passThreshold |
Numeric in |
factualWeight |
Numeric weight for the factual dimension. |
inferenceWeight |
Numeric weight for the inference dimension. |
completenessWeight |
Numeric weight for the completeness dimension. |
clarityWeight |
Numeric weight for the clarity dimension. |
calibrationWeight |
Numeric weight for the calibration dimension. |
descriptive metadata about the run,
extracted claim variables describing what the explanation said,
judged quality variables describing whether those claims were appropriate, and
aggregate dimension scores.
The function accepts either:
a data.frame of run records, or
a list containing a runsDf element.
The function scores explanations across five dimensions:
How well the explanation states the main effect, effect scale, reference-group structure, and interaction substance.
How appropriate the explanation's inferential language is, especially for uncertainty and interaction evidence.
Whether the explanation covers the important ingredients that the model structure suggests should be present.
Whether the explanation is reasonably clear, sensibly expressed, and not excessively short or long.
Whether the explanation avoids overclaiming and severe underclaiming.
This function fills or overwrites the following judged fields when they are
missing or NA:
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Integer in 0, 1, 2.
Logical.
Numeric score on a 0 to 2 scale.
Numeric score on a 0 to 2 scale.
Numeric score on a 0 to 2 scale.
Numeric score on a 0 to 2 scale.
Numeric score on a 0 to 2 scale.
Numeric weighted score on a 0 to 100 scale.
Logical.
Duplicate detection is based on normalizedExplanation when available,
otherwise on trimmed explanation text.
Fatal flaws do not force the score to zero, but they cap the final overall
score using fatalFlawCap.
A data.frame with judged quality columns and dimension scores added.
Scores each raw run record in a wmfmRuns object using an LLM-based scorer.
The returned value is a list of score records containing only judged fields
and score summaries. The input run records are not modified.
scoreWmfmRunsWithLlm( runRecords, chat, useCache = FALSE, showProgress = TRUE, verbose = FALSE )scoreWmfmRunsWithLlm( runRecords, chat, useCache = FALSE, showProgress = TRUE, verbose = FALSE )
runRecords |
List of raw WMFM run records. |
chat |
Chat provider object used for LLM scoring. |
useCache |
Logical. Whether to allow cached LLM responses. |
showProgress |
Logical. Whether to display progress and timing information during scoring. |
verbose |
Logical. Whether to print additional diagnostic information. |
Timing metadata is attached to the returned list as an attribute named
"timing".
A list of score records. Each element corresponds to one input run
record and contains only judged fields and score summaries. A "timing"
attribute is attached to the returned list.
Sends a scoring request to the supplied chat provider and returns a pure score record containing judged fields and score summaries. The input run record is not modified.
scoreWmfmRunWithLlm(runRecord, chat, useCache = FALSE, verbose = FALSE)scoreWmfmRunWithLlm(runRecord, chat, useCache = FALSE, verbose = FALSE)
runRecord |
Named list produced by |
chat |
A chat provider object as returned by |
useCache |
Logical. Should scoring results be cached and reused for
identical run records? Defaults to |
verbose |
Logical. Should the raw scoring response be printed?
Defaults to |
This function expects a chat provider object returned by
getChatProvider() and uses chat$chat(prompt).
Named list containing only scored fields and LLM scoring metadata.
Generic for assessing stability of WMFM objects.
stability(x, ...)stability(x, ...)
x |
An object to assess. |
... |
Additional arguments passed to methods. |
Method-specific stability output.
Summarises within-method stability of scores stored in a wmfmScores object.
Stability is summarised separately for binary, ordinal, and continuous score
fields.
## S3 method for class 'wmfmScores' stability(x, ...)## S3 method for class 'wmfmScores' stability(x, ...)
x |
A |
... |
Unused. Included for S3 compatibility. |
An object of class wmfmScoreStability.
Builds audit-only summary tables from developer scoring fixtures exported by the developer scoring UI. This helper does not rescore explanations or change any scoring decisions; it only summarises the scores and metric values already present in the supplied fixtures.
summariseDeveloperScoringAudit( fixtures, metricLabels = NULL, unstableSpreadThreshold = 2, lowMarkThreshold = 4, metricSpreadThreshold = 1 )summariseDeveloperScoringAudit( fixtures, metricLabels = NULL, unstableSpreadThreshold = 2, lowMarkThreshold = 4, metricSpreadThreshold = 1 )
fixtures |
Named list of developer scoring fixture payloads. |
metricLabels |
Character vector of metric labels to audit. When |
unstableSpreadThreshold |
Numeric mark spread at or above which an example is flagged as unstable. |
lowMarkThreshold |
Numeric mark at or below which a run is flagged as a low-mark run. |
metricSpreadThreshold |
Numeric metric-value spread at or above which a metric is flagged as unstable within an example. |
A list with runSummary, exampleSummary, metricSummary, and
unstableMetrics data frames.
Builds a metric-level summary combining cross-method disagreement with deterministic stability/ease measures.
summariseMetricComparison( scores, comparison, deterministicMethod = "deterministic", orderBy = NULL )summariseMetricComparison( scores, comparison, deterministicMethod = "deterministic", orderBy = NULL )
scores |
A |
comparison |
A |
deterministicMethod |
Name of deterministic method. |
orderBy |
Optional ordering: NULL, |
An object of class metricComparisonSummary.
Produces a compact summary for a wmfmGrade object. When repeated LLM
grading has been used, the summary includes run-to-run mark variability and
per-dimension ranges.
## S3 method for class 'wmfmGrade' summary(object, method = NULL, ...)## S3 method for class 'wmfmGrade' summary(object, method = NULL, ...)
object |
A |
method |
Optional character. One of |
... |
Unused. |
An object of class summary.wmfmGrade.
Summarise a WMFM grade list object
## S3 method for class 'wmfmGradeListObj' summary(object, ...)## S3 method for class 'wmfmGradeListObj' summary(object, ...)
object |
A |
... |
Unused. |
An object of class summary.wmfmGradeListObj.
Produces a concise summary of a raw wmfmRuns object. The summary focuses on
run-level variability, text metrics, timing, duplication, and extracted claim
frequencies. Judged fields and score summaries are intentionally excluded and
belong to wmfmScores objects instead.
## S3 method for class 'wmfmRuns' summary(object, ...)## S3 method for class 'wmfmRuns' summary(object, ...)
object |
A |
... |
Reserved for future extensions. |
An object of class summary.wmfmRuns.