Title: | Continuous Norming |
---|---|
Description: | A comprehensive toolkit for generating continuous test norms in psychometrics and biometrics, and analyzing model fit. The package offers both distribution-free modeling using Taylor polynomials and parametric modeling using the beta-binomial distribution. Originally developed for achievement tests, it is applicable to a wide range of mental, physical, or other test scores dependent on continuous or discrete explanatory variables. The package provides several advantages: It minimizes deviations from representativeness in subsamples, interpolates between discrete levels of explanatory variables, and significantly reduces the required sample size compared to conventional norming per age group. cNORM enables graphical and analytical evaluation of model fit, accommodates a wide range of scales including those with negative and descending values, and even supports conventional norming. It generates norm tables including confidence intervals. It also includes methods for addressing representativeness issues through Iterative Proportional Fitting. |
Authors: | Alexandra Lenhard [aut] , Wolfgang Lenhard [cre, aut] , Sebastian Gary [aut], WPS publisher [fnd] (<https://www.wpspublish.com/>) |
Maintainer: | Wolfgang Lenhard <[email protected]> |
License: | AGPL-3 |
Version: | 3.4.0 |
Built: | 2024-11-05 06:47:51 UTC |
Source: | CRAN |
Computes Taylor polynomial regression models by evaluating a series of models with increasing predictors. It aims to find a consistent model that effectively captures the variance in the data. It draws on the regsubsets function from the leaps package and builds up to 20 models for each number of predictors, evaluates these models regarding model consistency and selects consistent model with the highest R^2. This automatic model selection should usually be accompanied with visual inspection of the percentile plots and assessment of fit statistics. Set R^2 or number of terms manually to retrieve a more parsimonious model, if desired.
bestModel( data, raw = NULL, R2 = NULL, k = NULL, t = NULL, predictors = NULL, terms = 0, weights = NULL, force.in = NULL, plot = TRUE, extensive = TRUE, subsampling = TRUE )
bestModel( data, raw = NULL, R2 = NULL, k = NULL, t = NULL, predictors = NULL, terms = 0, weights = NULL, force.in = NULL, plot = TRUE, extensive = TRUE, subsampling = TRUE )
data |
Preprocessed dataset with 'raw' scores, powers, interactions, and usually an explanatory variable (like age). |
raw |
Name of the raw score variable (default: 'raw'). |
R2 |
Adjusted R^2 stopping criterion for model building. |
k |
Power constant influencing model complexity (default: 4, max: 6). |
t |
Age power parameter. If unset, defaults to 'k'. |
predictors |
List of predictors or regression formula for model selection. Overrides 'k' and can include additional variables. |
terms |
Desired number of terms in the model. |
weights |
Optional case weights. If set to FALSE, default weights (if any) are ignored. |
force.in |
Variables forcibly included in the regression. |
plot |
If TRUE (default), displays a percentile plot of the model and information about the regression object. FALSE turns off plotting and report. |
extensive |
If TRUE (default), screen models for consistency and - if possible, exclude inconsistent ones |
subsampling |
If TRUE (default), model coefficients are calculated using 10-folds and averaged across the folds. This produces more robust estimates with a slight increase in bias. |
The functions rankBySlidingWindow
, rankByGroup
, bestModel
,
computePowers
and prepareData
are usually not called directly, but accessed
through other functions like cnorm
.
Additional functions like plotSubset(model)
and cnorm.cv
can aid in model evaluation.
The model. Further exploration can be done using plotSubset(model)
and plotPercentiles(data, model)
.
plotSubset, plotPercentiles, plotPercentileSeries, checkConsistency
Other model:
checkConsistency()
,
cnorm.cv()
,
derive()
,
modelSummary()
,
print.cnorm()
,
printSubset()
,
rangeCheck()
,
regressionFunction()
,
summary.cnorm()
# Example with sample data ## Not run: # It is not recommende to use this function. Rather use 'cnorm' instead. normData <- prepareData(elfe) model <- bestModel(normData) plotSubset(model) plotPercentiles(buildCnormObject(normData, model)) # Specifying variables explicitly preselectedModel <- bestModel(normData, predictors = c("L1", "L3", "L1A3", "A2", "A3")) print(regressionFunction(preselectedModel)) ## End(Not run)
# Example with sample data ## Not run: # It is not recommende to use this function. Rather use 'cnorm' instead. normData <- prepareData(elfe) model <- bestModel(normData) plotSubset(model) plotPercentiles(buildCnormObject(normData, model)) # Specifying variables explicitly preselectedModel <- bestModel(normData, predictors = c("L1", "L3", "L1A3", "A2", "A3")) print(regressionFunction(preselectedModel)) ## End(Not run)
This function calculates the (a) and
(b) parameters of a beta binomial
distribution, along with the mean (m), variance (var) based on the input vector 'x'
and the maximum number 'n'.
betaCoefficients(x, n = NULL)
betaCoefficients(x, n = NULL)
x |
A numeric vector of non-negative integers representing observed counts. |
n |
The maximum number or the maximum possible value of 'x'. If not specified, uses max(x) instead. |
The beta-binomial distribution is a discrete probability distribution that models the number of successes in a fixed number of trials, where the probability of success varies from trial to trial. This variability in success probability is modeled by a beta distribution. Such a calculation is particularly relevant in scenarios where there is heterogeneity in success probabilities across trials, which is common in real-world situations, as for example the number of correct solutions in a psychometric test, where the test has a fixed number of items.
A numeric vector containing the calculated parameters in the following order: alpha (a), beta (b), mean (m), standard deviation (sd), and the maximum number (n).
x <- c(1, 2, 3, 4, 5) n <- 5 betaCoefficients(x, n) # or, to set n to max(x) betaCoefficients(x)
x <- c(1, 2, 3, 4, 5) n <- 5 betaCoefficients(x, n) # or, to set n to max(x) betaCoefficients(x)
Helper function to build a cnorm object from a data object and a model object from the bestModel function for compatibility reasons.
buildCnormObject(data, model)
buildCnormObject(data, model)
data |
A data object from 'prepareData', or from 'rankByGroup' and 'computePower' |
model |
Object obtained from the bestModel function |
A cnorm object
## Not run: data <- prepareData(elfe) model <- bestModel(data, k = 4) model.cnorm <- buildCnormObject(data, model) ## End(Not run)
## Not run: data <- prepareData(elfe) model <- bestModel(data, k = 4) model.cnorm <- buildCnormObject(data, model) ## End(Not run)
Build regression function for bestModel
buildFunction(raw, k, t, age)
buildFunction(raw, k, t, age)
raw |
name of the raw score variable |
k |
the power degree for location |
t |
the power degree for age |
age |
use age |
regression function
The function is an inline for searching zeros in the inverse regression function. It collapses the regression function at a specific age and simplifies the coefficients.
calcPolyInL(raw, age, model)
calcPolyInL(raw, age, model)
raw |
The raw value (subtracted from the intercept) |
age |
The age |
model |
The cNORM regression model |
The coefficients
The function is an inline for searching zeros in the inverse regression function. It collapses the regression function at a specific age and simplifies the coefficients.
calcPolyInLBase(raw, age, coeff, k)
calcPolyInLBase(raw, age, coeff, k)
raw |
The raw value (subtracted from the intercept) |
age |
The age |
coeff |
The cNORM regression model coefficients |
k |
The cNORM regression model power parameter |
The coefficients
The function is an inline for searching zeros in the inverse regression function. It collapses the regression function at a specific age and simplifies the coefficients. Optimized version of the prior 'calcPolyInLBase'
calcPolyInLBase2(raw, age, coeff, k)
calcPolyInLBase2(raw, age, coeff, k)
raw |
The raw value (subtracted from the intercept) |
age |
The age |
coeff |
The cNORM regression model coefficients |
k |
The cNORM regression model power parameter |
The coefficients
By the courtesy of the Center of Disease Control (CDC), cNORM includes human growth data for children and adolescents age 2 to 25 that can be used to model trajectories of the body mass index and to estimate percentiles for clinical definitions of under- and overweight. The data stems from the NHANES surveys in the US and was published in 2012 as public domain. The data was cleaned by removing missing values and it includes the following variables from or based on the original dataset.
CDC
CDC
A data frame with 45053 rows and 7 variables:
continuous age in years, based on the month variable
age group; chronological age in years at the time of examination
chronological age in month at the time of examination
sex of the participant, 1 = male, 2 = female
height of the participants in cm
weight of the participants in kg
the body mass index, computed by (weight in kg)/(height in m)^2
A data frame with 45035 rows and 7 columns
https://www.cdc.gov/nchs/nhanes/index.htm
CDC (2012). National Health and Nutrition Examination Survey: Questionnaires, Datasets and Related Documentation. available https://www.cdc.gov/nchs/nhanes/index.htm (date of retrieval: 25/08/2018)
This function checks if the predicted values from a linear model are monotonically increasing or decreasing across a range of L values for multiple age points.
check_monotonicity(lm_model, pred_data, minRaw, maxRaw)
check_monotonicity(lm_model, pred_data, minRaw, maxRaw)
lm_model |
An object of class 'lm' representing the fitted linear model. |
pred_data |
Matrix with prediction values |
minRaw |
lowest raw score in prediction |
maxRaw |
highest raw score in prediction |
The function creates a prediction data frame using all combinations of the provided L values and age points. It then generates predictions using the provided linear model and checks if these predictions are monotonically increasing or decreasing for each age point across the range of L values.
A named character vector where each element corresponds to an age point. Possible values for each element are 1 for "Monotonically increasing" -1 for "Monotonically decreasing", or 0 for "Not monotonic".
While abilities increase and decline over age, within one age group, the norm scores always have to show a monotonic increase or decrease with increasing raw scores. Violations of this assumption are an indication for problems in modeling the relationship between raw and norm scores. There are several reasons, why this might occur:
Vertical extrapolation: Choosing extreme norm scores, e. g. values -3 <= x and x >= 3 In order to model these extreme values, a large sample dataset is necessary.
Horizontal extrapolation: Taylor polynomials converge in a certain radius. Using the model values outside the original dataset may lead to inconsistent results.
The data cannot be modeled with Taylor polynomials, or you need another power parameter (k) or R2 for the model.
checkConsistency( model, minAge = NULL, maxAge = NULL, minNorm = NULL, maxNorm = NULL, minRaw = NULL, maxRaw = NULL, stepAge = NULL, stepNorm = 1, warn = FALSE, silent = FALSE )
checkConsistency( model, minAge = NULL, maxAge = NULL, minNorm = NULL, maxNorm = NULL, minRaw = NULL, maxRaw = NULL, stepAge = NULL, stepNorm = 1, warn = FALSE, silent = FALSE )
model |
The model from the bestModel function or a cnorm object |
minAge |
Age to start with checking |
maxAge |
Upper end of the age check |
minNorm |
Lower end of the norm value range |
maxNorm |
Upper end of the norm value range |
minRaw |
clipping parameter for the lower bound of raw scores |
maxRaw |
clipping parameter for the upper bound of raw scores |
stepAge |
Stepping parameter for the age check. values indicate higher precision / closer checks |
stepNorm |
Stepping parameter for the norm table check within age with lower scores indicating a higher precision. The choice depends of the norm scale used. With T scores a stepping parameter of 1 is suitable |
warn |
If set to TRUE, already minor violations of the model assumptions are displayed (default = FALSE) |
silent |
turn off messages |
In general, extrapolation (point 1 and 2) can carefully be done to a certain degree outside the original sample, but it should in general be handled with caution. Please note that at extreme values, the models most likely become independent and it is thus recommended to restrict the norm score range to the relevant range of abilities, e.g. +/- 2.5 SD via the minNorm and maxNorm parameter.
Boolean, indicating model violations (TRUE) or no problems (FALSE)
Other model:
bestModel()
,
cnorm.cv()
,
derive()
,
modelSummary()
,
print.cnorm()
,
printSubset()
,
rangeCheck()
,
regressionFunction()
,
summary.cnorm()
model <- cnorm(raw = elfe$raw, group = elfe$group, plot = FALSE) modelViolations <- checkConsistency(model, minNorm = 25, maxNorm = 75) plotDerivative(model, minNorm = 25, maxNorm = 75)
model <- cnorm(raw = elfe$raw, group = elfe$group, plot = FALSE) modelViolations <- checkConsistency(model, minNorm = 25, maxNorm = 75) plotDerivative(model, minNorm = 25, maxNorm = 75)
Check, if NA or values <= 0 occur and issue warning
checkWeights(weights)
checkWeights(weights)
weights |
Raking weights |
Conducts continuous norming in one step and returns an object including ranked raw data and the continuous norming model. Please consult the function description ' of 'rankByGroup', 'rankBySlidingWindow' and 'bestModel' for specifics of the steps in the data preparation and modeling process. In addition to the raw scores, either provide
a numeric vector for the grouping information (group)
a numeric age vector and the width of the sliding window (age, width)
for the ranking of the raw scores. You can adjust the grade of smoothing of the regression model by setting the k and terms parameter. In general, increasing k to more than 4 and the number of terms lead to a higher fit, while lower values lead to more smoothing. The power parameter for the age trajectory can be specified independently by 't'. If both parameters are missing, cnorm uses k = 5 and t = 3 by default.
cnorm( raw = NULL, group = NULL, age = NULL, width = NA, weights = NULL, scale = "T", method = 4, descend = FALSE, k = NULL, t = NULL, terms = 0, R2 = NULL, plot = TRUE, extensive = TRUE, subsampling = TRUE )
cnorm( raw = NULL, group = NULL, age = NULL, width = NA, weights = NULL, scale = "T", method = 4, descend = FALSE, k = NULL, t = NULL, terms = 0, R2 = NULL, plot = TRUE, extensive = TRUE, subsampling = TRUE )
raw |
Numeric vector of raw scores |
group |
Numeric vector of grouping variable, e. g. grade. If no group or age variable is provided, conventional norming is applied |
age |
Numeric vector with chronological age, please additionally specify width of window |
width |
Size of the sliding window in case an age vector is used |
weights |
Vector or variable name in the dataset with weights for each individual case. It can be used to compensate for moderate imbalances due to insufficient norm data stratification. Weights should be numerical and positive. |
scale |
type of norm scale, either T (default), IQ, z or percentile (= no transformation); a double vector with the mean and standard deviation can as well, be provided f. e. c(10, 3) for Wechsler scale index points |
method |
Ranking method in case of bindings, please provide an index, choosing from the following methods: 1 = Blom (1958), 2 = Tukey (1949), 3 = Van der Warden (1952), 4 = Rankit (default), 5 = Levenbach (1953), 6 = Filliben (1975), 7 = Yu & Huang (2001) |
descend |
ranking order (default descent = FALSE): inverses the ranking order with higher raw scores getting lower norm scores; relevant for example when norming error scores, where lower scores mean higher performance |
k |
The power constant. Higher values result in more detailed approximations but have the danger of over-fit (max = 6). If not set, it uses t and if both parameters are NULL, k is set to 5. |
t |
The age power parameter (max = 6). If not set, it uses k and if both parameters are NULL, k is set to 3, since age trajectories are most often well captured by cubic polynomials. |
terms |
Selection criterion for model building. The best fitting model with this number of terms is used |
R2 |
Adjusted R square as a stopping criterion for the model building (default R2 = 0.99) |
plot |
Default TRUE; plots the regression model and prints report |
extensive |
If TRUE, screen models for consistency and - if possible, exclude inconsistent ones |
subsampling |
If TRUE (default), model coefficients are calculated using 10-folds and averaged across the folds. This produces more robust estimates with a slight increase in bias. |
cnorm object including the ranked raw data and the regression model
Gary, S. & Lenhard, W. (2021). In norming we trust. Diagnostica.
Gary, S., Lenhard, W. & Lenhard, A. (2021). Modelling Norm Scores with the cNORM Package in R. Psych, 3(3), 501-521. https://doi.org/10.3390/psych3030033
Lenhard, A., Lenhard, W., Suggate, S. & Segerer, R. (2016). A continuous solution to the norming problem. Assessment, Online first, 1-14. doi:10.1177/1073191116656437
Lenhard, A., Lenhard, W., Gary, S. (2018). Continuous Norming (cNORM). The Comprehensive R Network, Package cNORM, available: https://CRAN.R-project.org/package=cNORM
Lenhard, A., Lenhard, W., Gary, S. (2019). Continuous norming of psychometric tests: A simulation study of parametric and semi-parametric approaches. PLoS ONE, 14(9), e0222279. doi:10.1371/journal.pone.0222279
Lenhard, W., & Lenhard, A. (2020). Improvement of Norm Score Quality via Regression-Based Continuous Norming. Educational and Psychological Measurement(Online First), 1-33. https://doi.org/10.1177/0013164420928457
rankByGroup, rankBySlidingWindow, computePowers, bestModel
## Not run: # Using this function with the example dataset 'elfe' # Conventional norming (no modelling over age) cnorm(raw=elfe$raw) # Continuous norming # You can use the 'getGroups()' function to set up grouping variable in case, # you have a continuous age variable. cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # return norm tables including 90% confidence intervals for a # test with a reliability of r = .85; table are set to mean of quartal # in grade 3 (children completed 2 years of schooling) normTable(c(2.125, 2.375, 2.625), cnorm.elfe, CI = .90, reliability = .95) # ... or instead of raw scores for norm scores, the other way round rawTable(c(2.125, 2.375, 2.625), cnorm.elfe, CI = .90, reliability = .95) # Using a continuous age variable instead of distinct groups, using a sliding # window for percentile estimation. Please specify continuos variable for age # and the sliding window size. cnorm.ppvt.continuous <- cnorm(raw = ppvt$raw, age = ppvt$age, width=1) # In case of unbalanced datasets, deviating from the census, the norm data # can be weighted by the means of raking / post stratification. Please generate # the weights with the computeWeights() function and pass them as the weights # parameter. For computing the weights, please specify a data.frame with the # population margins (further information is available in the computeWeights # function). A demonstration based on sex and migration status in vocabulary # development (ppvt dataset): margins <- data.frame(variables = c("sex", "sex", "migration", "migration"), levels = c(1, 2, 0, 1), share = c(.52, .48, .7, .3)) weights <- computeWeights(ppvt, margins) model <- cnorm(raw = ppvt$raw, group=ppvt$group, weights = weights) ## End(Not run)
## Not run: # Using this function with the example dataset 'elfe' # Conventional norming (no modelling over age) cnorm(raw=elfe$raw) # Continuous norming # You can use the 'getGroups()' function to set up grouping variable in case, # you have a continuous age variable. cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # return norm tables including 90% confidence intervals for a # test with a reliability of r = .85; table are set to mean of quartal # in grade 3 (children completed 2 years of schooling) normTable(c(2.125, 2.375, 2.625), cnorm.elfe, CI = .90, reliability = .95) # ... or instead of raw scores for norm scores, the other way round rawTable(c(2.125, 2.375, 2.625), cnorm.elfe, CI = .90, reliability = .95) # Using a continuous age variable instead of distinct groups, using a sliding # window for percentile estimation. Please specify continuos variable for age # and the sliding window size. cnorm.ppvt.continuous <- cnorm(raw = ppvt$raw, age = ppvt$age, width=1) # In case of unbalanced datasets, deviating from the census, the norm data # can be weighted by the means of raking / post stratification. Please generate # the weights with the computeWeights() function and pass them as the weights # parameter. For computing the weights, please specify a data.frame with the # population margins (further information is available in the computeWeights # function). A demonstration based on sex and migration status in vocabulary # development (ppvt dataset): margins <- data.frame(variables = c("sex", "sex", "migration", "migration"), levels = c(1, 2, 0, 1), share = c(.52, .48, .7, .3)) weights <- computeWeights(ppvt, margins) model <- cnorm(raw = ppvt$raw, group=ppvt$group, weights = weights) ## End(Not run)
This function fits a beta-binomial regression model where both the and
parameters of the beta-binomial distribution are modeled as polynomial functions
of the predictor variable (typically age). Setting mode to 1 fits a beta-binomial
model on the basis of
and
, setting it to 2 (default) fits a beta-binomial
model directly on the basis of
and
.
cnorm.betabinomial( age, score, n = NULL, weights = NULL, mode = 2, alpha = 3, beta = 3, control = NULL, scale = "T", plot = T )
cnorm.betabinomial( age, score, n = NULL, weights = NULL, mode = 2, alpha = 3, beta = 3, control = NULL, scale = "T", plot = T )
age |
A numeric vector of predictor values (e.g., age). |
score |
A numeric vector of response values. |
n |
The maximum score (number of trials in the beta-binomial distribution). If NULL, max(score) is used. |
weights |
A numeric vector of weights for each observation. Default is NULL (equal weights). |
mode |
Integer specifying the mode of the model. Default is 2 (direct modelling of |
alpha |
Integer specifying the degree of the polynomial for the alpha model.
Default is 3. If mode is set to 1, this parameter is used to specify the degree
of the polynomial for the |
beta |
Integer specifying the degree of the polynomial for the beta model. Default is 3.
If mode is set to 1, this parameter is used to specify the degree of the polynomial
for the |
control |
A list of control parameters to be passed to the 'optim' function. If NULL, default values are used, namely control = list(reltol = 1e-8, maxit = 1000) for mode 1 and control = list(factr = 1e-8, maxit = 1000) for mode 2. |
scale |
Type of norm scale, either "T" (default), "IQ", "z" or a double vector with the mean and standard deviation. |
plot |
Logical indicating whether to plot the model. Default is TRUE. |
The function standardizes the input variables, fits polynomial models for both the alpha and beta parameters, and uses maximum likelihood estimation to find the optimal parameters. The optimization is performed using the L-BFGS-B method.
A list of class "cnormBetaBinomial" or "cnormBetaBinomial2". In case of mode 2 containing:
alpha_est |
Estimated coefficients for the alpha model |
beta_est |
Estimated coefficients for the beta model |
se |
Standard errors of the estimated coefficients |
alpha_degree |
Degree of the polynomial for the alpha model |
beta_degree |
Degree of the polynomial for the beta model |
result |
Full result from the optimization procedure |
## Not run: # Fit a beta-binomial regression model to the PPVT data model <- cnorm.betabinomial(ppvt$age, ppvt$raw, n = 228) summary(model) # Use weights for post-stratification marginals <- data.frame(var = c("sex", "sex", "migration", "migration"), level = c(1,2,0,1), prop = c(0.51, 0.49, 0.65, 0.35)) weights <- computeWeights(ppvt, marginals) model <- cnorm.betabinomial(ppvt$age, ppvt$raw, n = 228, weights = weights) ## End(Not run)
## Not run: # Fit a beta-binomial regression model to the PPVT data model <- cnorm.betabinomial(ppvt$age, ppvt$raw, n = 228) summary(model) # Use weights for post-stratification marginals <- data.frame(var = c("sex", "sex", "migration", "migration"), level = c(1,2,0,1), prop = c(0.51, 0.49, 0.65, 0.35)) weights <- computeWeights(ppvt, marginals) model <- cnorm.betabinomial(ppvt$age, ppvt$raw, n = 228, weights = weights) ## End(Not run)
Assists in determining the optimal number of terms for the regression model using repeated Monte Carlo cross-validation. It leverages an 80-20 split between training and validation data, with stratification by norm group or random sample in case of using sliding window ranking.
cnorm.cv( data, formula = NULL, repetitions = 5, norms = TRUE, min = 1, max = 12, cv = "full", pCutoff = NULL, width = NA, raw = NULL, group = NULL, age = NULL, weights = NULL )
cnorm.cv( data, formula = NULL, repetitions = 5, norms = TRUE, min = 1, max = 12, cv = "full", pCutoff = NULL, width = NA, raw = NULL, group = NULL, age = NULL, weights = NULL )
data |
Data frame of norm sample or a cnorm object. Should have ranking, powers, and interaction of L and A. |
formula |
Formula from an existing regression model; min/max functions ignored. If using a cnorm object, this is automatically fetched. |
repetitions |
Number of repetitions for cross-validation. |
norms |
If TRUE, computes norm score crossfit and R^2. Note: Computationally intensive. |
min |
Start with a minimum number of terms (default = 1). |
max |
Maximum terms in model, up to (k + 1) * (t + 1) - 1. |
cv |
"full" (default) splits data into training/validation, then ranks. Otherwise, expects a pre-ranked dataset. |
pCutoff |
Checks stratification for unbalanced data. Performs a t-test per group. Default set to 0.2 to minimize beta error. |
width |
If provided, ranking done via 'rankBySlidingWindow'. Otherwise, by group. |
raw |
Name of the raw score variable. |
group |
Name of the grouping variable. |
age |
Name of the age variable. |
weights |
Name of the weighting parameter. |
Successive models, with an increasing number of terms, are evaluated, and the RMSE for raw scores plotted. This encompasses the training, validation, and entire dataset. If 'norms' is set to TRUE (default), the function will also calculate the mean norm score reliability and crossfit measures. Note that due to the computational requirements of norm score calculations, execution can be slow, especially with numerous repetitions or terms.
When 'cv' is set to "full" (default), both test and validation datasets are ranked separately, providing comprehensive cross-validation. For a more streamlined validation process focused only on modeling, a pre-ranked dataset can be used. The output comprises RMSE for raw score models, norm score R^2, delta R^2, crossfit, and the norm score SE according to Oosterhuis, van der Ark, & Sijtsma (2016).
This function is not yet prepared for the 'extensive' search strategy, introduced in version 3.3, but instead relies on the first model per number of terms, without consistency check.
For assessing overfitting:
A CROSSFIT > 1 suggests overfitting, < 1 suggests potential underfitting, and values around 1 are optimal, given a low raw score RMSE and high norm score validation R^2.
Suggestions for ideal model selection:
Visual inspection of percentiles with 'plotPercentiles' or 'plotPercentileSeries'.
Pair visual inspection with repeated cross-validation (e.g., 10 repetitions).
Aim for low raw score RMSE and high norm score R^2, avoiding terms with significant overfit (e.g., crossfit > 1.1).
Table with results per term number: RMSE for raw scores, R^2 for norm scores, and crossfit measure.
Oosterhuis, H. E. M., van der Ark, L. A., & Sijtsma, K. (2016). Sample Size Requirements for Traditional and Regression-Based Norms. Assessment, 23(2), 191–202. https://doi.org/10.1177/1073191115580638
Other model:
bestModel()
,
checkConsistency()
,
derive()
,
modelSummary()
,
print.cnorm()
,
printSubset()
,
rangeCheck()
,
regressionFunction()
,
summary.cnorm()
## Not run: # Example: Plot cross-validation RMSE by number of terms (up to 9) with three repetitions. result <- cnorm(raw = elfe$raw, group = elfe$group) cnorm.cv(result$data, min = 2, max = 9, repetitions = 3) # Using a cnorm object examines the predefined formula. cnorm.cv(result, repetitions = 1) ## End(Not run)
## Not run: # Example: Plot cross-validation RMSE by number of terms (up to 9) with three repetitions. result <- cnorm(raw = elfe$raw, group = elfe$group) cnorm.cv(result$data, min = 2, max = 9, repetitions = 3) # Using a cnorm object examines the predefined formula. cnorm.cv(result, repetitions = 1) ## End(Not run)
Launcher for the graphical user interface of cNORM
cNORM.GUI(launch.browser = TRUE)
cNORM.GUI(launch.browser = TRUE)
launch.browser |
Default TRUE; automatically open browser for GUI |
## Not run: # Launch graphical user interface cNORM.GUI() ## End(Not run)
## Not run: # Launch graphical user interface cNORM.GUI() ## End(Not run)
This function creates a visualization comparing two norm models by displaying their percentile curves. The first model is shown with solid lines, the second with dashed lines. If age and score vectors are provided, manifest percentiles are displayed as dots. The function works with both regular cnorm models and beta-binomial models and allows comparison between different model types.
compare( model1, model2, percentiles = c(0.025, 0.1, 0.25, 0.5, 0.75, 0.9, 0.975), age = NULL, score = NULL, weights = NULL, title = NULL, subtitle = NULL )
compare( model1, model2, percentiles = c(0.025, 0.1, 0.25, 0.5, 0.75, 0.9, 0.975), age = NULL, score = NULL, weights = NULL, title = NULL, subtitle = NULL )
model1 |
First model object (distribution free or beta-binomial) |
model2 |
Second model object (distribution free or beta-binomial) |
percentiles |
Vector with percentile scores, ranging from 0 to 1 (exclusive) |
age |
Optional vector with manifest age or group values |
score |
Optional vector with manifest raw score values |
weights |
Optional vector with manifest weights |
title |
Custom title for plot (optional) |
subtitle |
Custom subtitle for plot (optional) |
A ggplot object showing the comparison of both models
Other plot:
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotDerivative()
,
plotNorm()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotRaw()
,
plotSubset()
## Not run: # Compare different types of models model1 <- cnorm(group = elfe$group, raw = elfe$raw) model2 <- cnorm.betabinomial(elfe$group, elfe$raw) compare(model1, model2, age = elfe$group, score = elfe$raw) ## End(Not run)
## Not run: # Compare different types of models model1 <- cnorm(group = elfe$group, raw = elfe$raw) model2 <- cnorm.betabinomial(elfe$group, elfe$raw) compare(model1, model2, age = elfe$group, score = elfe$raw) ## End(Not run)
The function computes powers of the norm variable e. g. T scores (location, L),
an explanatory variable, e. g. age or grade of a data frame (age, A) and the
interactions of both (L X A). The k variable indicates the degree up to which
powers and interactions are build. These predictors can be used later on in the
bestModel
function to model the norm sample. Higher values of k
allow for modeling the norm sample closer, but might lead to over-fit. In general
k = 3 or k = 4 (default) is sufficient to model human performance data. For example,
k = 2 results in the variables L1, L2, A1, A2, and their interactions L1A1, L2A1, L1A2
and L2A2 (but k = 2 is usually not sufficient for the modeling). Please note, that
you do not need to use a normal rank transformed scale like T r IQ, but you can
as well use the percentiles for the 'normValue' as well.
computePowers(data, k = 5, norm = NULL, age = NULL, t = 3, silent = FALSE)
computePowers(data, k = 5, norm = NULL, age = NULL, t = 3, silent = FALSE)
data |
data.frame with the norm data |
k |
degree |
norm |
the variable containing the norm data in the data.frame; might be T scores, IQ scores, percentiles ... |
age |
Explanatory variable like age or grade, which was as well used for the grouping. Can be either the grouping variable itself or a finer grained variable like the exact age. Other explanatory variables can be used here instead an age variable as well, as long as the variable is at least ordered metric, e. g. language or development levels ... The label 'age' is used, as this is the most common field of application. |
t |
the age power parameter (default NULL). If not set, cNORM automatically uses k. The age power parameter can be used to specify the k to produce rectangular matrices and specify the course of scores per independently from k |
silent |
set to TRUE to suppress messages |
The functions rankBySlidingWindow
, rankByGroup
, bestModel
,
computePowers
and prepareData
are usually not called directly, but accessed
through other functions like cnorm
.
data.frame with the powers and interactions of location and explanatory variable / age
bestModel
Other prepare:
prepareData()
,
rankByGroup()
,
rankBySlidingWindow()
# Dataset with grade levels as grouping data.elfe <- rankByGroup(elfe) data.elfe <- computePowers(data.elfe) # Dataset with continuous age variable and k = 5 data.ppvt <- rankByGroup(ppvt) data.ppvt <- computePowers(data.ppvt, age = "age", k = 5)
# Dataset with grade levels as grouping data.elfe <- rankByGroup(elfe) data.elfe <- computePowers(data.elfe) # Dataset with continuous age variable and k = 5 data.ppvt <- rankByGroup(ppvt) data.ppvt <- computePowers(data.ppvt, age = "age", k = 5)
Computes and standardizes weights via raking to compensate for non-stratified samples. It is based on the implementation in the survey R package. It reduces data collection #' biases in the norm data by the means of post stratification, thus reducing the effect of unbalanced data in percentile estimation and norm data modeling.
computeWeights(data, population.margins, standardized = TRUE)
computeWeights(data, population.margins, standardized = TRUE)
data |
data.frame with norm sample data. |
population.margins |
A data.frame including three columns, specifying the variable name in the original dataset used for data stratification, the factor level of the variable and the according population share. Please ensure, the original data does not include factor levels, not present in the population.margins. Additionally, summing up the shares of the different levels of a variable should result in a value near 1.0. The first column must specify the name of the stratification variable, the second the level and the third the proportion |
standardized |
If TRUE (default), the raking weights are scaled to weights/min(weights) |
This function computes standardized raking weights to overcome biases in norm samples. It generates weights, by drawing on the information of population shares (e. g. for sex, ethnic group, region ...) and subsequently reduces the influence of over-represented groups or increases underrepresented cases. The returned weights are either raw or standardized and scaled to be larger than 0.
Raking in general has a number of advantages over post stratification and it additionally allows cNORM to draw on larger datasets, since less cases have to be removed during stratification. To use this function, additionally to the data, a data frame with stratification variables has to be specified. The data frame should include a row with (a) the variable name, (b) the level of the variable and (c) the according population proportion.
a vector with the standardized weights
# cNORM features a dataset on vocabulary development (ppvt) # that includes variables like sex or migration. In order # to weight the data, we have to specify the population shares. # According to census, the population includes 52% boys # (factor level 1 in the ppvt dataset) and 70% / 30% of persons # without / with a a history of migration (= 0 / 1 in the dataset). # First we set up the popolation margins with all shares of the # different levels: margins <- data.frame(variables = c("sex", "sex", "migration", "migration"), levels = c(1, 2, 0, 1), share = c(.52, .48, .7, .3)) head(margins) # Now we use the population margins to generate weights # through raking weights <- computeWeights(ppvt, margins) # There are as many different weights as combinations of # factor levels, thus only four in this specific case unique(weights) # To include the weights in the cNORM modelling, we have # to pass them as weights. They are then used to set up # weighted quantiles and as weights in the regession. model <- cnorm(raw = ppvt$raw, group=ppvt$group, weights = weights)
# cNORM features a dataset on vocabulary development (ppvt) # that includes variables like sex or migration. In order # to weight the data, we have to specify the population shares. # According to census, the population includes 52% boys # (factor level 1 in the ppvt dataset) and 70% / 30% of persons # without / with a a history of migration (= 0 / 1 in the dataset). # First we set up the popolation margins with all shares of the # different levels: margins <- data.frame(variables = c("sex", "sex", "migration", "migration"), levels = c(1, 2, 0, 1), share = c(.52, .48, .7, .3)) head(margins) # Now we use the population margins to generate weights # through raking weights <- computeWeights(ppvt, margins) # There are as many different weights as combinations of # factor levels, thus only four in this specific case unique(weights) # To include the weights in the cNORM modelling, we have # to pass them as weights. They are then used to set up # weighted quantiles and as weights in the regession. model <- cnorm(raw = ppvt$raw, group=ppvt$group, weights = weights)
In order to check model assumptions, a table of the first order derivative of the model coefficients is created.
derivationTable(A, model, minNorm = NULL, maxNorm = NULL, step = 0.1)
derivationTable(A, model, minNorm = NULL, maxNorm = NULL, step = 0.1)
A |
the age |
model |
The regression model or a cnorm object |
minNorm |
The lower bound of the norm value range |
maxNorm |
The upper bound of the norm value range |
step |
Stepping parameter with lower values indicating higher precision |
data.frame with norm scores and the predicted scores based on the derived regression function
plotDerivative, derive
Other predict:
getNormCurve()
,
normTable()
,
predict.cnormBetaBinomial()
,
predict.cnormBetaBinomial2()
,
predictNorm()
,
predictRaw()
,
rawTable()
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # retrieve function for time point 6 d <- derivationTable(6, cnorm.elfe, step = 0.5)
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # retrieve function for time point 6 d <- derivationTable(6, cnorm.elfe, step = 0.5)
Calculates the derivative of the location / norm value from the regression model with the first derivative as the default. This is useful for finding violations of model assumptions and problematic distribution features as f. e. bottom and ceiling effects, non-progressive norm scores within an age group or in general #' intersecting percentile curves.
derive(model, order = 1)
derive(model, order = 1)
model |
The regression model or a cnorm object |
order |
The degree of the derivate, default: 1 |
The derived coefficients
Other model:
bestModel()
,
checkConsistency()
,
cnorm.cv()
,
modelSummary()
,
print.cnorm()
,
printSubset()
,
rangeCheck()
,
regressionFunction()
,
summary.cnorm()
m <- cnorm(raw = elfe$raw, group = elfe$group) derivedCoefficients <- derive(m)
m <- cnorm(raw = elfe$raw, group = elfe$group) derivedCoefficients <- derive(m)
This function provides diagnostic information for a fitted beta-binomial model from the cnorm.betabinomial function. It returns various metrics related to model convergence, fit, and complexity. In case, age and raw scores are provided, the function as well computes R2, rmse and bias for the norm scores based on the manifest and predicted norm scores.
diagnostics.betabinomial(model, age = NULL, score = NULL, weights = NULL)
diagnostics.betabinomial(model, age = NULL, score = NULL, weights = NULL)
model |
An object of class "cnormBetaBinomial", typically the result of a call to cnorm.betabinomial(). |
age |
An optional vector with age values |
score |
An optional vector with raw values |
weights |
An optional vector with weights |
The AIC and BIC are calculated as: AIC = 2k - 2ln(L) BIC = ln(n)k - 2ln(L) where k is the number of parameters, L is the maximum likelihood, and n is the number of observations.
A list containing the following diagnostic information:
converged: Logical indicating whether the optimization algorithm converged.
n_evaluations: Number of function evaluations performed during optimization.
n_gradient: Number of gradient evaluations performed during optimization.
final_value: Final value of the objective function (negative log-likelihood).
message: Any message returned by the optimization algorithm.
AIC: Akaike Information Criterion.
BIC: Bayesian Information Criterion.
max_gradient: Maximum absolute gradient at the solution (if available).
## Not run: # Fit a beta-binomial model model <- cnorm.betabinomial(ppvt$age, ppvt$raw) # Get diagnostic information diag_info <- diagnostics.betabinomial(model) # Print the diagnostic information print(diag_info) # Summary the diagnostic information summary(diag_info) # Check if the model converged if(diag_info$converged) { cat("Model converged successfully.\n") } else { cat("Warning: Model did not converge.\n") } # Compare AIC and BIC cat("AIC:", diag_info$AIC, "\n") cat("BIC:", diag_info$BIC, "\n") ## End(Not run)
## Not run: # Fit a beta-binomial model model <- cnorm.betabinomial(ppvt$age, ppvt$raw) # Get diagnostic information diag_info <- diagnostics.betabinomial(model) # Print the diagnostic information print(diag_info) # Summary the diagnostic information summary(diag_info) # Check if the model converged if(diag_info$converged) { cat("Model converged successfully.\n") } else { cat("Warning: Model did not converge.\n") } # Compare AIC and BIC cat("AIC:", diag_info$AIC, "\n") cat("BIC:", diag_info$BIC, "\n") ## End(Not run)
A dataset containing the raw data of 1400 students from grade 2 to 5 in the sentence comprehension test from ELFE 1-6 (Lenhard & Schneider, 2006). In this test, students are presented lists of sentences with one gap. The student has to fill in the correct solution by selecting from a list of 5 alternatives per sentence. The alternatives include verbs, adjectives, nouns, pronouns and conjunctives. Each item stems from the same word type. The text is speeded, with a time cutoff of 180 seconds. The variables are as follows:
elfe
elfe
A data frame with 1400 rows and 3 variables:
ID of the student
grade level, with x.5 indicating the end of the school year and x.0 indicating the middle of the school year
the raw score of the student, spanning values from 0 to 28
A data frame with 1400 rows and 3 columns
https://www.psychometrica.de/elfe2.html
Lenhard, W. & Schneider, W.(2006). Ein Leseverstaendnistest fuer Erst- bis Sechstklaesser. Goettingen/Germany: Hogrefe.
# prepare data, retrieve model and plot percentiles model <- cnorm(elfe$group, elfe$raw)
# prepare data, retrieve model and plot percentiles model <- cnorm(elfe$group, elfe$raw)
Helps to split the continuous explanatory variable into groups and assigns the group mean. The groups can be split either into groups of equal size (default) or equal number of observations.
getGroups(x, n = NULL, equidistant = FALSE)
getGroups(x, n = NULL, equidistant = FALSE)
x |
The continuous variable to be split |
n |
The number of groups; if NULL then the function determines a number of groups with usually 100 cases or 3 <= n <= 20. |
equidistant |
If set to TRUE, builds equidistant interval, otherwise (default) with equal number of observations |
vector with group means for each observation
x <- rnorm(1000, m = 50, sd = 10) m <- getGroups(x, n = 10)
x <- rnorm(1000, m = 50, sd = 10) m <- getGroups(x, n = 10)
As with this continuous norming regression approach, raw scores are modeled as a function of age and norm score (location), getNormCurve is a straightforward approach to show the raw score development over age, while keeping the norm value constant. This way, e. g. academic performance or intelligence development of a specific ability is shown.
getNormCurve( norm, model, minAge = NULL, maxAge = NULL, step = 0.1, minRaw = NULL, maxRaw = NULL )
getNormCurve( norm, model, minAge = NULL, maxAge = NULL, step = 0.1, minRaw = NULL, maxRaw = NULL )
norm |
The specific norm score, e. g. T value |
model |
The model from the regression modeling obtained with the cnorm function |
minAge |
Age to start from |
maxAge |
Age to stop at |
step |
Stepping parameter for the precision when retrieving of the values, lower values indicate higher precision (default 0.1). |
minRaw |
lower bound of the range of raw scores (default = 0) |
maxRaw |
upper bound of raw scores |
data.frame of the variables raw, age and norm
Other predict:
derivationTable()
,
normTable()
,
predict.cnormBetaBinomial()
,
predict.cnormBetaBinomial2()
,
predictNorm()
,
predictRaw()
,
rawTable()
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) getNormCurve(35, cnorm.elfe)
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) getNormCurve(35, cnorm.elfe)
Calculates the standard error (SE) or root mean square error (RMSE) of the norm scores In case of large datasets, both results should be almost identical
getNormScoreSE(model, type = 2)
getNormScoreSE(model, type = 2)
model |
a cnorm object |
type |
either '1' for the standard error senso Oosterhuis et al. (2016) or '2' for the RMSE (default) |
The standard error (SE) of the norm scores sensu Oosterhuis et al. (2016) or the RMSE
Oosterhuis, H. E. M., van der Ark, L. A., & Sijtsma, K. (2016). Sample Size Requirements for Traditional and Regression-Based Norms. Assessment, 23(2), 191–202. https://doi.org/10.1177/1073191115580638
Prints the results and regression function of a cnorm model
modelSummary(object, ...)
modelSummary(object, ...)
object |
A regression model or cnorm object |
... |
additional parameters |
A report on the regression function, weights, R2 and RMSE
Other model:
bestModel()
,
checkConsistency()
,
cnorm.cv()
,
derive()
,
print.cnorm()
,
printSubset()
,
rangeCheck()
,
regressionFunction()
,
summary.cnorm()
This function generates a norm table for a specific age based on the regression model by assigning raw scores to norm scores. Please specify the range of norm scores, you want to cover. A T value of 25 corresponds to a percentile of .6. As a consequence, specifying a range of T = 25 to T = 75 would cover 98.4 the population. Please be careful when extrapolating vertically (at the lower and upper end of the age specific distribution). Depending on the size of your standardization sample, extreme values with T < 20 or T > 80 might lead to inconsistent results. In case a confidence coefficient (CI, default .9) and the reliability is specified, confidence intervals are computed for the true score estimates, including a correction for regression to the mean (Eid & Schmidt, 2012, p. 272).
normTable( A, model, minNorm = NULL, maxNorm = NULL, minRaw = NULL, maxRaw = NULL, step = NULL, monotonuous = TRUE, CI = 0.9, reliability = NULL, pretty = T )
normTable( A, model, minNorm = NULL, maxNorm = NULL, minRaw = NULL, maxRaw = NULL, step = NULL, monotonuous = TRUE, CI = 0.9, reliability = NULL, pretty = T )
A |
the age as single value or a vector of age values |
model |
The regression model from the cnorm function |
minNorm |
The lower bound of the norm score range |
maxNorm |
The upper bound of the norm score range |
minRaw |
clipping parameter for the lower bound of raw scores |
maxRaw |
clipping parameter for the upper bound of raw scores |
step |
Stepping parameter with lower values indicating higher precision |
monotonuous |
corrects for decreasing norm scores in case of model inconsistencies (default) |
CI |
confidence coefficient, ranging from 0 to 1, default .9 |
reliability |
coefficient, ranging between 0 to 1 |
pretty |
Format table by collapsing intervals and rounding to meaningful precision |
either data.frame with norm scores, predicted raw scores and percentiles in case of simple A value or a list #' of norm tables if vector of A values was provided
Eid, M. & Schmidt, K. (2012). Testtheorie und Testkonstruktion. Hogrefe.
rawTable
Other predict:
derivationTable()
,
getNormCurve()
,
predict.cnormBetaBinomial()
,
predict.cnormBetaBinomial2()
,
predictNorm()
,
predictRaw()
,
rawTable()
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # create single norm table norms <- normTable(3.5, cnorm.elfe, minNorm = 25, maxNorm = 75, step = 0.5) # create list of norm tables norms <- normTable(c(2.5, 3.5, 4.5), cnorm.elfe, minNorm = 25, maxNorm = 75, step = 1, minRaw = 0, maxRaw = 26 ) # conventional norming, set age to arbitrary value model <- cnorm(raw=elfe$raw) normTable(0, model)
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # create single norm table norms <- normTable(3.5, cnorm.elfe, minNorm = 25, maxNorm = 75, step = 0.5) # create list of norm tables norms <- normTable(c(2.5, 3.5, 4.5), cnorm.elfe, minNorm = 25, maxNorm = 75, step = 1, minRaw = 0, maxRaw = 26 ) # conventional norming, set age to arbitrary value model <- cnorm(raw=elfe$raw) normTable(0, model)
This function generates a norm table for a specific ages based on the beta binomial regression model. In case a confidence coefficient (CI, default .9) and the reliability is specified, confidence intervals are computed for the true score estimates, including a correction for regression to the mean (Eid & Schmidt, 2012, p. 272).
normTable.betabinomial( model, ages, n = NULL, m = NULL, range = 3, CI = 0.9, reliability = NULL )
normTable.betabinomial( model, ages, n = NULL, m = NULL, range = 3, CI = 0.9, reliability = NULL )
model |
The model, which was fitted using the 'optimized.model' function. |
ages |
A numeric vector of age points at which to make predictions. |
n |
The number of items resp. the maximum score. |
m |
An optional stop criterion in table generation. Positive integer lower than n. |
range |
The range of the norm scores in standard deviations. Default is 3. Thus, scores in the range of +/- 3 standard deviations are considered. |
CI |
confidence coefficient, ranging from 0 to 1, default .9 |
reliability |
coefficient, ranging between 0 to 1 |
A list of data frames with columns: x, Px, Pcum, Percentile, z, norm score and possibly confidence interval
S3 function for plotting cnorm objects
## S3 method for class 'cnorm' plot(x, y, ...)
## S3 method for class 'cnorm' plot(x, y, ...)
x |
the cnorm object |
y |
the type of plot as a string, can be one of 'raw' (1), 'norm' (2), 'curves' (3), 'percentiles' (4), 'series' (5), 'subset' (6), or 'derivative' (7), either as a string or the according index |
... |
additional parameters for the specific plotting function |
Other plot:
compare()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotDerivative()
,
plotNorm()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotRaw()
,
plotSubset()
This function creates a visualization of a fitted cnormBetaBinomial model, including the original data points manifest percentiles and specified percentile lines.
## S3 method for class 'cnormBetaBinomial' plot(x, ...)
## S3 method for class 'cnormBetaBinomial' plot(x, ...)
x |
A fitted model object of class "cnormBetaBinomial" or "cnormBetaBinomial2". |
... |
Additional arguments passed to the plot method.
|
A ggplot object.
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotDerivative()
,
plotNorm()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotRaw()
,
plotSubset()
## Not run: # Computing beta binomial models already displays plot model.bb <- cnorm.betabinomial(elfe$group, elfe$raw) # Without data points plot(model.bb, age = elfe$group, score = elfe$raw, weights=NULL, points=FALSE) ## End(Not run)
## Not run: # Computing beta binomial models already displays plot model.bb <- cnorm.betabinomial(elfe$group, elfe$raw) # Without data points plot(model.bb, age = elfe$group, score = elfe$raw, weights=NULL, points=FALSE) ## End(Not run)
This function creates a visualization of a fitted cnormBetaBinomial model, including the original data points manifest percentiles and specified percentile lines.
## S3 method for class 'cnormBetaBinomial2' plot(x, ...)
## S3 method for class 'cnormBetaBinomial2' plot(x, ...)
x |
A fitted model object of class "cnormBetaBinomial" or "cnormBetaBinomial2". |
... |
Additional arguments passed to the plot method.
|
A ggplot object.
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plotDensity()
,
plotDerivative()
,
plotNorm()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotRaw()
,
plotSubset()
General convenience plotting function
plotCnorm(x, y, ...)
plotCnorm(x, y, ...)
x |
a cnorm object |
y |
the type of plot as a string, can be one of 'raw' (1), 'norm' (2), 'curves' (3), 'percentiles' (4), 'series' (5), 'subset' (6), or 'derivative' (7), either as a string or the according index |
... |
additional parameters for the specific plotting function |
This function plots density curves based on the regression model against the raw scores. It supports both traditional continuous norming models and beta-binomial models. The function allows for customization of the plot range and groups to be displayed.
plotDensity( model, minRaw = NULL, maxRaw = NULL, minNorm = NULL, maxNorm = NULL, group = NULL )
plotDensity( model, minRaw = NULL, maxRaw = NULL, minNorm = NULL, maxNorm = NULL, group = NULL )
model |
The model from the bestModel function, a cnorm object, or a cnormBetaBinomial or cnormBetaBinomial2 object. |
minRaw |
Lower bound of the raw score. If NULL, it's automatically determined based on the model type. |
maxRaw |
Upper bound of the raw score. If NULL, it's automatically determined based on the model type. |
minNorm |
Lower bound of the norm score. If NULL, it's automatically determined based on the model type. |
maxNorm |
Upper bound of the norm score. If NULL, it's automatically determined based on the model type. |
group |
Numeric vector specifying the age groups to plot. If NULL, groups are automatically selected. |
The function generates density curves for specified age groups, allowing for easy comparison of score distributions across different ages.
For beta-binomial models, the density is based on the probability mass function, while for traditional models, it uses a normal distribution based on the norm scores.
A ggplot object representing the density functions.
Please check for inconsistent curves, especially those showing implausible shapes such as violations of biuniqueness in the cnorm models.
plotNormCurves
, plotPercentiles
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDerivative()
,
plotNorm()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotRaw()
,
plotSubset()
## Not run: # For traditional continuous norming model result <- cnorm(raw = elfe$raw, group = elfe$group) plotDensity(result, group = c(2, 4, 6)) # For beta-binomial model bb_model <- cnorm.betabinomial(age = ppvt$age, score = ppvt$raw, n = 228) plotDensity(bb_model) ## End(Not run)
## Not run: # For traditional continuous norming model result <- cnorm(raw = elfe$raw, group = elfe$group) plotDensity(result, group = c(2, 4, 6)) # For beta-binomial model bb_model <- cnorm.betabinomial(age = ppvt$age, score = ppvt$raw, n = 228) plotDensity(bb_model) ## End(Not run)
This function plots the scores obtained via the first order derivative of the regression model in dependence of the norm score.
plotDerivative( model, minAge = NULL, maxAge = NULL, minNorm = NULL, maxNorm = NULL, stepAge = NULL, stepNorm = NULL, order = 1 )
plotDerivative( model, minAge = NULL, maxAge = NULL, minNorm = NULL, maxNorm = NULL, stepAge = NULL, stepNorm = NULL, order = 1 )
model |
The model from the bestModel function, a cnorm object. |
minAge |
Minimum age to start checking. If NULL, it's automatically determined from the model. |
maxAge |
Maximum age for checking. If NULL, it's automatically determined from the model. |
minNorm |
Lower end of the norm score range. If NULL, it's automatically determined from the model. |
maxNorm |
Upper end of the norm score range. If NULL, it's automatically determined from the model. |
stepAge |
Stepping parameter for the age check, usually 1 or 0.1; lower values indicate higher precision. |
stepNorm |
Stepping parameter for norm scores. |
order |
Degree of the derivative (default = 1). |
The results indicate the progression of the norm scores within each age group. The regression-based modeling approach relies on the assumption of a linear progression of the norm scores. Negative scores in the first order derivative indicate a violation of this assumption. Scores near zero are typical for bottom and ceiling effects in the raw data.
The regression models usually converge within the range of the original values. In case of vertical and horizontal extrapolation, with increasing distance to the original data, the risk of assumption violation increases as well.
A ggplot object representing the derivative of the regression function.
This function is currently incompatible with reversed raw score scales ('descent' option).
checkConsistency
, bestModel
, derive
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotNorm()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotRaw()
,
plotSubset()
# For traditional continuous norming model result <- cnorm(raw = elfe$raw, group = elfe$group) plotDerivative(result, minAge=2, maxAge=5, stepAge=.2, minNorm=25, maxNorm=75, stepNorm=1)
# For traditional continuous norming model result <- cnorm(raw = elfe$raw, group = elfe$group) plotDerivative(result, minAge=2, maxAge=5, stepAge=.2, minNorm=25, maxNorm=75, stepNorm=1)
This function plots the manifest norm score against the fitted norm score from the inverse regression model per group. This helps to inspect the precision of the modeling process. The scores should not deviate too far from the regression line. Applicable for Taylor polynomial models.
plotNorm( model, age = NULL, score = NULL, width = NULL, weights = NULL, group = FALSE, minNorm = NULL, maxNorm = NULL, type = 0 )
plotNorm( model, age = NULL, score = NULL, width = NULL, weights = NULL, group = FALSE, minNorm = NULL, maxNorm = NULL, type = 0 )
model |
The regression model, usually from the 'cnorm' or 'cnorm.betabinomial' function |
age |
In case of beta binomial model, please provide the age vector |
score |
In case of beta binomial model, please provide the score vector |
width |
In case of beta binomial model, please provide the width for the sliding window. If null, the function tries to determine a sensible setting. |
weights |
Vector or variable name in the dataset with weights for each individual case. If NULL, no weights are used. |
group |
On optional grouping variable, use empty string for no group, the variable name for Taylor polynomial models or a vector with the groups for beta binomial models |
minNorm |
lower bound of fitted norm scores |
maxNorm |
upper bound of fitted norm scores |
type |
Type of display: 0 = plot manifest against fitted values, 1 = plot manifest against difference values |
A ggplot object representing the norm scores plot.
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotDerivative()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotRaw()
,
plotSubset()
## Not run: # Load example data set, compute model and plot results # Taylor polynomial model model <- cnorm(raw = elfe$raw, group = elfe$group) plot(model, "norm") # Beta binomial models; maximum number of items in elfe is n = 28 model.bb <- cnorm.betabinomial(elfe$group, elfe$raw, n = 28) plotNorm(model.bb, age = elfe$group, score = elfe$raw) ## End(Not run)
## Not run: # Load example data set, compute model and plot results # Taylor polynomial model model <- cnorm(raw = elfe$raw, group = elfe$group) plot(model, "norm") # Beta binomial models; maximum number of items in elfe is n = 28 model.bb <- cnorm.betabinomial(elfe$group, elfe$raw, n = 28) plotNorm(model.bb, age = elfe$group, score = elfe$raw) ## End(Not run)
This function plots the norm curves based on the regression model. It supports both Taylor polynomial models and beta-binomial models.
plotNormCurves( model, normList = NULL, minAge = NULL, maxAge = NULL, step = 0.1, minRaw = NULL, maxRaw = NULL )
plotNormCurves( model, normList = NULL, minAge = NULL, maxAge = NULL, step = 0.1, minRaw = NULL, maxRaw = NULL )
model |
The model from the bestModel function, a cnorm object, or a cnormBetaBinomial / cnormBetaBinomial2 object. |
normList |
Vector with norm scores to display. If NULL, default values are used. |
minAge |
Age to start with checking. If NULL, it's automatically determined from the model. |
maxAge |
Upper end of the age check. If NULL, it's automatically determined from the model. |
step |
Stepping parameter for the age check, usually 1 or 0.1; lower scores indicate higher precision. |
minRaw |
Lower end of the raw score range, used for clipping implausible results. If NULL, it's automatically determined from the model. |
maxRaw |
Upper end of the raw score range, used for clipping implausible results. If NULL, it's automatically determined from the model. |
Please check the function for inconsistent curves: The different curves should not intersect. Violations of this assumption are a strong indication of violations of model assumptions in modeling the relationship between raw and norm scores.
Common reasons for inconsistencies include: 1. Vertical extrapolation: Choosing extreme norm scores (e.g., scores <= -3 or >= 3). 2. Horizontal extrapolation: Using the model scores outside the original dataset. 3. The data cannot be modeled with the current approach, or you need another power parameter (k) or R2 for the model.
A ggplot object representing the norm curves.
checkConsistency
, plotDerivative
, plotPercentiles
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotDerivative()
,
plotNorm()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotRaw()
,
plotSubset()
## Not run: # For Taylor continuous norming model m <- cnorm(raw = ppvt$raw, group = ppvt$group) plotNormCurves(m, minAge=2, maxAge=5) # For beta-binomial model bb_model <- cnorm.betabinomial(age = ppvt$age, score = ppvt$raw, n = 228) plotNormCurves(bb_model) ## End(Not run)
## Not run: # For Taylor continuous norming model m <- cnorm(raw = ppvt$raw, group = ppvt$group) plotNormCurves(m, minAge=2, maxAge=5) # For beta-binomial model bb_model <- cnorm.betabinomial(age = ppvt$age, score = ppvt$raw, n = 228) plotNormCurves(bb_model) ## End(Not run)
The function plots the norm curves based on the regression model against the actual percentiles from the raw data. As in 'plotNormCurves', please check for inconsistent curves, especially intersections. Violations of this assumption are a strong indication for problems in modeling the relationship between raw and norm scores. In general, extrapolation (point 1 and 2) can carefully be done to a certain degree outside the original sample, but it should in general be handled with caution. The original percentiles are displayed as distinct points in the according color, the model based projection of percentiles are drawn as lines. Please note, that the estimation of the percentiles of the raw data is done with the quantile function with the default settings. In case, you get 'jagged' or disorganized percentile curve, try to reduce the 'k' and/or 't' parameter in modeling.
plotPercentiles( model, minRaw = NULL, maxRaw = NULL, minAge = NULL, maxAge = NULL, raw = NULL, group = NULL, percentiles = c(0.025, 0.1, 0.25, 0.5, 0.75, 0.9, 0.975), scale = NULL, title = NULL, subtitle = NULL, points = F )
plotPercentiles( model, minRaw = NULL, maxRaw = NULL, minAge = NULL, maxAge = NULL, raw = NULL, group = NULL, percentiles = c(0.025, 0.1, 0.25, 0.5, 0.75, 0.9, 0.975), scale = NULL, title = NULL, subtitle = NULL, points = F )
model |
The Taylor polynomial regression model object from the cNORM |
minRaw |
Lower bound of the raw score (default = 0) |
maxRaw |
Upper bound of the raw score |
minAge |
Variable to restrict the lower bound of the plot to a specific age |
maxAge |
Variable to restrict the upper bound of the plot to a specific age |
raw |
The name of the raw variable |
group |
The name of the grouping variable; the distinct groups are automatically determined |
percentiles |
Vector with percentile scores, ranging from 0 to 1 (exclusive) |
scale |
The norm scale, either 'T', 'IQ', 'z', 'percentile' or self defined with a double vector with the mean and standard deviation, f. e. c(10, 3) for Wechsler scale index points; if NULL, scale information from the data preparation is used (default) |
title |
custom title for plot |
subtitle |
custom title for plot |
points |
Logical indicating whether to plot the data points. Default is TRUE. |
plotNormCurves, plotPercentileSeries
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotDerivative()
,
plotNorm()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotRaw()
,
plotSubset()
# Load example data set, compute model and plot results result <- cnorm(raw = elfe$raw, group = elfe$group) plotPercentiles(result)
# Load example data set, compute model and plot results result <- cnorm(raw = elfe$raw, group = elfe$group) plotPercentiles(result)
This functions makes use of 'plotPercentiles' to generate a series of plots with different number of predictors. It draws on the information provided by the model object to determine the bounds of the modeling (age and standard score range). It can be used as an additional model check to determine the best fitting model. Please have a look at the ' plotPercentiles' function for further information.
plotPercentileSeries( model, start = 1, end = NULL, group = NULL, percentiles = c(0.025, 0.1, 0.25, 0.5, 0.75, 0.9, 0.975), filename = NULL )
plotPercentileSeries( model, start = 1, end = NULL, group = NULL, percentiles = c(0.025, 0.1, 0.25, 0.5, 0.75, 0.9, 0.975), filename = NULL )
model |
The Taylor polynomial regression model object from the cNORM |
start |
Number of predictors to start with |
end |
Number of predictors to end with |
group |
The name of the grouping variable; the distinct groups are automatically determined |
percentiles |
Vector with percentile scores, ranging from 0 to 1 (exclusive) |
filename |
Prefix of the filename. If specified, the plots are saves as png files in the directory of the workspace, instead of displaying them |
the complete list of plots
plotPercentiles
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotDerivative()
,
plotNorm()
,
plotNormCurves()
,
plotPercentiles()
,
plotRaw()
,
plotSubset()
# Load example data set, compute model and plot results result <- cnorm(raw = elfe$raw, group = elfe$group) plotPercentileSeries(result, start=4, end=6)
# Load example data set, compute model and plot results result <- cnorm(raw = elfe$raw, group = elfe$group) plotPercentileSeries(result, start=4, end=6)
The function plots the raw data against the fitted scores from the regression model per group. This helps to inspect the precision of the modeling process. The scores should not deviate too far from regression line.
plotRaw(model, group = FALSE, raw = NULL, type = 0)
plotRaw(model, group = FALSE, raw = NULL, type = 0)
model |
The regression model from the 'cnorm' function |
group |
Should the fit be displayed by group? |
raw |
Vector of the observed raw data |
type |
Type of display: 0 = plot manifest against fitted values, 1 = plot manifest against difference values |
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotDerivative()
,
plotNorm()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotSubset()
# Compute model with example dataset and plot results result <- cnorm(raw = elfe$raw, group = elfe$group) plotRaw(result)
# Compute model with example dataset and plot results result <- cnorm(raw = elfe$raw, group = elfe$group) plotRaw(result)
This function plots various information criteria and model fit statistics against the number of predictors or adjusted R-squared, depending on the type of plot selected. It helps in model selection by visualizing different aspects of model performance. Models, which did not pass the initial consistency check are depicted with an empty circle.
plotSubset(model, type = 0)
plotSubset(model, type = 0)
model |
The regression model from the bestModel function or a cnorm object. |
type |
Integer specifying the type of plot to generate:
|
The function generates different plots to help in model selection:
- For types 1 and 2 (Mallow's Cp and BIC), look for the "elbow" in the curve where the information criterion begins to drop. This often indicates a good balance between model fit and complexity. - For type 0 (Adjusted R2), higher values indicate better fit, but be cautious of overfitting with values approaching 1. - For types 3 and 4 (RMSE and RSS), lower values indicate better fit. - For type 5 (F-test), higher values suggest significant improvement with added predictors. - For type 6 (p-values), values below the significance level (typically 0.05) suggest significant improvement with added predictors.
A ggplot object representing the selected information criterion plot.
It's important to balance statistical measures with practical considerations and
to visually inspect the model fit using functions like plotPercentiles
.
bestModel
, plotPercentiles
, printSubset
Other plot:
compare()
,
plot.cnorm()
,
plot.cnormBetaBinomial()
,
plot.cnormBetaBinomial2()
,
plotDensity()
,
plotDerivative()
,
plotNorm()
,
plotNormCurves()
,
plotPercentileSeries()
,
plotPercentiles()
,
plotRaw()
# Compute model with example data and plot information function cnorm.model <- cnorm(raw = elfe$raw, group = elfe$group) plotSubset(cnorm.model) # Plot BIC against adjusted R-squared plotSubset(cnorm.model, type = 2) # Plot RMSE against number of predictors plotSubset(cnorm.model, type = 3)
# Compute model with example data and plot information function cnorm.model <- cnorm(raw = elfe$raw, group = elfe$group) plotSubset(cnorm.model) # Plot BIC against adjusted R-squared plotSubset(cnorm.model, type = 2) # Plot RMSE against number of predictors plotSubset(cnorm.model, type = 3)
A dataset based on an unstratified sample of PPVT4 data (German adaption). The PPVT4 consists of blocks of items with 12 items each. Each item consists of 4 pictures. The test taker is given a word orally and he or she has to point out the picture matching the oral word. Bottom and ceiling blocks of items are determined according to age and performance. For instance, when a student knows less than 4 word from a block of 12 items, the testing stops. The sample is not identical with the norm sample and includes doublets of cases in order to align the sample size per age group. It is primarily intended for running the cNORM analyses with regard to modeling and stratification.
ppvt
ppvt
A data frame with 4542 rows and 6 variables:
the chronological age of the child
the sex of the test taker, 1=male, 2=female
migration status of the family, 0=no, 1=yes
factor specifying the region, the data were collected; grouped into south, north, east and west
the raw score of the student, spanning values from 0 to 228
age group of the child, determined by the getGroups()-function with 12 equidistant age groups
A data frame with 5600 rows and 9 columns
https://www.psychometrica.de/ppvt4.html
Lenhard, A., Lenhard, W., Segerer, R. & Suggate, S. (2015). Peabody Picture Vocabulary Test - Revision IV (Deutsche Adaption). Frankfurt a. M./Germany: Pearson Assessment.
## Not run: # Example with continuous age variable, ranked with sliding window model.ppvt.sliding <- cnorm(age=ppvt$age, raw=ppvt$raw, width=1) # Example with age groups; you might first want to experiment with # the granularity of the groups via the 'getGroups()' function model.ppvt.group <- cnorm(group=ppvt$group, raw=ppvt$raw) # with predefined groups model.ppvt.group <- cnorm(group=getGroups(ppvt$age, n=15, equidistant = T), raw=ppvt$raw) # groups built 'on the fly' # plot information function plot(model.ppvt.group, "subset") # check model consistency checkConsistency(model.ppvt.group) # plot percentiles plot(model.ppvt.group, "percentiles") ## End(Not run)
## Not run: # Example with continuous age variable, ranked with sliding window model.ppvt.sliding <- cnorm(age=ppvt$age, raw=ppvt$raw, width=1) # Example with age groups; you might first want to experiment with # the granularity of the groups via the 'getGroups()' function model.ppvt.group <- cnorm(group=ppvt$group, raw=ppvt$raw) # with predefined groups model.ppvt.group <- cnorm(group=getGroups(ppvt$age, n=15, equidistant = T), raw=ppvt$raw) # groups built 'on the fly' # plot information function plot(model.ppvt.group, "subset") # check model consistency checkConsistency(model.ppvt.group) # plot percentiles plot(model.ppvt.group, "percentiles") ## End(Not run)
This function calculates norm scores based on raw scores, age, and a fitted cnormBetaBinomial model.
## S3 method for class 'cnormBetaBinomial' predict(object, ...)
## S3 method for class 'cnormBetaBinomial' predict(object, ...)
object |
A fitted model object of class 'cnormBetaBinomial' or 'cnormBetaBinomial2'. |
... |
Additional arguments passed to the prediction method:
|
The function first predicts the alpha and beta parameters of the beta-binomial distribution for each age using the provided model. It then calculates the cumulative probability for each raw score given these parameters. Finally, it converts these probabilities to the norm scale specified in the model.
A numeric vector of norm scores.
Other predict:
derivationTable()
,
getNormCurve()
,
normTable()
,
predict.cnormBetaBinomial2()
,
predictNorm()
,
predictRaw()
,
rawTable()
## Not run: # Assuming you have a fitted model named 'bb_model': model <- cnorm.betabinomial(ppvt$age, ppvt$raw) raw <- c(100, 121, 97, 180) ages <- c(7, 8, 9, 10) norm_scores <- predict(model, ages, raw) ## End(Not run)
## Not run: # Assuming you have a fitted model named 'bb_model': model <- cnorm.betabinomial(ppvt$age, ppvt$raw) raw <- c(100, 121, 97, 180) ages <- c(7, 8, 9, 10) norm_scores <- predict(model, ages, raw) ## End(Not run)
This function calculates norm scores based on raw scores, age, and a fitted cnormBetaBinomial model.
## S3 method for class 'cnormBetaBinomial2' predict(object, ...)
## S3 method for class 'cnormBetaBinomial2' predict(object, ...)
object |
A fitted model object of class 'cnormBetaBinomial' or 'cnormBetaBinomial2'. |
... |
Additional arguments passed to the prediction method:
|
The function first predicts the alpha and beta parameters of the beta-binomial distribution for each age using the provided model. It then calculates the cumulative probability for each raw score given these parameters. Finally, it converts these probabilities to the norm scale specified in the model.
A numeric vector of norm scores.
Other predict:
derivationTable()
,
getNormCurve()
,
normTable()
,
predict.cnormBetaBinomial()
,
predictNorm()
,
predictRaw()
,
rawTable()
## Not run: # Assuming you have a fitted model named 'bb_model': model <- cnorm.betabinomial(ppvt$age, ppvt$raw) raw <- c(100, 121, 97, 180) ages <- c(7, 8, 9, 10) norm_scores <- predict(model, ages, raw) ## End(Not run)
## Not run: # Assuming you have a fitted model named 'bb_model': model <- cnorm.betabinomial(ppvt$age, ppvt$raw) raw <- c(100, 121, 97, 180) ages <- c(7, 8, 9, 10) norm_scores <- predict(model, ages, raw) ## End(Not run)
This functions numerically determines the norm score for raw scores depending on the level of the explanatory variable A, e. g. norm scores for raw scores at given ages.
predictNorm( raw, A, model, minNorm = NULL, maxNorm = NULL, force = FALSE, silent = FALSE )
predictNorm( raw, A, model, minNorm = NULL, maxNorm = NULL, force = FALSE, silent = FALSE )
raw |
The raw value, either single numeric or numeric vector |
A |
the explanatory variable (e. g. age), either single numeric or numeric vector |
model |
The regression model or a cnorm object |
minNorm |
The lower bound of the norm score range |
maxNorm |
The upper bound of the norm score range |
force |
Try to resolve missing norm scores in case of inconsistent models |
silent |
set to TRUE to suppress messages |
The predicted norm score for a raw score, either single value or vector
Other predict:
derivationTable()
,
getNormCurve()
,
normTable()
,
predict.cnormBetaBinomial()
,
predict.cnormBetaBinomial2()
,
predictRaw()
,
rawTable()
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # return norm value for raw value 21 for grade 2, month 9 specificNormValue <- predictNorm(raw = 21, A = 2.75, cnorm.elfe) # predicted norm scores for the elfe dataset # predictNorm(elfe$raw, elfe$group, cnorm.elfe)
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # return norm value for raw value 21 for grade 2, month 9 specificNormValue <- predictNorm(raw = 21, A = 2.75, cnorm.elfe) # predicted norm scores for the elfe dataset # predictNorm(elfe$raw, elfe$group, cnorm.elfe)
Most elementary function to predict raw score based on Location (L, T score), Age (grouping variable) and the coefficients from a regression model.
predictRaw(norm, age, coefficients, minRaw = -Inf, maxRaw = Inf)
predictRaw(norm, age, coefficients, minRaw = -Inf, maxRaw = Inf)
norm |
The norm score, e. g. a specific T score or a vector of scores |
age |
The age value or a vector of scores |
coefficients |
The a cnorm object or the coefficients from the regression model |
minRaw |
Minimum score for the results; can be used for clipping unrealistic outcomes, usually set to the lower bound of the range of values of the test (default: 0) |
maxRaw |
Maximum score for the results; can be used for clipping unrealistic outcomes usually set to the upper bound of the range of values of the test |
the predicted raw score or a data.frame of scores in case, lists of norm scores or age is used
Other predict:
derivationTable()
,
getNormCurve()
,
normTable()
,
predict.cnormBetaBinomial()
,
predict.cnormBetaBinomial2()
,
predictNorm()
,
rawTable()
# Prediction of single scores model <- cnorm(raw = elfe$raw, group = elfe$group) predictRaw(35, 3.5, model)
# Prediction of single scores model <- cnorm(raw = elfe$raw, group = elfe$group) predictRaw(35, 3.5, model)
This is a convenience method to either load the inbuilt sample dataset, or to provide a data frame with the variables "raw" (for the raw scores) and "group" The function ranks the data within groups, computes norm values, powers of the norm scores and interactions. Afterwards, you can use these preprocessed data to determine the best fitting model.
prepareData( data = NULL, group = "group", raw = "raw", age = "group", k = 4, t = NULL, width = NA, weights = NULL, scale = "T", descend = FALSE, silent = FALSE )
prepareData( data = NULL, group = "group", raw = "raw", age = "group", k = 4, t = NULL, width = NA, weights = NULL, scale = "T", descend = FALSE, silent = FALSE )
data |
data.frame with a grouping variable named 'group' and a raw score variable named 'raw'. |
group |
grouping variable in the data, e. g. age groups, grades ... Setting group = FALSE deactivates modeling in dependence of age. Use this in case you do want conventional norm tables. |
raw |
the raw scores |
age |
the continuous explanatory variable; by default set to "group" |
k |
The power parameter, default = 4 |
t |
the age power parameter (default NULL). If not set, cNORM automatically uses k. The age power parameter can be used to specify the k to produce rectangular matrices and specify the course of scores per independently from k |
width |
if a width is provided, the function switches to rankBySlidingWindow to determine the observed raw scores, otherwise, ranking is done by group (default) |
weights |
Vector or variable name in the dataset with weights for each individual case. It can be used to compensate for moderate imbalances due to insufficient norm data stratification. Weights should be numerical and positive. Please use the 'computeWeights' function for this purpose. |
scale |
type of norm scale, either T (default), IQ, z or percentile (= no transformation); a double vector with the mean and standard deviation can as well, be provided f. e. c(10, 3) for Wechsler scale index point |
descend |
ranking order (default descent = FALSE): inverses the ranking order with higher raw scores getting lower norm scores; relevant for example when norming error scores, where lower scores mean higher performance |
silent |
set to TRUE to suppress messages |
The functions rankBySlidingWindow
, rankByGroup
, bestModel
,
computePowers
and prepareData
are usually not called directly, but accessed
through other functions like cnorm
.
data frame including the norm scores, powers and interactions of the norm score and grouping variable
Other prepare:
computePowers()
,
rankByGroup()
,
rankBySlidingWindow()
# conducts ranking and computation of powers and interactions with the 'elfe' dataset data.elfe <- prepareData(elfe) # use vectors instead of data frame data.elfe <- prepareData(raw=elfe$raw, group=elfe$group) # variable names can be specified as well, here with the BMI data included in the package ## Not run: data.bmi <- prepareData(CDC, group = "group", raw = "bmi", age = "age") ## End(Not run) # modeling with only one group with the 'elfe' dataset as an example # this results in conventional norming data.elfe2 <- prepareData(data = elfe, group = FALSE) m <- bestModel(data.elfe2)
# conducts ranking and computation of powers and interactions with the 'elfe' dataset data.elfe <- prepareData(elfe) # use vectors instead of data frame data.elfe <- prepareData(raw=elfe$raw, group=elfe$group) # variable names can be specified as well, here with the BMI data included in the package ## Not run: data.bmi <- prepareData(CDC, group = "group", raw = "bmi", age = "age") ## End(Not run) # modeling with only one group with the 'elfe' dataset as an example # this results in conventional norming data.elfe2 <- prepareData(data = elfe, group = FALSE) m <- bestModel(data.elfe2)
Format raw and norm tables The function takes a raw or norm table, condenses intervals at the bottom and top and round the numbers to meaningful interval.
prettyPrint(table)
prettyPrint(table)
table |
The table to format |
formatted table
After conducting the model fitting procedure on the data set, the best fitting model has to be chosen. The print function shows the R2 and other information on the different best fitting models with increasing number of predictors.
## S3 method for class 'cnorm' print(x, ...)
## S3 method for class 'cnorm' print(x, ...)
x |
The model from the 'bestModel' function or a cnorm object |
... |
additional parameters |
A table with information criteria
Other model:
bestModel()
,
checkConsistency()
,
cnorm.cv()
,
derive()
,
modelSummary()
,
printSubset()
,
rangeCheck()
,
regressionFunction()
,
summary.cnorm()
Displays R^2 and other metrics for models with varying predictors, aiding in choosing the best-fitting model after model fitting.
printSubset(x, ...)
printSubset(x, ...)
x |
Model output from 'bestModel' or a cnorm object. |
... |
Additional parameters. |
Table with model information criteria.
Other model:
bestModel()
,
checkConsistency()
,
cnorm.cv()
,
derive()
,
modelSummary()
,
print.cnorm()
,
rangeCheck()
,
regressionFunction()
,
summary.cnorm()
# Using cnorm object from sample data result <- cnorm(raw = elfe$raw, group = elfe$group) printSubset(result)
# Using cnorm object from sample data result <- cnorm(raw = elfe$raw, group = elfe$group) printSubset(result)
Regression model only work in a specific range and extrapolation horizontally (outside the original range) or vertically (extreme norm scores) might lead to inconsistent results. The function generates a message, indicating extrapolation and the range of the original data.
rangeCheck( object, minAge = NULL, maxAge = NULL, minNorm = NULL, maxNorm = NULL, digits = 3, ... )
rangeCheck( object, minAge = NULL, maxAge = NULL, minNorm = NULL, maxNorm = NULL, digits = 3, ... )
object |
The regression model or a cnorm object |
minAge |
The lower age bound |
maxAge |
The upper age bound |
minNorm |
The lower norm value bound |
maxNorm |
The upper norm value bound |
digits |
The precision for rounding the norm and age data |
... |
additional parameters |
the report
Other model:
bestModel()
,
checkConsistency()
,
cnorm.cv()
,
derive()
,
modelSummary()
,
print.cnorm()
,
printSubset()
,
regressionFunction()
,
summary.cnorm()
m <- cnorm(raw = elfe$raw, group = elfe$group) rangeCheck(m)
m <- cnorm(raw = elfe$raw, group = elfe$group) rangeCheck(m)
This is the initial step, usually done in all kinds of test norming projects, after the scale is constructed and the norm sample is established. First, the data is grouped according to a grouping variable and afterwards, the percentile for each raw value is retrieved. The percentile can be used for the modeling procedure, but in case, the samples to not deviate too much from normality, T, IQ or z scores can be computed via a normal rank procedure based on the inverse cumulative normal distribution. In case of bindings, we use the medium rank and there are different methods for estimating the percentiles (default RankIt).
rankByGroup( data = NULL, group = "group", raw = "raw", weights = NULL, method = 4, scale = "T", descend = FALSE, descriptives = TRUE, na.rm = TRUE, silent = FALSE )
rankByGroup( data = NULL, group = "group", raw = "raw", weights = NULL, method = 4, scale = "T", descend = FALSE, descriptives = TRUE, na.rm = TRUE, silent = FALSE )
data |
data.frame with norm sample data. If no data.frame is provided, the raw score and group vectors are directly used |
group |
name of the grouping variable (default 'group') or numeric vector, e. g. grade, setting group to FALSE cancels grouping (data is treated as one group) |
raw |
name of the raw value variable (default 'raw') or numeric vector |
weights |
Vector or variable name in the dataset with weights for each individual case. It can be used to compensate for moderate imbalances due to insufficient norm data stratification. Weights should be numerical and positive. Please use the 'computeWeights' function for this purpose. |
method |
Ranking method in case of bindings, please provide an index, choosing from the following methods: 1 = Blom (1958), 2 = Tukey (1949), 3 = Van der Warden (1952), 4 = Rankit (default), 5 = Levenbach (1953), 6 = Filliben (1975), 7 = Yu & Huang (2001) |
scale |
type of norm scale, either T (default), IQ, z or percentile (= no transformation); a double vector with the mean and standard deviation can as well, be provided f. e. c(10, 3) for Wechsler scale index points |
descend |
ranking order (default descent = FALSE): inverses the ranking order with higher raw scores getting lower norm scores; relevant for example when norming error scores, where lower scores mean higher performance |
descriptives |
If set to TRUE (default), information in n, mean, median and standard deviation per group is added to each observation |
na.rm |
remove values, where the percentiles could not be estimated, most likely happens in the context of weighting |
silent |
set to TRUE to suppress messages |
the dataset with the percentiles and norm scales per group
So far the inclusion of a binary covariate is experimental and far from optimized. The according variable name has to be specified in the ranking procedure and the modeling includes this in the further process. At the moment, during ranking the data are split into the according cells group x covariate, which leads to small sample sizes. Please take care to have enough cases in each combination. Additionally, covariates can lead to unstable modeling solutions. The question, if it is really reasonable to include covariates when norming a test is a decision beyond the pure data modeling. Please use with care or alternatively split the dataset into the two groups beforehand and model them separately.
The functions rankBySlidingWindow
, rankByGroup
, bestModel
,
computePowers
and prepareData
are usually not called directly, but accessed
through other functions like cnorm
.
rankBySlidingWindow, computePowers, computeWeights, weighted.rank
Other prepare:
computePowers()
,
prepareData()
,
rankBySlidingWindow()
# Transformation with default parameters: RankIt and converting to T scores data.elfe <- rankByGroup(elfe, group = "group") # using a data frame with vector names data.elfe2 <- rankByGroup(raw=elfe$raw, group=elfe$group) # use vectors for raw score and group # Transformation into Wechsler scores with Yu & Huang (2001) ranking procedure data.elfe <- rankByGroup(raw = elfe$raw, group = elfe$group, method = 7, scale = c(10, 3)) # cNORM can as well be used for conventional norming, in case no group is given d <- rankByGroup(raw = elfe$raw) d <- computePowers(d) m <- bestModel(d) rawTable(0, m) # please use an arbitrary value for age when generating the tables
# Transformation with default parameters: RankIt and converting to T scores data.elfe <- rankByGroup(elfe, group = "group") # using a data frame with vector names data.elfe2 <- rankByGroup(raw=elfe$raw, group=elfe$group) # use vectors for raw score and group # Transformation into Wechsler scores with Yu & Huang (2001) ranking procedure data.elfe <- rankByGroup(raw = elfe$raw, group = elfe$group, method = 7, scale = c(10, 3)) # cNORM can as well be used for conventional norming, in case no group is given d <- rankByGroup(raw = elfe$raw) d <- computePowers(d) m <- bestModel(d) rawTable(0, m) # please use an arbitrary value for age when generating the tables
The function retrieves all individuals in the predefined age range (x +/- width/2) around each case and ranks that individual based on this individually drawn sample. This function can be directly used with a continuous age variable in order to avoid grouping. When collecting data on the basis of a continuous age variable, cases located far from the mean age of the group receive distorted percentiles when building discrete groups and generating percentiles with the traditional approach. The distortion increases with distance from the group mean and this effect can be avoided by the sliding window. Nonetheless, please ensure, that the optional grouping variable in fact represents the correct mean age of the respective age groups, as this variable is later on used for displaying the manifest data in the percentile plots.
rankBySlidingWindow( data = NULL, age = "age", raw = "raw", weights = NULL, width, method = 4, scale = "T", descend = FALSE, descriptives = TRUE, nGroup = 0, group = NA, na.rm = TRUE, silent = FALSE )
rankBySlidingWindow( data = NULL, age = "age", raw = "raw", weights = NULL, width, method = 4, scale = "T", descend = FALSE, descriptives = TRUE, nGroup = 0, group = NA, na.rm = TRUE, silent = FALSE )
data |
data.frame with norm sample data |
age |
the continuous age variable. Setting 'age' to FALSE inhibits computation of powers of age and the interactions |
raw |
name of the raw value variable (default 'raw') |
weights |
Vector or variable name in the dataset with weights for each individual case. It can be used to compensate for moderate imbalances due to insufficient norm data stratification. Weights should be numerical and positive. It can be resource intense when applied to the sliding window. Please use the 'computeWeights' function for this purpose. |
width |
the width of the sliding window |
method |
Ranking method in case of bindings, please provide an index, choosing from the following methods: 1 = Blom (1958), 2 = Tukey (1949), 3 = Van der Warden (1952), 4 = Rankit (default), 5 = Levenbach (1953), 6 = Filliben (1975), 7 = Yu & Huang (2001) |
scale |
type of norm scale, either T (default), IQ, z or percentile (= no transformation); a double vector with the mean and standard deviation can as well, be provided f. e. c(10, 3) for Wechsler scale index points |
descend |
ranking order (default descent = FALSE): inverses the ranking order with higher raw scores getting lower norm scores; relevant for example when norming error scores, where lower scores mean higher performance |
descriptives |
If set to TRUE (default), information in n, mean, median and standard deviation per group is added to each observation |
nGroup |
If set to a positive value, a grouping variable is created with the desired number of equi distant groups, named by the group mean age of each group. It creates the column 'group' in the data.frame and in case, there is already one with that name, overwrites it. |
group |
Optional parameter for providing the name of the grouping variable (if present; overwritten if ngroups is used) |
na.rm |
remove values, where the percentiles could not be estimated, most likely happens in the context of weighting |
silent |
set to TRUE to suppress messages |
In case of bindings, the function uses the medium rank and applies the algorithms
already described in the rankByGroup
function. At the upper and lower end of the
data sample, the sliding stops and the sample is drawn from the interval min + width and
max - width, respectively.
the dataset with the individual percentiles and norm scores
So far the inclusion of a binary covariate is experimental and far from optimized. The according variable name has to be specified in the ranking procedure and the modeling includes this in the further process. At the moment, during ranking the data are split into the according degrees of the covariate and the ranking is done separately. This may lead to small sample sizes. Please take care to have enough cases in each combination. Additionally, covariates can lead to unstable modeling solutions. The question, if it is really reasonable to include covariates when norming a test is a decision beyond the pure data modeling. Please use with care or alternatively split the dataset into the two groups beforehand and model them separately.
The functions rankBySlidingWindow
, rankByGroup
, bestModel
,
computePowers
and prepareData
are usually not called directly, but accessed
through other functions like cnorm
.
rankByGroup, computePowers, computeWeights, weighted.rank, weighted.quantile
Other prepare:
computePowers()
,
prepareData()
,
rankByGroup()
## Not run: # Transformation using a sliding window data.elfe2 <- rankBySlidingWindow(relfe, raw = "raw", age = "group", width = 0.5) # Comparing this to the traditional approach should give us exactly the same # values, since the sample dataset only has a grouping variable for age data.elfe <- rankByGroup(elfe, group = "group") mean(data.elfe$normValue - data.elfe2$normValue) ## End(Not run)
## Not run: # Transformation using a sliding window data.elfe2 <- rankBySlidingWindow(relfe, raw = "raw", age = "group", width = 0.5) # Comparing this to the traditional approach should give us exactly the same # values, since the sample dataset only has a grouping variable for age data.elfe <- rankByGroup(elfe, group = "group") mean(data.elfe$normValue - data.elfe2$normValue) ## End(Not run)
This function is comparable to 'normTable', despite it reverses the assignment: A table with raw scores and the according norm scores for a specific age based on the regression model is generated. This way, the inverse function of the regression model is solved numerically with brute force. Please specify the range of raw values, you want to cover. With higher precision and smaller stepping, this function becomes computational intensive. In case a confidence coefficient (CI, default .9) and the reliability is specified, confidence intervals are computed for the true score estimates, including a correction for regression to the mean (Eid & Schmidt, 2012, p. 272).
rawTable( A, model, minRaw = NULL, maxRaw = NULL, minNorm = NULL, maxNorm = NULL, step = 1, monotonuous = TRUE, CI = 0.9, reliability = NULL, pretty = TRUE )
rawTable( A, model, minRaw = NULL, maxRaw = NULL, minNorm = NULL, maxNorm = NULL, step = 1, monotonuous = TRUE, CI = 0.9, reliability = NULL, pretty = TRUE )
A |
the age, either single value or vector with age values |
model |
The regression model or a cnorm object |
minRaw |
The lower bound of the raw score range |
maxRaw |
The upper bound of the raw score range |
minNorm |
Clipping parameter for the lower bound of norm scores (default 25) |
maxNorm |
Clipping parameter for the upper bound of norm scores (default 25) |
step |
Stepping parameter for the raw scores (default 1) |
monotonuous |
corrects for decreasing norm scores in case of model inconsistencies (default) |
CI |
confidence coefficient, ranging from 0 to 1, default .9 |
reliability |
coefficient, ranging between 0 to 1 |
pretty |
Format table by collapsing intervals and rounding to meaningful precision |
either data.frame with raw scores and the predicted norm scores in case of simple A value or a list of norm tables if vector of A values was provided
Eid, M. & Schmidt, K. (2012). Testtheorie und Testkonstruktion. Hogrefe.
normTable
Other predict:
derivationTable()
,
getNormCurve()
,
normTable()
,
predict.cnormBetaBinomial()
,
predict.cnormBetaBinomial2()
,
predictNorm()
,
predictRaw()
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # generate a norm table for the raw value range from 0 to 28 for the time point month 7 of grade 3 table <- rawTable(3 + 7 / 12, cnorm.elfe, minRaw = 0, maxRaw = 28) # generate several raw tables table <- rawTable(c(2.5, 3.5, 4.5), cnorm.elfe, minRaw = 0, maxRaw = 28) # additionally compute confidence intervals table <- rawTable(c(2.5, 3.5, 4.5), cnorm.elfe, minRaw = 0, maxRaw = 28, CI = .9, reliability = .94) # conventional norming, set age to arbitrary value model <- cnorm(raw=elfe$raw) rawTable(0, model)
# Generate cnorm object from example data cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # generate a norm table for the raw value range from 0 to 28 for the time point month 7 of grade 3 table <- rawTable(3 + 7 / 12, cnorm.elfe, minRaw = 0, maxRaw = 28) # generate several raw tables table <- rawTable(c(2.5, 3.5, 4.5), cnorm.elfe, minRaw = 0, maxRaw = 28) # additionally compute confidence intervals table <- rawTable(c(2.5, 3.5, 4.5), cnorm.elfe, minRaw = 0, maxRaw = 28, CI = .9, reliability = .94) # conventional norming, set age to arbitrary value model <- cnorm(raw=elfe$raw) rawTable(0, model)
The method builds the regression function for the regression model, including the beta weights. It can be used to predict the raw scores based on age and location.
regressionFunction(model, raw = NULL, digits = NULL)
regressionFunction(model, raw = NULL, digits = NULL)
model |
The regression model from the bestModel function or a cnorm object |
raw |
The name of the raw value variable (default 'raw') |
digits |
Number of digits for formatting the coefficients |
The regression formula as a string
Other model:
bestModel()
,
checkConsistency()
,
cnorm.cv()
,
derive()
,
modelSummary()
,
print.cnorm()
,
printSubset()
,
rangeCheck()
,
summary.cnorm()
result <- cnorm(raw = elfe$raw, group = elfe$group) regressionFunction(result)
result <- cnorm(raw = elfe$raw, group = elfe$group) regressionFunction(result)
Simulate mean per age
simMean(age)
simMean(age)
age |
the age variable |
return predicted means
## Not run: x <- simMean(a) ## End(Not run)
## Not run: x <- simMean(a) ## End(Not run)
Simulate sd per age
simSD(age)
simSD(age)
age |
the age variable |
return predicted sd
## Not run: x <- simSD(a) ## End(Not run)
## Not run: x <- simSD(a) ## End(Not run)
For testing purposes only: The function simulates raw test scores based on a virtual Rasch based test with n results per age group, an evenly distributed age variable, items.n test items with a simulated difficulty and standard deviation. The development trajectories over age group are modeled by a curve linear function of age, with at first fast progression, which slows down over age, and a slightly increasing standard deviation in order to model a scissor effects. The item difficulties can be accessed via $theta and the raw data via $data of the returned object.
simulateRasch( data = NULL, n = 100, minAge = 1, maxAge = 7, items.n = 21, items.m = 0, items.sd = 1, Theta = "random", width = 1 )
simulateRasch( data = NULL, n = 100, minAge = 1, maxAge = 7, items.n = 21, items.m = 0, items.sd = 1, Theta = "random", width = 1 )
data |
data.frame from previous simulations for recomputation (overrides n, minAge, maxAge) |
n |
The sample size per age group |
minAge |
The minimum age (default 1) |
maxAge |
The maximum age (default 7) |
items.n |
The number of items of the test |
items.m |
The mean difficulty of the items |
items.sd |
The standard deviation of the item difficulty |
Theta |
irt scales difficulty parameters, either "random" for drawing a random sample, "even" for evenly distributed or a set of predefined values, which then overrides the item.n parameters |
width |
The width of the window size for the continuous age per group; +- 1/2 width around group center on items.m and item.sd; if set to FALSE, the distribution is not drawn randomly but normally nonetheless |
a list containing the simulated data and thetas
the data.frame with only age, group and raw
the complete simulated data with item level results
the difficulty of the items
# simulate data for a rather easy test (m = -1.0) sim <- simulateRasch(n=150, minAge=1, maxAge=7, items.n = 30, items.m = -1.0, items.sd = 1, Theta = "random", width = 1.0) # Show item difficulties mean(sim$theta) sd(sim$theta) hist(sim$theta) # Plot raw scores boxplot(raw~group, data=sim$data) # Model data data <- prepareData(sim$data, age="age") model <- bestModel(data, k = 4) printSubset(model) plotSubset(model, type=0)
# simulate data for a rather easy test (m = -1.0) sim <- simulateRasch(n=150, minAge=1, maxAge=7, items.n = 30, items.m = -1.0, items.sd = 1, Theta = "random", width = 1.0) # Show item difficulties mean(sim$theta) sd(sim$theta) hist(sim$theta) # Plot raw scores boxplot(raw~group, data=sim$data) # Model data data <- prepareData(sim$data, age="age") model <- bestModel(data, k = 4) printSubset(model) plotSubset(model, type=0)
This function standardizes a numeric vector by subtracting the mean and dividing by the standard deviation. The resulting vector will have a mean of 0 and a standard deviation of 1.
standardize(x)
standardize(x)
x |
A numeric vector to be standardized. |
A numeric vector of the same length as x, containing the standardized values.
data <- c(1, 2, 3, 4, 5) standardized_data <- standardize(data) print(standardized_data)
data <- c(1, 2, 3, 4, 5) standardized_data <- standardize(data) print(standardized_data)
Function for standardizing raking weights Raking weights get divided by the smallest weight. Thereby, all weights become larger or equal to 1 without changing the ratio of the weights to each other.
standardizeRakingWeights(weights)
standardizeRakingWeights(weights)
weights |
Raking weights computed by computeWeights() |
the standardized weights
Performs k-fold resampling to estimate averaged coefficients for linear regression. The coefficients are averaged across k different subsets of the data to provide more stable estimates. For small samples (n < 100), returns a standard linear model instead.
subsample_lm(text, data, weights, k = 10)
subsample_lm(text, data, weights, k = 10)
text |
A character string or formula specifying the model to be fitted |
data |
A data frame containing the variables in the model |
weights |
Optional numeric vector of weights. If NULL, unweighted regression is performed |
k |
Integer specifying the number of resampling folds (default = 10) |
The function splits the data into k subsets, fits a linear model on k-1 subsets, and stores the coefficients. This process is repeated k times, and the final coefficients are averaged across all iterations to provide more stable estimates.
An object of class 'lm' with averaged coefficients from k-fold resampling. For small samples, returns a standard lm object.
S3 method for printing the results and regression function of a cnorm model
## S3 method for class 'cnorm' summary(object, ...)
## S3 method for class 'cnorm' summary(object, ...)
object |
A regression model or cnorm object |
... |
additional parameters |
A report on the regression function, weights, R2 and RMSE
Other model:
bestModel()
,
checkConsistency()
,
cnorm.cv()
,
derive()
,
modelSummary()
,
print.cnorm()
,
printSubset()
,
rangeCheck()
,
regressionFunction()
This function provides a summary of a fitted beta-binomial continuous norming model, including model fit statistics, convergence information, and parameter estimates.
## S3 method for class 'cnormBetaBinomial' summary(object, ...)
## S3 method for class 'cnormBetaBinomial' summary(object, ...)
object |
An object of class "cnormBetaBinomial" or "cnormBetaBinomial2", typically
the result of a call to |
... |
Additional arguments passed to the summary method:
|
The summary includes:
Basic model information (type, number of observations, number of parameters)
Model fit statistics (log-likelihood, AIC, BIC)
R-squared, RMSE, and bias (if age and raw scores are provided) in comparison to manifest norm scores
Convergence information
Parameter estimates with standard errors, z-values, and p-values
Invisibly returns a list containing detailed diagnostic information about the model. The function primarily produces printed output summarizing the model.
cnorm.betabinomial
, diagnostics.betabinomial
## Not run: model <- cnorm.betabinomial(ppvt$age, ppvt$raw, n = 228) summary(model) # Including R-squared, RMSE, and bias in the summary: summary(model, age = ppvt$age, score = ppvt$raw) ## End(Not run)
## Not run: model <- cnorm.betabinomial(ppvt$age, ppvt$raw, n = 228) summary(model) # Including R-squared, RMSE, and bias in the summary: summary(model, age = ppvt$age, score = ppvt$raw) ## End(Not run)
This function provides a summary of a fitted beta-binomial continuous norming model, including model fit statistics, convergence information, and parameter estimates.
## S3 method for class 'cnormBetaBinomial2' summary(object, ...)
## S3 method for class 'cnormBetaBinomial2' summary(object, ...)
object |
An object of class "cnormBetaBinomial" or "cnormBetaBinomial2", typically
the result of a call to |
... |
Additional arguments passed to the summary method:
|
The summary includes:
Basic model information (type, number of observations, number of parameters)
Model fit statistics (log-likelihood, AIC, BIC)
R-squared, RMSE, and bias (if age and raw scores are provided) in comparison to manifest norm scores
Convergence information
Parameter estimates with standard errors, z-values, and p-values
Invisibly returns a list containing detailed diagnostic information about the model. The function primarily produces printed output summarizing the model.
cnorm.betabinomial
, diagnostics.betabinomial
## Not run: model <- cnorm.betabinomial(ppvt$age, ppvt$raw, n = 228) summary(model) # Including R-squared, RMSE, and bias in the summary: summary(model, age = ppvt$age, raw = ppvt$raw) ## End(Not run)
## Not run: model <- cnorm.betabinomial(ppvt$age, ppvt$raw, n = 228) summary(model) # Including R-squared, RMSE, and bias in the summary: summary(model, age = ppvt$age, raw = ppvt$raw) ## End(Not run)
Conducts distribution free continuous norming and aims to find a fitting model. Raw data are modelled as a Taylor polynomial of powers of age and location and their interactions. In addition to the raw scores, either provide a numeric vector for the grouping information (group) for the ranking of the raw scores. You can adjust the grade of smoothing of the regression model by setting the k, t and terms parameter. In general, increasing k and t leads to a higher fit, while lower values lead to more smoothing. If both parameters are missing, taylorSwift uses k = 5 and t = 3 by default.
taylorSwift( raw = NULL, group = NULL, age = NULL, width = NA, weights = NULL, scale = "T", method = 4, descend = FALSE, k = NULL, t = NULL, terms = 0, R2 = NULL, plot = TRUE, extensive = TRUE, subsampling = TRUE )
taylorSwift( raw = NULL, group = NULL, age = NULL, width = NA, weights = NULL, scale = "T", method = 4, descend = FALSE, k = NULL, t = NULL, terms = 0, R2 = NULL, plot = TRUE, extensive = TRUE, subsampling = TRUE )
raw |
Numeric vector of raw scores |
group |
Numeric vector of grouping variable, e. g. grade. If no group or age variable is provided, conventional norming is applied |
age |
Numeric vector with chronological age, please additionally specify width of window |
width |
Size of the sliding window in case an age vector is used |
weights |
Vector or variable name in the dataset with weights for each individual case. It can be used to compensate for moderate imbalances due to insufficient norm data stratification. Weights should be numerical and positive. |
scale |
type of norm scale, either T (default), IQ, z or percentile (= no transformation); a double vector with the mean and standard deviation can as well, be provided f. e. c(10, 3) for Wechsler scale index points |
method |
Ranking method in case of bindings, please provide an index, choosing from the following methods: 1 = Blom (1958), 2 = Tukey (1949), 3 = Van der Warden (1952), 4 = Rankit (default), 5 = Levenbach (1953), 6 = Filliben (1975), 7 = Yu & Huang (2001) |
descend |
ranking order (default descent = FALSE): inverses the ranking order with higher raw scores getting lower norm scores; relevant for example when norming error scores, where lower scores mean higher performance |
k |
The power constant. Higher values result in more detailed approximations but have the danger of over-fit (max = 6). If not set, it uses t and if both parameters are NULL, k is set to 5. |
t |
The age power parameter (max = 6). If not set, it uses k and if both parameters are NULL, k is set to 3, since age trajectories are most often well captured by cubic polynomials. |
terms |
Selection criterion for model building. The best fitting model with this number of terms is used |
R2 |
Adjusted R square as a stopping criterion for the model building (default R2 = 0.99) |
plot |
Default TRUE; plots the regression model and prints report |
extensive |
If TRUE, screen models for consistency and - if possible, exclude inconsistent ones |
subsampling |
If TRUE (default), model coefficients are calculated using 10-folds and averaged across the folds. This produces more robust estimates with a slight increase in bias. |
cnorm object including the ranked raw data and the regression model
Gary, S. & Lenhard, W. (2021). In norming we trust. Diagnostica.
Gary, S., Lenhard, W. & Lenhard, A. (2021). Modelling Norm Scores with the cNORM Package in R. Psych, 3(3), 501-521. https://doi.org/10.3390/psych3030033
Lenhard, A., Lenhard, W., Suggate, S. & Segerer, R. (2016). A continuous solution to the norming problem. Assessment, Online first, 1-14. doi:10.1177/1073191116656437
Lenhard, A., Lenhard, W., Gary, S. (2018). Continuous Norming (cNORM). The Comprehensive R Network, Package cNORM, available: https://CRAN.R-project.org/package=cNORM
Lenhard, A., Lenhard, W., Gary, S. (2019). Continuous norming of psychometric tests: A simulation study of parametric and semi-parametric approaches. PLoS ONE, 14(9), e0222279. doi:10.1371/journal.pone.0222279
Lenhard, W., & Lenhard, A. (2020). Improvement of Norm Score Quality via Regression-Based Continuous Norming. Educational and Psychological Measurement(Online First), 1-33. https://doi.org/10.1177/0013164420928457
rankByGroup, rankBySlidingWindow, computePowers, bestModel
## Not run: # Using this function with the example dataset 'ppvt' # You can use the 'getGroups()' function to set up grouping variable in case, # you have a continuous age variable. model <- taylorSwift(raw = ppvt$raw, group = ppvt$group) # return norm tables including 90% confidence intervals for a # test with a reliability of r = .85; table are set to mean of quartal # in grade 3 (children completed 2 years of schooling) normTable(c(5, 15), model, CI = .90, reliability = .95) # ... or instead of raw scores for norm scores, the other way round rawTable(c(8, 12), model, CI = .90, reliability = .95) ## End(Not run)
## Not run: # Using this function with the example dataset 'ppvt' # You can use the 'getGroups()' function to set up grouping variable in case, # you have a continuous age variable. model <- taylorSwift(raw = ppvt$raw, group = ppvt$group) # return norm tables including 90% confidence intervals for a # test with a reliability of r = .85; table are set to mean of quartal # in grade 3 (children completed 2 years of schooling) normTable(c(5, 15), model, CI = .90, reliability = .95) # ... or instead of raw scores for norm scores, the other way round rawTable(c(8, 12), model, CI = .90, reliability = .95) ## End(Not run)
Computes weighted quantiles (code from Andrey Akinshin (2023) "Weighted quantile estimators" arXiv:2304.07265 [stat.ME] Code made available via the CC BY-NC-SA 4.0 license) on the basis of either the weighted Harrell-Davis quantile estimator or an adaption of the type 7 quantile estimator of the generic quantile function in the base package. Please provide a vector with raw values, the probabilities for the quantiles and an additional vector with the weight of each observation. In case the weight vector is NULL, a normal quantile estimation is done. The vectors may not include NAs and the weights should be positive non-zero values. Please draw on the computeWeights() function for retrieving weights in post stratification.
weighted.quantile(x, probs, weights = NULL, type = "Harrell-Davis")
weighted.quantile(x, probs, weights = NULL, type = "Harrell-Davis")
x |
A numerical vector |
probs |
Numerical vector of quantiles |
weights |
A numerical vector with weights; should have the same length as x |
type |
Type of estimator, can either be "inflation", "Harrell-Davis" using a beta function to approximate the weighted percentiles (Harrell & Davis, 1982) or "Type7" (default; Hyndman & Fan, 1996), an adaption of the generic quantile function in R, including weighting. The inflation procedure is essentially a numerical, non-parametric solution that gives the same results as Harrel-Davis. It requires less ressources with small datasets and always finds a solution (e. g. 1000 cases with weights between 1 and 10). If it becomes too resource intense, it switches to Harrell-Davis automatically. Harrel-Davis and Type7 code is based on the work of Akinshin (2023). |
the weighted quantiles
Harrell, F.E. & Davis, C.E. (1982). A new distribution-free quantile estimator. Biometrika, 69(3), 635-640.
Hyndman, R. J. & Fan, Y. (1996). Sample quantiles in statistical packages, American Statistician 50, 361–365.
Akinshin, A. (2023). Weighted quantile estimators arXiv:2304.07265 [stat.ME]
weighted.quantile.inflation, weighted.quantile.harrell.davis, weighted.quantile.type7
Computes weighted quantiles; code from Andrey Akinshin (2023) "Weighted quantile estimators" arXiv:2304.07265 [stat.ME] Code made available via the CC BY-NC-SA 4.0 license
weighted.quantile.harrell.davis(x, probs, weights = NULL)
weighted.quantile.harrell.davis(x, probs, weights = NULL)
x |
A numerical vector |
probs |
Numerical vector of quantiles |
weights |
A numerical vector with weights; should have the same length as x. If no weights are provided (NULL), it falls back to the base quantile function, type 7 |
the quantiles
Applies weighted ranking numerically by inflating cases according to weight. This function will be resource intensive, if inflated cases get too high and in this cases, it switches to the parametric Harrell-Davis estimator.
weighted.quantile.inflation( x, probs, weights = NULL, degree = 3, cutoff = 1e+07 )
weighted.quantile.inflation( x, probs, weights = NULL, degree = 3, cutoff = 1e+07 )
x |
A numerical vector |
probs |
Numerical vector of quantiles |
weights |
A numerical vector with weights; should have the same length as x. |
degree |
power parameter for case inflation (default = 3, equaling factor 1000) If no weights are provided (NULL), it falls back to the base quantile function, type 7 |
cutoff |
stop criterion for the sum of standardized weights to switch to Harrell-Davis, default = 1000000 |
the quantiles
Computes weighted quantiles; code from Andrey Akinshin (2023) "Weighted quantile estimators" arXiv:2304.07265 [stat.ME] Code made available via the CC BY-NC-SA 4.0 license
weighted.quantile.type7(x, probs, weights = NULL)
weighted.quantile.type7(x, probs, weights = NULL)
x |
A numerical vector |
probs |
Numerical vector of quantiles |
weights |
A numerical vector with weights; should have the same length as x. If no weights are provided (NULL), it falls back to the base quantile function, type 7 |
the quantiles
Conducts weighted ranking on the basis of sums of weights per unique raw score. Please provide a vector with raw values and an additional vector with the weight of each observation. In case the weight vector is NULL, a normal ranking is done. The vectors may not include NAs and the weights should be positive non-zero values.
weighted.rank(x, weights = NULL)
weighted.rank(x, weights = NULL)
x |
A numerical vector |
weights |
A numerical vector with weights; should have the same length as x |
the weighted absolute ranks