Weighted Regression-Based Norming

Enhancing Representativeness with Weighted Regression-Based Norming in cNORM

Acquiring valid norm scores fundamentally relies on the representativeness of the norm sample. Traditionally, random sampling is employed to meet this end, but even this approach may result in a sample that diverges from the population structure. The cNORM R package provides a potent solution by incorporating sampling weights into the norming process, thereby diminishing the impact of non-representative norm samples on the norm score quality. A key component of this process is raking, or iterative proportional fitting, which allows for the post-stratification of the norm sample based on stratification variables (SVs), considering their population marginals.

Impact of Non-Representative Norm Samples

When a norm sample inadequately represents the target population, especially regarding pertinent stratification variables, it can compromise the quality of the resulting norm scores (Kruskal & Mosteller, 1979). An illustration of this issue is the neglect of parental educational background when determining norm scores for a children’s intelligence test. Such oversight can lead to skewed norm scores, thus distorting the child’s actual intelligence level (Hernandez et al., 2017). As norm scores commonly inform significant decisions, like school placement or the diagnosis of learning disabilities (Gary & Lenhard, 2021; Lenhard, Lenhard & Gary, 2019; Lenhard & Lenhard, 2021), any bias can potentially disadvantage those being evaluated. Therefore, techniques such as sample weighting methods become crucial to mitigate non-representativeness in norm samples.

Raking: Enhancing Sample Representativeness through Iterative Proportional Fitting

Raking, also called iterative proportional fitting, is a post-stratification approach targeted to enhance sample representativeness with respect to two or more stratification variables . For this purpose, sample weights are computed for every case in the norm sample based on the ratio between the proportion of the corresponding strata in the target population and the proportion in the actual norm sample (Lumley, 2011). The procedure can be described as an iterative post-stratification with respect to one variable in each step. For example, let’s assume a target population containing 49% female as well as 51% male persons, while the resulting norm sample contains 45% female and 55% male subjects. To enhance the representativeness of the norm sample with respect to the SV sex (female/male), every single female case would be weighted with $w_{female}=\frac{49\%}{45\%}=1.09$ and every single male case with $w_{male}=\frac{55\%}{51\%}=0.93$. For stratifying a norm sample with respect to two or more variables, for example sex(female/male) and education(low/medium/high), the before described adaptation is applied several times regarding the marginals of one variable by time iteratively. For example, if the weights are adapted with respect to the variable sex first, the weights would be adapted regarding education in the second step. Since the weights no longer represent the population with respect to variable sex after the second step, the weights are computed to SV sex in the third step respectively to education in the fourth step and so on until the corresponding raking weights are converged. Finally, the resulting raking weights respectively the weighted norm sample represents the target population with respect to the marginal proportions of the used SVs. Each case is assigned with an according weight in a way that the proportions of the strata in the norm sample aligns with the composition of the representative population.

Integration of raking weights in regression-based norming in cNORM

The integration of raking weights in cNORM is accomplished in three steps.

  • Computation and standardization of raking weights
  • Initial ranking of test raw scores using standardized raking weights with weighted percentile estimation
  • Regression-based norming with standardized regression weights

Step 1: Computation and standardization of raking weights

Raking weights are computed regarding the proportions of the SVs in the target population and the actual norm sample. Afterwards, the resulting raking weights are standardized by dividing every weight by the smallest resulting raking weight, i.e., the smallest weight is set to 1.0, while the ratio between one weight and each other remains the same. Consequently, underrepresented cases in the sample are weighted with a factor larger 1.0. To compute the weights, please provide a data frame with three columns to specify the population marginals. The first column specifies the stratification variable, the second the factor level of the stratification variable and the third the proportion for the representative population. The function ‘computeWeights()’ is used to retrieve the weights. The original data and the marginals have to be passed as function parameters.

Step 2: Weighted percentile estimation

Secondly, the norm sample is ranked with respect to the raking weights using weighted percentile. This step is the actual start of the further regression-based norming approach and it is automatically applied in the ‘cnorm()’ function, as soon as weights are specified.

Step 3: Regression-based norming with standardized regression weights

Finally, the standardized raking weights are used in the weighted best-subset regression to obtain an adequate norm model. While the former steps can be seen as kind of data preparation, the computation of the regression-based norm model represents the actual norming process, since the resulting regression model is used for the actual mapping between achieved raw score and assigned norm score. By using the standardized raking weights in weighted regression, an overfit of the regression model with respect to overrepresented data points should be reduced. This third step is as well applied automatically when using the ‘cnorm()’ function.

Example

In the following, the usage of raking weights in regression-based norming with cNORM is illustrated in detail based the on a not representative norm sample for the German version of the Peabody Picture Vocabulary Test (PPVT-IV)

library(cNORM)
# Assign data to object norm.data
norm.data <- ppvt
head(norm.data)
#>      age sex migration region raw    group
#> 1 2.5971   1         0   west 120 3.160655
#> 2 2.5993   1         0   west  67 3.160655
#> 3 2.6241   1         0   west  23 3.160655
#> 4 2.8622   1         0  south  50 3.160655
#> 5 2.8764   1         0  south  44 3.160655
#> 6 2.9308   1         0   west  55 3.160655

For the post-stratification, we need population marginals for the relevant stratification variables as a data frame, with each level of each stratification variable in a row. The data frame must contain the names of the SVs (column 1), the single levels (column 2) and the corresponding proportion in the target population (column 3).

# Generate population marginals
marginals <- data.frame(var = c("sex", "sex", "migration", "migration"),
                             level = c(1,2,0,1),
                             prop = c(0.51, 0.49, 0.65, 0.35))
head(marginals)
#>         var level prop
#> 1       sex     1 0.51
#> 2       sex     2 0.49
#> 3 migration     0 0.65
#> 4 migration     1 0.35

To calculate raking weights, the cNORM’s ‘computeWeights()’ function is used, with the norm sample data and the population marginals as function parameters.

weights <- computeWeights(data = norm.data, population.margins = marginals)
#> Raking converged normally after 3 iterations.

Using the ‘cnorm()’ function passing the raking weights by function parameter ‘weights’, the initial weighted ranking and the actual norming process is started.

norm.model <- cnorm(raw = norm.data$raw, group = norm.data$group,
                    weights = weights)

The resulting model contains four predictors with a RMSE of 3.54212.

summary(norm.model)
#> cNORM Model Summary
#> -------------------
#> Number of terms: 22 
#> Adjusted R-squared: 0.9907 
#> RMSE: 3.4848 
#> Selection strategy: 1,  largest consistent model
#> Highest consistent model: 22 
#> Raw score variable: raw 
#> Raw score range: 7 to 221 
#> Age range: 3.160655 to 16.40039 
#> 
#> Regression function:
#> raw ~ 591.569913549695 + (-82.1272182737657*L1) + (3.54376001215521*L2) + (-0.0702913073207041*L3) + (0.000658677990759558*L4) + (-2.31677604416621e-06*L5) + (-141.497142358744*A1) + (5.75677695006734*A2) + (20.0332305860305*L1A1) + (-1.07536561157269*L1A2) + (0.0146318083408461*L1A3) + (-0.835444767723218*L2A1) + (0.0492353298189443*L2A2) + (-0.000819925025778186*L2A3) + (0.0163201977514885*L3A1) + (-0.000985151392562178*L3A2) + (1.69657651184986e-05*L3A3) + (-0.000148048801673851*L4A1) + (8.62285933174051e-06*L4A2) + (-1.36763213219113e-07*L4A3) + (4.90478024097055e-07*L5A1) + (-2.55768016488594e-08*L5A2) + (2.99448988970298e-10*L5A3) 
#> Final solution: 22 terms (highest consistent model)
#> R-Square Adj. = 0.990684
#> Final regression model: raw ~ L1 + L2 + L3 + L4 + L5 + A1 + A2 + L1A1 + L1A2 + L1A3 + L2A1 + L2A2 + L2A3 + L3A1 + L3A2 + L3A3 + L4A1 + L4A2 + L4A3 + L5A1 + L5A2 + L5A3
#> Regression function: raw ~ 591.5699135 + (-82.12721827*L1) + (3.543760012*L2) + (-0.07029130732*L3) + (0.0006586779908*L4) + (-2.316776044e-06*L5) + (-141.4971424*A1) + (5.75677695*A2) + (20.03323059*L1A1) + (-1.075365612*L1A2) + (0.01463180834*L1A3) + (-0.8354447677*L2A1) + (0.04923532982*L2A2) + (-0.0008199250258*L2A3) + (0.01632019775*L3A1) + (-0.0009851513926*L3A2) + (1.696576512e-05*L3A3) + (-0.0001480488017*L4A1) + (8.622859332e-06*L4A2) + (-1.367632132e-07*L4A3) + (4.904780241e-07*L5A1) + (-2.557680165e-08*L5A2) + (2.99448989e-10*L5A3)
#> Raw Score RMSE = 3.48482
#> Post stratification was applied. The weights range from 1 to 1.415 (m = 1.116, sd = 0.182).

Moreover, the percentile plot reveals no hints on model violation, like intersecting percentile curves. It reaches a high multiple R2 with only few terms.

plot(norm.model, "subset")
plot(norm.model, "norm")

Caveats and recommendation for use

We extensively simulated biased distributions and assessed, if our approach can mitigate the effects of unrepresentative samples. cNORM itself already corrects for several types of sampling eror, namely if deviations occur in specific age groups or if joint probabilities of stratification variables are unbalanced (while preserving the marginals). Weighted Continuous Norming as well works very well in most, but not all use cases. Please note the following:

  • Non-representativeness in most cases leads to (moderately) increased error of the normed scores. It is - of course - always better to ensure the highest feasible degree of representativeness in the data collection.
  • The data collection should be as random as possible.
  • In most but not in all cases, Weighted Continuous Norming reduces negative effects of non-representative norm samples. If the mean of the standardized weights exceeds a value of mweights = 2, this is an indication to rather not use weighting.
  • With cNORM, representativeness need not necessarily be established in every single age group. If the marginals are more or less correct, weighting is unnecessary.
  • Only use stratification for variables with considerable influence on the dependent variable.
  • If available, the probabilities of cross-classifications of the stratification variables can be used. You can recode several variables into one and directly specify the according population marginals (especially in combination with the next point).
  • Avoid too many stratification variables with many fine-grained levels. This leads to high weights in specific subgroups. Rather combine different levels of stratification variables, if the according subgroups do not differ in the outcome variable.

References

  • Gary, S., & Lenhard, W., (2021). In norming we trust - Verfahren zur statistischen Modellierung kontinuierlicher Testnormen auf dem Prüfstand. Diagnostica, 67(2), 75 - 86.
  • Hernandez, A., Aguilar, C., Paradell, È., Munoz, M., Vannier, L.C., & Vallar, F. (2017). The effect of demographic variables on the assesment of cognitive ability. Psicothema. 29(4), 469 - 474.
  • Kruskal, W., & Mosteller, F. (1979). Representative sampling, III: The current statistical literature. International Statistical Review/Revue Internationale de Statistique, 245 - 265
  • Lenhard, A., Lenhard, W., Segerer, R., & Suggate, S. (2015). Peabody Picture Vocabulary Test (PPVT-4). Frankfurt: Pearson Clinical Assesment.
  • Lenhard, A., Lenhard, W., & Gary, S. (2019). Continuous norming of psychometric tests: A simulation study of parametric and semi-parametric approaches. PloS one, 14(9), e0222279.
  • Lenhard, W., & Lenhard, A. (2021). Improvement of norm score quality via regression-based continuous norming. Educational and Psychological Measurement, 81(2), 229 - 261.
  • Lumley, T. (2011). Complex surveys: A guide to analysis using R (Vol. 565); John Wiley & Sons.
  • Mercer, A., Lau, A., & Kennedy, C. (2018). For weighting online opt-in samples, what matters most. Pew Research Center.