Package: irrICC 1.0

Kilem L. Gwet

irrICC: Intraclass Correlations for Quantifying Inter-Rater Reliability

Calculates various intraclass correlation coefficients used to quantify inter-rater and intra-rater reliability. The assumption here is that the raters produced quantitative ratings. Most of the statistical procedures implemented in this package are described in details in Gwet, K.L. (2014, ISBN:978-0970806284): "Handbook of Inter-Rater Reliability," 4th edition, Advanced Analytics, LLC.

Authors:Kilem L. Gwet, Ph.D.

irrICC_1.0.tar.gz
irrICC_1.0.tar.gz(r-4.5-noble)irrICC_1.0.tar.gz(r-4.4-noble)
irrICC_1.0.tgz(r-4.4-emscripten)irrICC_1.0.tgz(r-4.3-emscripten)
irrICC.pdf |irrICC.html
irrICC/json (API)

# Install 'irrICC' in R:
install.packages('irrICC', repos = 'https://cloud.r-project.org')
Datasets:
  • iccdata1 - Scores assigned by 4 judges to 5 targets/subjects.
  • iccdata2 - Scores assigned by 4 judges to 5 targets/subjects distributed in 2 groups A and B.
  • iccdata3 - Scores assigned by 3 raters to 4 subjects.

On CRAN:

Conda:

This package does not link to any Github/Gitlab/R-forge repository. No issue tracker or development information is available.

3.00 score 196 downloads 1 mentions 27 exports 0 dependencies

Last updated 6 years agofrom:2da1f07f4c. Checks:1 OK, 2 WARNING. Indexed: yes.

TargetResultLatest binary
Doc / VignettesOKMar 12 2025
R-4.5-linuxWARNINGMar 12 2025
R-4.4-linuxWARNINGMar 12 2025

Exports:ci.ICC1aci.ICC1bci.ICC2a.interci.ICC2a.nointerci.ICC2r.interci.ICC2r.nointerci.ICC3a.interci.ICC3r.interci.ICC3r.nointericc1a.fnicc1b.fnicc2.inter.fnicc2.nointer.fnicc3.inter.fnicc3.nointer.fnpval.ICC1apval.ICC1bpval.ICC2r.interpvals.ICC1apvals.ICC1bpvals.ICC2a.interpvals.ICC2a.nointerpvals.ICC2r.interpvals.ICC2r.nointerpvals.ICC3a.interpvals.ICC3r.interpvals.ICC3r.nointer

Dependencies:

Calculating Confidence Intervals and P-values for Various ICCs

Rendered fromPrecisionMeasures.Rmdusingknitr::rmarkdownon Mar 12 2025.

Last update: 2019-09-23
Started: 2019-09-23

Intraclass Correlation Coefficients (ICC) with the irrICC Package

Rendered fromUserGuide.Rmdusingknitr::rmarkdownon Mar 12 2025.

Last update: 2019-09-23
Started: 2019-09-23

Citation

To cite package ‘irrICC’ in publications use:

Gwet KL, Ph.D. (2019). irrICC: Intraclass Correlations for Quantifying Inter-Rater Reliability. R package version 1.0, https://CRAN.R-project.org/package=irrICC.

ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see ‘help("citation")’.

Corresponding BibTeX entry:

  @Manual{,
    title = {irrICC: Intraclass Correlations for Quantifying
      Inter-Rater Reliability},
    author = {Kilem L. Gwet and {Ph.D.}},
    year = {2019},
    note = {R package version 1.0},
    url = {https://CRAN.R-project.org/package=irrICC},
  }

Readme and manuals

library(irrICC)

Installation

devtools::install_github(“kgwet/irrICC”)

Abstract

irrICC is an R package that provides several functions for calculating various Intraclass Correlation Coefficients (ICC). This package follows closely the general framework of inter-rater and intra-rater reliability presented by Gwet (2014).

All input datasets to be used with this package must contain a mandatory “Target” column of all subjects that were rated, and 2 or more columns “Rater1”, “Rater2”, …. showing the ratings assigned to the subjects. The Target variable mus represent the first column of the data frame, and every other column is assumed to contained ratings from a rater. Note that all ratings must be numeric values for the ICC to be calculated. For example, here is a dataset “iccdata1” that is included in this package:

  iccdata1
#>    Target   J1  J2  J3 J4
#> 1       1  6.0 1.0 3.0  2
#> 2       1  6.5  NA 3.0  4
#> 3       1  4.0 3.0 5.5  4
#> 4       5 10.0 5.0 6.0  9
#> 5       5  9.5 4.0  NA  8
#> 6       4  6.0 2.0 4.0 NA
#> 7       4   NA 1.0 3.0  6
#> 8       4  8.0 2.5  NA  5
#> 9       2  9.0 2.0 5.0  8
#> 10      2  7.0  NA 2.0  6
#> 11      2  8.0  NA 2.0  7
#> 12      3 10.0 5.0 6.0 NA

The first column “Taget” (the name Target can be replaced with any other name you like) contains subject identifiers, while J1, J2, J3, J4 are the 4 raters (referred to here as Judges) and the ratings they assigned to the subjects. You will notice that the Target column contains duplicates, indicating that some subjects were rated multiple times. Moreover, none of these judges rated all subjects as seen by the presencce of missing ratings identified with the symbol NA.

Two other datasets, iccdata2, and iccdata3 come with the package for you to experiment with. Even if your data frame contains several variuables, note that only the Target and the Rater columns must be supplied as parameters to the functions. For example the iccdata2 data frame contains a variable named Group, which indicates the specific group each Target is categorized. It must be excluded from the input dataset as follows: iccdata2[,2:6].

 

Computing various ICC values

To determine what function you need, you must first have a statistical description of experimental data. There are essentially 3 statistical models recommended in the literature for describing quantitative inter-rater reliability data. These are commonly refer to as model 1, model 2 and model 3.

  • Model 1
    Model 1 is uses a single factor (hence the number 1) to explain the variation in the ratings. When the factor used is the subject then the model is referred to as Model 1A and when it is the rater the model is named Model 1B. You will want to use Model 1A if not all subjects are rated by the same roster of raters. That raters may change from subject to subject. Model 1B is more indicated if different raters may rate different rosters of subjects. Note that while Model 1A only allows for the calculation of inter-rater reliability, Model 1B on the other hand only allows for the calculation of intra-rater reliability.

Calculating the ICC under Model 1A is done as follows:

  icc1a.fn(iccdata1)
#>      sig2s    sig2e     icc1a n r max.rep min.rep Mtot ov.mean
#> 1 1.761312 5.225529 0.2520899 5 4       3       1   40     5.2

It follows that the inter-rater reliability is given by 0.252, the first 2 output statistics being the subject variance component 1.761 and error variance component 5.226 respectively. You may see a description of the other statistics from the function’s documentation.

The ICC under Model 1B is calculated as follows:

  icc1b.fn(iccdata1)
#>     sig2r    sig2e     icc1b n r max.rep min.rep Mtot ov.mean
#> 1 4.32087 3.365846 0.5621217 5 4       3       1   40     5.2

It follows that the intra-rater reliability is given by 0.562, the first 2 output statistics being the rater variance component 4.321 and error variance component 3.366 respectively. A description of the other statistics can be found in the function’s documentation.

  • Model 2
    Model 2 includes a subject and a rater factors, both of which are considered random. That is, Model 2 is a pure random factorial ANOVA model. You may have Model 2 with a subject-rater interaction and Model 2 without subject-rater interaction. Model 2 with subject-rater interaction is made up of 3 factors: the rater, subject and interaction factors, and is implemented in the function icc2.inter.fn.
    For information, the mathematical formulation of the full Model 2 is yijk = μ + si + rj + (s**r)i**j + eijk, where yijk is the rating associated with subject i, rater j and replicate (or measurement) k. Moreover, μ is the average rating, si subject i’s effect, rj rater j’s effect, (s**r)i**j subject-rater interaction effect associated with subject i and rater j, and eijk is the error effect. The other statistical models are similar to this one. Some may be based on fewer factors or the assumptions applicable to these factors may vary from model to model. Please read Gwet (2014) for a technical discussion of these models.

Calculating the ICC from the iccdata1 dataset (included in this package) and under the assumption of Model 2 with interaction is done as follows:

  icc2.inter.fn(iccdata1)
#>      sig2s    sig2r    sig2e    sig2sr    icc2r     icc2a n r max.rep
#> 1 2.018593 4.281361 1.315476 0.4067361 0.251627 0.8360198 5 4       3
#>   min.rep Mtot ov.mean
#> 1       1   40     5.2

This function produces 2 intraclass correlation coefficients icc2r and icc2a. While iccr represents the inter-rater reliability estimated to be 0.252 , icc2a represents the intra-rater reliability estimated at 0.836. The first 3 output statistics are respectively the the subject, rater, and interaction variance components.

The ICC calculation with the iccdata1 dataset and under the assumption of Model 2 without interaction is done as follows:

  icc2.nointer.fn(iccdata1)
#>      sig2s   sig2r    sig2e     icc2r    icc2a n r max.rep min.rep Mtot
#> 1 2.090769 4.34898 1.598313 0.2601086 0.801157 5 4       3       1   40
#>   ov.mean
#> 1     5.2

The 2 intraclass correlation coefficients have now become icc2r=0.26 and icc2a=0.801. That is the estimated inter-rater reliability slightly went up while the intra-rater reliability coefficient slightly went down.

  • Model 3
    To calcule the ICC using the iccdata1 dataset and under the assumption of Model 3 with interaction, you should proceed as follows:
  icc3.inter.fn(iccdata1)
#>      sig2s    sig2e    sig2sr     icc2r     icc2a n r max.rep min.rep Mtot
#> 1 2.257426 1.315476 0.2238717 0.5749097 0.6535279 5 4       3       1   40
#>   ov.mean
#> 1     5.2

Here, the 2 intraclass correlation coefficients are given by icc2r = 0.575 and icc2a = 0.654. The estimated inter-rater reliability went up substantially while the intra-rater reliability coefficient went down substantially compared to Model 2 with interaction.
Assuming Model 3 without interaction, the same coefficients are calculated as follows:

  icc3.nointer.fn(iccdata1)
#>      sig2s    sig2e     icc2r     icc2a n r max.rep min.rep Mtot ov.mean
#> 1 2.241792 1.470638 0.6038611 0.6038611 5 4       3       1   40     5.2

It follows that the 2 ICCs are given by icc2r = 0.604 and icc2a = 0.604. As usual, the omission of an interaction factor leads to a slight increase in inter-rater reliability and a slight descrease in intra-rater reliability. In this case, both become identical.

References:

  1. Gwet, K.L. (2014, ISBN:978-0970806284). “Handbook of Inter-Rater Reliability,” 4th Edition. Advanced Analytics, LLC

Help Manual

Help pageTopics
Confidence Interval of Intraclass Correlation Coefficient (ICC) under ANOVA Model 1A.ci.ICC1a
Confidence Interval of Intraclass Correlation Coefficient (ICC) under ANOVA Model 1B.ci.ICC1b
Confidence Interval of ICCa(2,1), a measure of intra-rater reliability under Model 2 with interaction.ci.ICC2a.inter
Confidence Interval of ICCa(2,1) under Model 2 without subject-rater interaction.ci.ICC2a.nointer
Confidence Interval of ICC(2,1) under ANOVA Model 2 with Interaction.ci.ICC2r.inter
Confidence Interval of the ICC(2,1) under Model 2 without subject-rater interactionci.ICC2r.nointer
Confidence Interval of ICCa(3,1), a measure of intra-rater reliability under MOdel 3.ci.ICC3a.inter
Confidence Interval of ICC(3,1) under ANOVA Model 3 with Interaction.ci.ICC3r.inter
Confidence Interval of the ICC(3,1) under Model 3 without subject-rater interactionci.ICC3r.nointer
Intraclass Correlation Coefficient (ICC) under ANOVA Model 1A.icc1a.fn
Intraclass Correlation Coefficient (ICC) under ANOVA Model 1B.icc1b.fn
Intraclass Correlation Coefficients ICC(2,1) & ICCa(2,1) under the Random Factorial ANOVA Model with Interaction.icc2.inter.fn
Intraclass Correlation Coefficients ICC(2,1) and ICCa(2,1) under ANOVA Model 2 without interaction.icc2.nointer.fn
Intraclass Correlation Coefficients (ICC) under the Mixed Factorial ANOVA model with Interaction.icc3.inter.fn
Intraclass Correlation Coefficients ICC(3,1) and ICCa(3,1) under ANOVA Model 3 without interaction.icc3.nointer.fn
Scores assigned by 4 judges to 5 targets/subjects.iccdata1
Scores assigned by 4 judges to 5 targets/subjects distributed in 2 groups A and B.iccdata2
Scores assigned by 3 raters to 4 subjects.iccdata3
Mean of Squares for Errors (MSE) under ANOVA Model 1A.mse1a.fn
Mean of Squares for Errors (MSE) under ANOVA Model 1B.mse1b.fn
Mean of Squares for Errors (MSE) under ANOVA Models 2 & 3 with interaction.mse2.inter.fn
Mean of Squares for Errors (MSE) under Models 2 & 3 without replicationmse2.nointer.fn
Mean of Squares for Interaction (MSI) under ANOVA Models 2 & 3 with interaction.msi2.fn
Mean of Squares for Raters (MSR) under ANOVA Model 1B.msr1b.fn
Mean of Squares for Raters (MSR) under ANOVA Model 2 with or without interaction.msr2.fn
Mean of Squares for Subjects (MSS) under ANOVA Model 1A.mss1a.fn
Mean of Squares for Subjects (MSS) under ANOVA Models 2 and 3, with or without interaction.mss2.fn
P-value of the ICC under ANOVA Model 1A for the specific null values 0,0.1,0.3,0.5,0.7,0.9.pval.ICC1a
P-value of the ICC under ANOVA Model 1B for the specific null values 0,0.1,0.3,0.5,0.7,0.9.pval.ICC1b
P-values of ICC(2,1) under Model 2 with subject-rater interaction, for 6 specific null values.pval.ICC2r.inter
P-value of the ICC under ANOVA Model 1A for arbitrary null values.pvals.ICC1a
P-value of the ICC under ANOVA Model 1B for arbitrary null values.pvals.ICC1b
P-values of ICCa(2,1) under Model 2 with interaction.pvals.ICC2a.inter
P-values of ICCa(2,1) under Model 2 without subject-rater interaction.pvals.ICC2a.nointer
P-values of ICC(2,1) under Model 2 with subject-rater interaction, for user-provided null values.pvals.ICC2r.inter
P-values of ICC(2,1) under Model 2 without subject-rater interaction.pvals.ICC2r.nointer
P-values of ICCa(3,1) under Model 3 with subject-rater interaction.pvals.ICC3a.inter
P-value of the Intraclass Correlation Coefficient ICC(3,1) under Model 3 with subject-rater interaction.pvals.ICC3r.inter
P-values of ICC(3,1) under Model 3 without subject-rater interaction.pvals.ICC3r.nointer