Evaluating the immunogenicity of biologics is a critical step in the drug development process. Non-clinical and clinical immunogenicity is evaluated via detection of treatment-induced anti-drug antibodies (ADAs). This assessment is often done using screening cut point determination through a multi-tier process. The screening cut point establishes the line between ADA negative and potentially positive samples. While there are well-established guidelines for determining the screening cut point of immunoassays (Shankar et al. 2008; Shen, Dong, and Tsong 2015), many tools are either incomplete or are not available for widespread use.
rADA
is an R package that provides an open-source
solution for screening cut-point determination that can be used to
simplify the process of ADA detection as well as provide a framework for
reproducibility and easily generate figures for reports.
rADA
offers the following key features summarised in the
diagram below.
In this vignette, the functions in the R package rADA
will be used to evaluate data from immunoassays and calculate the
screening cut point as well as produce plots that may be useful in
further assessment of the data. Note that the confirmatory cut-point
determination is not implemented in the rADA
.
rADA
can be installed from CRAN or the repository:
The following libraries are needed to calculate screening cut point and produce plots. To load these libraries, type in the R console:
The rADA
package includes a lognormAssay
dataset which is generated based on the log-normal distribution. The
parameters of this simulated dataset are based on observed datasets by
the authors. To load the dataset, use:
To upload your own data from your local hard drive, use an
appropriate data import function designated to read the data format. For
example, the data stored in Comma Separated Values (.csv) files can be
read in R using the read.csv
function and the Excel (.xlsx)
files can be read using the read.xlsx
function in the
openxlsx
package (make sure all of your data is on a single
sheet). For more information and help on uploading xlsx files in R,
refer to the openxlsx
documentation [add hyperlink to
openxlsx documentation] .
As a minimum, your file should contain at least 3 columns: ID, Lot, and data column.
Currently, your data column(s) should be identified by the following identifiers: DayID_OperatorID_ReplicateID. While the IDs do not have to necessarily be in a particular format, they should be consistent across all columns as well as distinct from each other. For instance, if ‘1’ is used to denote the replicate number 1, you should use ‘D1’ to designate your day 1 instead of ‘1’.
For the lognormAssay
dataset, the simulated experiment
is based on a design of 3 replicates performed by 2 separate operators
over 3 days as follows:
## ID Lot D1_Op1_1 D1_Op1_2 D1_Op1_3 D1_Op2_1 D1_Op2_2 D1_Op2_3 D2_Op1_1
## 1 ID1 ID1 8.242108 8.685232 9.571481 9.596608 8.089986 10.39423 9.748730
## 2 ID2 ID2 13.240265 11.247107 10.962370 17.799476 17.514739 17.65711 17.796055
## 3 ID3 ID3 10.292218 12.602716 11.552489 10.241992 11.292218 12.55249 9.241992
## 4 ID4 ID4 13.015193 12.780685 12.546177 12.021875 10.732081 12.60814 11.608145
## 5 ID5 ID5 28.791520 38.388693 36.789165 32.670672 36.189636 35.22992 32.950295
## 6 ID6 ID6 10.073062 10.878907 12.389867 10.166487 11.173793 12.18110 7.957719
## D2_Op1_2 D2_Op1_3 D2_Op2_1 D2_Op2_2 D2_Op2_3 D3_Op1_1 D3_Op1_2 D3_Op1_3
## 1 10.98948 7.887609 7.646862 8.887609 11.36910 9.128356 7.621735 8.951107
## 2 11.95895 15.945265 17.657108 18.226581 17.94184 11.247107 11.958949 17.226581
## 3 12.91778 10.187195 13.707738 11.502263 12.23742 8.821901 10.187195 9.241992
## 4 14.42224 13.718717 15.187733 13.897939 10.26307 13.601463 13.953225 10.318351
## 5 26.55218 36.149353 30.751237 38.428976 36.18964 37.109070 24.952651 31.670672
## 6 10.07306 10.878907 13.591328 12.986944 10.46868 7.554797 8.763564 12.087675
## D3_Op2_1 D3_Op2_2 D3_Op2_3
## 1 9.773857 11.634978 7.735486
## 2 12.816581 12.816581 14.952107
## 3 10.241992 8.981720 11.817331
## 4 14.249701 11.787367 11.435605
## 5 36.509541 33.950295 26.592462
## 6 13.188405 8.554797 10.065756
After loading/uploading the data, the next step is to assign the data
to an ImmunoAssay
object in order to perform the analysis
as this will be the class that will be accepted by most of the functions
in rADA
.
The exp.name
argument holds the name of the experiment
(default is Experiment1
) and can be changed if
required.
The data is imported as is and in a reformatted version in the
data
and melted.data
slots. This melted
version of the data is used for the downstream functions of
rADA
. (To read more about melting data.frames, see the
reshape2
documentation here) However,
the function specific to rADA
will also extract the
analytical variables as well from the column names defined.
The coefficient of variation (CV) can be calculated by using the
calcCvStats
function. The CV is used to test the
variability between replicates. If a replicate falls outside of the
established range, it can be excluded from the final analysis. The
default value for the CV in this function is 20%, though this can be
changed within the function.
## Using index as id variables
This function produces a list of dataframes that can be used for the remainder of the analysis.
## [1] "means" "sd" "cv.df" "means.adj.df" "sd.adj.df"
## [6] "cv.adj.df" "is.adj.df"
This function produces dataframes that summarize the values calculated in the function. This is a way for the user to see how many samples were outside the defined CV and if this is consistent across different potential analytical variables.
For instance, one can examine whether or not there were outliers calculated for replicates done on Day 1 done by Operator 2.
##
## TRUE
## 9
Accordingly, 9/100 of the samples are outside the defined CV.
Similarly, the outlier examination can be done for replicates run on Day 2 by Operator 1.
##
## TRUE
## 16
There seems to be more variability between the replicates of the values generated on this day by this operator. This will be something to keep in mind for the further evaluation of the dataset.
Once this analysis is completed, the melted.data
slot is
replaced with a copy of the original (melted) dataset with all of the
outliers removed. This can then be used as input for the rest of the
pipeline.
## ID Lot variable value Day Operator Replicate Category DayOperator
## 1 ID1 ID1 D1_Op1_1 8.242108 D1 Op1 1 Experiment1 D1Op1
## 2 ID2 ID2 D1_Op1_1 13.240265 D1 Op1 1 Experiment1 D1Op1
## 3 ID3 ID3 D1_Op1_1 10.292218 D1 Op1 1 Experiment1 D1Op1
## 4 ID4 ID4 D1_Op1_1 13.015193 D1 Op1 1 Experiment1 D1Op1
## 5 ID5 ID5 D1_Op1_1 28.791520 D1 Op1 1 Experiment1 D1Op1
## 6 ID6 ID6 D1_Op1_1 10.073062 D1 Op1 1 Experiment1 D1Op1
## 7 ID7 ID7 D1_Op1_1 NA D1 Op1 1 Experiment1 D1Op1
## xid
## 1 D1Op1_ID1_ID1
## 2 D1Op1_ID2_ID2
## 3 D1Op1_ID3_ID3
## 4 D1Op1_ID4_ID4
## 5 D1Op1_ID5_ID5
## 6 D1Op1_ID6_ID6
## 7 D1Op1_ID7_ID7
The melted.data
includes columns that represent the
days, operators, and replicates. This type of format makes it easier to
create plots for visualization as well as group the data by the
analytical variables of interest.
While an examination of the CV is helpful to understand the variability within the dataset, a better way of looking at the dataset is through the evaluation of the distribution using boxplots.
The evalBoxplot
function will produce the boxplot of the
signal for a given experiment variable. For example, the signal
differences between days can be visualized using:
This can be helpful when you are comparing between different
experiments as evalBoxplot
will separate the values by
experiment that experimental variable.
Note that the other options to input into the var
variable are “Operator”, “Replicate”, or “Day”. The
evalBoxplot
uses ggplot2
functionalities.
Therefore, any additional ggplot2
argument can be passed to
the function for better customization. In this instance, the plot theme
is adjusted using theme_minimal()
.
In order to differentiate between different variables, the color adjustments can be applied:
evalBoxplot(assay.obj, var = 'Replicate') + ggplot2::theme_minimal() + ggplot2::scale_fill_manual(values='#00a4b2')
Further customization can be done using ggplot2
functionalities. For more information, refer to the website here.
Histograms along with boxplots are commonly used to evaluate the distribution of the data. This section will provide the R code to re-create a figure from the original white paper (Shankar et al. 2008). The following code will create a composite of a histogram and a boxplot:
# Create the boxplot for the top of the plot
p1 <- ggplot(assay.obj@melted.data, aes(x=Category,y=value)) + stat_boxplot(geom ='errorbar',width=0.2, size = 1.5) +geom_boxplot(size = 1.1) + coord_flip() + scale_y_continuous(limits = c(0, 40)) +
theme(
axis.title=element_text(size=12,face="bold"),
panel.background = element_blank(),
panel.grid = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.ticks.x=element_blank(),
axis.ticks.y=element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=2),
axis.text.x=element_blank(),
axis.text.y=element_blank())
# Create the histogram for the bottom of the plot
p2 <- ggplot(assay.obj@melted.data, aes(x=value)) + geom_histogram(aes(x = value, y = ..density..), colour="black", fill="#6c78a7", size = 2) + scale_x_continuous(limits = c(0, 40)) +
stat_function(fun = dnorm, args = list(mean = mean(assay.obj@melted.data$value, na.rm = TRUE), sd = sd(assay.obj@melted.data$value, na.rm = TRUE)), size = 2) +
theme(
axis.title=element_text(size=12,face="bold"),
panel.border = element_rect(colour = "black", fill=NA, size=2),
panel.background = element_blank(),
panel.grid = element_blank())
The histogram and the boxplot can be put together in the same
viewport using functions from grid
and
gridExtra
libraries as follows:
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
As seen, the distribution of the data is skewed to the right.
The rADA
offers normality evaluation using Shapiro –
Wilk’s normality test (Royston 1982) with
the evalNorm
function. Using evalNorm
, the
normality can be assessed using the original untransformed data,
transformed data, with or without outliers.
The function take the following arguments:
assay.obj
: An ImmunoAssay object imported by
importAssaycategory
: If assay.df.melted consists of more than 1
dataset, choose the category here to split datasetdata.transf
: Should the data should be transformed
before normality is evaluatedtransf.method
: If data.transf is TRUE, which method
should be used. Can choose between ‘log10’ and ‘ln’.excl.outliers
: Should outliers be excluded from this
analysis? If TRUE, data points which lie beyond the extremes of the
whiskers in boxplot will be excluded, see boxplot.stats for
details.hist
: Should a histogram be outputted? TRUE/FALSEp.val
: Value to be used for cutoff for Shapiro-Wilks
test. Defaults to 0.05.skew
: Value to be used to determine skewness. Defaults
to 1.return.object
: If FALSE, only the plot is returned and
the stats are returned as a list.The evaluation of the normality in the untransformed (original)
values can be done using data.transf = FALSE
argument in
evalNorm
.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## $transf.method
## [1] "None"
##
## $sw.results
##
## Shapiro-Wilk normality test
##
## data: assay.df.melted$value
## W = 0.85107, p-value < 2.2e-16
##
##
## $skew.val
## [1] 1.642846
##
## $recommendation
## [1] "nonparametric"
As seen, when the data has not been transformed, the distribution is positive skewed.
Now, the data can be transformed using log10 to see if the
transformed data follows a normal distribution. The
data.transf = TRUE
and transf.method = 'log10'
can be used for this purpose. Note that, if there are multiple
experiments, the experiment name for the category variable can be
specified accordingly.
assay.obj <- evalNorm(assay.obj = assay.obj, category = 'Experiment1', data.transf = TRUE, transf.method = 'log10')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The normality evaluation function adds additional stats to the
stats
slot of the ImmunoAssay
object:
## [1] "means" "sd" "cv.df" "means.adj.df"
## [5] "sd.adj.df" "cv.adj.df" "is.adj.df" "transformation"
## [9] "sw.results" "skewness" "recommendation"
transformation
a string containing the transformation
method used, “None” will be displayed if
data.tranf = FALSE
sw.results
the output of shapiro.test()
,
this output is used for the recommendationskewness
the skewness value used in order to make the
recommendationrecommendation
a string containing the official
recommendation of the function, may be used as input for
scp()
functionAdditionally, this produces a histogram of the dataset which should be used in order to confirm or reject the recommendation that has been offered by the function.
The recommendation made by this function can be used to make a final decision on the statistical methodology (i.e. parametric or non-parametric) used for the screening cut-point determination as explained in Shankar, 2008. If you are satisfied with the recommendation, it can be used in the next stage of screening cut-point determination.
Accordingly, after the log10 transformation on the data, it is recommended to use a parametric method for screening cut point determination.
##
## Shapiro-Wilk normality test
##
## data: assay.df.melted$value
## W = 0.98911, p-value = 1.846e-09
## [1] 0.3258241
## [1] "normal"
If required, the normality can be assessed by removing the outlying
values in the data. When the excl.outliers
argument is set
as TRUE
, the normality assessment is done by removing
outliers in the data.
assay.obj <- evalNorm(assay.obj = assay.obj, category = 'Experiment1', data.transf = FALSE, excl.outliers = TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Note that the outliers are not removed from the data at this point and need to be removed (if needed) in the next step.
Once the normality of the distribution and logarithmic transformations are tested on the data the next step is to calculate the screening cut point using the recommendation.
Since this data is skewed without data transformation and outlier removal, the distribution can be characterized as “nonparamateric” and the screening cut-point can be found using the non-parametric approach explained in Shankar et al. (2008) as follows:
assay.obj <- scp(assay.obj = assay.obj,
category = 'Experiment1',
distrib = 'nonparametric',
data.transf = FALSE,
rm.out = FALSE)
print(assay.obj@scp.table)
## OperatorDay mean sd distrib cp mean.conf.int1
## 1 D1Op1 11.22877 6.242176 nonparametric 24.18815 10.52240
## 2 D1Op2 12.31018 6.291923 nonparametric 26.08928 11.59818
## 3 D2Op1 11.71666 6.758840 nonparametric 26.62153 10.95182
## 4 D2Op2 12.26978 6.409614 nonparametric 25.50627 11.54446
## 5 D3Op1 11.40482 6.479217 nonparametric 24.61321 10.67163
## 6 D3Op2 12.26036 6.526981 nonparametric 28.24917 11.52176
## 7 all 11.87038 6.453973 nonparametric 26.57031 11.57223
## mean.conf.int2 cp.conf.int1 cp.conf.int2
## 1 11.93514 23.48178 24.89452
## 2 13.02218 25.37728 26.80127
## 3 12.48149 25.85670 27.38637
## 4 12.99509 24.78095 26.23158
## 5 12.13802 23.88002 25.34640
## 6 12.99895 27.51058 28.98777
## 7 12.16854 26.27215 26.86847
By default, the scp
function divides up the cut point
calculations by the experimental variables (i.e., operators and days)
used in the immunoassay experiment in order to provide better
understanding on how the screening cut point may differ between
experimental groups. The last line (named all
) will provide
the overall cut point estimation and the 95% confidence interval for the
estimation.
Based on this data and the boxplots generated earlier, since there is not a substantial difference among cut-points, this is indicative of minimal influence of analytical variability. The same calculation can be carried out based on the transformed data and the influence on the results can be compared. This way the calculations can be compared systematically to see how data transformation has affected the cut point calculation.
Using the most suitable parameters that is determined from normality
evaluation, the screening cut-point calculation can be undertaken using
the log10 transformed data and be saved in a table named
cp.table
.
assay.obj <- scp(assay.obj = assay.obj,
category = 'Experiment1',
distrib = 'normal', #assay.norm.eval$recommendation,
data.transf = TRUE,
transf.method = 'log10',
rm.out = FALSE)
print(assay.obj@scp.table)
## OperatorDay mean.transf sd.transf distrib cp cp.inv mean.conf.int1
## 1 D1Op1 0.9951773 0.2143719 normal 1.347819 22.27506 0.9709188
## 2 D1Op2 1.0431544 0.1970483 normal 1.367299 23.29694 1.0208563
## 3 D2Op1 1.0081372 0.2260505 normal 1.379990 23.98779 0.9825571
## 4 D2Op2 1.0397956 0.2013784 normal 1.371063 23.49974 1.0170075
## 5 D3Op1 1.0014410 0.2135421 normal 1.352718 22.52774 0.9772764
## 6 D3Op2 1.0378627 0.2046658 normal 1.374538 23.68852 1.0147026
## 7 all 1.0212249 0.2101269 normal 1.366884 23.27468 1.0115175
## mean.conf.int2 cp.conf.int1 cp.conf.int2
## 1 1.019436 21.06495 23.55469
## 2 1.065452 22.13098 24.52432
## 3 1.033717 22.61571 25.44312
## 4 1.062584 22.29846 24.76573
## 5 1.025606 21.30851 23.81673
## 6 1.061023 22.45835 24.98608
## 7 1.030932 22.76022 23.80078
As seen from the output, the calculated cut points are different between the two different methods chosen.
The columns in this table are slightly different compared to those in
the previous cut point table we generated since we transformed the data.
Instead of just one cp
, there is an additional cut point
called cp.inv
. This is the inverse cut point that
corresponds to the actual dataset.
There are two confidence intervals calculated: one for the mean and one for the screening cut point. The 95% confidence intervals are also calculated here. Since the cut point is an unknown parameter, this is the range where we can be 95% confident the parameter falls between. The smaller the range, the more confident we can be that the calculated cut point is close to the real value. In this case, the confidence intervals are relatively close to each other.
A forest plot can help visualize the data generated from the table of calculated cut points. This type of plot will also be useful when comparing between different methods of cut point calculation (such as outlier inclusion/exclusion and various data transformation methods.)
In this case, it can be seen that there is an overlap using the 95%
confidence intervals that were calculated in the scp
function beforehand.
For further explanation of the input and variables for the
forestplot
function, please refer to the package
documentation from the forestplot
package.
The forest plot suggests that while there is slight variation in the cut point separated by day/operator, the values are all relatively close to each other with the 95% confidence intervals all overlapping with each other.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] reshape2_1.4.4 gridExtra_2.3 ggplot2_3.5.1 forestplot_3.1.6
## [5] abind_1.4-8 checkmate_2.3.2 rADA_1.1.9
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3 class_7.3-22
## [5] stringi_1.8.4 digest_0.6.37 magrittr_2.0.3 evaluate_1.0.1
## [9] fastmap_1.2.0 plyr_1.8.9 jsonlite_1.8.9 e1071_1.7-16
## [13] backports_1.5.0 fansi_1.0.6 scales_1.3.0 codetools_0.2-20
## [17] jquerylib_0.1.4 cli_3.6.3 rlang_1.1.4 munsell_0.5.1
## [21] withr_3.0.2 cachem_1.1.0 yaml_2.3.10 tools_4.4.2
## [25] dplyr_1.1.4 colorspace_2.1-1 buildtools_1.0.0 vctrs_0.6.5
## [29] R6_2.5.1 matrixStats_1.4.1 proxy_0.4-27 lifecycle_1.0.4
## [33] stringr_1.5.1 pkgconfig_2.0.3 pillar_1.9.0 bslib_0.8.0
## [37] gtable_0.3.6 glue_1.8.0 Rcpp_1.0.13-1 xfun_0.49
## [41] tibble_3.2.1 tidyselect_1.2.1 sys_3.4.3 knitr_1.49
## [45] farver_2.1.2 htmltools_0.5.8.1 rmarkdown_2.29 maketools_1.3.1
## [49] labeling_0.4.3 compiler_4.4.2