Package: gaQSAR 1.2.3

Jos Hageman

gaQSAR: QSAR Modelling Using Genetic Algorithm Based Variable Selection

Implements genetic algorithm-based variable selection for building quantitative structure-activity relationship (QSAR) models. The package provides a workflow for selecting optimal predictor subsets from large descriptor spaces using leave-one-out cross-validation (LOOCV) with Q2 as the fitness criterion. Features include automatic handling of multicollinearity via variance inflation factor (VIF) thresholding, customizable genetic algorithm operators, and diagnostic tools for model evaluation. Supports both training set optimization and external validation, plus nested (double) cross-validation for unbiased performance estimation and predictor stability diagnostics. Built-in visualization functions include Q2 curves and Williams plots to assess model applicability domain. The method is demonstrated in papers predicting antibacterial activity by Araya-Cloutier et al. (2018) <doi:10.1038/s41598-018-27545-4> and Kalli et al. (2021) <doi:10.1038/s41598-021-92964-9>.

Authors:Jos Hageman [aut, cre]

gaQSAR_1.2.3.tar.gz
gaQSAR_1.2.3.tar.gz(r-4.7-any)gaQSAR_1.2.3.tar.gz(r-4.6-any)
gaQSAR_1.2.3.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
gaQSAR/json (API)

# Install 'gaQSAR' in R:
install.packages('gaQSAR', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/joshageman/gaqsar/issues

On CRAN:

Conda:

2.30 score 5 scripts 17 exports 38 dependencies

Last updated from:fca1d03eed. Checks:4 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK139
source / vignettesOK203
linux-release-x86_64OK141
wasm-releaseOK148

Exports:createBestFitnessPlotcreateDCVTrainingMetricsPlotcreateDCVWilliamsPlotcreateQ2PlotcreateWilliamsPlotgaDoubleCrossValidationgaintegerMutationgaintegerOnePointCrossovergaintegerPopulationgaintegerTwoPointCrossovergaPermutationTestgaVariableSelectionpredictOOBObjectsQ2QSARMonitorFactorysingleCVsplitUp

Dependencies:clicodetoolscpp11crayondigestfarverforeachfuturefuture.applyGAggplot2ggrepelglobalsgluegtableisobanditeratorslabelinglifecyclelistenvmagrittrmathjaxrparallellyplyrprospectrR6RColorBrewerRcppRcppArmadilloreshape2rlangS7scalesstringistringrvctrsviridisLitewithr

Double cross-validation workflow with gaQSAR
Load packages and helper function | Prepare the data | Choose settings | Run double cross-validation | Optional parallel execution | Compare model sizes | Select a model size | Inspect the selected model size | Williams plot across outer folds | Best fitness plot | Permutation test | Save results | Summary

Last update: 2026-06-24
Started: 2026-06-24

Train/test QSAR workflow with gaQSAR
Load packages and helper function | Prepare the data | Choose settings | Split the data | Run the GA for several model sizes | Optional parallel execution | Predict the test set | Compare Q2 values | Williams plot, fitness plot and Observed versus Predicted plot | Select one model | Permutation test | Save results | Summary

Last update: 2026-06-24
Started: 2026-06-24

Readme and manuals

Help Manual

Help pageTopics
Plot best fitness per generationcreateBestFitnessPlot
Plot training metrics (R2, R2adj, Q2) versus model size for nested CV runscreateDCVTrainingMetricsPlot
Williams plot for double cross-validation diagnosticscreateDCVWilliamsPlot
Plot Q2 versus number of predictorscreateQ2Plot
Create Williams plots for QSAR model diagnosticscreateWilliamsPlot
Nested (double) cross-validation for GA-based variable selectiongaDoubleCrossValidation
Integer-valued GA mutation operatorgaintegerMutation
Integer-valued one-point crossover operatorgaintegerOnePointCrossover
Integer-valued GA population initializergaintegerPopulation
Integer-valued two-point crossover operatorgaintegerTwoPointCrossover
Y-scrambling permutation test for GA-based variable selectiongaPermutationTest
Genetic algorithm based variable selection for QSARgaVariableSelection
Plot method for gaQSAR objectsplot.gaQSAR
Plot method for gaQSAR_dcv objectsplot.gaQSAR_dcv
Plot method for gaQSAR_permTest objectsplot.gaQSAR_permTest
Predict out-of-bag objects and compute external Q2predictOOBObjects
Print method for gaQSAR objectsprint.gaQSAR
Print method for gaQSAR_dcv objectsprint.gaQSAR_dcv
Print method for gaQSAR_permTest objectsprint.gaQSAR_permTest
Compute Q2 (cross-validated R-squared)Q2
GA monitor function for QSAR variable selectionQSARMonitorFactory
LOOCV Q2 fitness function for small datasetssingleCV
Split data into training and test setssplitUp
Summary method for gaQSAR objectssummary.gaQSAR
Summary method for gaQSAR_dcv objectssummary.gaQSAR_dcv
Summary method for gaQSAR_permTest objectssummary.gaQSAR_permTest