Package: Colossus 1.2

Eric Giunta

Colossus: "Risk Model Regression and Analysis with Complex Non-Linear Models"

Performs survival analysis using general non-linear models. Risk models can be the sum or product of terms. Each term is the product of exponential/linear functions of covariates. Additionally sub-terms can be defined as a sum of exponential, linear threshold, and step functions. Cox Proportional hazards <https://en.wikipedia.org/wiki/Proportional_hazards_model>, Poisson <https://en.wikipedia.org/wiki/Poisson_regression>, and Fine-Gray competing risks <https://www.publichealth.columbia.edu/research/population-health-methods/competing-risk-analysis> regression are supported. This work was sponsored by NASA Grant 80NSSC19M0161 through a subcontract from the National Council on Radiation Protection and Measurements (NCRP). The computing for this project was performed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CNS-1006860, EPS-1006860, EPS-0919443, ACI-1440548, CHE-1726332, and NIH P20GM113109.

Authors:Eric Giunta [aut, cre], Amir Bahadori [ctb], Dan Andresen [ctb], Linda Walsh [ctb], Benjamin French [ctb], Lawrence Dauer [ctb], John Boice Jr [ctb], Kansas State University [cph], NASA [fnd], NCRP [fnd], NRC [fnd]

Colossus_1.2.tar.gz
Colossus_1.2.tar.gz(r-4.5-noble)Colossus_1.2.tar.gz(r-4.4-noble)
Colossus_1.2.tgz(r-4.4-emscripten)Colossus_1.2.tgz(r-4.3-emscripten)
Colossus.pdf |Colossus.html
Colossus/json (API)
NEWS

# Install 'Colossus' in R:
install.packages('Colossus', repos = 'https://cloud.r-project.org')

Bug tracker:https://github.com/ericgiunta/colossus/issues0 issues

Pkgdown site:https://ericgiunta.github.io

Uses libs:
  • c++– GNU Standard C++ Library v3
  • openmp– GCC OpenMP (GOMP) support library

On CRAN:

Conda:

cppopenmp

4.72 score 508 downloads 50 exports 41 dependencies

Last updated 2 months agofrom:392950d215. Checks:3 OK. Indexed: no.

TargetResultLatest binary
Doc / VignettesOKMar 15 2025
R-4.5-linux-x86_64OKMar 15 2025
R-4.4-linux-x86_64OKMar 15 2025

Exports:Check_Dupe_ColumnsCheck_TruncConvert_Model_EqCorrect_Formula_OrderCox_Relative_RiskCoxCurveSolverDate_ShiftDef_ControlDef_Control_GuessDef_model_controlDef_modelform_fixEvent_Count_GenEvent_Time_Genfactorizefactorize_parGather_Guesses_CPPgen_time_depGetCensWeightinteract_themInterpret_OutputJoint_Multiple_EventsLikelihood_Ratio_TestLinked_Dose_FormulaLinked_Lin_Exp_ParaModel_Results_LogPoissonCurveSolverReplace_MissingRunCoxNullRunCoxPlotsRunCoxRegressionRunCoxRegression_BasicRunCoxRegression_CRRunCoxRegression_Guesses_CPPRunCoxRegression_OmnibusRunCoxRegression_Omnibus_MultidoseRunCoxRegression_SingleRunCoxRegression_StrataRunCoxRegression_Tier_GuessesRunPoissonEventAssignmentRunPoissonEventAssignment_boundRunPoissonRegressionRunPoissonRegression_Guesses_CPPRunPoissonRegression_Joint_OmnibusRunPoissonRegression_OmnibusRunPoissonRegression_ResidualRunPoissonRegression_SingleRunPoissonRegression_StrataRunPoissonRegression_Tier_GuessesSystem_VersionTime_Since

Dependencies:briocallrclicpp11crayondata.tabledescdiffobjdigestdplyrevaluatefansifsgenericsgluejsonlitelifecyclelubridatemagrittrpillarpkgbuildpkgconfigpkgloadpraiseprocessxpsR6RcppRcppEigenrlangrprojrootstringistringrtestthattibbletidyselecttimechangeutf8vctrswaldowithr

Alternative Regression Options

Rendered fromAlt_Run_Opt.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2024-09-07

Colossus Description

Rendered fromStarting-Description.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2024-09-07

Confidence Interval Selection

Rendered fromWald_and_Log_Bound.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2024-09-07

Distributed Start Framework

Rendered fromAlt_Distrib_Starts.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2024-10-21
Started: 2023-09-28

Dose Response Formula Terms

Rendered fromDose_Formula_Inputs.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2024-10-21
Started: 2024-09-07

Excess and Predicted Cases

Rendered fromExcess_and_Predicted_Cases.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2024-10-21

Functions for Plotting and Analysis

Rendered fromPlotting_And_Analysis.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2024-02-21

Generating Person-Count and Person-Time Tables

Rendered fromcount_time_tables.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2025-02-13

Gradient and Hessian Approaches

Rendered fromGrad_Hess.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2025-02-13

List of Control Options

Rendered fromControl_Options.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2024-09-07

Multiple Realization Methods

Rendered fromMulti_Realization.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2024-09-07

Script comparisons with 32-bit Epicure

Rendered fromScript_Comparison_Epicure.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2024-09-07

SMR Analysis

Rendered fromSMR_Analysis.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2025-02-13

Time Dependent Covariate Use

Rendered fromTime_Dep_Cov.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2024-10-21
Started: 2023-09-28

Unified Equation Representation

Rendered fromEquation_Expression.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2025-02-13
Started: 2025-02-13

Citation

Giunta E, Bahadori A, Andresen D, Walsh L, French B, Dauer L, Boice Jr J (2024). Colossus: Risk Model Regression and Analysis with Complex Non-Linear Models. https://github.com/ericgiunta/Colossus.

Giunta E, Stutzman D, Cohen S, French B, Walsh L, Dauer L, Boice Jr J, Blattnig S, Andresen D, Bahadori A (2024). “Colossus: Software for radiation epidemiologic studies with big data.” https://burkclients.com/IRPA/2024/site/welcome.html.

Corresponding BibTeX entries:

  @Manual{,
    title = {Colossus: Risk Model Regression and Analysis with Complex
      Non-Linear Models},
    author = {Eric Giunta and Amir Bahadori and Dan Andresen and Linda
      Walsh and Benjamin French and Lawrence Dauer and John {Boice
      Jr}},
    year = {2024},
    url = {https://github.com/ericgiunta/Colossus},
  }
  @Misc{,
    title = {Colossus: Software for radiation epidemiologic studies
      with big data},
    author = {Eric Giunta and Dawson Stutzman and Sarah Cohen and
      Benjamin French and Linda Walsh and Lawrence Dauer and John
      {Boice Jr} and Steve Blattnig and Dan Andresen and Amir
      Bahadori},
    year = {2024},
    url = {https://burkclients.com/IRPA/2024/site/welcome.html},
    publisher = {IRPA 16},
  }

Readme and manuals

Colossus

The goal of Colossus is to provide an open-source means of performing survival analysis on big data with complex risk formula. Colossus is designed to perform Cox Proportional Hazard regressions and Poisson regressions on datasets loaded as data.tables or data.frames. The risk models allowed are sums or products of linear, log-linear, or several other radiation dose response formula highlighted in the vignettes. Additional plotting capabilities are available.

By default a fully portable version of the code is compiled, which does not support OpenMP on every system. Note that Colossus requires OpenMP support to perform parallel calculations. The environment variable “R_COLOSSUS_NOT_CRAN” is checked to determine if OpenMP should be disabled for linux compiling with clang. The number of cores is set to 1 if the environment variable is empty, the operating system is detected as linux, and the default compiler or R compiler is clang. Colossus testing checks for the “NOT_CRAN” variable to determine if additional tests should be run. Setting “NOT_CRAN” to “false” will disable the longer tests. Currently OpenMP support is not configured for linux compiling with clang.

Example

This is a basic example which shows you how to solve a common problem:

library(data.table)
library(parallel)
library(Colossus)
## basic example code reproduced from the starting-description vignette

df <- data.table(
  "UserID" = c(112, 114, 213, 214, 115, 116, 117),
  "Starting_Age" = c(18, 20, 18, 19, 21, 20, 18),
  "Ending_Age" = c(30, 45, 57, 47, 36, 60, 55),
  "Cancer_Status" = c(0, 0, 1, 0, 1, 0, 0),
  "a" = c(0, 1, 1, 0, 1, 0, 1),
  "b" = c(1, 1.1, 2.1, 2, 0.1, 1, 0.2),
  "c" = c(10, 11, 10, 11, 12, 9, 11),
  "d" = c(0, 0, 0, 1, 1, 1, 1)
)
# For the interval case
time1 <- "Starting_Age"
time2 <- "Ending_Age"
event <- "Cancer_Status"

names <- c("a", "b", "c", "d")
term_n <- c(0, 1, 1, 2)
tform <- c("loglin", "lin", "lin", "plin")
modelform <- "M"

a_n <- c(0.1, 0.1, 0.1, 0.1)

keep_constant <- c(0, 0, 0, 0)
der_iden <- 0

control <- list(
  "lr" = 0.75, "maxiter" = 100, "halfmax" = 5, "epsilon" = 1e-9,
  "deriv_epsilon" = 1e-9, "abs_max" = 1.0,
  "verbose" = FALSE, "ties" = "breslow"
)

e <- RunCoxRegression(df, time1, time2, event, names, term_n, tform, keep_constant, a_n, modelform, control = control)
Interpret_Output(e)
#> |-------------------------------------------------------------------|
#> Final Results
#>    Covariate Subterm Term Number Central Estimate Standard Deviation
#>       <char>  <char>       <int>            <num>              <num>
#> 1:         a  loglin           0         44.53340       9.490627e+07
#> 2:         b     lin           1         98.72266                NaN
#> 3:         c     lin           1         96.82311       2.408255e+02
#> 4:         d    plin           2        101.10000       5.207003e+02
#> 
#> Cox Model Used
#> -2*Log-Likelihood: 1.35,  AIC: 9.35
#> Iterations run: 100
#> maximum step size: 1.00e+00, maximum first derivative: 1.92e-04
#> Analysis did not converge, check convergence criteria or run further
#> Run finished in 0.25 seconds
#> |-------------------------------------------------------------------|

Help Manual

Help pageTopics
checks for duplicated column namesCheck_Dupe_Columns
Applies time duration truncation limits to create columns for Cox modelCheck_Trunc
General purpose verbosity checkCheck_Verbose
Converts a string equation to regression model inputsConvert_Model_Eq
Corrects the order of terms/formula/etcCorrect_Formula_Order
Calculates hazard ratios for a reference vectorCox_Relative_Risk
Calculates the likelihood curve for a cox model directlyCoxCurveSolver
Automates creating a date difference columnDate_Shift
Automatically assigns missing control valuesDef_Control
Automatically assigns missing guessing control valuesDef_Control_Guess
Automatically assigns missing model control valuesDef_model_control
Automatically assigns geometric-mixture values and checks that a valid modelform is usedDef_modelform_fix
uses a table, list of categories, and list of event summaries to generate person-count tablesEvent_Count_Gen
uses a table, list of categories, list of summaries, list of events, and person-year information to generate person-time tablesEvent_Time_Gen
Splits a parameter into factorsfactorize
Splits a parameter into factors in parallelfactorize_par
Performs checks to gather a list of guesses and iterationsGather_Guesses_CPP
Checks default c++ compilergcc_version
Applies time dependence to parametersgen_time_dep
Checks system OSget_os
Calculates and returns data for time by hazard and survival to estimate censoring rateGetCensWeight
Defines Interactionsinteract_them
Prints a regression output clearlyInterpret_Output
Automates creating data for a joint competing risks analysisJoint_Multiple_Events
Defines the likelihood ratio testLikelihood_Ratio_Test
Calculates Full Parameter list for Special Dose FormulaLinked_Dose_Formula
Calculates The Additional Parameter For a linear-exponential formula with known maximumLinked_Lin_Exp_Para
Saves information about a run to a log fileModel_Results_Log
Checks the OMP flagOMP_Check
Calculates the likelihood curve for a poisson model directlyPoissonCurveSolver
Checks how R was compiledRcomp_version
Checks default R c++ compilerRcpp_version
Automatically assigns missing values in listed columnsReplace_Missing
Performs basic Cox Proportional Hazards regression with the null modelRunCoxNull
Performs Cox Proportional Hazard model plotsRunCoxPlots
Performs basic Cox Proportional Hazards regression without special optionsRunCoxRegression
Performs basic Cox Proportional Hazards regression with a multiplicative log-linear modelRunCoxRegression_Basic
Performs basic Cox Proportional Hazards regression with competing risksRunCoxRegression_CR
Performs basic Cox Proportional Hazards regression, Generates multiple starting guesses on c++ sideRunCoxRegression_Guesses_CPP
Performs Cox Proportional Hazards regression using the omnibus functionRunCoxRegression_Omnibus
Performs Cox Proportional Hazards regression using the omnibus function with multiple column realizationsRunCoxRegression_Omnibus_Multidose
Performs basic Cox Proportional Hazards calculation with no derivativeRunCoxRegression_Single
Performs basic Cox Proportional Hazards regression with strata effectRunCoxRegression_Strata
Performs basic cox regression, with multiple guesses, starts with solving for a single termRunCoxRegression_Tier_Guesses
Predicts how many events are due to baseline vs excessRunPoissonEventAssignment
Predicts how many events are due to baseline vs excess at the confidence bounds of a single parameterRunPoissonEventAssignment_bound
Performs basic poisson regressionRunPoissonRegression
Performs basic Poisson regression, generates multiple starting guesses on c++ sideRunPoissonRegression_Guesses_CPP
Performs joint Poisson regression using the omnibus functionRunPoissonRegression_Joint_Omnibus
Performs basic Poisson regression using the omnibus functionRunPoissonRegression_Omnibus
Calculates poisson residualsRunPoissonRegression_Residual
Performs poisson regression with no derivative calculationsRunPoissonRegression_Single
Performs poisson regression with strata effectRunPoissonRegression_Strata
Performs basic poisson regression, with multiple guesses, starts with a single termRunPoissonRegression_Tier_Guesses
Checks OS, compilers, and OMPSystem_Version
Automates creating a date since a reference columnTime_Since