List of Control Options

library(Colossus)
library(data.table)
library(ggplot2)

General Use

There are three control lists used in Colossus that the user can use to customize the regression, all three of which have functions defined that assign default values to any missing list items. The following sections will go through each control list, how it is used, and what every item assigned to it means.

Standard Control List

This is the “control” variable used by every regression function. This list focuses on options that control the convergence criteria and standard Cox proportional hazards options.

Option	Description
verbose	integer valued 0-4 controlling what information is printed to the terminal. Each level includes the lower levels. 0: silent, 1: errors printed, 2: warnings printed, 3: notes printed, 4: debug information printed. Errors are situations that stop the regression, warnings are situations that assume default values that the user might not have intended, notes provide information on regression progress, and debug prints out C++ progress and intermediate results. The default level is 2 and True/False is converted to 3/0.
ncores	Number of cores used for parallel operations. The user can’t assign more cores than available. Defaults to the number of detected cores
lr	Learning rate used. Every step is multiplied by the learning rate. Default is 0.75.
maxiter	Maximum iterations run for a single initial position. Defaults to 20.
maxiters	Used for multiple starting positions. List of maximum number of iterations per starting point and maximum number of iterations for final guess. Assumes N values for N guesses and 1 value for final solution. Defaults to c(1,maxiter), running 1 iteration per provided guess.
halfmax	Maximum number of half-steps taken per iteration. Defaults to 5.
epsilon	Minimum step size before convergence is assumed. Defaults to 1e-9
deriv_epsilon	Smallest maximum first derivative before convergence is assumed. Defaults to 1e-9.
abs_max	Largest acceptable step size for non-intercept parameters. Defaults to 1.0.
dose_abs_max	Largest acceptable step size for intercept parameters. Defaults to 100.0.
ties	Chooses between ‘breslow’ and ‘efron’ methods for tied event times in Cox proportional hazards regressions. Defaults to ‘breslow’.
double_step	0/1 Choosing if the model should be optimized by solving the inversion hessian (1) or optimizing each parameter independently (0). Defaults to 1.
change_all	Boolean that control if the model is optimized (TRUE), or if a single parameter is changed a constant value each iteration (FALSE). Used for debugging purposes. Defaults to TRUE.

Model Control List

This is the “model_control” variable used primarily by the omnibus functions. The non-omnibus functions use this list to select non-standard modeling options.

Option	Description
single	Boolean used by both Cox and Poisson functions to calculate score without derivatives. This method can be used to quickly test multiple starting points. Defaults to FALSE.
basic	Boolean used by Cox functions to use the single-term log-linear model. This method can apply simplifications which reduce the time and memory requirements. Defaults to FALSE.
null	Boolean used by Cox functions to calculate the null model, which assumes a constant hazard ratio. Defaults to FALSE.
cr	Boolean used by Cox functions to apply competing risks model options to use a Fine-Grey model. Defaults to FALSE.
constraint	Boolean used by both Cox and Poisson function to apply a system of linear constraints. Defaults to FALSE.
strata	Boolean used by both Cox and Poisson functions to apply stratification. Cox functions use stratified risk groups and Poisson functions apply an average event rate per strata. Defaults to FALSE.
surv	Boolean used by Cox plotting function to choose survival curve plotting. Defaults to FALSE
schoenfeld	Boolean used by Cox plotting function to choose schoenfeld residual curve plotting. Defaults to FALSE
risk	Boolean used by Cox plotting function to choose hazard ratio plotting by a single parameter. Defaults to FALSE
unique_values	integer used by the risk plotting option to select how many values calculate risk at. Depending on function either defaults to the number of unique values in the column or 2.
risk_subset	Boolean used by Cox_Relative_Risk function to calculate a relative risk for every row. Not directly used by the user, Defaults to FALSE.
log_bound	Boolean used by Cox and Poisson functions to select calculating likelihood-based confidence bounds. Defaults to FALSE.
pearson	Boolean used by RunPoissonRegression_Residual to select calculating pearson residuals. Defaults to FALSE.
deviance	Boolean used by RunPoissonRegression_Residual to select calculating deviance residuals. Defaults to FALSE.
gmix_theta	Theta value for the geometric mixture model, double between 0-1. Defaults to 0.5.
gmix_term	Integer vector indicating if each term in the geometric mixture model is a relative risk (0) or excess risk (1). Defaults to c(0), should be provided if used.

If the log_bound method is used, there are several additional control options used:

Option	Description
alpha	Confidence level to use, valued 0-1. Defaults to 0.05.
qchi	Two tailed Chi-squared value at one degree of freedom and the provided alpha. Defaults to the value corresponding to alpha.
para_number	Parameter number to solve the boundary for. Indexed starting at 0. Defaults to 0.
manual	Boolean which selects if the standard Venzon-Moolgavkar algorithm (FALSE) or if a modified version is used which starts by optimizing multiple points (TRUE). The manual mode is beneficial for problems with linear terms, which may have issues with local optimums. Defaults to FALSE.
maxstep	Integer used by the manual search option, splits the initial step into maxstep points. Then each point is optimized with a constant boundary parameter. This is done to find the closest estimate for the boundary point. Defaults to 10.
search_mult	Double used by manual search option, scales the initial step by that amount. This can be done to widen or narrow the scope of the search if the initial quadratic assumption made by the Venzon-Moolgavkar algorithm is incorrect.

Guess Control List

This is the “guesses_control” variable used by the distributed start functions. These options are used to control the number and scope of distributed initial parameter sets used.

Option	Description
maxiter	Iterations run for every random starting point
guesses	Number of starting points to test
guesses_start	Number of starting points tested for Tiered or Strata First regressions
guess_constant	Binary values to denote if any parameter values should not be randomized
exp_min	minimum exponential parameter change
exp_max	maximum exponential parameter change
intercept_min	minimum intercept parameter change
intercept_max	maximum intercept parameter change
lin_min	minimum linear slope parameter change
lin_max	maximum linear slope parameter change
exp_slope_min	minimum linear-exponential, exponential slope parameter change
exp_slope_max	maximum linear-exponential, exponential slope parameter change
strata	True/False if stratification is used
term_initial	List of term numbers to run first if Tiered guessing is used
rmin	list of minimum change for each parameter
rmax	list of maximum change for each parameter
verbose	integer valued 0-4 controlling what information is printed to the terminal. Each level includes the lower levels. 0: silent, 1: errors printed, 2: warnings printed, 3: notes printed, 4: debug information printed. Errors are situations that stop the regression, warnings are situations that assume default values that the user might not have intended, notes provide information on regression progress, and debug prints out C++ progress and intermediate results. The default level is 2 and True/False is converted to 3/0.

- General Use