| Title: | Poisson Super Learner |
|---|---|
| Description: | Provides tools for fitting piecewise-constant hazard models for survival and competing risks data, including ensemble hazard estimation via the Super Learner framework. The package supports estimation of survival functions and absolute risk predictions from fitted cause-specific hazard models. For the Super Learner framework see van der Laan, Polley and Hubbard (2007) <doi:10.2202/1544-6115.1309>. |
| Authors: | Gabriele Pittarello [aut, cre], Helene Rytgaard [aut], Thomas Gerds [aut] |
| Maintainer: | Gabriele Pittarello <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.2.0 |
| Built: | 2026-05-18 19:42:54 UTC |
| Source: | https://github.com/cran/poissonsuperlearner |
Convenience method to extract (cause-specific) model coefficients from a fitted
base_learner returned by fit_learner().
## S3 method for class 'base_learner' coef(object, cause = NULL, ...)## S3 method for class 'base_learner' coef(object, cause = NULL, ...)
object |
|
cause |
|
... |
Passed to the underlying |
For competing risks, fit_learner() fits one model per cause, stored in
object$learner_fit[[k]] for k = 1, 2, ..., K. This method simply dispatches
to the underlying model’s coef() method for each fitted object.
Learner-dependent output. The returned coefficient object depends on the
base learner used (e.g. a numeric vector, a sparse matrix, a list, etc.).
This method does not post-process or rename coefficients; it returns the output
of coef(object$learner_fit[[k]], ...) unchanged.
If cause is a single integer, returns the coefficient object produced by
coef() for that cause-specific fitted model.
If cause = NULL, returns a list of length object$data_info$n_crisks,
where element [[k]] contains coefficients for cause k.
If no fitted model is present (object$learner_fit is NULL), signals a message
and returns invisible(object).
d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE) bl <- fit_learner(d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 4) # coefficients for cause 1 coef(bl, cause = 1) # coefficients for all causes (list) coef(bl)d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE) bl <- fit_learner(d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 4) # coefficients for cause 1 coef(bl, cause = 1) # coefficients for all causes (list) coef(bl)
Extracts the meta-learner coefficients (stacking weights) from a fitted
poisson_superlearner object returned by Superlearner().
## S3 method for class 'poisson_superlearner' coef(object, cause = NULL, model = "sl", ...)## S3 method for class 'poisson_superlearner' coef(object, cause = NULL, model = "sl", ...)
object |
|
cause |
|
model |
Model selector. Default is
|
... |
Passed to the underlying |
For each cause k, the ensemble stores a fitted meta-learner in
object$superlearner[[k]]$meta_learner_fit. This method dispatches to the
underlying coef() method for that fitted meta-learner.
What coefficients represent. These coefficients correspond to the meta-learner
regression of the outcome on the cross-validated base-learner predictions
(Z1, Z2, ...). Under the default meta-learner, they are the stacking
weights (on the scale defined by the meta-learner).
Learner-dependent output. The returned coefficient object depends on the
meta-learner implementation (by default a glmnet fit, often returning a sparse
matrix). This method does not rename Z* terms or post-process coefficients; it
returns the output of coef(object$superlearner[[k]]$meta_learner_fit, ...)
unchanged.
Single-learner special case. If the ensemble was fit with only one base learner,
no meta-learner is fit and meta_learner_fit is NULL. In that case, coef()
for the poisson_superlearner does not have meta-learner coefficients to return.
If cause is a single integer, returns the coefficient object produced by
coef() for the selected cause-specific fitted model: the meta-learner when
model = "sl" and a meta-learner is available, or the selected base learner
when model selects a base learner or no meta-learner is available.
If cause = NULL, returns a list of length object$data_info$n_crisks,
where element [[k]] contains coefficients for the selected model for cause
k.
If no fitted ensemble is present (object$superlearner is NULL), signals a message
and returns invisible(object).
d <- simulateStenoT1(50, competing_risks = TRUE) learners <- list( glm = Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE), gam = Learner_gam(covariates = c("age", "value_LDL")) ) fit <- Superlearner(d, id="id", status="status_cvd", event_time="time_cvd", learners=learners, number_of_nodes=4, nfold=2) # meta-learner coefficients (cause 1) coef(fit, cause = 1) # meta-learner coefficients for all causes (list) coef(fit)d <- simulateStenoT1(50, competing_risks = TRUE) learners <- list( glm = Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE), gam = Learner_gam(covariates = c("age", "value_LDL")) ) fit <- Superlearner(d, id="id", status="status_cvd", event_time="time_cvd", learners=learners, number_of_nodes=4, nfold=2) # meta-learner coefficients (cause 1) coef(fit, cause = 1) # meta-learner coefficients for all causes (list) coef(fit)
Pre-processes subject-level time-to-event data into a long Poisson format on a piecewise-constant time grid, then fits one initialized learner object. For competing risks, a separate model is fit for each event type (cause) using the standard cause-specific Poisson likelihood on the long data.
fit_learner( data, learner, id = "id", stratified_k_fold = FALSE, status = "status", event_time = NULL, number_of_nodes = NULL, nodes = NULL, variable_transformation = NULL, ... )fit_learner( data, learner, id = "id", stratified_k_fold = FALSE, status = "status", event_time = NULL, number_of_nodes = NULL, nodes = NULL, variable_transformation = NULL, ... )
data |
|
learner |
Reference-class learner object (e.g. from |
id |
|
stratified_k_fold |
|
status |
|
event_time |
|
number_of_nodes |
|
nodes |
|
variable_transformation |
|
... |
Additional arguments currently ignored. |
An object of class base_learner, i.e. a named list with:
The learner object that was fit (the input learner), stored
for later prediction. This contains the learner specification (e.g.,
covariates, tuning parameters).
A list of fitted model objects, one per cause.
Its length equals data_info$n_crisks. The list is created by splitting the
internally pre-processed long data by cause indicator k and calling
model$private_fit() on each split.
Names typically correspond to the cause labels "1", "2", ..., "K".
Each element is learner-dependent: e.g. for Learner_glmnet it
may be a "glmnet" (often wrapped, e.g. "fishnet") fit; for other
learners it will be whatever $private_fit() returns.
Each fitted object is trained on long Poisson data representing the piecewise-constant hazard for that cause across the node intervals.
A list of bookkeeping information needed for prediction and
interpretation:
Identifier column name used.
Status column name used.
Event/censoring time column name used.
Numeric vector of node cut points used for the piecewise grid
(includes 0 and is sorted). These are the interval boundaries used in
the long Poisson representation.
max(data[[event_time]]).
Number of event types (causes) detected.
If censoring is present (0 in status), then n_crisks = #unique(status) - 1;
otherwise n_crisks = #unique(status).
The transformation specification passed in
variable_transformation (or NULL).
d <- simulateStenoT1(50, competing_risks = TRUE) lrn <- Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE) bl <- fit_learner(d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 2)d <- simulateStenoT1(50, competing_risks = TRUE) lrn <- Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE) bl <- fit_learner(d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 2)
mgcv::bam
Learner_gam is a Reference Class implementing the learner interface
used by Superlearner() and fit_learner().
covariates |
|
cross_validation |
|
User-facing API: users should only initialize the learner and pass it
to Superlearner() / fit_learner(). The remaining methods documented below
are part of the internal learner interface and are not meant to be called
directly by users.
Wrapper role: this class wraps mgcv::bam in a piecewise-constant hazard
workflow. The package-specific contribution is to provide a convenient
interface for the long-format Poisson likelihood with offsets for time at risk,
and optional node terms encoding the baseline hazard, while forwarding standard
mgcv::bam arguments supplied via ....
Let denote time knots and
define interval indicators .
The piecewise-constant hazard model with an additive predictor is
The additive predictor is constructed from covariates
(smooth terms such as s(age) and/or linear terms) and estimated by mgcv.
covariates (character)Terms used to build the additive predictor (may include s() terms).
cross_validation (logical)Workflow flag; see Details.
intercept (logical)Whether to include an intercept.
formula (character)Formula string passed to mgcv::bam.
learner (function)Backend fitter (mgcv::bam).
fit_arguments (list)Additional arguments forwarded to mgcv::bam.
initialize(...)Construct and configure the learner. This is the only method users should call.
private_fit(data, ...)Internal. Fits a Poisson GAM with offset log(tij) on long-format data.
private_fit_all_causes(data, ...)Internal. Fits cause-specific Poisson GAMs for all requested causes using a shared long-format data setup.
private_predictor(model, newdata, ...)Internal. Predicts hazards on the response scale.
lrn <- Learner_gam(covariates = c("s(age)", "value_LDL"))lrn <- Learner_gam(covariates = c("s(age)", "value_LDL"))
glmnet
Learner_glmnet is a Reference Class implementing the learner interface
used by Superlearner() and fit_learner().
User-facing API: users are expected to initialize the learner (i.e.,
call Learner_glmnet(...)) and pass the resulting object to
Superlearner() or fit_learner(). The remaining methods documented below
are part of the internal learner interface and are not meant to be called
directly by users.
Wrapper role: this class is a user-friendly wrapper around the existing
glmnet implementation. The package-specific contribution is to provide a
piecewise-constant hazard workflow: create the long-format Poisson data with
offsets for time at risk, include interval ("node") indicators for the
baseline hazard, and forward standard glmnet arguments supplied at
initialization to the backend fitter.
Let denote time knots and
define interval indicators .
The piecewise-constant hazard model is
Penalization is applied to the regression coefficients through the glmnet
elastic-net penalty. Node (baseline) terms are given zero penalty by default;
if this backend call fails, the learner retries with a fully penalized design.
covariates (character)Names of covariate columns used in the model.
cross_validation (logical)If TRUE, chooses lambda by glmnet::cv.glmnet.
intercept (logical)Backend intercept flag; currently fixed to TRUE
by the constructor.
lambda (numeric)If cross_validation=FALSE, the lambda used in the final fit.
formula (character)Formula string used to create the design matrix in long format.
learner (function)Backend fitter (glmnet::glmnet or glmnet::cv.glmnet).
fit_arguments (list)Additional arguments forwarded to the backend fitter.
initialize(...)Construct and configure the learner. This is the only method users should call.
private_fit(data, ...)Internal. Fits a Poisson model with offset log(tij) on long-format data.
private_fit_all_causes(data, ...)Internal. Fits cause-specific Poisson models for all requested causes using a shared long-format data setup.
private_predictor(model, newdata, ...)Internal. Predicts hazards on the response scale for long-format newdata.
lrn <- Learner_glmnet(covariates = c("age", "sex"), alpha = 1, cross_validation = TRUE)lrn <- Learner_glmnet(covariates = c("age", "sex"), alpha = 1, cross_validation = TRUE)
Learner_hal is a Reference Class implementing the learner interface
used by Superlearner() and fit_learner().
User-facing API: users should only initialize the learner and pass it
to Superlearner() / fit_learner(). The remaining methods documented below
are part of the internal learner interface and are not meant to be called
directly by users.
Wrapper role: this class provides a piecewise-constant hazard wrapper around
a HAL-style indicator-basis construction, estimated by L1-penalized Poisson
regression using a glmnet backend. The package-specific contribution is to
(i) construct the long-format Poisson representation with offsets for time at
risk, (ii) generate indicator bases compatible with piecewise hazards, and
(iii) forward backend fitting arguments supplied via ....
Let denote time knots and
define interval indicators .
The HAL piecewise-constant hazard model is
where is approximated by a finite linear combination of
indicator basis functions.
Let be two covariates and let
be time grid points used to
create step functions in time. Choose covariate cutpoints
for and
for .
Define indicator bases:
A main-effects HAL approximation on the log-hazard scale can be written as:
If max_degree >= 2, the learner additionally includes interaction bases such as
covariatesCovariate columns used to build covariate indicator bases.
num_knotsControls the number of cutpoints per covariate used for indicator bases.
max_degreeMaximum interaction order included in the basis expansion.
interceptWhether the backend penalized regression includes an intercept term.
cross_validationIf TRUE, selects the penalty level using glmnet::cv.glmnet.
maxit_prefitOptional maxit value used for the initial HAL
backend fit. Leave as NA to use the backend default.
fit_argumentsAdditional arguments forwarded to the glmnet backend
(e.g. nfolds).
covariates (character)Names of covariate columns used in the basis.
cross_validation (logical)Whether to use cv.glmnet to select the penalty.
intercept (logical)Backend intercept flag.
max_degree (integer)Maximum interaction order.
num_knots (numeric)Knots used for basis construction.
lambda_opt (numeric)Selected penalty level when using cross-validation.
maxit_prefit (numeric)Optional maxit value used for the initial
HAL backend fit.
fit_arguments (list)Extra backend arguments forwarded to glmnet.
initialize(...)Construct and configure the learner. This is the only method users should call.
hal_basis(...)Internal helper. Constructs HAL basis matrices and metadata for fitting.
hal_prepare_new(...)Internal helper. Builds prediction-time HAL basis matrices from fitted basis metadata.
private_fit(data, ...)Internal. Builds bases and fits the penalized Poisson model with offset log(tij).
private_fit_all_causes(data, ...)Internal. Fits penalized Poisson HAL models for all requested causes using a shared basis setup.
private_predictor(model, newdata, ...)Internal. Evaluates the fitted approximation and returns hazards on the response scale.
lrn <- Learner_hal(covariates = c("age", "sex"), max_degree = 2L, num_knots = c(10L, 5L))lrn <- Learner_hal(covariates = c("age", "sex"), max_degree = 2L, num_knots = c(10L, 5L))
Computes, per row, the cumulative incidence function at the end of each interval,
grouped by id. The number of causes is inferred from the number of columns in haz.
pch_absolute_risk(id, dt, haz, cause_idx, one_based = TRUE, na_is_zero = FALSE)pch_absolute_risk(id, dt, haz, cause_idx, one_based = TRUE, na_is_zero = FALSE)
id |
Integer vector. Sorted by |
dt |
Numeric vector of interval lengths. |
haz |
Numeric matrix (n x C) of cause-specific hazards per interval. Columns correspond to causes 1..C. |
cause_idx |
Integer. Index of the cause of interest (1-based by default). |
one_based |
Logical. If |
na_is_zero |
Logical. If |
Numeric vector of cumulative incidence values at the end of each interval.
id <- c(1L, 1L, 2L, 2L) dt <- c(1, 1, 1, 1) haz <- rbind( c(0.10, 0.05), c(0.20, 0.10), c(0.05, 0.02), c(0.10, 0.03) ) pch_absolute_risk(id = id, dt = dt, haz = haz, cause_idx = 1)id <- c(1L, 1L, 2L, 2L) dt <- c(1, 1, 1, 1) haz <- rbind( c(0.10, 0.05), c(0.20, 0.10), c(0.05, 0.02), c(0.10, 0.03) ) pch_absolute_risk(id = id, dt = dt, haz = haz, cause_idx = 1)
Computes the cumulative incidence function using the first-order Euler (discrete) approximation:
Grouped by id, this returns the cumulative incidence at the end of each interval.
pch_absolute_risk_euler( id, dt, haz, cause_idx, one_based = TRUE, na_is_zero = FALSE )pch_absolute_risk_euler( id, dt, haz, cause_idx, one_based = TRUE, na_is_zero = FALSE )
id |
Integer vector. Sorted by |
dt |
Numeric vector of interval lengths. |
haz |
Numeric matrix (n x C) of cause-specific hazards per interval. |
cause_idx |
Integer. Index of the cause of interest (1-based by default). |
one_based |
Logical. If |
na_is_zero |
Logical. If |
Numeric vector of cumulative incidence values (Euler approximation) at the end of each interval.
id <- c(1L, 1L, 2L, 2L) dt <- c(1, 1, 1, 1) haz <- rbind( c(0.10, 0.05), c(0.20, 0.10), c(0.05, 0.02), c(0.10, 0.03) ) pch_absolute_risk_euler(id = id, dt = dt, haz = haz, cause_idx = 1)id <- c(1L, 1L, 2L, 2L) dt <- c(1, 1, 1, 1) haz <- rbind( c(0.10, 0.05), c(0.20, 0.10), c(0.05, 0.02), c(0.10, 0.03) ) pch_absolute_risk_euler(id = id, dt = dt, haz = haz, cause_idx = 1)
Computes survival at the end of each interval for competing risks with piecewise constant hazards.
pch_survival(id, dt, haz, na_is_zero = FALSE)pch_survival(id, dt, haz, na_is_zero = FALSE)
id |
Integer vector of subject IDs, sorted by id then time. |
dt |
Numeric vector of interval lengths. |
haz |
Numeric matrix (n x C) of cause-specific hazards. |
na_is_zero |
Logical. If TRUE, treat NA hazards as zero. |
Numeric vector of survival probabilities at the end of each interval.
id <- c(1L, 1L, 2L, 2L) dt <- c(1, 1, 1, 1) haz <- rbind( c(0.10, 0.05), c(0.20, 0.10), c(0.05, 0.02), c(0.10, 0.03) ) pch_survival(id = id, dt = dt, haz = haz)id <- c(1L, 1L, 2L, 2L) dt <- c(1, 1, 1, 1) haz <- rbind( c(0.10, 0.05), c(0.20, 0.10), c(0.05, 0.02), c(0.10, 0.03) ) pch_survival(id = id, dt = dt, haz = haz)
Computes cause-specific piecewise-constant hazards (pwch_k), the corresponding
survival function, and absolute risk for a given cause, at user-supplied
prediction horizons times, using a fitted base_learner object (single learner;
no stacking).
## S3 method for class 'base_learner' predict(object, newdata, times, cause = 1, ...)## S3 method for class 'base_learner' predict(object, newdata, times, cause = 1, ...)
object |
|
newdata |
|
times |
|
cause |
|
... |
Additional arguments (currently ignored). |
Internally, newdata is expanded to a Cartesian product with times, converted to
long Poisson format on object$data_info$nodes, and the fitted learner for each
cause in object$learner_fit is used to predict the cause-specific hazards.
Survival and absolute risk are then computed from the predicted hazards.
Special case times = 0: when 0 is included in times, the returned rows
have survival_function = 1, absolute_risk = 0, and all pwch_k = 0 at time 0.
Identifiers in the output: if newdata contains the id column, it is carried
into the output. If newdata does not contain an id column, an internal id is
created for computation, but it is not guaranteed to appear in the returned table
unless it was present in newdata.
A data.table with one row per (row in newdata, time in times) and columns:
All columns from newdata (excluding ignored event columns).
A column with name object$data_info$event_time holding the requested horizon.
Predicted cause-specific piecewise hazards at the horizon.
Predicted survival probability at the horizon.
Predicted cumulative incidence (absolute risk) for cause at the horizon.
d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE) bl <- fit_learner(d, learner = lrn, id="id", status="status_cvd", event_time="time_cvd", number_of_nodes=8) p <- predict(bl, newdata = d[1:5], times = c(0, 2, 5), cause = 1) head(p)d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE) bl <- fit_learner(d, learner = lrn, id="id", status="status_cvd", event_time="time_cvd", number_of_nodes=8) p <- predict(bl, newdata = d[1:5], times = c(0, 2, 5), cause = 1) head(p)
Computes cause-specific piecewise-constant hazards (pwch_k), the corresponding
survival function, and absolute risk for a given cause, at user-supplied
prediction horizons times, for each row in newdata.
## S3 method for class 'poisson_superlearner' predict(object, newdata, times, cause = 1, model = "sl", ...)## S3 method for class 'poisson_superlearner' predict(object, newdata, times, cause = 1, model = "sl", ...)
object |
|
newdata |
|
times |
|
cause |
|
model |
Model selector. Default is
Numeric positions refer to the learners actually retained for each cause in the fitted object. |
... |
Additional arguments (currently ignored). |
Internally, newdata is expanded to a Cartesian product with the requested
times, converted to long Poisson format on object$data_info$nodes, and hazards
are predicted either from the stacked super learner (model = "sl"), the
discrete super learner (model = "discrete_sl"), or selected fitted base
learners. Survival and absolute risk are then computed from the predicted
hazards.
Special case times = 0: when 0 is included in times, the returned rows
have survival_function = 1, absolute_risk = 0, and all pwch_k = 0 at time 0.
Identifiers in the output: if newdata contains the id column, it is carried
into the output. If newdata does not contain an id column, an internal id is
created for computation, but it is not guaranteed to appear in the returned table
unless it was present in newdata.
A data.table with one row per (row in newdata, time in times) and columns:
All columns from newdata (excluding ignored event columns).
A column with name object$data_info$event_time holding the requested horizon.
Predicted cause-specific piecewise hazards at the horizon.
Predicted survival probability at the horizon.
Predicted cumulative incidence (absolute risk) for cause at the horizon.
d <- simulateStenoT1(30, competing_risks = TRUE) learners <- list( lasso = Learner_glmnet( covariates = "sex", alpha = 1, lambda = 0.01, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = d, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 ) p <- predict(fit, newdata = d[1:3], times = c(0, 2), cause = 1) p[, .(id, time_cvd, absolute_risk)]d <- simulateStenoT1(30, competing_risks = TRUE) learners <- list( lasso = Learner_glmnet( covariates = "sex", alpha = 1, lambda = 0.01, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = d, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 ) p <- predict(fit, newdata = d[1:3], times = c(0, 2), cause = 1) p[, .(id, time_cvd, absolute_risk)]
Absolute-risk matrix predictions for a fitted base learner
## S3 method for class 'base_learner' predictRisk(object, newdata, times, cause = 1, ...)## S3 method for class 'base_learner' predictRisk(object, newdata, times, cause = 1, ...)
object |
|
newdata |
|
times |
|
cause |
|
... |
Unused. |
numeric matrix with nrow(newdata) rows and length(times) columns.
d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet( covariates = c("sex", "value_LDL"), lambda = 0.01, cross_validation = FALSE ) bl <- fit_learner( d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 3 ) if (requireNamespace("riskRegression", quietly = TRUE)) { riskRegression::predictRisk(bl, newdata = d[1:3], times = c(1, 3), cause = 1) }d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet( covariates = c("sex", "value_LDL"), lambda = 0.01, cross_validation = FALSE ) bl <- fit_learner( d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 3 ) if (requireNamespace("riskRegression", quietly = TRUE)) { riskRegression::predictRisk(bl, newdata = d[1:3], times = c(1, 3), cause = 1) }
S3 method compatible with riskRegression::predictRisk returning one column
per requested time.
## S3 method for class 'poisson_superlearner' predictRisk(object, newdata, times, cause = 1, model = "sl", ...)## S3 method for class 'poisson_superlearner' predictRisk(object, newdata, times, cause = 1, model = "sl", ...)
object |
|
newdata |
|
times |
|
cause |
|
model |
Model selector. Default is |
... |
Unused. |
numeric matrix with nrow(newdata) rows and length(times) columns.
d <- simulateStenoT1(30, competing_risks = TRUE) learners <- list( lasso = Learner_glmnet( covariates = "sex", alpha = 1, lambda = 0.01, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = d, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 ) if (requireNamespace("riskRegression", quietly = TRUE)) { riskRegression::predictRisk(fit, newdata = d[1:3], times = c(1, 3), cause = 1) }d <- simulateStenoT1(30, competing_risks = TRUE) learners <- list( lasso = Learner_glmnet( covariates = "sex", alpha = 1, lambda = 0.01, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = d, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 ) if (requireNamespace("riskRegression", quietly = TRUE)) { riskRegression::predictRisk(fit, newdata = d[1:3], times = c(1, 3), cause = 1) }
base_learner
Prints a compact description of the fitted base learner, including the learner type, the time-grid used, and (optionally) the fitted model object for a given cause.
## S3 method for class 'base_learner' print(x, cause = 1, ...)## S3 method for class 'base_learner' print(x, cause = 1, ...)
x |
|
cause |
|
... |
Passed to the underlying fitted object |
Invisibly returns x.
d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet( covariates = c("sex", "value_LDL"), lambda = 0.01, cross_validation = FALSE ) bl <- fit_learner( d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 3 ) print(bl, cause = NULL)d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet( covariates = c("sex", "value_LDL"), lambda = 0.01, cross_validation = FALSE ) bl <- fit_learner( d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 3 ) print(bl, cause = NULL)
poisson_superlearner
Prints a compact description of the fitted Poisson Super Learner, including the number of base learners, the meta-learner, the time-grid used, and competing-risk structure. Optionally prints the fitted meta-learner for a given cause.
## S3 method for class 'poisson_superlearner' print(x, cause = 1, model = "sl", ...)## S3 method for class 'poisson_superlearner' print(x, cause = 1, model = "sl", ...)
x |
|
cause |
|
model |
Model selector. Default is
|
... |
Passed to the underlying fitted meta-learner |
Invisibly returns x.
d <- simulateStenoT1(30, competing_risks = TRUE) learners <- list( lasso = Learner_glmnet( covariates = "sex", alpha = 1, lambda = 0.01, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = d, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 ) print(fit, cause = NULL)d <- simulateStenoT1(30, competing_risks = TRUE) learners <- list( lasso = Learner_glmnet( covariates = "sex", alpha = 1, lambda = 0.01, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = d, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 ) print(fit, cause = NULL)
Simulate synthetic data inspired by the Steno Type-1 risk engine
simulateStenoT1( n, coefficient_age = 0.05, coefficient_LDL = 0.1, value_diabetis = 0.02, seed = NULL, keep = NULL, scenario = c("alpha", "beta"), competing_risks = FALSE )simulateStenoT1( n, coefficient_age = 0.05, coefficient_LDL = 0.1, value_diabetis = 0.02, seed = NULL, keep = NULL, scenario = c("alpha", "beta"), competing_risks = FALSE )
n |
|
coefficient_age |
|
coefficient_LDL |
|
value_diabetis |
|
seed |
|
keep |
|
scenario |
|
competing_risks |
|
Generates baseline covariates and event times for CVD and censoring, with an optional competing-risks setting, for examples, benchmarks and tests.
The simulator uses a structural equation model (via lava::lvm) to generate
realistic correlations between covariates. Event times are then generated from
cause-specific Weibull proportional hazards models, where the linear predictor
depends on the simulated covariates (and scenario).
The following baseline covariates are generated (column name, type, interpretation):
factor. Binary sex indicator (generated Bernoulli, then stored as factor).
numeric. Age at baseline (years).
numeric. Duration of diabetes at baseline (years).
numeric. Systolic blood pressure (SBP).
numeric. LDL cholesterol.
numeric. HbA1c.
factor with levels Normal, Micro, Macro. Albuminuria category.
numeric. Estimated glomerular filtration rate, constructed from latent
age-dependent log2 eGFR components (higher values indicate better kidney function).
factor. Smoking indicator (generated from a logistic model, then stored as factor).
factor. Physical activity indicator (generated from a logistic model, then stored as factor).
Event time variables are generated from latent Weibull PH models:
time.event.1 (CVD), time.event.0 (censoring), and, if
competing_risks = TRUE, time.event.2 (death without prior CVD).
These latent variables are used to
construct the observed outcome variables returned by the function (see below).
A data.table with at least the following columns:
integer. Subject identifier (1, ..., n).
numeric. Observed follow-up time (minimum of event and censoring times;
also includes competing risk time if competing_risks = TRUE in scenario "alpha").
integer. Observed event status:
0 = censored, 1 = CVD, and if competing_risks = TRUE in scenario "alpha",
2 = death without prior CVD.
numeric. Alias of time_cvd (kept for convenience).
integer. Alias of status_cvd (kept for convenience).
numeric. Event time ignoring censoring
(minimum of event causes only).
integer. Event cause ignoring censoring.
In scenario "alpha" this is 1 (CVD) or 2 (death without CVD);
in scenario "beta" this is always 1.
numeric. Alias of uncensored_time_cvd.
integer. Alias of uncensored_status_cvd.
In addition, the returned table contains all baseline covariates listed in
Details. Internal latent variables used only for simulation are removed
before returning (e.g., log2 eGFR components and, in scenario "beta", the
hinge-squared features).
Thomas A. Gerds [email protected]
simulateStenoT1(n = 20, scenario = "alpha", competing_risks = TRUE)simulateStenoT1(n = 20, scenario = "alpha", competing_risks = TRUE)
Dispatches to the underlying fitted model’s summary() method for the selected
cause, or returns a list of summaries for all causes.
## S3 method for class 'base_learner' summary(object, cause = 1, ...)## S3 method for class 'base_learner' summary(object, cause = 1, ...)
object |
|
cause |
|
... |
Passed to the underlying |
If cause is a single integer, returns the underlying model summary for
that cause. If cause = NULL, returns a list of summaries (one per cause).
d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet( covariates = c("sex", "value_LDL"), lambda = 0.01, cross_validation = FALSE ) bl <- fit_learner( d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 3 ) out <- summary(bl, cause = 1)d <- simulateStenoT1(30, competing_risks = TRUE) lrn <- Learner_glmnet( covariates = c("sex", "value_LDL"), lambda = 0.01, cross_validation = FALSE ) bl <- fit_learner( d, learner = lrn, id = "id", status = "status_cvd", event_time = "time_cvd", number_of_nodes = 3 ) out <- summary(bl, cause = 1)
Prints:
a compact description of the fitted ensemble,
cross-validated deviances for base learners (when available),
cause-specific meta-learner coefficients (stacking weights).
## S3 method for class 'poisson_superlearner' summary(object, cause = NULL, model = "sl", ...)## S3 method for class 'poisson_superlearner' summary(object, cause = NULL, model = "sl", ...)
object |
|
cause |
|
model |
Model selector. Default is
|
... |
Passed to the underlying |
Invisibly returns a list with elements:
data.table (or NULL).
List of length n_crisks with cause-specific coefficient objects (or NULL).
d <- simulateStenoT1(30, competing_risks = TRUE) learners <- list( lasso = Learner_glmnet( covariates = "sex", alpha = 1, lambda = 0.01, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = d, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 ) s <- summary(fit, cause = 1) names(s)d <- simulateStenoT1(30, competing_risks = TRUE) learners <- list( lasso = Learner_glmnet( covariates = "sex", alpha = 1, lambda = 0.01, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = d, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 ) s <- summary(fit, cause = 1) names(s)
Fits an ensemble of cause-specific piecewise-constant hazard models using a long-format Poisson representation and combines them through a meta-learner (stacking).
Superlearner( data, id = "id", status = "status", event_time = NULL, learners, number_of_nodes = NULL, nodes = NULL, variable_transformation = NULL, nfold = 3, verbose = FALSE, ... )Superlearner( data, id = "id", status = "status", event_time = NULL, learners, number_of_nodes = NULL, nodes = NULL, variable_transformation = NULL, nfold = 3, verbose = FALSE, ... )
data |
|
id |
|
status |
|
event_time |
|
learners |
|
number_of_nodes |
|
nodes |
|
variable_transformation |
Optional transformation specification passed to
|
nfold |
|
verbose |
logical(1). If TRUE, display progress bars during full-data fitting and cross-validation fitting. Defaults to FALSE. |
... |
Additional arguments currently ignored. |
Internally, the function:
builds a time grid (nodes) and converts the subject-level data to a
long Poisson format;
fits each base learner once on the full long data for each cause;
removes learners that already fail on the full data;
uses nfold cross-validation to obtain out-of-sample base-learner
predictions (Z1, Z2, ...) for stacking;
removes learners whose cross-validated prediction column is entirely missing for at least one cause;
fits a cause-specific meta-learner on the retained stacked predictions.
If all learners fail on the full data, the function stops with an error.
If only one learner remains after the full-data screening step or after the
cross-validation screening step, no meta-learner is fit. In that case,
metalearner is NULL, each superlearner[[k]]$meta_learner_fit is NULL,
and prediction is based directly on the stored fitted base learner. If some,
but not all, causes retain only one learner after screening, those causes are
predicted directly while other causes may still use a fitted meta-learner.
Numeric learner positions always refer to the learners actually retained for
the corresponding cause in the fitted object.
An object of class poisson_superlearner, stored as a named list
with the following components:
learners:
a cause-specific list of retained base learner libraries. Thus
learners[[k]][[j]] is the j-th retained learner object for cause k.
metalearner:
a list describing the internal meta-learner used for stacking
(engine = "glmnet::glmnet", Poisson family, no intercept, lambda = 0,
add_nodes = FALSE, log-hazard scale). If no stacking is performed because
only one learner remains for every cause, metalearner is NULL.
superlearner:
a list of length data_info$n_crisks, one entry per cause. For cause k,
superlearner[[k]] is a list with two elements:
learners_fit: the fitted base learner object or objects for cause k.
If more than one learner is retained, this is a list with one fitted
object per retained learner. If only one learner remains, this is the
single fitted learner object itself.
meta_learner_fit: the fitted cause-specific meta-learner for cause
k. If no stacking is performed, this is NULL.
cross_validation_deviance:
a data.table with columns cause_index, cause, learner_index,
learner, and deviance, giving the cross-validated Poisson deviance for
each retained base learner within each cause. This component is absent when
all causes are fitted directly with a single retained learner.
data_info:
a list of bookkeeping information used for prediction and interpretation,
containing:
id: identifier column name used.
status: status column name used.
event_time: event-time column name used.
nodes: numeric vector of node cut points used for the piecewise grid.
nfold: number of folds used for stacking.
maximum_followup: maximum observed follow-up time.
n_crisks: number of event types detected.
learners_labels: list of character vectors with retained learner
labels for each cause.
variable_transformation: the transformation specification passed in
variable_transformation, or NULL.
data <- simulateStenoT1(50, competing_risks = TRUE) learners <- list( glm = Learner_glmnet( covariates = c("sex", "value_LDL"), lambda = 0, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = data, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 )data <- simulateStenoT1(50, competing_risks = TRUE) learners <- list( glm = Learner_glmnet( covariates = c("sex", "value_LDL"), lambda = 0, cross_validation = FALSE ), ridge = Learner_glmnet( covariates = c("sex", "value_LDL"), alpha = 0, lambda = 0.01, cross_validation = FALSE ) ) fit <- Superlearner( data = data, id = "id", status = "status_cvd", event_time = "time_cvd", learners = learners, number_of_nodes = 3, nfold = 2 )