Families and model types in fbrglm

Overview

fbrglm exposes a single formula/data interface for regularized GLMs and reuses one set of S3 methods across model types. The common operations are:

  • fbrglm(formula, data, family, lambda, ...)
  • print() / summary() / coef() / nobs()
  • predict(newdata, type = c("link", "response", "class"), newoffset)
  • plot() (delegates to the underlying glmnet / cv.glmnet plot)
  • as_glmnet() / as_cv_glmnet()

Classical standard errors, z values, and p values are not produced under infer = "none" (the only inference mode currently implemented). The package returns regularized point estimates; inference modes are the next milestone.

The family argument accepts the same forms as glmnet::glmnet() itself: the six canonical character strings ("gaussian", "binomial", "poisson", "cox", "multinomial", "mgaussian"), any GLM family object (e.g. stats::Gamma(link = "log")), or a family object from another package (e.g. MASS::negative.binomial(theta = 2)). fbrglm validates the argument shape, stores both the value passed to glmnet and a display name on the fit, and wraps glmnet’s errors with the offending family name when something downstream fails.

library(fbrglm)

Summary of supported model types

Model fbrglm formulation Response Status
Linear regression family = "gaussian" continuous y supported
Logistic regression family = "binomial" 0 / 1 y supported
Poisson regression family = "poisson" count y supported
Piecewise exponential survival "poisson" + interval factor + offset(log(exposure)) event count / interval supported (formulation)
Native Cox regression family = "cox" + Surv(time, status) ~ ... survival object supported (experimental)
Gamma regression family = stats::Gamma(link = "log") positive continuous supported (experimental)
Negative binomial (fixed θ) family = MASS::negative.binomial(theta = ...) overdispersed counts supported (fixed θ only)
Multinomial regression family = "multinomial" factor y (≥ 3 levels) experimental
Multi-response Gaussian family = "mgaussian", cbind(y1, y2) ~ ... LHS matrix y experimental

Two important caveats:

  • The piecewise exponential survival model and native Cox model are not the same thing. The former is a Poisson regression on a long-format dataset with an interval factor and offset(log(exposure)); the latter is glmnet’s partial-likelihood Cox path triggered by family = "cox" together with a Surv() LHS. Both are useful, but the parameters they estimate and the data they consume are different. Do not conflate them.
  • Negative binomial support is for the fixed-θ family object (MASS::negative.binomial(theta = ...)). Joint estimation of θ in the style of MASS::glm.nb() is a different problem (alternating optimization plus a separate likelihood for θ) and is not implemented here.

Linear regression

set.seed(101)
n <- 150
dat_lm <- data.frame(
    y  = 0.5 + 0.8 * rnorm(n) - 0.3 * rnorm(n) + rnorm(n, sd = 0.5),
    x1 = rnorm(n),
    x2 = rnorm(n)
)
fit_lm <- fbrglm(y ~ x1 + x2, data = dat_lm,
                 family = "gaussian",
                 lambda = "fix", lambda_value = 0.05)
coef(fit_lm)
#> (Intercept)          x1          x2 
#>   0.4336309   0.0000000   0.0000000
head(predict(fit_lm, type = "response"))
#> [1] 0.4336309 0.4336309 0.4336309 0.4336309 0.4336309 0.4336309

Logistic regression

set.seed(102)
n <- 200
dat_logit <- data.frame(
    y  = rbinom(n, 1, 0.5),
    x1 = rnorm(n), x2 = rnorm(n)
)
fit_logit <- fbrglm(y ~ x1 + x2, data = dat_logit,
                    family = "binomial",
                    lambda = "fix", lambda_value = 0.05)
head(predict(fit_logit, type = "response"))   # probabilities in [0, 1]
#> [1] 0.4581520 0.4524843 0.4504296 0.4556070 0.4548203 0.4551897
head(predict(fit_logit, type = "class"))      # 0 / 1
#> [1] 0 0 0 0 0 0

Poisson regression with an offset

A common count-data setup attaches an exposure (time, area, population) to each row. The Poisson likelihood is written on the event count with log(exposure) as an additive offset on the linear predictor.

set.seed(103)
n <- 200
exposure <- runif(n, min = 0.5, max = 2)
x1 <- rnorm(n); x2 <- rnorm(n)
rate <- exp(-1 + 0.4 * x1 - 0.2 * x2)
dat_pois <- data.frame(
    y  = rpois(n, rate * exposure),
    x1 = x1, x2 = x2,
    exposure = exposure
)

fit_pois <- fbrglm(y ~ x1 + x2, data = dat_pois,
                   family = "poisson",
                   offset = log(dat_pois$exposure),
                   lambda = "fix", lambda_value = 0.05)

new_dat <- dat_pois[1:5, ]
predict(fit_pois,
        newdata   = new_dat,
        newoffset = log(new_dat$exposure),
        type = "response")
#> [1] 0.3353762 0.3098936 0.4821580 0.3127004 0.2636854

Piecewise exponential survival model via Poisson regression

A Cox-like survival model can be fit with a Poisson regression on a long-format dataset whose rows are subject × time-interval. The construction is:

  • split each subject’s follow-up time into a small number of intervals;
  • for each interval the subject is at risk in, generate one row with the time at risk (exposure), an event indicator (event), the interval label (a factor), and the time-fixed covariates;
  • model event ~ x1 + x2 + interval + offset(log(exposure)) with a Poisson likelihood.

This is the piecewise exponential model and is closely related to Cox regression when the intervals are fine enough. It is not the same as glmnet’s partial-likelihood Cox path (see the next section).

set.seed(104)
n_subj <- 120
breaks <- c(0, 1, 2, 3)
n_int  <- length(breaks) - 1

x1 <- rnorm(n_subj)
x2 <- rnorm(n_subj)
rate    <- exp(-1 + 0.4 * x1 - 0.2 * x2)
T_event <- rexp(n_subj, rate)
T_obs   <- pmin(T_event, 3)
delta   <- as.integer(T_event <= 3)

rows <- list()
for (i in seq_len(n_subj)) {
    for (j in seq_len(n_int)) {
        lo <- breaks[j]; hi <- breaks[j + 1]
        if (T_obs[i] <= lo) next
        exposure   <- min(T_obs[i], hi) - lo
        if (exposure <= 0) next
        event_here <- as.integer(delta[i] == 1L &
                                 T_obs[i] >  lo &
                                 T_obs[i] <= hi)
        rows[[length(rows) + 1]] <- data.frame(
            id       = i,
            interval = j,
            exposure = exposure,
            event    = event_here,
            x1       = x1[i],
            x2       = x2[i]
        )
    }
}
long_dat <- do.call(rbind, rows)
long_dat$interval <- factor(long_dat$interval, levels = seq_len(n_int))

fit_pwe <- fbrglm(event ~ x1 + x2 + interval,
                  data   = long_dat,
                  family = "poisson",
                  offset = log(long_dat$exposure),
                  lambda = "fix", lambda_value = 0.05)
coef(fit_pwe)
#> (Intercept)          x1          x2   interval2   interval3 
#> -0.90734870  0.03831948 -0.14253822  0.00000000  0.00000000

Native Cox regression

family = "cox" together with survival::Surv(time, status) ~ ... on the LHS hands the problem to glmnet’s partial-likelihood Cox path directly. fbrglm carries the Surv response through model.frame() without converting it. type = "class" is not meaningful here and is rejected.

if (requireNamespace("survival", quietly = TRUE)) {
    set.seed(105)
    n <- 200
    x1 <- rnorm(n); x2 <- rnorm(n)
    rate    <- exp(0.4 * x1 - 0.2 * x2)
    T_event <- rexp(n, rate)
    t_obs   <- pmin(T_event, 3)
    delta   <- as.integer(T_event <= 3)
    dat_cox <- data.frame(t_obs = t_obs, delta = delta,
                          x1 = x1, x2 = x2)
    fit_cox <- fbrglm(survival::Surv(t_obs, delta) ~ x1 + x2,
                      data   = dat_cox,
                      family = "cox",
                      lambda = "fix", lambda_value = 0.05)
    fit_cox$family_name
    head(predict(fit_cox, type = "link"))
}
#> [1] -0.25503520  0.18920995  0.23509476  0.34977148 -0.08956644 -0.11334088

Cox support is marked experimental: glmnet’s coxnet is mature on its own, but fbrglm has not yet been exercised against the full breadth of Cox usage (strata, ties, time-varying covariates), so unusual setups may need to fall back to as_glmnet(fit_cox).

Gamma regression

set.seed(106)
n <- 200
x1 <- rnorm(n); x2 <- rnorm(n)
eta <- 0.4 * x1 - 0.2 * x2
dat_g <- data.frame(
    y  = rgamma(n, shape = 2, rate = exp(-eta)),
    x1 = x1, x2 = x2
)
fit_g <- fbrglm(y ~ x1 + x2, data = dat_g,
                family = stats::Gamma(link = "log"),
                lambda = "fix", lambda_value = 0.05)
fit_g$family_name
#> [1] "Gamma"
head(predict(fit_g, type = "response"))
#> [1] 1.491878 1.807995 1.382459 2.960397 2.206824 2.523459

Negative binomial regression (fixed θ)

if (requireNamespace("MASS", quietly = TRUE)) {
    set.seed(107)
    n <- 200
    x1 <- rnorm(n); x2 <- rnorm(n)
    mu <- exp(0.4 * x1 - 0.2 * x2)
    dat_nb <- data.frame(
        y  = rnbinom(n, mu = mu, size = 2),
        x1 = x1, x2 = x2
    )
    fit_nb <- fbrglm(y ~ x1 + x2, data = dat_nb,
                     family = MASS::negative.binomial(theta = 2),
                     lambda = "fix", lambda_value = 0.05)
    fit_nb$family_name
    head(predict(fit_nb, type = "response"))
}
#> [1] 0.4753379 0.9441341 1.1140226 1.2485429 0.6242180 0.7468528

Joint estimation of θ (MASS::glm.nb() style) is not in scope here; θ must be chosen externally.

Multinomial regression (experimental)

set.seed(108)
n <- 200
dat_mult <- data.frame(
    y  = factor(sample(c("a", "b", "c"), n, replace = TRUE)),
    x1 = rnorm(n), x2 = rnorm(n)
)
fit_mult <- fbrglm(y ~ x1 + x2, data = dat_mult,
                   family = "multinomial",
                   lambda = "fix", lambda_value = 0.05)
fit_mult$family_name
#> [1] "multinomial"
p_resp <- predict(fit_mult, type = "response")
dim(p_resp)     # one column per class
#> [1] 200   3
head(p_resp)
#>       a   b     c
#> 1 0.335 0.3 0.365
#> 2 0.335 0.3 0.365
#> 3 0.335 0.3 0.365
#> 4 0.335 0.3 0.365
#> 5 0.335 0.3 0.365
#> 6 0.335 0.3 0.365
head(predict(fit_mult, type = "class"))
#> [1] "c" "c" "c" "c" "c" "c"

For multinomial fits, coef(fit) returns a list of (sparse) matrices — one per class — exactly as glmnet does; callers who want the underlying object can use as_glmnet(fit_mult).

Multi-response Gaussian (experimental)

set.seed(109)
n <- 150
dat_mg <- data.frame(
    y1 = rnorm(n), y2 = rnorm(n),
    x1 = rnorm(n), x2 = rnorm(n)
)
fit_mg <- fbrglm(cbind(y1, y2) ~ x1 + x2, data = dat_mg,
                 family = "mgaussian",
                 lambda = "fix", lambda_value = 0.05)
fit_mg$family_name
#> [1] "mgaussian"
pred_mg <- predict(fit_mg, type = "response")
dim(pred_mg)   # one column per response
#> [1] 150   2

Limitations

  • infer = "none" only — no classical standard errors, z values, p values, or confidence intervals. Honest post-selection inference is the next milestone.
  • Native Cox, multinomial, and mgaussian are marked experimental: fit and predict work for the basic cases shown here, but the more unusual variants (Cox strata, ties handling, multinomial grouped / ungrouped) have not been exhaustively tested.
  • Negative binomial supports fixed θ only. Joint θ estimation in the spirit of MASS::glm.nb() is out of scope.
  • The piecewise exponential survival example is a Poisson formulation and is not identical to native Cox partial likelihood.