| Title: | Safe Formula-Based Regularized Generalized Linear Models |
|---|---|
| Description: | A formula-based wrapper around 'glmnet' that brings the 'glm()'-compatible modeling workflow to regularized generalized linear models. Training-time 'terms', 'xlevels', and 'contrasts' are stored on the fit object and reused at predict time, so the design matrix is reconstructed consistently across sessions. Complete-case bookkeeping is exposed via 'nobs_info', and linearly dependent columns are detected by a QR pivot and reported as 'NA' in 'coef()' and 'summary()' (the 'stats::glm()' convention), distinguishing "not identifiable" from "shrunk to zero by the penalty". Novel factor levels at predict time raise the same error 'stats::predict.glm()' does by default, with 'on_new_levels = "na"' as a production-style opt-in. Accepts character family strings ('gaussian', 'binomial', 'poisson', 'cox', 'multinomial', 'mgaussian') and any 'glm' family object the underlying 'glmnet' itself accepts, including 'Gamma' and fixed-theta negative binomial via 'MASS::negative.binomial'. |
| Authors: | Koki Tsuyuzaki [aut, cre] |
| Maintainer: | Koki Tsuyuzaki <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.1 |
| Built: | 2026-06-22 19:33:16 UTC |
| Source: | https://github.com/cran/fbrglm |
Returns the raw cv.glmnet object stored inside an fbrglm model. This
is NULL when the model was fit with lambda = "fix".
as_cv_glmnet(object, ...)as_cv_glmnet(object, ...)
object |
An |
... |
Ignored. |
A cv.glmnet object, or NULL.
Returns the raw glmnet object stored inside an fbrglm model. For a
lambda = "fix" fit this is the direct glmnet::glmnet() return; for
a CV fit it is the underlying glmnet.fit (cv_fit$glmnet.fit).
as_glmnet(object, ...)as_glmnet(object, ...)
object |
An |
... |
Ignored. |
A glmnet object, or NULL if no fit has been attached yet.
Fits a regularized generalized linear model with a formula/data interface
that mirrors base R's stats::glm() while delegating the actual penalized
fit to glmnet::glmnet() / glmnet::cv.glmnet().
fbrglm( formula, data, family = c("gaussian", "binomial", "poisson"), weights = NULL, offset = NULL, infer = c("none", "split", "selective"), selection_frac = 0.2, alpha = 1, lambda = c("cv_min", "cv_1se", "fix"), lambda_value = NULL, x = NULL, y = NULL, ... )fbrglm( formula, data, family = c("gaussian", "binomial", "poisson"), weights = NULL, offset = NULL, infer = c("none", "split", "selective"), selection_frac = 0.2, alpha = 1, lambda = c("cv_min", "cv_1se", "fix"), lambda_value = NULL, x = NULL, y = NULL, ... )
formula |
A model formula, e.g. |
data |
A data frame containing the variables in |
family |
A character string ( |
weights |
Optional observation weights, passed to glmnet / cv.glmnet. |
offset |
Optional offset vector, passed to glmnet / cv.glmnet.
Reused at predict time when |
infer |
Inference mode: |
selection_frac |
Selection-share for |
alpha |
Elastic-net mixing parameter, passed to glmnet. |
lambda |
|
lambda_value |
Numeric |
x, y
|
Optional pre-built design matrix and response. Not yet
supported; supply |
... |
Additional arguments forwarded to |
Current scope: infer = "none" only, with the same family argument
surface as glmnet itself. The character strings "gaussian",
"binomial", "poisson", "cox", "multinomial", and "mgaussian"
are accepted; so are GLM family objects (e.g.
stats::Gamma(link = "log"), MASS::negative.binomial(theta = 2)).
Native Cox, multinomial, and mgaussian paths are exercised by the
tests but marked experimental: more unusual usage (Cox strata,
tie handling, time-varying covariates) is not yet validated. Joint
theta estimation in the spirit of MASS::glm.nb() is out of scope;
pass the desired theta to MASS::negative.binomial() directly.
lambda rules are cv_min / cv_1se / fix. Rank-deficient
designs are handled in
the spirit of stats::glm(): linearly dependent columns are dropped
via a QR pivot, the underlying glmnet fit only sees the independent
subset, and the dropped columns surface as NA in coef() /
summary(). Novel factor levels in newdata at predict time also
follow stats::predict.glm() by default – an unseen level raises an
error. Production scoring pipelines can opt into
predict(fit, newdata, on_new_levels = "na") to set affected rows
to NA (with a warning) instead. Heavier features (split /
selective inference) are tracked in TODO.md.
An object of class c("fbrglm", "regularized_glm") with
fields including family (the value passed to glmnet – a string or
a family object), family_name (a short display string), weights,
offset, alpha, lambda_rule, lambda_value, infer,
selection_frac, fit (the underlying glmnet object), cv_fit
(cv.glmnet, or NULL for lambda = "fix"), coefficients,
nonzero, terms, xlevels, contrasts, x_colnames, x_train,
nobs_info (n_total / n_dropped_missing / n_used), and
rank_info (rank / ncol / rank_deficient / pivot /
kept_cols / dropped_cols). When the design is rank-deficient,
linearly dependent columns are dropped before fitting (in the
spirit of stats::glm()); their entries in coefficients are
reported as NA to distinguish "not identifiable" from
"shrunk to zero by penalty".