Package: ebTobit 1.0.2
ebTobit: Empirical Bayesian Tobit Matrix Estimation
Estimation tools for multidimensional Gaussian means using empirical Bayesian g-modeling. Methods are able to handle fully observed data as well as left-, right-, and interval-censored observations (Tobit likelihood); descriptions of these methods can be found in Barbehenn and Zhao (2023) <doi:10.48550/arXiv.2306.07239>. Additional, lower-level functionality based on Kiefer and Wolfowitz (1956) <doi:10.1214/aoms/1177728066> and Jiang and Zhang (2009) <doi:10.1214/08-AOS638> is provided that can be used to accelerate many empirical Bayes and nonparametric maximum likelihood problems.
Authors:
ebTobit_1.0.2.tar.gz
ebTobit_1.0.2.tar.gz(r-4.5-noble)ebTobit_1.0.2.tar.gz(r-4.4-noble)
ebTobit_1.0.2.tgz(r-4.4-emscripten)ebTobit_1.0.2.tgz(r-4.3-emscripten)
ebTobit.pdf |ebTobit.html✨
ebTobit/json (API)
# Install 'ebTobit' in R: |
install.packages('ebTobit', repos = 'https://cloud.r-project.org') |
Bug tracker:https://github.com/barbehenna/ebtobit/issues0 issues
- BileAcid - Bile Acid Data
Last updated 11 months agofrom:bb5929813f. Checks:3 OK. Indexed: no.
Target | Result | Latest binary |
---|---|---|
Doc / Vignettes | OK | Mar 27 2025 |
R-4.5-linux-x86_64 | OK | Mar 27 2025 |
R-4.4-linux-x86_64 | OK | Mar 27 2025 |
Exports:ConvexDualConvexPrimalebTobitEMis.ebTobitlik_GaussianPIClikMatnew_ebTobitposterior_L1mediod.ebTobitposterior_mean.ebTobitposterior_mode.ebTobittobit_sdtobit_sd_mle
Dependencies:RcppRcppArmadilloRcppParallel
Citation
To cite package ‘ebTobit’ in publications use:
Barbehenn A, Zhao S (2024). ebTobit: Empirical Bayesian Tobit Matrix Estimation. R package version 1.0.2, https://CRAN.R-project.org/package=ebTobit.
Corresponding BibTeX entry:
@Manual{, title = {ebTobit: Empirical Bayesian Tobit Matrix Estimation}, author = {Alton Barbehenn and Sihai Dave Zhao}, year = {2024}, note = {R package version 1.0.2}, url = {https://CRAN.R-project.org/package=ebTobit}, }
Readme and manuals
Empirical Bayesian Estimation of Censored Gaussian (Tobit) Matrices
What is it?
An R package for denoising censored, Gaussian means with empirical Bayes $g$-modeling. The general model is as follows:
$$ \theta_i \sim_{iid} g \quad (\subseteq \mathbb{R}^p) $$
$$ X_{ij} \mid \theta_{ij} \sim_{indep.} N(\theta_{ij}, \sigma^2) $$
$$ L_{ij} \leq X_{ij} \leq R_{ij} $$
The data is represented with matrices:
$$ \theta = \begin{bmatrix} \theta_{11} & \dots & \theta_{1p} \ \theta_{21} & \dots & \theta_{2p} \ \vdots & \ddots & \vdots \ \theta_{n1} & \dots & \theta_{np} \ \end{bmatrix} \qquad X = \begin{bmatrix} X_{11} & \dots & X_{1p} \ X_{21} & \dots & X_{2p} \ \vdots & \ddots & \vdots \ X_{n1} & \dots & X_{np} \ \end{bmatrix} $$
$$ L = \begin{bmatrix} L_{11} & \dots & L_{1p} \ L_{21} & \dots & L_{2p} \ \vdots & \ddots & \vdots \ L_{n1} & \dots & L_{np} \ \end{bmatrix} \qquad R = \begin{bmatrix} R_{11} & \dots & R_{1p} \ R_{21} & \dots & R_{2p} \ \vdots & \ddots & \vdots \ R_{n1} & \dots & R_{np} \ \end{bmatrix} $$
The bounds $L_{ij}$ and $R_{ij}$ are assumed to be known. When $L_{ij} = R_{ij}$ there is a direct (noisy) measurement of $\theta_{ij}$, if $L_{ij} < R_{ij}$ then there is a censored measurement of $\theta_{ij}$. This structure is commonly referred to as partially interval censored data and it allows for any combination of observed measurements and left-, right-, and interval-censored measurements.
We use a Tobit likelihood for each measurement:
$$ P(L, R \mid \theta) = \begin{cases} \phi_{\sigma} ( L - \theta ) & L = R \ \Phi_{\sigma} ( R - \theta ) - \Phi_{\sigma} ( L - \theta ) & L < R \end{cases} $$
where the standard Gaussian likelihood is used when there is a direct Gaussian measurement (ie $L = X = R$) and a Gaussian probability is used when there is a censored Gaussian measurement (ie $L < R$).
What does it do?
This package provides an object ebTobit
(Empirical Bayes model with Tobit likelihood) that estimates the prior, $g$ over a user-specified grid gr
and then computes the posterior mean or $\ell_1$ mediod as estimates for $\theta$.
In one dimension, the $\ell_1$ mediod is the median.
By default gr
is set using the exemplar method so the grid is the maximum likelihood estimate for each $\theta_{ij}$.
When the censoring interval is finite, the maximum likelihood estimate for each $\theta_{ij}$ is $0.5 ( L_{ij} + R_{ij} )$
Suppose $p = 1$ and there is no censoring, then the basic utility is:
library(ebTobit)
# create noisy measurements
n <- 100
t <- sample(c(0, 5), size = n, replace = TRUE, prob = c(0.8, 0.2))
x <- t + stats::rnorm(n)
# fit g-model with default prior grid
res1 <- ebTobit(x)
# measure performance of estimated posterior mean
mean((t - fitted(res1))^2)
Next we can look at a more complicated example with $p = 10$:
library(ebTobit)
# create noisy measurements (low rank structure)
n <- 1000; p <- 10
t <- matrix(stats::rgamma(n*p, shape = 5, rate = 1), n, p)
x <- t + matrix(stats::rnorm(n*p), n, p)
# assume we can't accurately measure x < 1 but we know theta > 0
L <- ifelse(x < 1, 0, x)
R <- ifelse(x < 1, 1, x)
# fit g-model with default prior grid
res2 <- ebTobit(x)
res3 <- ebTobit(L, R)
# oberve that the censoring affects the fitted range
range(fitted(res2))
range(fitted(res3))
# fit censored data with a different grid (large and random not MLE)
res4 <- ebTobit(
L = L,
R = R,
gr = sapply(1:p, function(j) stats::runif(1e+4, min = min(L[,j]), max = max(R[,j]))),
algorithm = "EM"
)
# compute posterior mean and L1mediod given new data
# we can also predict based on partially interval-censored observations
y <- matrix(stats::rexp(5*p, rate = 0.5), 5, p)
predict(res4, y) # posterior mean
predict(res4, y, method = "L1mediod") # posterior L1-mediod
How do install it?
This package is available on CRAN. It can also be installed directly from GitHub:
remotes::install_github("barbehenna/ebTobit")
Data
This R package also includes a real bile acid data.frame
taken directly from Lei et al. (2018) (https://doi.org/10.1096/fj.201700055R) via https://github.com/WandeRum/GSimp (https://doi.org/10.1371/journal.pcbi.1005973). The bile acid data contains measurements of 34 bile acids for 198 patients; no missing values are present in the data. In our modeling, we assume the bile acid values are independent log-normal measurements.
data(BileAcid, package = "ebTobit") # attach the bile acid data
Who wrote it?
Alton Barbehenn and Sihai Dave Zhao
What license?
GPL (>= 3)
Help Manual
Help page | Topics |
---|---|
Bile Acid Data | BileAcid |
Convex Optimization of the Kiefer-Wolfowitz NPMLE | ConvexDual |
Convex Optimization of the Kiefer-Wolfowitz NPMLE | ConvexPrimal |
Empirical Bayes Matrix Estimation under a Tobit Likelihood | ebTobit |
Nonparametric Maximum Likelihood via Expectation Maximization | EM |
Fitted Estimates of an ebTobit object | fitted.ebTobit |
Validate ebTobit Object | is.ebTobit |
Helper Function - generate likelihood for pair (L,R) and mean gr | lik_GaussianPIC |
Helper Function - generate likelihood matrix | likMat |
Marginal Log-likelihood of an ebTobit object | logLik.ebTobit |
Create a new ebTobit object | new_ebTobit |
Compute the Posterior L1 Mediod of an ebTobit object | posterior_L1mediod.ebTobit |
Compute Posterior Mean of an ebTobit object | posterior_mean.ebTobit |
Compute Posterior Mode of an ebTobit object | posterior_mode.ebTobit |
Fitted Estimates of an ebTobit object | predict.ebTobit |
Fit Tobit Standard Deviation via Maximum Likelihood | tobit_sd |
Maximum Likelihood Estimator for a Single Standard Deviation Parameter | tobit_sd_mle |