scDECO-Copula

library(scDECO)

Quick Start

n <- 2500

x.use <- rnorm(n)
w.use <- runif(n,-1,1)
marginals.use <- c("ZINB", "ZIGA")

# simulate data
y.use <- scdeco.sim.cop(marginals=marginals.use, x=x.use,
                    eta1.true=c(-2, 0.8), eta2.true=c(-2, 0.8),
                    beta1.true=c(1, 0.5), beta2.true=c(1, 1),
                    alpha1.true=7, alpha2.true=3,
                    tau.true=c(-0.2, .3), w=w.use)

Parameters:

  • marginals: The two marginals. Options are NB, ZINB, GA, ZIGA, Beta, ZIBEta
  • x: The vector (or matrix) containing the covariate values to be regressed for mean and rho parameters.
  • eta1.true: The coefficients of the 1st marginal’s zero-inflation parameter.
  • eta2.true: The coefficients of the 2nd marginal’s zero-inflation parameter.
  • beta1.true: The coefficients of the 1st marginal’s mean parameter.
  • beta2.true: The coefficients of the 2nd marginal’s mean parameter.
  • alpha1.true: The coefficient of the 1st marginal’s second parameter.
  • alpha2.true: The coefficient of the 2nd marginal’s second parameter.
  • tau.true: The coefficients of the correlation parameter.
  • w: A vector (or matrix) containing the covariate values to be regressed for zero-inflation parameters.

This will simulate a 2-column matrix of NROW(x) rows of observations from the scdeco.cop model.

# fit the model
mcmc.out <- scdeco.cop(y=y.use, x=x.use, marginals=marginals.use, w=w.use,
                       n.mcmc=10, burn=0, thin=1) # n.mcmc=5000, burn=1000, thin=10)

Parameters:

  • y: 2-column matrix with the dependent variable observations.
  • n.mcmc: The number of MCMC iterations to run.
  • burn: The number of MCMC iterations to burn from the beginning of the chain.
  • thin: The number of MCMC iterations to thin.

This will return a matrix where the columns correspond to the different parameters of the model and the rows correspond to MCMC samples where the burn and thin has already been incorporated.

One can obtain estimates and confidence intervals for each parameter by looking at quantiles of these MCMC samples.

# extract estimates and confidence intervals
lowerupper <- t(apply(mcmc.out, 2, quantile, c(0.025, 0.5, 0.975)))
estmat <- cbind(lowerupper[,1],
                c(c(-2, 0.8), c(-2, 0.8), c(1, 0.5), c(1, 1), 7, 3, c(-0.2, .3)),
                lowerupper[,c(2,3)])
colnames(estmat) <- c("lower", "trueval", "estval", "upper")
estmat
#>               lower trueval      estval       upper
#> eta10  -0.281827649    -2.0 -0.19463733  0.00000000
#> eta11  -0.105697394     0.8  0.00000000  0.08029813
#> eta20  -0.649341301    -2.0 -0.42460064 -0.03277390
#> eta21  -0.084548165     0.8  0.06274363  0.09842503
#> beta10  0.000000000     1.0  0.05354789  0.23986102
#> beta11 -0.074742673     0.5 -0.07101422  0.03140429
#> beta20  0.000000000     1.0  0.02718200  0.04801565
#> beta21  0.000000000     1.0  0.17337972  0.27143767
#> alpha1  0.783465517     7.0  0.79051080  1.03489862
#> alpha2  0.653701008     3.0  0.78293823  0.97948762
#> tau0    0.017465825    -0.2  0.18776674  0.29458446
#> tau1    0.003947131     0.3  0.05727547  0.16681860

Model Details

Allow Y1, …, Yn to be n independent bivariate random vectors. For j = 1, 2 we assume the marginal CDF of Yij is given by Fj(⋅; θj, xi) where θj represents a set of parameters associated with Fj, and xi = (1, xi1, …, xip) a set of covariates for the ith cell. We construct the joint CDF of Yi via Gaussian copula with covariate-dependent parameters as follows. Let Zi = (Zi1, Zi2) be such that

$$\boldsymbol {Z}_i \sim N_2{\left(\boldsymbol {0}= \def\eqcellsep{&}\begin{bmatrix} 0 \\ 0 \end{bmatrix} , \boldsymbol {R}_i = \def\eqcellsep{&}\begin{bmatrix} 1 & \rho _i \\ \rho _i & 1 \end{bmatrix} \right)} $$

with

$$ \rho _i = \text{corr}(Z_{i1},Z_{i2}) = \frac{\exp ({\boldsymbol {x}}_i^{\prime }\boldsymbol{\tau }) - 1}{\exp ({\boldsymbol {x}}_i^{\prime }\boldsymbol{\tau }) + 1}$$ where τ = (τ0, τ1, …, τp).

In the marginal distributions supported by this paper (Negative Binomial, Gamma, and Beta), we model their mean parameter as a function of covariates using the log link function like so

μij = E[Yij] = exp {xiβ(j)}

where β(j) = (β0(j), β1(j), …, βp(j)).

and we allow the second parameter of those distributions, which we call α, to be free of covariates.

This formulation incorporates dynamic association between Yi1 and Yi2, that is, association that depends on covariates, through the correlation between Zi1 and Zi2. We denote the joint CDF of Zi by Φτ to reflect its dependence on parameter τ. For both discrete and continuous marginals, the general form of the joint CDF of Y is given by

FY(yi; θ1, θ2, τ, xi) = Φτ(Φ−1[F1(yi1; θ1, xi)],Φ−1[F2(yi2; θ2, xi)])

where Φ−1 represents the inverse CDF of N(0, 1).

Marginal Parameterizations

We use the following parameterization of the negative binomial for use in the marginals:

$$ f_{\text{NB}}(y_{ij};\mu_{ij}, \alpha_{j}) = \frac{\Gamma(y_{ij} + \alpha_{j})}{\Gamma(y_{ij}+1)\Gamma(\alpha_{j})}\left(\frac{\alpha_{j}}{\alpha_{j}+\mu_{ij}}\right)^{\alpha_{j}}\left(\frac{\mu_{ij}}{\alpha_{j}+\mu_{ij}}\right)^{y_{ij}} $$

This has mean μij and variance μij + μij2/αj.

For the gamma distribution, we use the following parameterization:

$$ f(y_{ij};\mu_{ij},\alpha_{j}) = \frac{\alpha_{j} ^{\mu_{ij}\alpha_{j} }}{\Gamma (\mu_{ij}\alpha_{j} )}\,y_i^{\mu_{ij}\alpha_{j} -1}e^{-\alpha_{j} y_{ij}} $$

This has mean μij and variance μij/αj.

For the beta distribution, we use the following parameterization:

$$ f(y_{ij};\mu_{ij},\alpha_{j}) = \frac{\Gamma(\alpha_j)}{\Gamma(\mu_{ij}\alpha_{j})\Gamma((1-\mu_{ij})\alpha_j)}y_{ij}^{\mu_{ij}\alpha_j-1}(1-y_{ij})^{(1-\mu_{ij})\alpha_j-1} $$

this has mean of μij and variance μij(1 − μij)/(αij + 1).

Incorporating Zero-inflation

In multi-omics data, dropout events occur when observations for certain molecules are not detected and thus recorded as zeros. For this reason, we incorporate zero-inflation into the above model by including two additional covariate-dependent parameters p1 and p2 which represent the probability of an observation from their respective marginal being zeroed-out by a dropout event.

Thus, for a Gamma or Beta marginal, the zero-inflated PDF is given by

fjzinf(yij; θj, xi) = (1 − pj)fj(yij; θj, xi)1(yij > 0) + pj1(yij = 0)

and for a NB marginal, the zero-inflated PDF is given by:

fjzinf(yij; θj, xi) = (1 − pj)fj(yij; θj, xi) + pj1(yij = 0)

We allow p1, p2 to be dependent on a different set of covariates than the x used in the previous section, because often zero-inflation is affected by different covariates than the marginal mean and/or correlation parameter is. We will call this new set of covariates wi = (1, wi1, …, wik) and it will be tied to p1, p2 in the following way:

$$ p_{ij} = \frac{1}{1+\exp\left\{\boldsymbol{w}_i^{\prime}\boldsymbol{\eta^{(j)}}\right\}} $$

where η(j) = (η0(j), η1(j), …, ηq(j)).

Parameter Estimation

Parameter estimation is achieved using an adaptive MCMC approach involving a Metropolis scheme, and using a one-margin-at-a-time approach. For more details please refer to the paper.

Citations

Zichen Ma, Shannon W. Davis, Yen-Yi Ho, Flexible Copula Model for Integrating Correlated Multi-Omics Data from Single-Cell Experiments, Biometrics, Volume 79, Issue 2, June 2023, Pages 1559–1572, https://doi.org/10.1111/biom.13701