Title: | Log-Linear Poisson Graphical Model with Hot-Deck Multiple Imputation |
---|---|
Description: | Infer log-linear Poisson Graphical Model with an auxiliary data set. Hot-deck multiple imputation method is used to improve the reliability of the inference with an auxiliary dataset. Standard log-linear Poisson graphical model can also be used for the inference and the Stability Approach for Regularization Selection (StARS) is implemented to drive the selection of the regularization parameter. The method is fully described in <doi:10.1093/bioinformatics/btx819>. |
Authors: | Alyssa Imbert [aut], Nathalie Vialaneix [aut, cre] |
Maintainer: | Nathalie Vialaneix <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.5 |
Built: | 2024-12-17 06:46:39 UTC |
Source: | CRAN |
chooseSigma
computes the average intra-donor pool variance for
different values of sigma. It helps choosing a sigma that makes a good
trade-off between homogeneity within the pool of donors and variety (large
enough number of donors in every pool).
chooseSigma(X, Y, sigma_list, seed = NULL)
chooseSigma(X, Y, sigma_list, seed = NULL)
X |
n x p numeric matrix containing RNA-seq expression with missing rows (numeric matrix or data frame) |
Y |
auxiliary dataset (n' x q numeric matrix or data frame) |
sigma_list |
a sequence of increasing positive values for sigma (numeric vector) |
seed |
single value, interpreted as an in integer, used to initialize the random number generation state |
The average intra-donor pool variance is described in (Imbert et al., 2018).
a data frame with the values of sigma and the corresponding intra-donor pool variances
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
Imbert, A., Valsesia, A., Le Gall, C., Armenise, C., Lefebvre, G. Gourraud, P.A., Viguerie, N. and Villa-Vialaneix, N. (2018) Multiple hot-deck imputation for network inference from RNA sequencing data. Bioinformatics. doi:10.1093/bioinformatics/btx819.
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) sigma_stats <- chooseSigma(lung, thyroid, 1:5) ## Not run: plot(sigma_stats, type = "b")
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) sigma_stats <- chooseSigma(lung, thyroid, 1:5) ## Not run: plot(sigma_stats, type = "b")
GLMnetToGraph
combines the m inferred networks, obtained from m
imputed datasets, into a single stable network or convert a matrix of
coefficients of a GLM model into a network (non zero coefficients are
converted to edges)
GLMnetToGraph(object, threshold = 0.9)
GLMnetToGraph(object, threshold = 0.9)
object |
an object of class |
threshold |
the percentage of times, among the m imputed networks, that
an edge has to be predicted to be in the final network. Used only for objects
of class |
an 'igraph' object. See igraph
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
Imbert, A., Valsesia, A., Le Gall, C., Armenise, C., Lefebvre, G. Gourraud, P.A., Viguerie, N. and Villa-Vialaneix, N. (2018) Multiple hot-deck imputation for network inference from RNA sequencing data. Bioinformatics. doi:10.1093/bioinformatics/btx819.
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) lambdas <- 4 * 10^(seq(0, -2, length = 10)) ## Not run: lung_hdmi <- imputedGLMnetwork(lung, thyroid, sigma = 2, lambdas = lambdas, m = 10, B = 5) lung_net <- GLMnetToGraph(lung_hdmi, 0.75) lung_net plot(lung_net) ## End(Not run)
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) lambdas <- 4 * 10^(seq(0, -2, length = 10)) ## Not run: lung_hdmi <- imputedGLMnetwork(lung, thyroid, sigma = 2, lambdas = lambdas, m = 10, B = 5) lung_net <- GLMnetToGraph(lung_hdmi, 0.75) lung_net plot(lung_net) ## End(Not run)
GLMnetwork
infers a network from RNA-seq expression with the
log-linear Poisson graphical model of (Allen and Liu, 2012).
GLMnetwork(counts, lambdas = NULL, normalize = TRUE)
GLMnetwork(counts, lambdas = NULL, normalize = TRUE)
counts |
a n x p matrix of RNA-seq expression (numeric matrix or data frame) |
lambdas |
a sequence of decreasing positive numbers to control the
regularization (numeric vector). Default to |
normalize |
logical value to normalize predictors in the log-linear
Poisson graphical model. If |
When input lambdas
are null the default sequence of
glmnet
for the first model (the one with the first
column of count
as the target) is used.
S3 object of class GLMnetwork
: a list consisting of
lambda |
regularization parameters used for LLGM path(vector) |
path |
a list having the same length than |
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
Allen, G. and Liu, Z. (2012) A log-linear model for inferring genetic networks from high-throughput sequencing data. In Proceedings of IEEE International Conference on Bioinformatics and Biomedecine (BIBM).
data(lung) lambdas <- 4 * 10^(seq(0, -2, length = 10)) ref_lung <- GLMnetwork(lung, lambdas = lambdas)
data(lung) lambdas <- 4 * 10^(seq(0, -2, length = 10)) ref_lung <- GLMnetwork(lung, lambdas = lambdas)
Methods for the result of GLMnetwork
(GLMpath
object)
## S3 method for class 'GLMpath' summary(object, ...) ## S3 method for class 'GLMpath' print(x, ...)
## S3 method for class 'GLMpath' summary(object, ...) ## S3 method for class 'GLMpath' print(x, ...)
object |
|
... |
not used |
x |
|
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
Methods for the result of imputeHD
(HDImputed
object)
## S3 method for class 'HDimputed' summary(object, ...) ## S3 method for class 'HDImputed' print(x, ...)
## S3 method for class 'HDimputed' summary(object, ...) ## S3 method for class 'HDImputed' print(x, ...)
object |
|
... |
not used |
x |
|
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
Methods for the result of imputedGLMnetwork
(HDpath
object)
## S3 method for class 'HDpath' summary(object, ...) ## S3 method for class 'HDpath' print(x, ...) ## S3 method for class 'HDpath' plot(x, ...)
## S3 method for class 'HDpath' summary(object, ...) ## S3 method for class 'HDpath' print(x, ...) ## S3 method for class 'HDpath' plot(x, ...)
object |
|
... |
not used |
x |
|
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) lambdas <- 4 * 10^(seq(0, -2, length = 10)) ## Not run: lung_hdmi <- imputedGLMnetwork(lung, thyroid, sigma = 2, lambdas = lambdas, m = 10, B = 5) plot(lung_hdmi) ## End(Not run)
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) lambdas <- 4 * 10^(seq(0, -2, length = 10)) ## Not run: lung_hdmi <- imputedGLMnetwork(lung, thyroid, sigma = 2, lambdas = lambdas, m = 10, B = 5) plot(lung_hdmi) ## End(Not run)
imputedGLMnetwork
performs a multiple hot-deck imputation and infers a
network for each imputed dataset with a log-linear Poisson graphical model
(LLGM).
imputedGLMnetwork(X, Y, sigma, m = 50, lambdas = NULL, B = 20)
imputedGLMnetwork(X, Y, sigma, m = 50, lambdas = NULL, B = 20)
X |
n x p numeric matrix containing RNA-seq expression with missing rows (numeric matrix or data frame) |
Y |
auxiliary dataset (n' x q numeric matrix or data frame) |
sigma |
affinity threshold for donor pool |
m |
number of replicates in multiple imputation (integer). Default to 50 |
lambdas |
a sequence of decreasing positive numbers to control the
regularization (numeric vector). Default to |
B |
number of iterations for stability selection. Default to 20 |
When input lambdas
are null the default sequence of
glmnet
for the first model (the one with the first
column of count
as the target) is used. A common default sequence is
generated for all imputed datasets using this method.
S3 object of class HDpath
: a list consisting of
path |
a list of |
efreq |
a numeric matrix of size p x p, which indicates the
number of times an edge has been predicted among the |
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
Imbert, A., Valsesia, A., Le Gall, C., Armenise, C., Lefebvre, G. Gourraud, P.A., Viguerie, N. and Villa-Vialaneix, N. (2018) Multiple hot-deck imputation for network inference from RNA sequencing data. Bioinformatics. doi:10.1093/bioinformatics/btx819.
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) lambdas <- 4 * 10^(seq(0, -2, length = 10)) ## Not run: lung_hdmi <- imputedGLMnetwork(lung, thyroid, sigma = 2, lambdas = lambdas, m = 10, B = 5) ## End(Not run)
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) lambdas <- 4 * 10^(seq(0, -2, length = 10)) ## Not run: lung_hdmi <- imputedGLMnetwork(lung, thyroid, sigma = 2, lambdas = lambdas, m = 10, B = 5) ## End(Not run)
imputeHD
performs multiple hot-deck imputation on an input data frame
with missing rows. Each missing row is imputed with a unique donor. This
method requires an auxiliary dataset to compute similaritities between
individuals and create the pool of donors.
imputeHD(X, Y, sigma, m = 50, seed = NULL)
imputeHD(X, Y, sigma, m = 50, seed = NULL)
X |
n x p numeric matrix containing RNA-seq expression with missing rows (numeric matrix or data frame) |
Y |
auxiliary dataset (n' x q numeric matrix or data frame) |
sigma |
threshold for hot-deck imputation (numeric, positive) |
m |
number of replicates in multiple imputation (integer). Default to 50 |
seed |
single value, interpreted as an in integer, used to initialize
the random number generation state. Default to |
Missing values are identified by matching rownames in X
and
Y
. If rownames are not provided the missing rows in X
are
supposed to correspond to the last rows of Y
.
S3 object of class HDImputed
: a list consisting of
donors |
a list. Each element of this list contains the donor pool for every missing observations |
draws |
a data frame which indicates which donor was chosen for each missing samples |
data |
a list of |
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
Imbert, A., Valsesia, A., Le Gall, C., Armenise, C., Lefebvre, G. Gourraud, P.A., Viguerie, N. and Villa-Vialaneix, N. (2018) Multiple hot-deck imputation for network inference from RNA sequencing data. Bioinformatics. doi:10.1093/bioinformatics/btx819.
chooseSigma
, imputedGLMnetwork
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) imputed_lung <- imputeHD(lung, thyroid, sigma = 2)
data(lung) data(thyroid) nobs <- nrow(lung) miss_ind <- sample(1:nobs, round(0.2 * nobs), replace = FALSE) lung[miss_ind, ] <- NA lung <- na.omit(lung) imputed_lung <- imputeHD(lung, thyroid, sigma = 2)
This data set is a small subset of the full data set from GTEx. It contains RNA-seq expressions measured from lung tissue. The RNA-seq expressions have been normalized with the TMM method.
a data frame with 221 rows and 100 variables (genes). Row names are identifiers for individuals.
Alyssa Imbert <[email protected]>
The raw data were download from
https://gtexportal.org/home/index.html. The TMM normalization of
RNA-seq expression was performed with the R package edgeR
.
Find the location of the RNAseqNet User's Guide and optionnaly opens it
RNAseqNetUsersGuide(html = TRUE, view = html)
RNAseqNetUsersGuide(html = TRUE, view = html)
html |
logical. Should the document returned by the function be the
compiled PDF or the Rmd source. Default to |
view |
logical. Should the document be opened using the default HTML
viewer? Default to |
The function vignette("RNAseqNet")
will find the short
RNAseqNet vignette that describes how to obtain the RNAseqNet User's Guide.
The User's Guide is not itself a true vignette because it is not
automatically generated during the package build process. However, the
location of the Rmarkdown source is returned by the function if
html = FALSE
.
If the operating system is not Windows, then the HTML viewer used is that
given by Sys.getenv("R_BROWSER")
. The HTML viewer can be changed using
Sys.setenv(R_BROWSER = )
.
Character string giving the file location. If html = TRUE
and
view = TRUE
, the HTML document reader is started and the User's Guide
is opened in it.
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
RNAseqNetUsersGuide(view = FALSE) RNAseqNetUsersGuide(html = FALSE) ## Not run: RNAseqNetUsersGuide()
RNAseqNetUsersGuide(view = FALSE) RNAseqNetUsersGuide(html = FALSE) ## Not run: RNAseqNetUsersGuide()
stabilitySelection
implements the regularization parameter selection
of (Liu et al., 2010) called 'Stability Approach to Regularization Selection'
(StARS).
stabilitySelection(counts, lambdas = NULL, B = 20)
stabilitySelection(counts, lambdas = NULL, B = 20)
counts |
a n x p matrix of RNA-seq expression (numeric matrix or data frame) |
lambdas |
a sequence of decreasing positive numbers to control the
regularization (numeric vector). Default to |
B |
number of iterations for stability selection. Default to 20 |
When input lambdas
are null the default sequence of
glmnet
(see GLMnetwork
for details).
S3 object of class stabilitySelection
: a list consisting of
lambdas |
numeric regularization parameters used for regularization path |
B |
number of iterations for stability selection |
best |
index of the regularization parameter selected by StARS
in |
variabilities |
numeric vector having same length than lambdas and providing the variability value as defined by StARS along the path |
Alyssa Imbert, [email protected] Nathalie Vialaneix, [email protected]
Liu, H., Roeber, K. and Wasserman, L. (2010) Stability approach to regularization selection (StARS) for high dimensional graphical models. In Proceedings of Neural Information Processing Systems (NIPS 2010), 23, 1432-1440, Vancouver, Canada.
data(lung) lambdas <- 4 * 10^(seq(0, -2, length = 5)) stability_lung <- stabilitySelection(lung, lambdas = lambdas, B = 4) ## Not run: plot(stability_lung)
data(lung) lambdas <- 4 * 10^(seq(0, -2, length = 5)) stability_lung <- stabilitySelection(lung, lambdas = lambdas, B = 4) ## Not run: plot(stability_lung)
Methods for the result of stabilitySelection
(stars
object)
## S3 method for class 'stars' summary(object, ...) ## S3 method for class 'stars' print(x, ...) ## S3 method for class 'stars' plot(x, ...)
## S3 method for class 'stars' summary(object, ...) ## S3 method for class 'stars' print(x, ...) ## S3 method for class 'stars' plot(x, ...)
object |
|
... |
not used |
x |
|
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
This data set is a small subset of the full data set from GTEx. It contains RNA-seq expressions measured from thyroid tissue. The RNA-seq expressions have been normalized with the TMM method.
a data frame with 221 rows and 50 variables (genes). Rown ames are identifiers for individuals.
Alyssa Imbert <[email protected]>
The raw data were downloaded from
https://gtexportal.org/home/index.html. The TMM normalisation of
RNA-seq expression was performed with the R package edgeR
.
varIntra
computes the average intra-donor pool variance.
varIntra(X, Y, donors)
varIntra(X, Y, donors)
X |
n x p numeric matrix containing RNA-seq expression with missing rows (numeric matrix or data frame) |
Y |
auxiliary dataset (n' x q numeric matrix or data frame) |
donors |
donor pool (a list, as given |
varIntra
returns a numeric value which is the average
intra-donor pool variance, as described in (Imbert et al., 2018).
Alyssa Imbert, [email protected]
Nathalie Vialaneix, [email protected]
Imbert, A., Valsesia, A., Le Gall, C., Armenise, C., Lefebvre, G. Gourraud, P.A., Viguerie, N. and Villa-Vialaneix, N. (2018) Multiple hot-deck imputation for network inference from RNA sequencing data. Bioinformatics. doi:10.1093/bioinformatics/btx819.