Title: | Dependent Gaussian Processes for Longitudinal Correlated Factors |
---|---|
Description: | Functionalities for analyzing high-dimensional and longitudinal biomarker data to facilitate precision medicine, using a joint model of Bayesian sparse factor analysis and dependent Gaussian processes. This paper illustrates the method in detail: J Cai, RJB Goudie, C Starr, BDM Tom (2023) <doi:10.48550/arXiv.2307.02781>. |
Authors: | Jiachen Cai [aut, cre] |
Maintainer: | Jiachen Cai <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2024-11-25 06:55:52 UTC |
Source: | CRAN |
This function is used to visualize results of estimates of factor loadings (in heatmaps).
factor_loading_heatmap(factor_loading_matrix, heatmap_title)
factor_loading_heatmap(factor_loading_matrix, heatmap_title)
factor_loading_matrix |
A matrix of dimension (p, k), which stores results for factor loadings. |
heatmap_title |
A character. Title for the heatmap. |
A heatmap presenting posterior median estimates of factor loadings.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
This function is used to visualize results of factor score trajectories.
factor_score_trajectory( factor_score_matrix, factor_index, person_index, trajectory_title, cex_main = 1 )
factor_score_trajectory( factor_score_matrix, factor_index, person_index, trajectory_title, cex_main = 1 )
factor_score_matrix |
A matrix of dimension (q, k, n), used to store results for factor scores. |
factor_index |
A numeric scalar. Index of the factor of interest. |
person_index |
A numeric scalar. Index of the person of interest. |
trajectory_title |
A character. Title for the factor trajectory plot. |
cex_main |
A numeric scalar. Text size of the title. |
Trajectory of the designated person-factor.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
Generating posterior samples for parameters (other than DGP parameters) in the model and predicted gene expression for one chain.
gibbs_after_mcem_algorithm( chain_index, mc_num, burnin, thin_step, pathname, pred_indicator = FALSE, pred_time_index = NULL, x, mcem_parameter_setup_result, mcem_algorithm_result, gibbs_after_mcem_diff_initials_result )
gibbs_after_mcem_algorithm( chain_index, mc_num, burnin, thin_step, pathname, pred_indicator = FALSE, pred_time_index = NULL, x, mcem_parameter_setup_result, mcem_algorithm_result, gibbs_after_mcem_diff_initials_result )
chain_index |
A numeric scalar. Index of the chain. |
mc_num |
A numeric scalar. Number of iterations in the Gibbs sampler. |
burnin |
A numeric scalar. Number of iterations to be discarded as 'burn-in'. |
thin_step |
A numeric scalar. This function will only save every 'thin_step'th iteration results in the specified directory to reduce storage space needed. Note that this number can be different from that used in the function 'mcem_algorithm'. |
pathname |
A character. The directory where the saved Gibbs samplers are stored. |
pred_indicator |
A logical value. pred_indicator = TRUE denotes the need to predict gene expression at new time points. The default value is FALSE. |
pred_time_index |
Only needed if pred_indicator = TRUE. Index of the new time points in the full time vector. |
x |
A list of n elements. Each element is a matrix of dimension (p, q_i), storing the gene expression observed at q_i time points for the ith subject. |
mcem_parameter_setup_result |
A list of objects returned from the function 'mcem_parameter_setup'. |
mcem_algorithm_result |
A list of objects returned from the function 'mcem_algorithm'. |
gibbs_after_mcem_diff_initials_result |
A list of objects returned from the function 'gibbs_after_mcem_diff_initials'. |
This function corresponds to Algorithm 2: Step 1 in the main manuscript; therefore reader can consult the paper for more explanations.
Posterior samples for parameters (other than DGP parameters) in the model and predicted gene expression for one chain.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Combining from all chains the posterior samples for parameters in the model and predicted gene expressions.
gibbs_after_mcem_combine_chains(tot_chain, gibbs_after_mcem_algorithm_result)
gibbs_after_mcem_combine_chains(tot_chain, gibbs_after_mcem_algorithm_result)
tot_chain |
A numeric scalar. Total number of chains. |
gibbs_after_mcem_algorithm_result |
A list of objects storing model constants. Should be the same as that input to the 'function gibbs_after_mcem_load_chains'. |
All saved posterior samples for parameters in the model and predicted gene expressions.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
Generating different initials for multiple chains.
gibbs_after_mcem_diff_initials( ind_x = TRUE, tot_chain = 5, mcem_parameter_setup_result, mcem_algorithm_result )
gibbs_after_mcem_diff_initials( ind_x = TRUE, tot_chain = 5, mcem_parameter_setup_result, mcem_algorithm_result )
ind_x |
A logical value. ind_x = TRUE uses the model including the intercept term for subject-gene mean in after-MCEM-Gibbs sampler; otherwise uses the model without the intercept term. |
tot_chain |
A numeric scalar. Number of parallel chains. |
mcem_parameter_setup_result |
A list of objects returned from the function 'mcem_parameter_setup'. |
mcem_algorithm_result |
A list of objects returned from the function 'mcem_algorithm'. |
Different initials for multiple chains.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Loading the saved posterior samples for parameters in the model and predicted gene expressions.
gibbs_after_mcem_load_chains(chain_index, gibbs_after_mcem_algorithm_result)
gibbs_after_mcem_load_chains(chain_index, gibbs_after_mcem_algorithm_result)
chain_index |
A numeric scalar. Index of the chain. |
gibbs_after_mcem_algorithm_result |
A list of objects storing model constants. |
All saved posterior samples for parameters in the model and predicted gene expressions.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
This function is used to return the MLE of DGP parameters.
mcem_algorithm( ind_x, ig_parameter = 10^-2, increasing_rate = 0.5, prob_conf_interval = 0.9, iter_count_num = 5, x, mcem_parameter_setup_result, ipt_x = FALSE, missing_list = NULL, missing_num = NULL )
mcem_algorithm( ind_x, ig_parameter = 10^-2, increasing_rate = 0.5, prob_conf_interval = 0.9, iter_count_num = 5, x, mcem_parameter_setup_result, ipt_x = FALSE, missing_list = NULL, missing_num = NULL )
ind_x |
A logical value. ind_x = TRUE uses the model including the intercept term for subject-gene mean in within-MCEM-Gibbs sampler; otherwise uses the model without the intercept term. |
ig_parameter |
A numeric scalar. Hyper-parameters for the prior Inverse-Gamma distribution. |
increasing_rate |
A numeric scalar. Rate of increasing the sample size. |
prob_conf_interval |
A numeric scalar. The probability that the true change in the Q-function is larger than the lower bound. |
iter_count_num |
A numeric scalar. Maximum number of increasing the sample size; a larger number than this would end the algorithm. |
x |
A list of n elements. Each element is a matrix of dimension (p, q_i), storing the gene expression observed at q_i time points for the ith subject. |
mcem_parameter_setup_result |
A list of objects returned from the function 'mcem_parameter_setup'. |
ipt_x |
A logical value. ind_x = TRUE denotes the need to impute for NAs of gene expression. The default value is ind_x = FALSE. |
missing_list |
A list of n elements. Each element is a matrix of dimension (missing_num, 2): each row corresponds to the position of one NA that needs imputation; first and second columns denote the row and column indexes, respectively, of the NA in the corresponding person's matrix of gene expression. |
missing_num |
A vector of n elements. Each element corresponds to a single person's number of NAs that needs imputation. |
The MLE of DGP parameters.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Visualizing cross-correlations among factors.
mcem_cov_plot(k, q, cov_input, title)
mcem_cov_plot(k, q, cov_input, title)
k |
A numeric scalar. Number of latent factors. |
q |
A numeric scalar. Number of time points in the covariance matrix of factors. |
cov_input |
A matrix of dimension (kq, kq). The covariance matrix of the vector obtained from vectorizing the matrix of latent factor scores. |
title |
A character. Title for the plot. |
Visualization of cross-correlations among factors.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
This function is used to create R objects storing parameters in the desired format, and assign initial values so that they are ready to use in the MCEM algorithm.
mcem_parameter_setup( p, k, n, q, ind_num = 10, burn_in_prop = 0.2, thin_step = 5, prior_sparsity = 0.1, em_num = 50, obs_time_num, obs_time_index, a_person, col_person_index, y_init, a_init, z_init, phi_init, a_full, train_index, x, model_dgp = TRUE )
mcem_parameter_setup( p, k, n, q, ind_num = 10, burn_in_prop = 0.2, thin_step = 5, prior_sparsity = 0.1, em_num = 50, obs_time_num, obs_time_index, a_person, col_person_index, y_init, a_init, z_init, phi_init, a_full, train_index, x, model_dgp = TRUE )
p |
A numeric scalar. Number of genes. |
k |
A numeric scalar. Number of latent factors. |
n |
A numeric scalar. Number of subjects. |
q |
A numeric scalar. Complete number of time points in the training data. |
ind_num |
A numeric scalar. Starting size of approximately independent samples for MCEM. |
burn_in_prop |
A numeric scalar. Proportion of burnin, which be used to calculate size of Monte Carlo samples needed in the Gibbs sampler. Must be the same as that in the function 'mcem_algorithm_irregular_time'. |
thin_step |
A numeric scalar. Thinning step, which be used to calculate size of Monte Carlo samples needed in the Gibbs sampler. Must be the same as that in the function 'mcem_algorithm_irregular_time'. |
prior_sparsity |
A numeric scalar. Prior expected proportion of genes involved within each pathway. |
em_num |
A numeric scalar. Maximum iterations of the expectation maximization (EM) algorithm allowed. |
obs_time_num |
A n-dimensional vector. One element represents one person's observed number of time points in the training data. |
obs_time_index |
A list of n elements. One element is a vector of observed time indexes for one person in the training data, sorted from early to late. |
a_person |
A list of n elements. One element is a vector of observed time for one subject in the training data, sorted from early to late. |
col_person_index |
A list of n elements. One element is a vector of column indexes for one subject in y_init. |
y_init |
A matrix of dimension (k, sum(obs_time_num)). Initial values of the latent factor score. Can be obtained using BFRM software. |
a_init |
A matrix of dimension (p, k). Initial values of the regression coefficients of factor loadings. Can be obtained using BFRM software. |
z_init |
A matrix of dimension (p, k). Initials values of the binary variables of factor loadings. Can be obtained using BFRM software. |
phi_init |
A p-dimensional column vector. Initials values of the variance for residuals when modeling gene expressions, corresponding to |
a_full |
A numeric vector. Complete time observed, sorted from early to late. |
train_index |
A q-dimensional column vector. Index of time points used in the training data. |
x |
A list of n elements. Each element is a matrix of dimension (p, q_i), storing the gene expressions for the ith subject. |
model_dgp |
A logical value. model_dgp = TRUE (default setting) uses the Dependent Gaussian Process to model latent factor trajectories, otherwise the Independent Gaussian Process is used. |
The following parameters are worth particular attention, and users should tune these parameters according to the specific data.
'burn_in_prop' and 'thin_step' co-control the number of Gibbs samples needed in order to generate approximately 'ind_num' independent samples. The ultimate purpose of tuning these two parameters is to generate high-quality posterior samples for latent factor scores. Therefore: if initials of the Gibbs sampler are not good, readers may need to increase 'burn_in_prop' to discard more burn-in samples; if high-correlation is a potential concern, 'thin_step' may need to be larger.
A list of R objects required in the MCEM algorithm.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Numerical summary for important continuous variables that do not need alignment.
numerics_summary_do_not_need_alignment( burnin = 0, thin_step = 1, pred_x_truth_indicator = FALSE, pred_x_truth = NULL, gibbs_after_mcem_combine_chains_result )
numerics_summary_do_not_need_alignment( burnin = 0, thin_step = 1, pred_x_truth_indicator = FALSE, pred_x_truth = NULL, gibbs_after_mcem_combine_chains_result )
burnin |
A numeric scalar. The saved samples are already after burnin; therefore the default value for this parameter here is 0. Can discard further samples if needed. |
thin_step |
A numeric scalar. The saved samples are already after thinning; therefore the default value for this parameter here is 1. Can be further thinned if needed. |
pred_x_truth_indicator |
A logical value. pred_x_truth_indicator = TRUE means that truth of predicted gene expressions are available. The default value is FALSE. |
pred_x_truth |
Only needed if pred_x_truth_inidcator = TRUE. An array of dimension (n, p, num_time_test), storing true gene expressions in the testing data. |
gibbs_after_mcem_combine_chains_result |
A list of objects returned from the function 'gibbs_after_mcem_combine_chains'. |
This function corresponds to Algorithm 2: Steps 3 and 4 in the main manuscript; therefore reader can consult the paper for more explanations.
Convergence assessment for important continuous variables that do not need alignment, and posterior summary for predicted gene expressions.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
Numerical summary for factor loadings and factor scores, which need alignment.
numerics_summary_need_alignment( burnin = 0, thin_step = 1, gibbs_after_mcem_combine_chains_result )
numerics_summary_need_alignment( burnin = 0, thin_step = 1, gibbs_after_mcem_combine_chains_result )
burnin |
A numeric scalar. The saved samples are already after burnin; therefore the default value for this parameter here is 0. Can discard further samples if needed. |
thin_step |
A numeric scalar. The saved samples are already after thinning; therefore the default value for this parameter here is 1. Can be further thinned if needed. |
gibbs_after_mcem_combine_chains_result |
A list of objects returned from the function 'gibbs_after_mcem_combine_chains'. |
This function corresponds to Algorithm 2: Steps 2, 3 and 4 in the main manuscript; therefore reader can consult the paper for more explanations.
Reordered posterior samples, convergence assessment, and summarized posterior results for factor loadings and factor scores.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Initial values provided by the two-step approach.
sim_fcs_init
sim_fcs_init
An object of class list
of length 14.
Results when people have irregularly observed time points (some 6 while others 8).
sim_fcs_results_irregular_6_8
sim_fcs_results_irregular_6_8
An object of class list
of length 3.
Results when people are observed at common 8 time points.
sim_fcs_results_regular_8
sim_fcs_results_regular_8
An object of class list
of length 3.
Simulated data under the scenario where factors are correlated and have small variability (CS).
sim_fcs_truth
sim_fcs_truth
An object of class list
of length 19.
Constructing subject-specific objects required for Gibbs sampler (for subjects with incomplete observations only).
subject_specific_objects(k, q, a_full, a_avail, cor_all)
subject_specific_objects(k, q, a_full, a_avail, cor_all)
k |
A numeric scalar. Number of latent factors. |
q |
A numeric scalar. Number of time points in the complete factor covariance matrix. |
a_full |
A q-dimensional numeric vector. Complete time sorted from early to late. |
a_avail |
A vector of time when gene expressions are available, sorted from early to late. |
cor_all |
A matrix of dimension (kq, kq). Correlation matrix of latent factor scores. |
This function is used to extract subject-specific factor covariance matrix from the complete factor covariance matrix, through constructing subject-specific indicator matrix, which indicates time indexes when gene expression are available.
Subject-specific objects needed for Gibbs sampler.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Generating a table listing all possible combinations of the binary variables for one gene.
table_generator(k)
table_generator(k)
k |
A numeric scalar. Number of latent factors. |
A table listing all possible combinations of the binary variables for one gene.
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
# See examples in vignette vignette("bsfadgp_regular_data_example", package = "DGP4LCF") vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")