Title: | Generating Multi-Omics Datasets |
---|---|
Description: | Designed to generate multi-omics datasets that closely reflect biological complexity, the package allows for testing, validation, and benchmarking of multi-omics integrative methods. The simulated data includes one or multiple predefined signals (latent/unobserved factors), giving users complete control over the data-generated characteristics. Tini, Giulia, et al (2019) <doi:10.1093/bib/bbx167>. |
Authors: | Bernard Isekah Osang'ir [aut, cre] , Bernard Isekah Osang'ir [aut] |
Maintainer: | Bernard Isekah Osang'ir <[email protected]> |
License: | CC BY 4.0 |
Version: | 0.1.0 |
Built: | 2024-12-22 06:21:16 UTC |
Source: | CRAN |
Dividing features to create vectors with signal in the first omic for single data
divide_features_one(n_features_one, num.factor)
divide_features_one(n_features_one, num.factor)
n_features_one |
number of features of first omic |
num.factor |
number of factors (should be set to '1') |
A list of numeric vectors. Each vector contains 80% of the features from one segment of the original feature set.
The number of segments is determined by the number of factors provided (num.factor
).
Each vector is a subset of the original feature set, selected randomly to contain 80% of the elements from the segment.
If the minimum segment size constraint is too large for the given feature length and number of segments, the function retries using the divide_vector()
function.
Dividing features to create vectors with signal in the second omic for single data
divide_features_two(n_features_two, num.factor)
divide_features_two(n_features_two, num.factor)
n_features_two |
number of features of first omic |
num.factor |
number of factors (should be set to '1') |
A list of numeric vectors. Each vector contains 80% of the features from one segment of the original feature set.
The number of segments is determined by the number of factors provided (num.factor
).
Each vector is a subset of the original feature set, selected randomly to contain 80% of the elements from the segment.
If the minimum segment size constraint is too large for the given feature length and number of segments, the function retries using the divide_vector()
function.
A global variable used in multiple functions.
divide_samples(n_samples, num, min_size)
divide_samples(n_samples, num, min_size)
n_samples |
number of samples |
num |
number of factors |
min_size |
Minimum length of any samples scores |
A list of numeric vectors.
If num == 1
, the list contains a single vector representing a random selection of between 10% and 55% of the elements from the full dataset.
If num > 1
, the list contains num
vectors, each representing 75% of the elements from one of the num
segments of the dataset.
The segmentation ensures that all segments are at least the size of min_size
.
If the segment sizes are too small, the function retries the segmentation process.
Divide features into randomized subsets based on factor Segments
divide_vector(n_samples, num, min_size)
divide_vector(n_samples, num, min_size)
n_samples |
number of samples |
num |
number of factors |
min_size |
Minimum length of any samples scores |
A list of numeric vectors. Each vector contains 80% of the features from one segment of the original feature set.
The number of segments is determined by the number of factors provided (num.factor
).
Each vector is a subset of the original feature set, selected randomly to contain 80% of the elements from the segment.
Only used when the minimum segment size constraint is too large for the given feature length and number of segments.
Dividing features to create vectors with signal in the first omic
feature_selection_one(n_features_one, num.factor, no_factor)
feature_selection_one(n_features_one, num.factor, no_factor)
n_features_one |
number of features of first omic |
num.factor |
type of factors - single or multiple |
no_factor |
number of factors |
A list of numeric vectors.
The first vector contains a consecutive subset of the first num_elements
from the original vector.
The subsequent vectors are sub-vectors derived from remaining segments, each containing 40% of the elements from the corresponding segment.
If num.factor == 'multiple'
, the segments are divided based on no_factor
, and the function ensures the segments meet the size constraints.
The function recursively retries segmentation if any segment size is smaller than the minimum constraint of 10 elements.
The function returns an error if the input parameters or constraints are invalid (e.g., num.factor
is not "multiple" or no_factor
is missing).
Dividing features to create vectors with signal in the second omic
feature_selection_two(n_features_two, num.factor, no_factor)
feature_selection_two(n_features_two, num.factor, no_factor)
n_features_two |
number of features of second omic |
num.factor |
type of factors - single or multiple |
no_factor |
number of factors |
A list of numeric vectors. The first vector represents a random subset of between 10% and 60% of the elements from the original feature vector. The remaining vectors represent 40% of the elements from each of the segments created from the rest of the feature vector. If the segment sizes are too small or there are overlapping elements across the final vectors, the function retries and returns a new list of vectors. Each vector is guaranteed to have no overlapping elements with the others. If the input parameters are invalid, the function throws an error.
Simulation of high-dimensional data with predefined single factor or multiple factors in multi-omics
OmixCraftHD( vector_features = c(2000, 2000), n_samples = 50, sigmas_vector = c(3, 5), n_factors = 3, num.factor = "multiple", advanced_dist = NULL )
OmixCraftHD( vector_features = c(2000, 2000), n_samples = 50, sigmas_vector = c(3, 5), n_factors = 3, num.factor = "multiple", advanced_dist = NULL )
vector_features |
Vector of features assigned to the two simulated datasets respectively '1' first dataset, '2' second dataset |
n_samples |
The number of samples common between the two simulated datasets |
sigmas_vector |
Vector for the noise variability for the two simulated datasets respectively, '1' first dataset, '2' second dataset |
n_factors |
Number of predefined factors |
num.factor |
Category of factors to be simulated specified as 'single', or 'multiple'. |
advanced_dist |
Applicable only when num.factor = 'multiple'. Contains six possible arguments, ”, NULL, 'mixed', 'omic.one', or 'omic.two', 'exclusive' |
A list containing:
dataset_1
: A matrix or data frame representing the first simulated dataset with rows as samples and columns as features.
dataset_2
: A matrix or data frame representing the second simulated dataset with rows as samples and columns as features.
factors
: A matrix representing the predefined factors used in generating the datasets. If num.factor
is 'single', this contains one set of factors. If num.factor
is 'multiple', it contains multiple sets of factors.
noise
: A list containing the noise terms added to both datasets based on the sigmas_vector
.
factor_assignment
: A vector indicating how factors are assigned to datasets, depending on the num.factor
and advanced_dist
settings.
The output provides simulated multi-omics datasets with predefined latent factors and noise, which can be used to model complex biological data structures.
A list containing:
dataset_1
: A matrix or data frame representing the first simulated dataset with rows as samples and columns as features.
dataset_2
: A matrix or data frame representing the second simulated dataset with rows as samples and columns as features.
factors
: A matrix representing the predefined factors used in generating the datasets. If num.factor
is 'single', this contains one set of factors. If num.factor
is 'multiple', it contains multiple sets of factors.
noise
: A list containing the noise terms added to both datasets based on the sigmas_vector
.
factor_assignment
: A vector indicating how factors are assigned to datasets, depending on the num.factor
and advanced_dist
settings.
The output provides simulated multi-omics datasets with predefined latent factors and noise, which can be used to model complex biological data structures.
# Examples set.seed(1234) output_obj <- OmixCraftHD( vector_features = c(2000,3000), sigmas_vector=c(8,5), n_samples=100, n_factors=5, num.factor='multiple', advanced_dist='mixed' ) output_obj <- OmixCraftHD( vector_features = c(5000,3000), sigmas_vector=c(3,4), n_samples=30, n_factors=1 )
# Examples set.seed(1234) output_obj <- OmixCraftHD( vector_features = c(2000,3000), sigmas_vector=c(8,5), n_samples=100, n_factors=5, num.factor='multiple', advanced_dist='mixed' ) output_obj <- OmixCraftHD( vector_features = c(5000,3000), sigmas_vector=c(3,4), n_samples=30, n_factors=1 )
Visualization of factor scores
plot_factor(sim_object = NULL, factor_num = NULL)
plot_factor(sim_object = NULL, factor_num = NULL)
sim_object |
R object containing data to be plotted |
factor_num |
Factor to be plotted. |
A ggplot object representing the factor scores for the specified factor (or all factors) in sim_object
.
If factor_num = 'all'
, a combined plot of all factors is returned. If a specific factor_num
is provided, the plot for that factor is returned.
The plot can be further customized or displayed using standard ggplot2
functions.
# Examples output_obj <- OmixCraftHD( vector_features = c(2000,3000), sigmas_vector=c(3,4), n_samples=30, n_factors=1 ) plot_factor(sim_object = output_obj, factor_num = 1) plot_factor(sim_object = output_obj, factor_num = 'all')
# Examples output_obj <- OmixCraftHD( vector_features = c(2000,3000), sigmas_vector=c(3,4), n_samples=30, n_factors=1 ) plot_factor(sim_object = output_obj, factor_num = 1) plot_factor(sim_object = output_obj, factor_num = 'all')
Visualizing the simulated data using image map and 3D visualization
plot_simData(sim_object, type = "heatmap")
plot_simData(sim_object, type = "heatmap")
sim_object |
R object containing simulated data to be plotted |
type |
type of the plot. Heatmap for image plot and 3D for persp 3D plot |
The function generates and displays a plot based on the specified type
.
If type
is "heatmap", the function displays a 2D heatmap of the simulated data.
If type
is "3D", the function creates a 3D surface plot of the simulated data.
The function does not return any values but generates the requested plot as a side effect.
# Examples output_obj <- OmixCraftHD( vector_features = c(2000,3000), sigmas_vector=c(8,5), n_samples=100, n_factors=5, num.factor='multiple', advanced_dist='mixed' ) plot_simData(sim_object = output_obj, type = "heatmap") plot_simData(sim_object = output_obj, type = "3D")
# Examples output_obj <- OmixCraftHD( vector_features = c(2000,3000), sigmas_vector=c(8,5), n_samples=100, n_factors=5, num.factor='multiple', advanced_dist='mixed' ) plot_simData(sim_object = output_obj, type = "heatmap") plot_simData(sim_object = output_obj, type = "3D")
Visualizing the loading of the features
plot_weights( sim_object = NULL, factor_num = 1, data = "omic.one", type = "scatter" )
plot_weights( sim_object = NULL, factor_num = 1, data = "omic.one", type = "scatter" )
sim_object |
R object containing data to be plotted |
factor_num |
Factor to be plotted. |
data |
Section of the integrated data to be plotted, omic.one or omic.two are the options |
type |
Type of plot. Scatter plot and histogram are the only allowed plots |
A ggplot object.
If type
is "scatter", the function returns a scatter plot visualizing the loadings of features for the selected factor.
If type
is "histogram", the function returns a histogram displaying the distribution of the loadings for the selected factor.
The plot visualizes either omic.one
or omic.two
data based on the user input in the data
parameter.
The ggplot object can be further modified or directly plotted.
# Examples output_obj <- OmixCraftHD( vector_features = c(2000,3000), sigmas_vector=c(8,5), n_samples=100, n_factors=4, num.factor='multiple', advanced_dist='mixed' ) plot_weights(sim_object = output_obj, factor_num = 1, data = 'omic.one', type = 'scatter') plot_weights(sim_object = output_obj, factor_num = 1, data = 'omic.one', type = 'histogram') plot_weights(sim_object = output_obj, factor_num = 1, data = 'omic.two', type = 'scatter') plot_weights(sim_object = output_obj, factor_num = 1, data = 'omic.two', type = 'histogram')
# Examples output_obj <- OmixCraftHD( vector_features = c(2000,3000), sigmas_vector=c(8,5), n_samples=100, n_factors=4, num.factor='multiple', advanced_dist='mixed' ) plot_weights(sim_object = output_obj, factor_num = 1, data = 'omic.one', type = 'scatter') plot_weights(sim_object = output_obj, factor_num = 1, data = 'omic.one', type = 'histogram') plot_weights(sim_object = output_obj, factor_num = 1, data = 'omic.two', type = 'scatter') plot_weights(sim_object = output_obj, factor_num = 1, data = 'omic.two', type = 'histogram')