Title: | A Bootstrap-Based Power Estimation Tool for Spatial Transcriptomics |
---|---|
Description: | Power estimation and sample size calculation for 10X Visium Spatial Transcriptomics data to detect differential expressed genes between two conditions based on bootstrap resampling. See Shui et al. (2024) <doi:10.1101/2024.08.30.610564> for method details. |
Authors: | Lan Shui [aut, cre] |
Maintainer: | Lan Shui <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-02-07 07:21:21 UTC |
Source: | CRAN |
This function loads the power values with corresponding avg_log2FC and avg_PCT derived from bootstrap sampling and utilizes the scam package to fit two dimensional smoothing splines under monotone constraints: 1.positive relationship between power and avg_log2FC; 2.positive relationship between power and avg_PCT. The values of avg_log2FC and avg_PCT can be either from the averages of the bootstrap samples or from the original spatial transcriptomics data.
fit_powerest(power,avg_log2FC,avg_PCT,filter_zero=TRUE)
fit_powerest(power,avg_log2FC,avg_PCT,filter_zero=TRUE)
power |
The raw power values. |
avg_log2FC |
The corresponding log2FC values. |
avg_PCT |
The corresponding PCT values. |
filter_zero |
Whether the user would like to filter to remove the power values being 0, default=TRUE. |
A 'scam' object is the result of scam function. More information about the content of a 'scam' object can be found at the document of R package scam.
Lan Shui [email protected]
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT)
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT)
This function estimates the power values based on XGBoost under 3-dimensional monotone constraints upon avg_log2FC, avg_PCT and replicates. This function is recommended when there exist crossings between power surfaces fitted by 'fit_powerest' and used for estimating local power values.
fit_XGBoost(power,avg_log2FC,avg_PCT,replicates,filter_zero=TRUE, max.depth=6,eta=0.3,nround=100)
fit_XGBoost(power,avg_log2FC,avg_PCT,replicates,filter_zero=TRUE, max.depth=6,eta=0.3,nround=100)
power |
The raw power values. |
avg_log2FC |
The corresponding log2FC values. |
avg_PCT |
The corresponding PCT values. |
replicates |
The corresponding replicates number. |
filter_zero |
Whether the user would like to filter to remove the power values being 0. Default=TRUE. |
max.depth |
Maximum depth of a tree. Default=6. |
eta |
control the learning rate: scale the contribution of each tree by a factor of 0 < eta < 1 when it is added to the current approximation. Used to prevent overfitting by making the boosting process more conservative. Default=0.3. |
nround |
Max number of boosting iterations. |
A object of class 'xgb.Booster'. More information about the content of a 'xgb.Booster' object can be found at the document of R package xgboost.
Lan Shui [email protected]
data(power_example) # Fit the local power surface of avg_log2FC_abs between 1 and 2 avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2) # Fit the model bst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs, avg_PCT=power_example$mean_pct,replicates=power_example$sample_size)
data(power_example) # Fit the local power surface of avg_log2FC_abs between 1 and 2 avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2) # Fit the model bst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs, avg_PCT=power_example$mean_pct,replicates=power_example$sample_size)
This function creates 3d interactive plot of the power against other parameters based on 'plot_ly'.
plotly_powerest(pred,opacity=0.8,colors='BrBG',fig_title=NULL)
plotly_powerest(pred,opacity=0.8,colors='BrBG',fig_title=NULL)
pred |
The result from 'pred_powerest'. |
opacity |
The opacity of the graph, default=0.8. |
colors |
The color for the graph, default='BrBG'. |
fig_title |
The title of the graph, default=NULL. |
A 3d interactive plot of the power surface. Users can also plot multiple surfaces together to compare them.
Lan Shui [email protected]
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT) pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1)) plotly_powerest(pred,fig_title='Power estimation result')
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT) pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1)) plotly_powerest(pred,fig_title='Power estimation result')
A subset of power results with multiple replicates number from PoweREST
power_example
power_example
power_example
A data frame with 844 rows and 5 columns:
average log2FC
percentage of spots detecting the gene
number of replicates
power values
the absolute value of average log2FC
This function performs bootstrap resampling upon a Seurat subject under each condition to resemble the real dataset which allows the exact power calculation, and perform DE analysis. Users can specify the test they would like to perform for the DE analysis in '...' which should not contain min.pct and logfc.threshold or other parameters attempt to pre-filter genes, as we specify min.pct and logfc.threshold as 0s to calculate power for all the genes available. Therefore it may take one night to run if the ST data owns over thousands of genes. To speed up this process, one may want to try function 'PoweREST_subset' where the pre-filter of genes are included in this process.
PoweREST(Seurat_obj,cond,replicates=1,spots_num, iteration=100,random_seed=1,pvalue=0.05,...)
PoweREST(Seurat_obj,cond,replicates=1,spots_num, iteration=100,random_seed=1,pvalue=0.05,...)
Seurat_obj |
A Seurat object. |
cond |
The name of the variable that indicates different conditions which is also stored in the meta.data of the Seurat_obj and should be in character type. |
replicates |
The number of sample replicates per group. |
spots_num |
The number of spots per replicate. |
iteration |
The number of iterations of the resampling. |
random_seed |
To set a random seed. |
pvalue |
The pvalue that will be considered significant. |
... |
DE test to use other than the default Wilcoxon test. |
A list of values containing the power, average log2FC and percentage of spots detecting the gene among the resampling data, the replicate value and the spots number per slice specified by the user and corresponding genes' name.
Lan Shui [email protected]
This function performs bootstrap resampling upon a Seurat subject under each condition to resemble the real dataset which allows the exact power calculation, and perform DE analysis upon one gene specified by the user. Users can specify the test they would like to perform for the DE analysis in '...'. Note that the results are not multiple testing corrected, therefore should be interpreted carefully.
PoweREST_gene(Seurat_obj,cond,replicates=1,spots_num, gene_name,iteration=100,random_seed=1,pvalue=0.05,...)
PoweREST_gene(Seurat_obj,cond,replicates=1,spots_num, gene_name,iteration=100,random_seed=1,pvalue=0.05,...)
Seurat_obj |
A Seurat object. |
cond |
The name of the variable that indicates different conditions which is also stored in the meta.data of the Seurat_obj and should be in character type. |
replicates |
The number of sample replicates per group. |
spots_num |
The number of spots per replicate. |
gene_name |
Specify the name of gene for power calculation. |
iteration |
The number of iterations of the resampling. |
random_seed |
To set a random seed. |
pvalue |
The pvalue that will be considered significant. |
... |
DE Test to use other than the default Wilcoxon test. |
A list of values containing the power, average log2FC and percentage of spots detecting the gene among the resampling data, the replicate value and the spots number per slice specified by the user and corresponding gene's name.
Lan Shui [email protected]
This function performs bootstrap resampling upon a Seurat subject under each condition to resemble the real dataset which allows the exact power calculation, and perform DE analysis. Similar to 'PoweREST', users can specify the test they would like to perform for the DE analysis in '...' (more test options can be refered to Seurat. Different to 'PoweREST', users can specify the values of 'min.pct' and 'logfc.threshold' to pre-filter the genes based on their minimum detection rate 'min.pct' and at least X-fold difference (log-scale) ('logfc.threshold') across both groups. But this kind of filtering can miss weaker signals.
PoweREST_subset(Seurat_obj,cond,replicates=1,spots_num, iteration=100,random_seed=1,pvalue=0.05,logfc.threshold = 0.1, min.pct = 0.01,...)
PoweREST_subset(Seurat_obj,cond,replicates=1,spots_num, iteration=100,random_seed=1,pvalue=0.05,logfc.threshold = 0.1, min.pct = 0.01,...)
Seurat_obj |
A Seurat object. |
cond |
The name of the variable that indicates different conditions which is also stored in the meta.data of the Seurat_obj and should be in character type. |
replicates |
The number of sample replicates per group. |
spots_num |
The number of spots per replicate. |
iteration |
The number of iterations of the resampling. |
random_seed |
To set a random seed. |
pvalue |
The pvalue that will be considered significant. |
logfc.threshold |
For every resampling, limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups. Default is 0.1 Increasing logfc.threshold speeds up the function, but can miss weaker signals. |
min.pct |
For every resampling, only test genes that are detected in a minimum fraction of min.pct spots in either of the two populations. Meant to speed up the function by not testing genes that are very infrequently expressed. Default is 0.01. |
... |
DE test to use other than the default Wilcoxon test. |
A list of values containing the power, average log2FC and percentage of spots detecting the gene among the resampling data, the replicate value and the spots number per slice specified by the user and the filtered.
Lan Shui [email protected]
This function provides the prediction from the Seurat object which could be used for visualization by 'plotly_powerest' and 'vis_powerest' or the power result for your proposal or research. And it is a modified version of the scam library code predict.scam.
pred_powerest(x,n.grid=30,xlim=NULL,ylim=NULL)
pred_powerest(x,n.grid=30,xlim=NULL,ylim=NULL)
x |
A Seurat object. |
n.grid |
The grid note number within 'xlim' and 'ylim', default=30. |
xlim |
The range of the absolute value of log2FC used for prediction, default=NULL which means the original range. |
ylim |
The range of the avg_pct used for prediction, default=NULL which means the original range. |
The prediction values of the power.
Lan Shui [email protected] based partly on 'scam' by Natalya Pya
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT) pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1))
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT) pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1))
This function takes the result from 'fit_XGBoost' and make predictions.
pred_XGBoost(x,n.grid=30,xlim,ylim,replicates)
pred_XGBoost(x,n.grid=30,xlim,ylim,replicates)
x |
A object of class 'xgb.Booster'. |
n.grid |
The grid note number within 'xlim' and 'ylim', default=30. |
xlim |
The range of the absolute value of avg_log2FC used for prediction. |
ylim |
The range of the avg_pct used for prediction. |
replicates |
The replicates number. |
The power estimations from XGBoost.
Lan Shui [email protected]
data(power_example) # Fit the local power surface of avg_log2FC_abs between 1 and 2 avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2) # Fit the model bst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs, avg_PCT=power_example$mean_pct,replicates=power_example$sample_size) pred<-pred_XGBoost(bst,n.grid=30,xlim=c(0,1.5),ylim=c(0,0.1),replicates=3)
data(power_example) # Fit the local power surface of avg_log2FC_abs between 1 and 2 avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2) # Fit the model bst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs, avg_PCT=power_example$mean_pct,replicates=power_example$sample_size) pred<-pred_XGBoost(bst,n.grid=30,xlim=c(0,1.5),ylim=c(0,0.1),replicates=3)
A subset of power results from PoweREST by running PoweREST(Peri,cond='Condition', replicates=5,spots_num=80,iteration=2)
result_example
result_example
result_example
A data frame with ~20,000 rows and 3 columns:
power values
average log2FC
percentage of spots detecting the gene
This function takes the result from 'pred_powerest' and plots 2D views of it, supply ticktype="detailed" to get proper axis annotation and is a modified version of the 'scam' library code 'vis.scam'.
vis_powerest(x,color="heat",contour.col=NULL, se=-1,zlim=NULL,n.grid=30,col=NA,plot.type="persp", nCol=50,...)
vis_powerest(x,color="heat",contour.col=NULL, se=-1,zlim=NULL,n.grid=30,col=NA,plot.type="persp", nCol=50,...)
x |
A scam object. |
color |
The color of the plot which can be one of the "heat", "topo", "cm", "terrain", "gray" or "bw". |
contour.col |
The color of the contour plot when using plot.type="contour". |
se |
If less than or equal to zero then only the predicted surface is plotted, but if greater than zero, then 3 surfaces are plotted, one at the predicted values minus se standard errors, one at the predicted values and one at the predicted values plus se standard errors. |
zlim |
The range of power value the user want to show. |
n.grid |
The number of grid nodes in each direction used for calculating the plotted surface. |
col |
The colors for the facets of the plot. If this is NA then if se>0 the facets are transparent, otherwise the color scheme specified in color is used. If col is not NA then it is used as the facet color. |
plot.type |
One of "contour" or "persp". |
nCol |
The number of colors to use in color schemes. |
... |
Other arguments. |
A 2d plot of the power surface. More details can be seen at scam.
Lan Shui [email protected] based partly on 'scam' by Natalya Pya
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT) pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1)) vis_powerest(pred,theta=-30,phi=30,color='heat',ticktype = "detailed",xlim=c(0,6),nticks=5)
data(result_example) b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT) pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1)) vis_powerest(pred,theta=-30,phi=30,color='heat',ticktype = "detailed",xlim=c(0,6),nticks=5)
This function takes the result from 'pred_XGboost' and plots 2D/3D views of it,
vis_XGBoost(x,view='2D',legend_name='Power', xlab='avg_log2FC_abs',ylab='mean_pct')
vis_XGBoost(x,view='2D',legend_name='Power', xlab='avg_log2FC_abs',ylab='mean_pct')
x |
The result dataframe from 'pred_XGboost'. |
view |
determines plot 2D/3D view, default='2D'. |
legend_name |
The name of legend, default='Power'. |
xlab |
The name of xlab, default='avg_log2FC_abs'. |
ylab |
The name of ylab, default='mean_pct'. |
A 2D/3D plot of the power results from XGBoost.
Lan Shui [email protected]
data(power_example) # Fit the local power surface of avg_log2FC_abs between 1 and 2 avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2) # Fit the model bst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs, avg_PCT=power_example$mean_pct,replicates=power_example$sample_size) pred<-pred_XGBoost(bst,n.grid=30,xlim=c(0,1.5),ylim=c(0,0.1),replicates=3) vis_XGBoost(pred,view='2D',legend_name='Power',xlab='avg_log2FC_abs',ylab='mean_pct')
data(power_example) # Fit the local power surface of avg_log2FC_abs between 1 and 2 avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2) # Fit the model bst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs, avg_PCT=power_example$mean_pct,replicates=power_example$sample_size) pred<-pred_XGBoost(bst,n.grid=30,xlim=c(0,1.5),ylim=c(0,0.1),replicates=3) vis_XGBoost(pred,view='2D',legend_name='Power',xlab='avg_log2FC_abs',ylab='mean_pct')