Title: | Visualization of Distribution of Selected Model |
---|---|
Description: | Although model selection is ubiquitous in scientific discovery, the stability and uncertainty of the selected model is often hard to evaluate. How to characterize the random behavior of the model selection procedure is the key to understand and quantify the model selection uncertainty. This R package offers several graphical tools to visualize the distribution of the selected model. For example, Gplot(), Hplot(), VDSM_scatterplot() and VDSM_heatmap(). To the best of our knowledge, this is the first attempt to visualize such a distribution. About what distribution of selected model is and how it work please see Qin,Y.and Wang,L. (2021) "Visualization of Model Selection Uncertainty" <https://homepages.uc.edu/~qinyn/VDSM/VDSM.html>. |
Authors: | Linna Wang [aut, cre], Yichen Qin [aut] |
Maintainer: | Linna Wang <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.1 |
Built: | 2024-12-15 07:39:17 UTC |
Source: | CRAN |
Input a valid matrix
CheckInput(X, f, p)
CheckInput(X, f, p)
X |
A m*p matrix which each row represents one unique model with the elements either 0 or 1. |
f |
A vector with m elements contain each model's frequency in X. |
p |
The number of variate in the model |
The standardized matrix
DSM_plot plot the naive visualization of the distribution of selected model
DSM_plot( X, f, p, Anchor.model = NULL, circlesize = NULL, linewidth = NULL, fontsize = NULL )
DSM_plot( X, f, p, Anchor.model = NULL, circlesize = NULL, linewidth = NULL, fontsize = NULL )
X |
A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1. |
f |
A vector with m elements which represent each model's frequency in X. |
p |
The number of variate in the model |
Anchor.model |
A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency. |
circlesize |
customize the size of the circle in the plot, default is 10. |
linewidth |
Customize the width of the line in the plot, default is 1. |
fontsize |
Customize the size of the font in the circles, default is 1.5. |
A summarized information of the grouped models.
data(exampleX) X=exampleX data(examplef) f=examplef p=8 DSM_example1 = DSM_plot(X,f,p)
data(exampleX) X=exampleX data(examplef) f=examplef p=8 DSM_example1 = DSM_plot(X,f,p)
This small data set contains the frequencies of thoes m=30 models in exampleX data set.
examplef
examplef
One vector representing the information of f.
This small data set contains m=30 unique models and p=8 variates.
exampleX
exampleX
One matrix containing the information of X.
Plotting Gplot.
Gplot( X, f, p, Anchor.model = NULL, xlim = NULL, ylim = NULL, circlesize = NULL, linewidth = NULL, fontsize = NULL )
Gplot( X, f, p, Anchor.model = NULL, xlim = NULL, ylim = NULL, circlesize = NULL, linewidth = NULL, fontsize = NULL )
X |
A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1. |
f |
A vector with m elements which represent each model's frequency in X. |
p |
The number of variate in the model. |
Anchor.model |
A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency. |
xlim |
A vector with two elements which determine the range of x-axis in the plot. |
ylim |
A vector with two elements which determine the range of y-axis in the plot. |
circlesize |
customize the size of the circle in the plot, default is 10. |
linewidth |
Customize the width of the line in the plot, default is 1. |
fontsize |
Customize the size of the font in the circles, default is 1.5. |
A list with components
Gplot.info |
The table includes all the information about each group, i.e., the total possible number of models in the group and the actual existing number of model in the group. |
MC.histogram |
The frequency of model complexity. |
HD.histogram |
The frequency of Hamming distance. |
data(exampleX) X=exampleX data(examplef) f=examplef p=8 G_example1 = Gplot(X,f,p) G_example2 = Gplot(X,f,p,xlim=c(0,7),ylim=c(3,8)) G_example3 = Gplot(X,f,p,xlim=c(0,7),ylim=c(3,8),circlesize=15,linewidth=2,fontsize=3)
data(exampleX) X=exampleX data(examplef) f=examplef p=8 G_example1 = Gplot(X,f,p) G_example2 = Gplot(X,f,p,xlim=c(0,7),ylim=c(3,8)) G_example3 = Gplot(X,f,p,xlim=c(0,7),ylim=c(3,8),circlesize=15,linewidth=2,fontsize=3)
Group the given models
Groupinfo(X, f, p, Anchor.model = NULL)
Groupinfo(X, f, p, Anchor.model = NULL)
X |
A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1. |
f |
A vector with m elements which represent each model's frequency in X. |
p |
The number of variate in the model |
Anchor.model |
A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency. |
A summarized information of the grouped models.
Plotting Hplot.
Hplot( X, f, p, Anchor.model = NULL, xlim = NULL, ylim = NULL, circlesize = NULL, linewidth = NULL, fontsize = NULL )
Hplot( X, f, p, Anchor.model = NULL, xlim = NULL, ylim = NULL, circlesize = NULL, linewidth = NULL, fontsize = NULL )
X |
A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1. |
f |
A vector with m elements which represent each model's frequency in X. |
p |
The number of variate in the model. |
Anchor.model |
A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency. |
xlim |
A vector with two elements which determine the range of x-axis in the plot. |
ylim |
A vector with two elements which determine the range of y-axis in the plot. |
circlesize |
customize the size of the circle in the plot, default is 10. |
linewidth |
Customize the width of the line in the plot, default is 1. |
fontsize |
Customize the size of the font in the circles, default is 1.5. |
A list with components
Hplot.info |
The table includes all the information about each group, i.e., the total possible number of models in the group and the actual existing number of model in the group. |
Hplus.histogram |
The frequency of Hamming distance plus. |
Hminus.histogram |
The frequency of Hamming distance minus. |
data(exampleX) X=exampleX data(examplef) f=examplef p=8 H_example1 = Hplot(X,f,p) H_example2 = Hplot(X,f,p,xlim=c(0,4),ylim=c(0,2)) H_example3 = Hplot(X,f,p,xlim=c(0,4),ylim=c(0,2),circlesize=15,linewidth=2,fontsize=3)
data(exampleX) X=exampleX data(examplef) f=examplef p=8 H_example1 = Hplot(X,f,p) H_example2 = Hplot(X,f,p,xlim=c(0,4),ylim=c(0,2)) H_example3 = Hplot(X,f,p,xlim=c(0,4),ylim=c(0,2),circlesize=15,linewidth=2,fontsize=3)
Plotting the VDSM-heatmap.
VDSM_heatmap( X, f, p, Anchor.estimate, xlim = NULL, ylim = NULL, Anchor.model = NULL, fontsize = NULL )
VDSM_heatmap( X, f, p, Anchor.estimate, xlim = NULL, ylim = NULL, Anchor.model = NULL, fontsize = NULL )
X |
A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1. |
f |
A vector with m elements which represent each model's frequency in X. |
p |
The number of variate in the model. |
Anchor.estimate |
An estimation for the anchor model. |
xlim |
A vector with two elements which determine the range of x-axis in the plot. |
ylim |
A vector with two elements which determine the range of y-axis in the plot. |
Anchor.model |
A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency. |
fontsize |
Customize the size of the font in the circles, default is 1.5. |
A list with components
Heatmap.info |
The table includes all the information about each group, i.e., the total possible number of models in the group and the actual existing number of model in the group. |
Hplus.histogram |
The frequency of Hamming distance plus. |
Hminus.weighted.histogram |
The frequency of Hamming distance minus-weighted. |
data(exampleX) X=exampleX data(examplef) f=examplef p=8 Anchor.estimate=c(3,2.5,2,1.5,1,0,0,0) Heatmap_example1 = VDSM_heatmap(X,f,p,Anchor.estimate) Heatmap_example2 = VDSM_heatmap(X,f,p,Anchor.estimate,fontsize=3) Heatmap_example3 = VDSM_heatmap(X,f,p,Anchor.estimate,xlim=c(0,5),ylim=c(0,5),fontsize=3)
data(exampleX) X=exampleX data(examplef) f=examplef p=8 Anchor.estimate=c(3,2.5,2,1.5,1,0,0,0) Heatmap_example1 = VDSM_heatmap(X,f,p,Anchor.estimate) Heatmap_example2 = VDSM_heatmap(X,f,p,Anchor.estimate,fontsize=3) Heatmap_example3 = VDSM_heatmap(X,f,p,Anchor.estimate,xlim=c(0,5),ylim=c(0,5),fontsize=3)
Report VDSM-Scatter-heatmap-infomation
VDSM_scatter_heat(X, f, p, Anchor.estimate, Anchor.model = NULL)
VDSM_scatter_heat(X, f, p, Anchor.estimate, Anchor.model = NULL)
X |
A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1. |
f |
A vector with m elements which represent each model's frequency in X. |
p |
The number of variate in the model |
Anchor.estimate |
An estimation for the anchor model |
Anchor.model |
A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency. |
A list of information which helps to plot VDSM-Scatter-heatmap.
Plotting the VDSM-Scatterplot.
VDSM_scatterplot( X, f, p, Anchor.estimate, xlim = NULL, ylim = NULL, Anchor.model = NULL, circlesize = NULL, fontsize = NULL )
VDSM_scatterplot( X, f, p, Anchor.estimate, xlim = NULL, ylim = NULL, Anchor.model = NULL, circlesize = NULL, fontsize = NULL )
X |
A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1. |
f |
A vector with m elements which represent each model's frequency in X. |
p |
The number of variate in the model. |
Anchor.estimate |
An estimation for the anchor model. |
xlim |
A vector with two elements which determine the range of x-axis in the plot. |
ylim |
A vector with two elements which determine the range of y-axis in the plot. |
Anchor.model |
A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency. |
circlesize |
customize the size of the circle in the plot, default is 10. |
fontsize |
Customize the size of the font in the circles, default is 1.5. |
A list with components
Scatterplot.info |
The table includes all the information about each group, i.e., the total possible number of models in the group and the actual existing number of model in the group. |
Hplus.histogram |
The frequency of Hamming distance plus. |
Hminus.weighted.histogram |
The frequency of Hamming distance minus-weighted. |
data(exampleX) X=exampleX data(examplef) f=examplef p=8 Anchor.estimate=c(3,2.5,2,1.5,1,0,0,0) Scatter_example1 = VDSM_scatterplot(X,f,p,Anchor.estimate) Scatter_example2 = VDSM_scatterplot(X,f,p,Anchor.estimate,xlim=c(0,5), ylim=c(0,8),circlesize=15,fontsize=2)
data(exampleX) X=exampleX data(examplef) f=examplef p=8 Anchor.estimate=c(3,2.5,2,1.5,1,0,0,0) Scatter_example1 = VDSM_scatterplot(X,f,p,Anchor.estimate) Scatter_example2 = VDSM_scatterplot(X,f,p,Anchor.estimate,xlim=c(0,5), ylim=c(0,8),circlesize=15,fontsize=2)