Package 'VDSM'

Title: Visualization of Distribution of Selected Model
Description: Although model selection is ubiquitous in scientific discovery, the stability and uncertainty of the selected model is often hard to evaluate. How to characterize the random behavior of the model selection procedure is the key to understand and quantify the model selection uncertainty. This R package offers several graphical tools to visualize the distribution of the selected model. For example, Gplot(), Hplot(), VDSM_scatterplot() and VDSM_heatmap(). To the best of our knowledge, this is the first attempt to visualize such a distribution. About what distribution of selected model is and how it work please see Qin,Y.and Wang,L. (2021) "Visualization of Model Selection Uncertainty" <https://homepages.uc.edu/~qinyn/VDSM/VDSM.html>.
Authors: Linna Wang [aut, cre], Yichen Qin [aut]
Maintainer: Linna Wang <[email protected]>
License: GPL (>= 2)
Version: 0.1.1
Built: 2024-10-16 06:37:48 UTC
Source: CRAN

Help Index


Check if the input is valid or not

Description

Input a valid matrix

Usage

CheckInput(X, f, p)

Arguments

X

A m*p matrix which each row represents one unique model with the elements either 0 or 1.

f

A vector with m elements contain each model's frequency in X.

p

The number of variate in the model

Value

The standardized matrix


DSM_plot plot the naive visualization of the distribution of selected model

Description

DSM_plot plot the naive visualization of the distribution of selected model

Usage

DSM_plot(
  X,
  f,
  p,
  Anchor.model = NULL,
  circlesize = NULL,
  linewidth = NULL,
  fontsize = NULL
)

Arguments

X

A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1.

f

A vector with m elements which represent each model's frequency in X.

p

The number of variate in the model

Anchor.model

A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency.

circlesize

customize the size of the circle in the plot, default is 10.

linewidth

Customize the width of the line in the plot, default is 1.

fontsize

Customize the size of the font in the circles, default is 1.5.

Value

A summarized information of the grouped models.

Examples

data(exampleX)
X=exampleX
data(examplef)
f=examplef
p=8
DSM_example1 = DSM_plot(X,f,p)

examplef

Description

This small data set contains the frequencies of thoes m=30 models in exampleX data set.

Usage

examplef

Format

One vector representing the information of f.


exampleX

Description

This small data set contains m=30 unique models and p=8 variates.

Usage

exampleX

Format

One matrix containing the information of X.


Gplot.

Description

Plotting Gplot.

Usage

Gplot(
  X,
  f,
  p,
  Anchor.model = NULL,
  xlim = NULL,
  ylim = NULL,
  circlesize = NULL,
  linewidth = NULL,
  fontsize = NULL
)

Arguments

X

A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1.

f

A vector with m elements which represent each model's frequency in X.

p

The number of variate in the model.

Anchor.model

A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency.

xlim

A vector with two elements which determine the range of x-axis in the plot.

ylim

A vector with two elements which determine the range of y-axis in the plot.

circlesize

customize the size of the circle in the plot, default is 10.

linewidth

Customize the width of the line in the plot, default is 1.

fontsize

Customize the size of the font in the circles, default is 1.5.

Value

A list with components

Gplot.info

The table includes all the information about each group, i.e., the total possible number of models in the group and the actual existing number of model in the group.

MC.histogram

The frequency of model complexity.

HD.histogram

The frequency of Hamming distance.

Examples

data(exampleX)
X=exampleX
data(examplef)
f=examplef
p=8
G_example1 = Gplot(X,f,p)
G_example2 = Gplot(X,f,p,xlim=c(0,7),ylim=c(3,8))
G_example3 = Gplot(X,f,p,xlim=c(0,7),ylim=c(3,8),circlesize=15,linewidth=2,fontsize=3)

Group the models according to their Hamming distance and Model complexity to the anchor model

Description

Group the given models

Usage

Groupinfo(X, f, p, Anchor.model = NULL)

Arguments

X

A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1.

f

A vector with m elements which represent each model's frequency in X.

p

The number of variate in the model

Anchor.model

A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency.

Value

A summarized information of the grouped models.


Hplot.

Description

Plotting Hplot.

Usage

Hplot(
  X,
  f,
  p,
  Anchor.model = NULL,
  xlim = NULL,
  ylim = NULL,
  circlesize = NULL,
  linewidth = NULL,
  fontsize = NULL
)

Arguments

X

A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1.

f

A vector with m elements which represent each model's frequency in X.

p

The number of variate in the model.

Anchor.model

A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency.

xlim

A vector with two elements which determine the range of x-axis in the plot.

ylim

A vector with two elements which determine the range of y-axis in the plot.

circlesize

customize the size of the circle in the plot, default is 10.

linewidth

Customize the width of the line in the plot, default is 1.

fontsize

Customize the size of the font in the circles, default is 1.5.

Value

A list with components

Hplot.info

The table includes all the information about each group, i.e., the total possible number of models in the group and the actual existing number of model in the group.

Hplus.histogram

The frequency of Hamming distance plus.

Hminus.histogram

The frequency of Hamming distance minus.

Examples

data(exampleX)
X=exampleX
data(examplef)
f=examplef
p=8
H_example1 = Hplot(X,f,p)
H_example2 = Hplot(X,f,p,xlim=c(0,4),ylim=c(0,2))
H_example3 = Hplot(X,f,p,xlim=c(0,4),ylim=c(0,2),circlesize=15,linewidth=2,fontsize=3)

VDSM-heatmap.

Description

Plotting the VDSM-heatmap.

Usage

VDSM_heatmap(
  X,
  f,
  p,
  Anchor.estimate,
  xlim = NULL,
  ylim = NULL,
  Anchor.model = NULL,
  fontsize = NULL
)

Arguments

X

A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1.

f

A vector with m elements which represent each model's frequency in X.

p

The number of variate in the model.

Anchor.estimate

An estimation for the anchor model.

xlim

A vector with two elements which determine the range of x-axis in the plot.

ylim

A vector with two elements which determine the range of y-axis in the plot.

Anchor.model

A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency.

fontsize

Customize the size of the font in the circles, default is 1.5.

Value

A list with components

Heatmap.info

The table includes all the information about each group, i.e., the total possible number of models in the group and the actual existing number of model in the group.

Hplus.histogram

The frequency of Hamming distance plus.

Hminus.weighted.histogram

The frequency of Hamming distance minus-weighted.

Examples

data(exampleX)
X=exampleX
data(examplef)
f=examplef
p=8
Anchor.estimate=c(3,2.5,2,1.5,1,0,0,0)
Heatmap_example1 = VDSM_heatmap(X,f,p,Anchor.estimate)
Heatmap_example2 = VDSM_heatmap(X,f,p,Anchor.estimate,fontsize=3)
Heatmap_example3 = VDSM_heatmap(X,f,p,Anchor.estimate,xlim=c(0,5),ylim=c(0,5),fontsize=3)

VDSM-Scatter-heatmap-info

Description

Report VDSM-Scatter-heatmap-infomation

Usage

VDSM_scatter_heat(X, f, p, Anchor.estimate, Anchor.model = NULL)

Arguments

X

A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1.

f

A vector with m elements which represent each model's frequency in X.

p

The number of variate in the model

Anchor.estimate

An estimation for the anchor model

Anchor.model

A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency.

Value

A list of information which helps to plot VDSM-Scatter-heatmap.


VDSM-Scatterplot.

Description

Plotting the VDSM-Scatterplot.

Usage

VDSM_scatterplot(
  X,
  f,
  p,
  Anchor.estimate,
  xlim = NULL,
  ylim = NULL,
  Anchor.model = NULL,
  circlesize = NULL,
  fontsize = NULL
)

Arguments

X

A m*p matrix which contains m different p-dimensional models. All the elements are either 0 or 1.

f

A vector with m elements which represent each model's frequency in X.

p

The number of variate in the model.

Anchor.estimate

An estimation for the anchor model.

xlim

A vector with two elements which determine the range of x-axis in the plot.

ylim

A vector with two elements which determine the range of y-axis in the plot.

Anchor.model

A vector containing p elements with either 1 or 0 value and must be found in X. Default is the model with the highest frequency.

circlesize

customize the size of the circle in the plot, default is 10.

fontsize

Customize the size of the font in the circles, default is 1.5.

Value

A list with components

Scatterplot.info

The table includes all the information about each group, i.e., the total possible number of models in the group and the actual existing number of model in the group.

Hplus.histogram

The frequency of Hamming distance plus.

Hminus.weighted.histogram

The frequency of Hamming distance minus-weighted.

Examples

data(exampleX)
X=exampleX
data(examplef)
f=examplef
p=8
Anchor.estimate=c(3,2.5,2,1.5,1,0,0,0)
Scatter_example1 = VDSM_scatterplot(X,f,p,Anchor.estimate)
Scatter_example2 = VDSM_scatterplot(X,f,p,Anchor.estimate,xlim=c(0,5),
ylim=c(0,8),circlesize=15,fontsize=2)