Package 'ganDataModel'

Title: Build a Metric Subspaces Data Model for a Data Source
Description: Neural networks are applied to create a density value function which approximates density values for a data source. The trained neural network is analyzed for different levels. For each level metric subspaces with density values above a level are determined. The obtained set of metric subspaces and the trained neural network are assembled into a data model. A prerequisite is the definition of a data source, the generation of generative data and the calculation of density values. These tasks are executed using package 'ganGenerativeData' <https://cran.r-project.org/package=ganGenerativeData>.
Authors: Werner Mueller
Maintainer: Werner Mueller <[email protected]>
License: GPL (>= 2)
Version: 1.1.7
Built: 2024-11-18 06:38:33 UTC
Source: CRAN

Help Index


Build a Metric Subspaces Data Model for a Data Source

Description

Neural networks are applied to create a density value function which approximates density values for a data source. The trained neural network is analyzed for different levels. For each level metric subspaces with density values above a level are determined. The obtained set of metric subspaces and the trained neural network are assembled into a data model. A prerequisite is the definition of a data source, the generation of generative data and the calculation of density values. These tasks are executed using package 'ganGenerativeData' <https://cran.r-project.org/package=ganGenerativeData>.

Properties of built metric subspaces:

1. They contain data with continuously varying density values above a level.
2. They have the topological property connected. In topology a space is connected when it cannot be represented as the union of disjoint open subspaces.
3. An inclusion relation is defined on them by levels. Higher level metric subspaces are contained in lower level ones.

The inserted images show two-dimensional projections of generative data contained in metric subspaces with assigned labels for the iris dataset.





dm12.pngdm34.pngdm12c.pngdm34c.png

Details

The API includes main functions dmTrain() and dmBuildMetricSubspaces(). dmTrain() trains a neural network that approximates density values for a data source. dmBuildMetricSubspaces() analyzes the trained neural network for a level and determines metric subspaces with density values above a level. The API is used as follows:

1. Prerequisite for building a metric subspaces data model: Create a data source, generate generative data and calculate density values using package ganGenerativeData

dsCreateWithDataFrame() Create a data source with passed data frame.

dsDeactivateColumns() Deactivate columns of a data source in order to exclude them in generation of generative data. In current version only columns with values of type double or float can be used in generation of generative data. All columns with values of other type have to be deactivated.

dsWrite() Write created data source including settings of active columns to a file in binary format.

gdGenerate() Read a data source from a file, generate generative data for the data source in iterative training steps and write generated data to a file in binary format.

gdCalculateDensityValues() Read generative data from a file, calculate density values and write generative data with assigned density values to original file.

2. Build a metric subspaces data model

dmTrain() Read a data source and generative data from files, train a neural network which approximates density values for a data source in iterative training steps, create a data model containing the trained neural network and write it to a file in binary format.

dmBuildMetricSubspaces() Read a data model and generative data from files, analyze the trained neural network in the data model for a level, determine metric subspaces with density values above a level, add obtained metric subspaces to the data model and write it to original file.

dmRemoveMetricSubspaces() Remove metric subspaces in a data model for a level.

dmRead() Read a data model and generative data from files.

dmGetLevels() Get levels for metric subspaces in a data model.

dmGetMetricSubspacesProperties() Get metric subspace properties in a data model for a level.

dmGetContainedInMetricSubspaces() Get metric subspaces in a data model in which a data record is contained.

dmPlotMetricSubspaceParameters() Specify plot parameters for metric subspaces for a level.

dmPlotEvaluateDataSourceParameters() Specify plot parameters for evaluated data source.

dmPlotMetricSubspaces() Create an image file containing two-dimensional projections of generative data contained in metric subspaces and evaluated data source.

dmReset() Reset API.

Author(s)

Werner Mueller

Maintainer: Werner Mueller <[email protected]>

References

Package 'ganGenerativeData' <https://cran.r-project.org/package=ganGenerativeData>

Examples

# Environment used for execution of examples:

# Operating system: Ubuntu 22.04.1
# Compiler: g++ 11.3.0 (supports C++17 standard)
# R applications: R 4.1.2, RStudio 2022.02.2
# Installed packages: 'Rcpp' 1.0.11, 'tensorflow' 2.11.0,
# 'ganGenerativeData' 2.0.2, 'ganDataModel' 1.1.7

# Package 'tensorflow' provides an interface to machine learning framework
# TensorFlow. To complete the installation function install_tensorflow() has to
# be called.
## Not run: 
library(tensorflow)
install_tensorflow()
## End(Not run)

# 1. Prerequisite for building a metric subspaces data model for the iris
# dataset: Create a data source, generate generative data and calculate density
# values for the iris dataset.

# Load library
## Not run: 
library(ganGenerativeData)
## End(Not run)

# Create a data source with passed iris data frame.
## Not run: 
dsCreateWithDataFrame(iris)
## End(Not run)

# Deactivate the column with index 5 and name Species in order to exclude it in
# generation of generative data.
## Not run: 
dsDeactivateColumns(c(5))
## End(Not run)

# Write the data source including settings of active columns to file "ds.bin" in
# binary format.
## Not run: 
dsWrite("ds.bin")
## End(Not run)

# Read data source from file "ds.bin", train a generative model in iterative
# training steps (used number of iterations in tests is in the range of 10000 to
# 50000), write trained generative model and generated data in training steps to
# files "gm.bin" and "gd.bin".
## Not run: 
gdTrain("gm.bin", "gd.bin", "ds.bin", c(1, 2), gdTrainParameters(1000))
## End(Not run)
# Read generative data from file "gd.bin", calculate density values and
# write generative data with density values to original file.
## Not run: 
gdCalculateDensityValues("gd.bin")
## End(Not run)

# 2. Build a metric subspaces data model for the iris data set

# Load  library
## Not run: 
library(ganDataModel)
## End(Not run)

# Read a data source and generative data from files "ds.bin" and "gd.bin",
# train a neural network which approximates density values for a data source
# in iterative training steps (used number of iterations in tests is in the
# range of 250000 to 300000), create a data model containing the trained neural
# network and write it to a file "dm.bin" in binary format.
## Not run: 
dmTrain("dm.bin", "ds.bin", "gd.bin", 10000)
## End(Not run)

# Read a data model and generative data from files "dm.bin" and "gd.bin",
# build metric subspaces for level 0.7,
# add obtained metric subspaces to the data model
# and write it to original file.
## Not run: 
dmBuildMetricSubspaces("dm.bin", 0.67, "gd.bin")
## End(Not run)

# Read a data model and generative data from files "dm.bin" and "gd,bin".
# Read in data is accessed in function dmPlotMetricSubspaces.
## Not run: 
dmRead("dm.bin", "gd.bin")
## End(Not run)

# Create an image showing a two-dimensional projection of generative data
# contained in metric subspaces fpr level 0.67 for column indices 3, 4 and write
# it to file "ms.png".
## Not run: 
dmPlotMetricSubspaces(
  list(dmPlotMetricSubspaceParameters(level = 0.67,
                                       labels = c("*"),
                                       percent = 100,
                                       boundary = TRUE,
                                       color = "red",
                                       backgroundPercent = 0,
                                       backgroundColor = "red",
                                       backgroundReset = TRUE,
                                       plotLabels = TRUE)),
  "msl.png",
  "Metric Subspaces for the Iris Dataset",
  c(3, 4),
  "ds.bin",
  dmPlotEvaluateDataSourceParameters(0.67))
## End(Not run)

# Read a data model and generative data from files "dm.bin" and "gd.bin",
# build metric subspaces for level 0.71,
# add obtained metric subspaces to the data model
# and write it to original file.
## Not run: 
dmBuildMetricSubspaces("dm.bin", 0.71, "gd.bin")
## End(Not run)

# Read a data model and generative data from files "dm.bin" and "gd,bin".
# Read in data is accessed in function dmPlotMetricSubspaces.
## Not run: 
dmRead("dm.bin", "gd.bin")
## End(Not run)

# Create an image showing a two-dimensional projection of generative data
# contained in metric subspaces for levels 0.67, 0.71 for column indices 3, 4
# and write it to file "msls.png".
## Not run: 
dmPlotMetricSubspaces(
  list(dmPlotMetricSubspaceParameters(level = 0.67,
                                       labels = c("*"),
                                       percent = 100,
                                       boundary = TRUE,
                                       color = "red",
                                       backgroundPercent = 0,
                                       backgroundColor = "red",
                                       backgroundReset = TRUE,
                                       plotLabels = TRUE),
       dmPlotMetricSubspaceParameters(level = 0.71,
                                       labels = c("*"),
                                       percent = 100,
                                       boundary = TRUE,
                                       color = "green",
                                       backgroundPercent = 5,
                                       backgroundColor = "red",
                                       backgroundReset = TRUE,
                                       plotLabels = TRUE)),                                       
  "msls.png",
  "Metric Subspaces for the Iris Dataset",
  c(3, 4),
  "ds.bin",
  dmPlotEvaluateDataSourceParameters(0.67))
## End(Not run)

Build metric subspaces for a level

Description

Read a data model and generative data from files, analyze the contained neural network in the data model for a level, determine metric subspaces with density values above a level, add obtained metric subspaces to the data model and write it to original file.

Usage

dmBuildMetricSubspaces(dataModelFileName, level, generativeDataFileName)

Arguments

dataModelFileName

Name of data model file

level

Level

generativeDataFileName

Name of generative data file

Value

None

Examples

## Not run: 
dmBuildMetricSubspaces("dm.bin", 0.7, "gd.bin")
## End(Not run)

Calculate a density value for a data record

Description

Calculate a density value for a data record by evaluating the contained neural network in a data model.

Usage

dmCalculateDensityValue(dataRecord)

Arguments

dataRecord

List containing a data record

Value

Normalized density value

Examples

## Not run: 
dmRead("dm.bin", "gd.bin")
dmCalculateDensityValue(list(4.4, 2.9, 1.4, 0.3))
## End(Not run)

Get metric subspaces in which a data record is contained

Description

Determine in which metric subspaces in a data model a data record is contained.

Usage

dmGetContainedInMetricSubspaces(dataRecord)

Arguments

dataRecord

List of a data record

Value

List of list containing level and label of metric subspaces

Examples

## Not run: 
dmRead("dm.bin", "gd.bin")
dmGetContainedInMetricSubspaces(list(4.4, 2.9, 1.4, 0.3))
## End(Not run)

Get levels for metric subspaces

Description

Get levels for metric subspaces in a data model.

Usage

dmGetLevels()

Value

Vector of levels

Examples

## Not run: 
dmRead("dm.bin", "gd.bin")
dmGetLevels()
## End(Not run)

Get metric subspace properties for a level

Description

Get properties of metric subspaces in a data model for a level.

Usage

dmGetMetricSubspaceProperties(level)

Arguments

level

Level for metric subspaces

Value

List of list containing label and size of contained generative data for metric subspaces

Examples

## Not run: 
dmRead("dm.bin", "gd.bin")
dmGetMetricSubspaceProperties(0.73)
## End(Not run)

Specify plot parameters for evaluated data source

Description

Specify plot parameters for evaluated data source passed to dmPlotMetricSubspaces().

Usage

dmPlotEvaluateDataSourceParameters(level = 0, color = "blue")

Arguments

level

Level for evaluation

color

Color for data points of evaluaded data source

Value

List of plot parameters for evaluated data source

Examples

## Not run: 
dmPlotEvaluateDataSourceParameters()
## End(Not run)

Specify plot parameters for metric subspaces for a level

Description

Specify plot parameters for metric subspaces in a data model for a level. A list of plot parameters is created for different levels and passed to dmPlotMetricSubspaces().

Usage

dmPlotMetricSubspaceParameters(
  level,
  labels = c("*"),
  percent = 10,
  boundary = TRUE,
  color = "red",
  backgroundPercent = 0,
  backgroundColor = "red",
  backgroundReset = TRUE,
  plotLabels = TRUE
)

Arguments

level

Level for metric subspaces.

labels

Vector of labels for metric subspaces. The default vector contains the wildcard character * which includes all labels.

percent

Percent of randomly selected data points of generative data contained in metric subspaces

boundary

Boolean value indicating if only data points of metric subspace boundaries should be selected

color

Color for data points of generative data contained in metric subspaces

backgroundPercent

Percent of randomly selected data points of generative data contained in metric subspaces for background

backgroundColor

Color for data points of generative data contained in metric subspaces for background

backgroundReset

Before data points for a metric subspace are drawn reset its background.

plotLabels

Boolean value indicating if labels for metric subspaces for a level should be displayed

Value

List of plot parameters for metric subspaces

Examples

## Not run: 
dmPlotMetricSubspaceParameters(0.73)
## End(Not run)

Create an image file for metric subspaces

Description

Create an image file containing two-dimensional projections of generative data contained in metric subspaces in a data model and optionally an evaluated data source. Plot parameters are passed by a list of generated plot parameters for different levels by dmPlotMetricSubspaceParameters() and by dmPlotEvaluateDataSourceParameters(). Data points are drawn in the order generative data contained in metric subspaces by increasing level and evaluated data source.

Usage

dmPlotMetricSubspaces(
  plotMetricSubspaceParametersList = list(),
  imageFileName,
  title,
  columnIndices,
  evaluateDataSourceFileName = "",
  plotEvaluateDataSourceParameters = NULL
)

Arguments

plotMetricSubspaceParametersList

List of plot parameters for metric subspaces for different levels, see dmPlotMetricSubspaceParameters().

imageFileName

Name of image file

title

Title of image

columnIndices

Vector of two column indices that are used for the two-dimensional projection. Indices refer to indices of active columns of the data source used to create the data model.

evaluateDataSourceFileName

Name of evaluated data source file

plotEvaluateDataSourceParameters

Plot parameters for evaluated data source, see dmPlotEvaluateDataSourceParameters().

Value

None

Examples

## Not run: 
dmRead("dm.bin", "gd.bin")
dmPlotMetricSubspaces(
  list(dmPlotMetricSubspaceParameters(level = 0.7,
                                       labels = c("*"),
                                       percent = 50,
                                       boundary = TRUE,
                                       color = "red",
                                       backgroundPercent = 0,
                                       backgroundColor = "red",
                                       backgroundReset = TRUE,
                                       plotLabels = TRUE)),
  "ms.png",
  "Metric Subspaces for the Iris Dataset",
  c(3, 4),
  "ds.bin",
  dmPlotEvaluateDataSourceParameters(0.67))
## End(Not run)

Read a data model and generative data

Description

Read a data model and generative data from files. This function has to be called before calling API functions when file names for a data model and gernerative data are not passed to functions directly.

Usage

dmRead(dataModelFileName, generativeDataFileName)

Arguments

dataModelFileName

Name of data model file

generativeDataFileName

Name of generative data file

Value

None

Examples

## Not run: 
dmRead("dm.bin", "gd.bin")
## End(Not run)

Remove metric subspaces for a level

Description

Read a data model from file, remove metric subspaces in the data model for a level and write it to original file.

Usage

dmRemoveMetricSubspaces(dataModelFileName, level)

Arguments

dataModelFileName

Name of data model file

level

Level

Value

None

Examples

## Not run: 
dmRemoveMetricSubspaces("dm.bin", 0.7)
## End(Not run)

Reset API

Description

Reset API

Usage

dmReset()

Value

None

Examples

## Not run: 
dmReset()
## End(Not run)

Train a neural network which approximates density values for a data source

Description

Read a data source and generative data from files, train a neural network which approximates density values for a data source in iterative training steps, create a data model containing the trained neural network and write it to a file in binary format.

Usage

dmTrain(
  dataModelFileName,
  dataSourceFileName,
  generativeDataFileName,
  numberOfIterations
)

Arguments

dataModelFileName

Name of data model file

dataSourceFileName

Name of data source file

generativeDataFileName

Name of generative data file

numberOfIterations

Number of iterations.

Value

None

Examples

## Not run: 
dmTrain("dm.bin", "ds.bin", "gd.bin", 10000)
## End(Not run)