Title: | Build a Metric Subspaces Data Model for a Data Source |
---|---|
Description: | Neural networks are applied to create a density value function which approximates density values for a data source. The trained neural network is analyzed for different levels. For each level metric subspaces with density values above a level are determined. The obtained set of metric subspaces and the trained neural network are assembled into a data model. A prerequisite is the definition of a data source, the generation of generative data and the calculation of density values. These tasks are executed using package 'ganGenerativeData' <https://cran.r-project.org/package=ganGenerativeData>. |
Authors: | Werner Mueller |
Maintainer: | Werner Mueller <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.7 |
Built: | 2024-11-18 06:38:33 UTC |
Source: | CRAN |
Neural networks are applied to create a density value function which approximates density values for a data source. The trained neural network is analyzed for different levels. For each level metric subspaces with density values above a level are determined. The obtained set of metric subspaces and the trained neural network are assembled into a data model. A prerequisite is the definition of a data source, the generation of generative data and the calculation of density values. These tasks are executed using package 'ganGenerativeData' <https://cran.r-project.org/package=ganGenerativeData>.
Properties of built metric subspaces:
1. They contain data with continuously varying density values above a level.
2. They have the topological property connected. In topology a space is connected when it cannot be represented as the union of disjoint open subspaces.
3. An inclusion relation is defined on them by levels. Higher level metric subspaces are contained in lower level ones.
The inserted images show two-dimensional projections of generative data contained in metric subspaces with assigned labels for the iris dataset.
The API includes main functions dmTrain()
and dmBuildMetricSubspaces()
. dmTrain()
trains a neural network that approximates density values for a data source. dmBuildMetricSubspaces()
analyzes the trained neural network for a level and determines metric subspaces with density values above a level. The API is used as follows:
1. Prerequisite for building a metric subspaces data model: Create a data source, generate generative data and calculate density values using package ganGenerativeData
dsCreateWithDataFrame()
Create a data source with passed data frame.
dsDeactivateColumns()
Deactivate columns of a data source in order to exclude them in generation of generative data. In current version only columns with values of type double or float can be used in generation of generative data. All columns with values of other type have to be deactivated.
dsWrite()
Write created data source including settings of active columns to a file in binary format.
gdGenerate()
Read a data source from a file, generate generative data for the data source in iterative training steps and write generated data to a file in binary format.
gdCalculateDensityValues()
Read generative data from a file, calculate density values and write generative data with assigned density values to original file.
2. Build a metric subspaces data model
dmTrain()
Read a data source and generative data from files,
train a neural network which approximates density values for a data source in iterative training steps,
create a data model containing the trained neural network and write it to a file in binary format.
dmBuildMetricSubspaces()
Read a data model and generative data from files,
analyze the trained neural network in the data model for a level,
determine metric subspaces with density values above a level,
add obtained metric subspaces to the data model and write it to original file.
dmRemoveMetricSubspaces()
Remove metric subspaces in a data model for a level.
dmRead()
Read a data model and generative data from files.
dmGetLevels()
Get levels for metric subspaces in a data model.
dmGetMetricSubspacesProperties()
Get metric subspace properties in a data model for a level.
dmGetContainedInMetricSubspaces()
Get metric subspaces in a data model in which a data record is contained.
dmPlotMetricSubspaceParameters()
Specify plot parameters for metric subspaces for a level.
dmPlotEvaluateDataSourceParameters()
Specify plot parameters for evaluated data source.
dmPlotMetricSubspaces()
Create an image file containing two-dimensional projections of generative data contained in metric subspaces and evaluated data source.
dmReset()
Reset API.
Werner Mueller
Maintainer: Werner Mueller <[email protected]>
Package 'ganGenerativeData' <https://cran.r-project.org/package=ganGenerativeData>
# Environment used for execution of examples: # Operating system: Ubuntu 22.04.1 # Compiler: g++ 11.3.0 (supports C++17 standard) # R applications: R 4.1.2, RStudio 2022.02.2 # Installed packages: 'Rcpp' 1.0.11, 'tensorflow' 2.11.0, # 'ganGenerativeData' 2.0.2, 'ganDataModel' 1.1.7 # Package 'tensorflow' provides an interface to machine learning framework # TensorFlow. To complete the installation function install_tensorflow() has to # be called. ## Not run: library(tensorflow) install_tensorflow() ## End(Not run) # 1. Prerequisite for building a metric subspaces data model for the iris # dataset: Create a data source, generate generative data and calculate density # values for the iris dataset. # Load library ## Not run: library(ganGenerativeData) ## End(Not run) # Create a data source with passed iris data frame. ## Not run: dsCreateWithDataFrame(iris) ## End(Not run) # Deactivate the column with index 5 and name Species in order to exclude it in # generation of generative data. ## Not run: dsDeactivateColumns(c(5)) ## End(Not run) # Write the data source including settings of active columns to file "ds.bin" in # binary format. ## Not run: dsWrite("ds.bin") ## End(Not run) # Read data source from file "ds.bin", train a generative model in iterative # training steps (used number of iterations in tests is in the range of 10000 to # 50000), write trained generative model and generated data in training steps to # files "gm.bin" and "gd.bin". ## Not run: gdTrain("gm.bin", "gd.bin", "ds.bin", c(1, 2), gdTrainParameters(1000)) ## End(Not run) # Read generative data from file "gd.bin", calculate density values and # write generative data with density values to original file. ## Not run: gdCalculateDensityValues("gd.bin") ## End(Not run) # 2. Build a metric subspaces data model for the iris data set # Load library ## Not run: library(ganDataModel) ## End(Not run) # Read a data source and generative data from files "ds.bin" and "gd.bin", # train a neural network which approximates density values for a data source # in iterative training steps (used number of iterations in tests is in the # range of 250000 to 300000), create a data model containing the trained neural # network and write it to a file "dm.bin" in binary format. ## Not run: dmTrain("dm.bin", "ds.bin", "gd.bin", 10000) ## End(Not run) # Read a data model and generative data from files "dm.bin" and "gd.bin", # build metric subspaces for level 0.7, # add obtained metric subspaces to the data model # and write it to original file. ## Not run: dmBuildMetricSubspaces("dm.bin", 0.67, "gd.bin") ## End(Not run) # Read a data model and generative data from files "dm.bin" and "gd,bin". # Read in data is accessed in function dmPlotMetricSubspaces. ## Not run: dmRead("dm.bin", "gd.bin") ## End(Not run) # Create an image showing a two-dimensional projection of generative data # contained in metric subspaces fpr level 0.67 for column indices 3, 4 and write # it to file "ms.png". ## Not run: dmPlotMetricSubspaces( list(dmPlotMetricSubspaceParameters(level = 0.67, labels = c("*"), percent = 100, boundary = TRUE, color = "red", backgroundPercent = 0, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE)), "msl.png", "Metric Subspaces for the Iris Dataset", c(3, 4), "ds.bin", dmPlotEvaluateDataSourceParameters(0.67)) ## End(Not run) # Read a data model and generative data from files "dm.bin" and "gd.bin", # build metric subspaces for level 0.71, # add obtained metric subspaces to the data model # and write it to original file. ## Not run: dmBuildMetricSubspaces("dm.bin", 0.71, "gd.bin") ## End(Not run) # Read a data model and generative data from files "dm.bin" and "gd,bin". # Read in data is accessed in function dmPlotMetricSubspaces. ## Not run: dmRead("dm.bin", "gd.bin") ## End(Not run) # Create an image showing a two-dimensional projection of generative data # contained in metric subspaces for levels 0.67, 0.71 for column indices 3, 4 # and write it to file "msls.png". ## Not run: dmPlotMetricSubspaces( list(dmPlotMetricSubspaceParameters(level = 0.67, labels = c("*"), percent = 100, boundary = TRUE, color = "red", backgroundPercent = 0, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE), dmPlotMetricSubspaceParameters(level = 0.71, labels = c("*"), percent = 100, boundary = TRUE, color = "green", backgroundPercent = 5, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE)), "msls.png", "Metric Subspaces for the Iris Dataset", c(3, 4), "ds.bin", dmPlotEvaluateDataSourceParameters(0.67)) ## End(Not run)
# Environment used for execution of examples: # Operating system: Ubuntu 22.04.1 # Compiler: g++ 11.3.0 (supports C++17 standard) # R applications: R 4.1.2, RStudio 2022.02.2 # Installed packages: 'Rcpp' 1.0.11, 'tensorflow' 2.11.0, # 'ganGenerativeData' 2.0.2, 'ganDataModel' 1.1.7 # Package 'tensorflow' provides an interface to machine learning framework # TensorFlow. To complete the installation function install_tensorflow() has to # be called. ## Not run: library(tensorflow) install_tensorflow() ## End(Not run) # 1. Prerequisite for building a metric subspaces data model for the iris # dataset: Create a data source, generate generative data and calculate density # values for the iris dataset. # Load library ## Not run: library(ganGenerativeData) ## End(Not run) # Create a data source with passed iris data frame. ## Not run: dsCreateWithDataFrame(iris) ## End(Not run) # Deactivate the column with index 5 and name Species in order to exclude it in # generation of generative data. ## Not run: dsDeactivateColumns(c(5)) ## End(Not run) # Write the data source including settings of active columns to file "ds.bin" in # binary format. ## Not run: dsWrite("ds.bin") ## End(Not run) # Read data source from file "ds.bin", train a generative model in iterative # training steps (used number of iterations in tests is in the range of 10000 to # 50000), write trained generative model and generated data in training steps to # files "gm.bin" and "gd.bin". ## Not run: gdTrain("gm.bin", "gd.bin", "ds.bin", c(1, 2), gdTrainParameters(1000)) ## End(Not run) # Read generative data from file "gd.bin", calculate density values and # write generative data with density values to original file. ## Not run: gdCalculateDensityValues("gd.bin") ## End(Not run) # 2. Build a metric subspaces data model for the iris data set # Load library ## Not run: library(ganDataModel) ## End(Not run) # Read a data source and generative data from files "ds.bin" and "gd.bin", # train a neural network which approximates density values for a data source # in iterative training steps (used number of iterations in tests is in the # range of 250000 to 300000), create a data model containing the trained neural # network and write it to a file "dm.bin" in binary format. ## Not run: dmTrain("dm.bin", "ds.bin", "gd.bin", 10000) ## End(Not run) # Read a data model and generative data from files "dm.bin" and "gd.bin", # build metric subspaces for level 0.7, # add obtained metric subspaces to the data model # and write it to original file. ## Not run: dmBuildMetricSubspaces("dm.bin", 0.67, "gd.bin") ## End(Not run) # Read a data model and generative data from files "dm.bin" and "gd,bin". # Read in data is accessed in function dmPlotMetricSubspaces. ## Not run: dmRead("dm.bin", "gd.bin") ## End(Not run) # Create an image showing a two-dimensional projection of generative data # contained in metric subspaces fpr level 0.67 for column indices 3, 4 and write # it to file "ms.png". ## Not run: dmPlotMetricSubspaces( list(dmPlotMetricSubspaceParameters(level = 0.67, labels = c("*"), percent = 100, boundary = TRUE, color = "red", backgroundPercent = 0, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE)), "msl.png", "Metric Subspaces for the Iris Dataset", c(3, 4), "ds.bin", dmPlotEvaluateDataSourceParameters(0.67)) ## End(Not run) # Read a data model and generative data from files "dm.bin" and "gd.bin", # build metric subspaces for level 0.71, # add obtained metric subspaces to the data model # and write it to original file. ## Not run: dmBuildMetricSubspaces("dm.bin", 0.71, "gd.bin") ## End(Not run) # Read a data model and generative data from files "dm.bin" and "gd,bin". # Read in data is accessed in function dmPlotMetricSubspaces. ## Not run: dmRead("dm.bin", "gd.bin") ## End(Not run) # Create an image showing a two-dimensional projection of generative data # contained in metric subspaces for levels 0.67, 0.71 for column indices 3, 4 # and write it to file "msls.png". ## Not run: dmPlotMetricSubspaces( list(dmPlotMetricSubspaceParameters(level = 0.67, labels = c("*"), percent = 100, boundary = TRUE, color = "red", backgroundPercent = 0, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE), dmPlotMetricSubspaceParameters(level = 0.71, labels = c("*"), percent = 100, boundary = TRUE, color = "green", backgroundPercent = 5, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE)), "msls.png", "Metric Subspaces for the Iris Dataset", c(3, 4), "ds.bin", dmPlotEvaluateDataSourceParameters(0.67)) ## End(Not run)
Read a data model and generative data from files, analyze the contained neural network in the data model for a level, determine metric subspaces with density values above a level, add obtained metric subspaces to the data model and write it to original file.
dmBuildMetricSubspaces(dataModelFileName, level, generativeDataFileName)
dmBuildMetricSubspaces(dataModelFileName, level, generativeDataFileName)
dataModelFileName |
Name of data model file |
level |
Level |
generativeDataFileName |
Name of generative data file |
None
## Not run: dmBuildMetricSubspaces("dm.bin", 0.7, "gd.bin") ## End(Not run)
## Not run: dmBuildMetricSubspaces("dm.bin", 0.7, "gd.bin") ## End(Not run)
Calculate a density value for a data record by evaluating the contained neural network in a data model.
dmCalculateDensityValue(dataRecord)
dmCalculateDensityValue(dataRecord)
dataRecord |
List containing a data record |
Normalized density value
## Not run: dmRead("dm.bin", "gd.bin") dmCalculateDensityValue(list(4.4, 2.9, 1.4, 0.3)) ## End(Not run)
## Not run: dmRead("dm.bin", "gd.bin") dmCalculateDensityValue(list(4.4, 2.9, 1.4, 0.3)) ## End(Not run)
Determine in which metric subspaces in a data model a data record is contained.
dmGetContainedInMetricSubspaces(dataRecord)
dmGetContainedInMetricSubspaces(dataRecord)
dataRecord |
List of a data record |
List of list containing level and label of metric subspaces
## Not run: dmRead("dm.bin", "gd.bin") dmGetContainedInMetricSubspaces(list(4.4, 2.9, 1.4, 0.3)) ## End(Not run)
## Not run: dmRead("dm.bin", "gd.bin") dmGetContainedInMetricSubspaces(list(4.4, 2.9, 1.4, 0.3)) ## End(Not run)
Get levels for metric subspaces in a data model.
dmGetLevels()
dmGetLevels()
Vector of levels
## Not run: dmRead("dm.bin", "gd.bin") dmGetLevels() ## End(Not run)
## Not run: dmRead("dm.bin", "gd.bin") dmGetLevels() ## End(Not run)
Get properties of metric subspaces in a data model for a level.
dmGetMetricSubspaceProperties(level)
dmGetMetricSubspaceProperties(level)
level |
Level for metric subspaces |
List of list containing label and size of contained generative data for metric subspaces
## Not run: dmRead("dm.bin", "gd.bin") dmGetMetricSubspaceProperties(0.73) ## End(Not run)
## Not run: dmRead("dm.bin", "gd.bin") dmGetMetricSubspaceProperties(0.73) ## End(Not run)
Specify plot parameters for evaluated data source passed to dmPlotMetricSubspaces().
dmPlotEvaluateDataSourceParameters(level = 0, color = "blue")
dmPlotEvaluateDataSourceParameters(level = 0, color = "blue")
level |
Level for evaluation |
color |
Color for data points of evaluaded data source |
List of plot parameters for evaluated data source
## Not run: dmPlotEvaluateDataSourceParameters() ## End(Not run)
## Not run: dmPlotEvaluateDataSourceParameters() ## End(Not run)
Specify plot parameters for metric subspaces in a data model for a level. A list of plot parameters is created for different levels and passed to dmPlotMetricSubspaces().
dmPlotMetricSubspaceParameters( level, labels = c("*"), percent = 10, boundary = TRUE, color = "red", backgroundPercent = 0, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE )
dmPlotMetricSubspaceParameters( level, labels = c("*"), percent = 10, boundary = TRUE, color = "red", backgroundPercent = 0, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE )
level |
Level for metric subspaces. |
labels |
Vector of labels for metric subspaces. The default vector contains the wildcard character * which includes all labels. |
percent |
Percent of randomly selected data points of generative data contained in metric subspaces |
boundary |
Boolean value indicating if only data points of metric subspace boundaries should be selected |
color |
Color for data points of generative data contained in metric subspaces |
backgroundPercent |
Percent of randomly selected data points of generative data contained in metric subspaces for background |
backgroundColor |
Color for data points of generative data contained in metric subspaces for background |
backgroundReset |
Before data points for a metric subspace are drawn reset its background. |
plotLabels |
Boolean value indicating if labels for metric subspaces for a level should be displayed |
List of plot parameters for metric subspaces
## Not run: dmPlotMetricSubspaceParameters(0.73) ## End(Not run)
## Not run: dmPlotMetricSubspaceParameters(0.73) ## End(Not run)
Create an image file containing two-dimensional projections of generative data contained in metric subspaces in a data model and optionally an evaluated data source. Plot parameters are passed by a list of generated plot parameters for different levels by dmPlotMetricSubspaceParameters() and by dmPlotEvaluateDataSourceParameters(). Data points are drawn in the order generative data contained in metric subspaces by increasing level and evaluated data source.
dmPlotMetricSubspaces( plotMetricSubspaceParametersList = list(), imageFileName, title, columnIndices, evaluateDataSourceFileName = "", plotEvaluateDataSourceParameters = NULL )
dmPlotMetricSubspaces( plotMetricSubspaceParametersList = list(), imageFileName, title, columnIndices, evaluateDataSourceFileName = "", plotEvaluateDataSourceParameters = NULL )
plotMetricSubspaceParametersList |
List of plot parameters for metric subspaces for different levels, see dmPlotMetricSubspaceParameters(). |
imageFileName |
Name of image file |
title |
Title of image |
columnIndices |
Vector of two column indices that are used for the two-dimensional projection. Indices refer to indices of active columns of the data source used to create the data model. |
evaluateDataSourceFileName |
Name of evaluated data source file |
plotEvaluateDataSourceParameters |
Plot parameters for evaluated data source, see dmPlotEvaluateDataSourceParameters(). |
None
## Not run: dmRead("dm.bin", "gd.bin") dmPlotMetricSubspaces( list(dmPlotMetricSubspaceParameters(level = 0.7, labels = c("*"), percent = 50, boundary = TRUE, color = "red", backgroundPercent = 0, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE)), "ms.png", "Metric Subspaces for the Iris Dataset", c(3, 4), "ds.bin", dmPlotEvaluateDataSourceParameters(0.67)) ## End(Not run)
## Not run: dmRead("dm.bin", "gd.bin") dmPlotMetricSubspaces( list(dmPlotMetricSubspaceParameters(level = 0.7, labels = c("*"), percent = 50, boundary = TRUE, color = "red", backgroundPercent = 0, backgroundColor = "red", backgroundReset = TRUE, plotLabels = TRUE)), "ms.png", "Metric Subspaces for the Iris Dataset", c(3, 4), "ds.bin", dmPlotEvaluateDataSourceParameters(0.67)) ## End(Not run)
Read a data model and generative data from files. This function has to be called before calling API functions when file names for a data model and gernerative data are not passed to functions directly.
dmRead(dataModelFileName, generativeDataFileName)
dmRead(dataModelFileName, generativeDataFileName)
dataModelFileName |
Name of data model file |
generativeDataFileName |
Name of generative data file |
None
## Not run: dmRead("dm.bin", "gd.bin") ## End(Not run)
## Not run: dmRead("dm.bin", "gd.bin") ## End(Not run)
Read a data model from file, remove metric subspaces in the data model for a level and write it to original file.
dmRemoveMetricSubspaces(dataModelFileName, level)
dmRemoveMetricSubspaces(dataModelFileName, level)
dataModelFileName |
Name of data model file |
level |
Level |
None
## Not run: dmRemoveMetricSubspaces("dm.bin", 0.7) ## End(Not run)
## Not run: dmRemoveMetricSubspaces("dm.bin", 0.7) ## End(Not run)
Reset API
dmReset()
dmReset()
None
## Not run: dmReset() ## End(Not run)
## Not run: dmReset() ## End(Not run)
Read a data source and generative data from files, train a neural network which approximates density values for a data source in iterative training steps, create a data model containing the trained neural network and write it to a file in binary format.
dmTrain( dataModelFileName, dataSourceFileName, generativeDataFileName, numberOfIterations )
dmTrain( dataModelFileName, dataSourceFileName, generativeDataFileName, numberOfIterations )
dataModelFileName |
Name of data model file |
dataSourceFileName |
Name of data source file |
generativeDataFileName |
Name of generative data file |
numberOfIterations |
Number of iterations. |
None
## Not run: dmTrain("dm.bin", "ds.bin", "gd.bin", 10000) ## End(Not run)
## Not run: dmTrain("dm.bin", "ds.bin", "gd.bin", 10000) ## End(Not run)