Title: | Transparent and Reproducible Artificial Data Generation |
---|---|
Description: | The main function generateDataset() processes a user-supplied .R file that contains metadata parameters in order to generate actual data. The metadata parameters have to be structured in the form of metadata objects, the format of which is outlined in the package vignette. This approach allows to generate artificial data in a transparent and reproducible manner. |
Authors: | Rainer Dangl [aut, cre] |
Maintainer: | Rainer Dangl <[email protected]> |
License: | GPL-2 |
Version: | 0.9-2 |
Built: | 2024-11-20 06:48:13 UTC |
Source: | CRAN |
Add an empty cluster to a metadata object
addCluster(m)
addCluster(m)
m |
A metadata object |
A metadata object with an empty additional cluster
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) m2 <- addCluster(m)
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) m2 <- addCluster(m)
Performs various consistency checks on a setup file
checkSetup(file)
checkSetup(file)
file |
A .R file with a new simulation setup |
Create a new setup file template
createFileskeleton( newname, mail, inst, author, type = c("metric", "functional", "ordinal", "binary", "randomstring", "wordnet"), infotable = NULL, ref = "Unpublished", codefile = F )
createFileskeleton( newname, mail, inst, author, type = c("metric", "functional", "ordinal", "binary", "randomstring", "wordnet"), infotable = NULL, ref = "Unpublished", codefile = F )
newname |
The name of the new setup (and subsequently the file name) |
mail |
The contact e-mail address of the author |
inst |
The institution of the author |
author |
The full name of the author |
type |
The data type of this setup |
infotable |
The setup summary table |
ref |
The reference to the publication where the setup was used |
codefile |
If functions that are needed for the data generation of the setup are stored in some other .R file, the path can be supplied |
Delete a cluster from a metadata object
deleteCluster(m, clnumber)
deleteCluster(m, clnumber)
m |
A metadata object |
clnumber |
The cluster to delete |
A metadata object
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) m2 <- deleteCluster(m, 2)
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) m2 <- deleteCluster(m, 2)
Generate a dataset from a metadata object
generateData(m)
generateData(m)
m |
A metadata object |
A dataset as specified by the metadata object
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) generateData(m)
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) generateData(m)
Generate a dataset from a metadata object
## S4 method for signature 'metadata.binary' generateData(m)
## S4 method for signature 'metadata.binary' generateData(m)
m |
A metadata object |
A dataset as specified by the metadata object
Generate a dataset from a metadata object
## S4 method for signature 'metadata.functional' generateData(m)
## S4 method for signature 'metadata.functional' generateData(m)
m |
A metadata object |
A dataset as specified by the metadata object
Generate a dataset from a metadata object
## S4 method for signature 'metadata.metric' generateData(m)
## S4 method for signature 'metadata.metric' generateData(m)
m |
A metadata object |
A dataset as specified by the metadata object
Generate a dataset from a metadata object
## S4 method for signature 'metadata.ordinal' generateData(m)
## S4 method for signature 'metadata.ordinal' generateData(m)
m |
A metadata object |
A dataset as specified by the metadata object
Generate a dataset from a metadata object
## S4 method for signature 'metadata.randomstring' generateData(m)
## S4 method for signature 'metadata.randomstring' generateData(m)
m |
A metadata object |
A dataset as specified by the metadata object
Generates a number of datasets from one metadata scenario
generateDatabase( name = NULL, setnr = NULL, draws = 1, seedinfo = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()), metaseedinfo = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()), file = NULL, seedincrement = 1 )
generateDatabase( name = NULL, setnr = NULL, draws = 1, seedinfo = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()), metaseedinfo = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()), file = NULL, seedincrement = 1 )
name |
The path to the setup file |
setnr |
The metadata scenario, as taken from the info table |
draws |
The number of datasets that are drawn from the metadata scenario |
seedinfo |
The random number generator seed parameters |
metaseedinfo |
If necessary, a separate set of random number generator parameters for the metadata (e.g. cluster centers) |
file |
A custom file name for the output database. Defaults to the pattern setupname_setnr_seed.db |
seedincrement |
The random number seed will by default increase by 1 for each draw from the base seed given in seedinfo unless specified otherwise here |
An SQLite database that contains the desired number of data sets drawn from a certain metadata scenario
## Not run: source(system.file("dangl2014.R", package="bdlp")) generateDatabase(name="dangl2014.R", setnr=1, draws=10) unlink("dangl2014_set_1_seed_100.db") ## End(Not run)
## Not run: source(system.file("dangl2014.R", package="bdlp")) generateDatabase(name="dangl2014.R", setnr=1, draws=10) unlink("dangl2014_set_1_seed_100.db") ## End(Not run)
Generates random strings
getRandomstrings( center = NULL, maxdist = NULL, length = nchar(center), n = 1, method = "lv" )
getRandomstrings( center = NULL, maxdist = NULL, length = nchar(center), n = 1, method = "lv" )
center |
Reference string, i.e. the cluster center |
maxdist |
The maximum allowed string distance |
length |
The length of the string |
n |
Number of strings to be generated |
method |
The string distance method used to calculate the string, defaults to Levensthein distance |
A character string
getRandomstrings(center="hello", maxdist = 2, n = 5)
getRandomstrings(center="hello", maxdist = 2, n = 5)
Initialize a new metadata object
initializeObject( type, k, genfunc, seed = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()) )
initializeObject( type, k, genfunc, seed = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()) )
type |
The data type for the new object |
k |
Number of clusters |
genfunc |
The distribution function for data generation |
seed |
The random number seed parameters for the data generation |
A metadata object
require(MASS) initializeObject(type = "metric", k = 3, genfunc = mvrnorm)
require(MASS) initializeObject(type = "metric", k = 3, genfunc = mvrnorm)
A class that represents a metadata object for binary data
A class that represents a metadata object for functional data
A class to represent a metadata object
clusters
A list of cluster information
genfunc
A string specifying a distribution for the random numbers
seedinfo
A list with the parameters for the random number generator
A class that represents a metadata object for metric data
standardization
If standardization is needed, function can be supplied
A class that represents a metadata object for ordinal data
A class that represents a metadata object for string data
3d plot of a metric metadata object
plot3dMetadata(m)
plot3dMetadata(m)
m |
A metadata object (for metric data) |
A 3d plot using function plot3d
from package rgl
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5,4), Sigma=diag(1,3)), c2 = list(n = 25, mu = c(-1,-2,-2), Sigma=diag(1,3))), genfunc = mvrnorm) plot3dMetadata(m)
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5,4), Sigma=diag(1,3)), c2 = list(n = 25, mu = c(-1,-2,-2), Sigma=diag(1,3))), genfunc = mvrnorm) plot3dMetadata(m)
3d plot of a metric metadata object
## S4 method for signature 'metadata.metric' plot3dMetadata(m)
## S4 method for signature 'metadata.metric' plot3dMetadata(m)
m |
A metadata object (for metric data) |
A 3d plot using function plot3d
from package rgl
Plot a metadata object
plotMetadata(m)
plotMetadata(m)
m |
A metadata object |
A plot, created by generating an instance of the dataset from the metadata object
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) plotMetadata(m)
require(MASS) m <- new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) plotMetadata(m)
Plot a metadata object
## S4 method for signature 'metadata.binary' plotMetadata(m)
## S4 method for signature 'metadata.binary' plotMetadata(m)
m |
A metadata object |
A plot, created by generating an instance of the dataset from the metadata object
Plot a metadata object
## S4 method for signature 'metadata.functional' plotMetadata(m)
## S4 method for signature 'metadata.functional' plotMetadata(m)
m |
A metadata object |
A plot, created by generating an instance of the dataset from the metadata object
Plot a metadata object
## S4 method for signature 'metadata.metric' plotMetadata(m)
## S4 method for signature 'metadata.metric' plotMetadata(m)
m |
A metadata object |
A plot, created by generating an instance of the dataset from the metadata object
Plot a metadata object
## S4 method for signature 'metadata.ordinal' plotMetadata(m)
## S4 method for signature 'metadata.ordinal' plotMetadata(m)
m |
A metadata object |
A plot, created by generating an instance of the dataset from the metadata object
Sample grid points for functional data
sampleGrid(total_n, minT, maxT, granularity, regular = FALSE)
sampleGrid(total_n, minT, maxT, granularity, regular = FALSE)
total_n |
Number of Observations |
minT |
Minimum number of time points sampled |
maxT |
Maximum number of time points sampled |
granularity |
Number of possible time points in total |
regular |
If TRUE, maxT time points are sampled at the same time points for each function |
A binary matrix indicating whether the function should be evaluated at a given time point
sampleGrid(total_n = 10, minT = 4, maxT = 10, granularity = 20)
sampleGrid(total_n = 10, minT = 4, maxT = 10, granularity = 20)
Saves a list of metadata objects to a new setup file
saveSetup( name, author, mail, inst, cit = "Unpublished", objects, table, seedinfo = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()), metaseedinfo = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()), custom_funcs = NULL, custom_name = NULL )
saveSetup( name, author, mail, inst, cit = "Unpublished", objects, table, seedinfo = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()), metaseedinfo = list(100, paste(R.version$major, R.version$minor, sep = "."), RNGkind()), custom_funcs = NULL, custom_name = NULL )
name |
The name of the new setup (and thus also the filename) |
author |
Full name of the author |
mail |
Contact e-mail address of the author |
inst |
Institution of the author |
cit |
Reference to the publication where the setup was used, defaults to unpublished |
objects |
List of metadata objects |
table |
Info table for the setup |
seedinfo |
Random number generator parameters for the data sets |
metaseedinfo |
Random number generator parameters for the metadata |
custom_funcs |
Custom functions that are needed to generate the meta(data) |
custom_name |
Custom filename that deviates from the authorYear format |
A .R file that can be processed by create.dataset
require(MASS) a = new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) b = new("metadata.metric", clusters = list(c1 = list(n = 44, mu = c(1,2), Sigma=diag(1,2)), c2 = list(n = 66, mu = c(-5,-6), Sigma=diag(1,2))), genfunc = mvrnorm) ## Not run: saveSetup(name="doe2002.R", author="John Dow", mail="[email protected]", inst="Example University", cit="Simple Data, pp. 23-24", objects=list(a, b), table=data.frame(n = c(50, 110), k = c(2,2), shape = c("spherical", "spherical"))) unlink("doe2002.R") ## End(Not run)
require(MASS) a = new("metadata.metric", clusters = list(c1 = list(n = 25, mu = c(4,5), Sigma=diag(1,2)), c2 = list(n = 25, mu = c(-1,-2), Sigma=diag(1,2))), genfunc = mvrnorm) b = new("metadata.metric", clusters = list(c1 = list(n = 44, mu = c(1,2), Sigma=diag(1,2)), c2 = list(n = 66, mu = c(-5,-6), Sigma=diag(1,2))), genfunc = mvrnorm) ## Not run: saveSetup(name="doe2002.R", author="John Dow", mail="[email protected]", inst="Example University", cit="Simple Data, pp. 23-24", objects=list(a, b), table=data.frame(n = c(50, 110), k = c(2,2), shape = c("spherical", "spherical"))) unlink("doe2002.R") ## End(Not run)
Returns the setup summary
summarizeSetup(name)
summarizeSetup(name)
name |
The name of the setup |
The summary table of name