Title: | Generate Simulated Sawn Timber Strength Grading Data |
---|---|
Description: | Tools for generating simulated sawn timber strength grading data with a main focus on statistical simulation based on covariance matrices. Simulation data for Norway spruce sawn timber from Austria and reference values of means and standard deviations of grade determining properties from literature for a number of European countries are provided, as well. |
Authors: | Andreas Weidenhiller [cre, aut] , Anton Wegscheider [aut] |
Maintainer: | Andreas Weidenhiller <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.1 |
Built: | 2024-11-29 08:49:36 UTC |
Source: | CRAN |
Means and standard deviations of grade determining properties (GDPs) from literature
gdp_data
gdp_data
Wood species as a four letter code according to EN 13556. Currently, this is always "PCAB" for Norway spruce (Picea abies).
Kind of destructive testing applied to the material – "t" for material tested in tension, "be" for material tested in bending.
Research project from which the data originated; "null"
if unknown or not applicable.
Country from which the material originated, as a two letter country code.
Number of pieces on which the values are based.
Mean and standard deviation of strength, in N/mm².
Mean and standard deviation of the static modulus of elasticity, in N/mm².
Mean and standard deviation of density, in kg/m³.
Reference to the literature source; "null"
if not
published yet.
For distinguishing multiple rows with the same species,
loadtype and country – if there are no duplicates, it is the same as
country
; if there are duplicates, it is country
plus a
suffixed number separated by "_".
For simulation of an entire dataset with different subsamples with different
characteristics (see simulate_dataset
),
it may be useful to be able to refer to existing results
from literature as a basis.
In the dataset gdp_data
, means and standard deviations for a number of
such subsamples have been collected.
The GDP values collected in gdp_data
were selected from
publications which aimed at representative sampling within the respective
countries.
All the same, care must be taken when using these values,
due to the natural high variability of timber properties.
The values have been extracted from the following publications:
Ranta-Maunus, Alpo, Julia K. Denzler, and Peter Stapel. 2011. Strength of European Timber. Part 2. Properties of Spruce and Pine Tested in Gradewood Project. VTT.
Rohanová, Alena, and Erika Nunez. 2014. "Prediction Models of Slovakian Structural Timber." Wood Research 59 (5): 757–69.
Stapel, Peter, and Jan-Willem G. van de Kuilen. 2014. “Efficiency of Visual Strength Grading of Timber with Respect to Origin, Species, Cross Section, and Grading Rules: A Critical Evaluation of the Common Standards.” Holzforschung 68 (2): 203–16.
In the WoodSimulatR
package, means and standard deviations of grade
determining properties (GDPs) for a number of Norway spruce
(Picea abies) samples from literature are stored for use in
simulate_dataset
. They are indexed by a two-letter country code
(and a suffixed number if disambiguation is required).
get_subsample_definitions(country = NULL, loadtype = "t", species = "PCAB")
get_subsample_definitions(country = NULL, loadtype = "t", species = "PCAB")
country |
Can be either the number of desired samples, or a
named vector of relative subsample sizes where the names can be
abbreviations of country names. Alternatively, |
loadtype |
Can be either |
species |
A species code according to EN 13556:2003. Currently, only 'PCAB' (Picea abies = Norway spruce) is supported. |
The direct descriptive data can also be directly accessed
(gdp_data
).
The present function is meant to prepare the data
as input to the subsets
argument of simulate_dataset
.
It allows picking multiple samples from the same country (disambiguating by
creating appropriately named entries in the column subsample
) and
creating random sample data (uniformly distributed within the
range of values given in the full dataset gdp_data
for the respective loadtype
and species
) for sample names not
found in this dataset.
The dataset gdp_data
contains a column share
which gives
the number of pieces in the original sample. Unless relative subsample sizes
are explicitly asked for by providing a named numeric vector for the
argument country
, the present function always resets share
to
1, prompting simulate_dataset
to create
(approximately) equal-sized subsamples.
The GDPs depend on the type of destructive testing done
(loadtype
) – therefore, giving the proper loadtype
is
required for realistic values.
If country
is NULL
(or omitted), the full dataset
gdp_data
for the
respective loadtype
(and species
) is returned.
For sample names not contained in the internal list, a warning is issued
and random sample data is returned (uniformly distributed within the
range of values given in the full table for the respective loadtype
and species
).
If country
is just a number (and not a named vector), also random
sample data is returned; the different "countries" are then named "C1", "C2"
and so on.
A data frame with country and subsample names, relative subsample sizes and some meta-information like project and literature references, as well as mean strength and standard deviation of strength, static modulus of elasticity and density.
The GDP values collected in gdp_data
were selected from
publications which aimed at representative sampling within the respective
countries.
All the same, care must be taken when using these values,
due to the natural high variability of timber properties.
# get all subsample data for loadtype bending, or tension get_subsample_definitions() get_subsample_definitions(loadtype='be') # get six random samples, explicitly state loadtype tension get_subsample_definitions(country=6, loadtype='t') # get subsample data for the German tension sample in different ways get_subsample_definitions(country='de', loadtype='t') get_subsample_definitions(country=c(de=1), loadtype='t') get_subsample_definitions(country=c(de=6), loadtype='t') # bending samples from Sweden (both samples), Poland, and France, equally # weighted get_subsample_definitions(c('se', 'se_1', 'pl', 'fr')) get_subsample_definitions(c(se=1, se_1=1, pl=1, fr=1)) get_subsample_definitions(c(se=5, se_1=5, pl=5, fr=5)) # four tension samples from Romania, two from Ukraine and one from Slovakia, # weighted so that each country contributes equally get_subsample_definitions(c(ro=1, ro=1, ro=1, ro=1, ua=2, ua=2, sk=4), loadtype='t') # non-existant subsample names get replaced by random values (which are based # on the range of stored values for the respective loadtype) get_subsample_definitions(c('xx', 'yy', 'zz')) get_subsample_definitions(c('xx', 'yy', 'zz'), loadtype='t') # subsample names are case-sensitive! get_subsample_definitions(c('at', 'aT', 'At', 'AT'), loadtype='t')
# get all subsample data for loadtype bending, or tension get_subsample_definitions() get_subsample_definitions(loadtype='be') # get six random samples, explicitly state loadtype tension get_subsample_definitions(country=6, loadtype='t') # get subsample data for the German tension sample in different ways get_subsample_definitions(country='de', loadtype='t') get_subsample_definitions(country=c(de=1), loadtype='t') get_subsample_definitions(country=c(de=6), loadtype='t') # bending samples from Sweden (both samples), Poland, and France, equally # weighted get_subsample_definitions(c('se', 'se_1', 'pl', 'fr')) get_subsample_definitions(c(se=1, se_1=1, pl=1, fr=1)) get_subsample_definitions(c(se=5, se_1=5, pl=5, fr=5)) # four tension samples from Romania, two from Ukraine and one from Slovakia, # weighted so that each country contributes equally get_subsample_definitions(c(ro=1, ro=1, ro=1, ro=1, ua=2, ua=2, sk=4), loadtype='t') # non-existant subsample names get replaced by random values (which are based # on the range of stored values for the respective loadtype) get_subsample_definitions(c('xx', 'yy', 'zz')) get_subsample_definitions(c('xx', 'yy', 'zz'), loadtype='t') # subsample names are case-sensitive! get_subsample_definitions(c('at', 'aT', 'At', 'AT'), loadtype='t')
The function simbase_covar
allows the specification of a
transform for one or more variables. The present function creates short
names for such transforms for use in labelling (by default, the labelling is
done by simbase_labeler
).
get_transform_names( transforms, prefer_primitive = c("if_shorter", "never", "always") )
get_transform_names( transforms, prefer_primitive = c("if_shorter", "never", "always") )
transforms |
A named list of objects of class |
prefer_primitive |
If "never", the function always returns the value of
the field |
The label of a transform could be the value of the field name
from
each object of class trans
(or transform
),
but also the name of the transform
function itself, if it is a primitive function or just calls one function.
Each object of class trans
(or transform
)
should have a field name
which can be returned by the present function.
The function examines the field transform
.
If this field contains a primitive function (see is_primitive
),
or if there is just one function call in the body of this transform
function, we can also return the name of this called function.
If there is no field name
and no single function is called from the
function defined in the field transform
,
a generic function name "f."
is returned.
A named vector of transforms names.
get_transform_names(list(a = scales::log_trans(), b = scales::boxcox_trans(0))); get_transform_names(list(x = list(name = 'a very long name', transform = log, inverse = exp)))
get_transform_names(list(a = scales::log_trans(), b = scales::boxcox_trans(0))); get_transform_names(list(x = list(name = 'a very long name', transform = log, inverse = exp)))
Predefined simbases in WoodSimulatR
ws_t ws_t_tr ws_t_te ws_t_logf ws_t_tr_logf ws_t_te_logf ws_be ws_be_tr ws_be_te ws_be_logf ws_be_tr_logf ws_be_te_logf
ws_t ws_t_tr ws_t_te ws_t_logf ws_t_tr_logf ws_t_te_logf ws_be ws_be_tr ws_be_te ws_be_logf ws_be_tr_logf ws_be_te_logf
For statistical simulation of datasets in WoodSimulatR
, one
needs a simbase_covar
object. WoodSimulatR
contains a
set of such predefined simbases for Norway spruce (Picea abies) grown
in Austria.
The names of the simbases follow the following schema – the different parts are separated by "_":
"ws" – abbreviation of "WoodSimulatR simbase"
loadtype – can either be "t" for material tested in tension, or "be" for material tested in bending
subsample – empty for the full dataset, "tr" for the part of the dataset that was used for training, "te" for the part that was used for testing. The latter two can be used to more closely simulate independent training and test samples
transformation – empty for no transformation, "logf" if the strength
has been log-transformed prior to calculation of the simbase – see also
the argument transforms
in simbase_covar
.
The simbases contain the basis for simulating the following variables:
Bending or tension strength, in N/mm².
Static modulus of elasticity in bending or tension, in N/mm².
Density of a small clear sample, in kg/m³.
Dynamic modulus of elasticity of the timber after drying to a moisture content of about 12%, in N/mm².
Dynamic modulus of elasticity of the timber in the green state, with moisture contents mostly above fibre saturation point, in N/mm².
An "indicating property" (IP) for density, established by measuring the weight of each board and dividing by its volume, in kg/m³.
An "indicating property" (IP) for strength, established by
linear regression on E_dyn
, ip_rho
and a knot parameter
called "total knot area ratio" (tKAR), in N/mm².
The simbases were created based on data from the research project SiOSiP of Holzforschung Austria. "SiOSiP" is short for "simulation-based optimization of sawn timber production" and ran from 2014 to 2017.
Given the covariance matrix and the means of a set of variables, we can
simulate not only the distribution of the variables, but also their
correlations. The present function calculates the basic values required for
the simulation and returns them packed into an object of class
simbase_covar
.
simbase_covar( data, variables = NULL, transforms = list(), label = simbase_labeler, ... )
simbase_covar( data, variables = NULL, transforms = list(), label = simbase_labeler, ... )
data |
The dataset for the calculation of the reference data for
simulation; for grouped datasets (see |
variables |
Character vector containing the names in |
transforms |
A named list of objects of class |
label |
Either a string describing the data and the simulation approach,
or a labelling function which returns a label string and takes as input
the data, a string giving the class
of the simbase object (here |
... |
Arguments to be passed on to |
If some of the variables are non-normally distributed, a transform may improve the prediction. The transforms are passed to the function as a named list, where the name of a list entry must correspond to the name of the variable in the data which is to be transformed.
Predefined transforms can be found in the package scales
, where they are
used for axis transformations as a preparation for plotting. The package
scales
also contains a function trans_new
which can be used
to define new transforms.
In the context of destructively measured sawn timber properties, the type of
destructive test applied is of interest. If the dataset data
contains a
variable loadtype
which consistently throughout the dataset has either the
value "t" (i.e. all sawn timber has been tested in tension) or the
value "be" (i.e. all sawn timber has been tested in bending, edgewise),
then the returned object also has a field loadtype
with that value.
One can also calculate a simbase under the assumption that the correlations
are different for different subgroups of the data. This is done by grouping
the dataset data
prior to passing it to the function,
using group_by
. In this case, several objects of
class simbase_covar
are created and joined together in a tibble
–
see also simbase_list
.
An S3
object of class simbase_list
if data
is grouped,
and an object of class simbase_covar
otherwise.
# obtain a dataset for demonstration dataset <- simulate_dataset(); # calculate a simbase without transforms simbase_covar(dataset, c('f', 'E', 'rho', 'E_dyn')); # calculate a simbase with log-transformed f simbase_covar(dataset, c('f', 'E', 'rho', 'E_dyn'), list(f = scales::log_trans())); # if we group the dataset, we get a simbase_list object simbase_covar(dplyr::group_by(dataset, country), c('f', 'E', 'rho', 'E_dyn'));
# obtain a dataset for demonstration dataset <- simulate_dataset(); # calculate a simbase without transforms simbase_covar(dataset, c('f', 'E', 'rho', 'E_dyn')); # calculate a simbase with log-transformed f simbase_covar(dataset, c('f', 'E', 'rho', 'E_dyn'), list(f = scales::log_trans())); # if we group the dataset, we get a simbase_list object simbase_covar(dplyr::group_by(dataset, country), c('f', 'E', 'rho', 'E_dyn'));
Each simbase object should have a label which can be used for differentiating different simulations. This function tries to simplify the label generation.
simbase_labeler(data, simbase_class, transforms)
simbase_labeler(data, simbase_class, transforms)
data |
The dataset for the calculation of the basic simulation data. |
simbase_class |
The class of the simbase object for which the label is
to be generated. Currently, only |
transforms |
The transforms applied to variables in the dataset.
Must be objects of class |
Primarily, this function is intended to be called as a default from
simbase_covar
. It can also serve as a template for creating
custom labelling functions.
A string for labelling a simbase object.
simbase_*
functions for grouped dataIf a function of the simbase_*
family encounters grouped data
(as caused by group_by
), it should invoke simbase_list
to create a collection of separate simbases for each group.
simbase_list(data, simbase_constructor, ..., suffix = "_lst")
simbase_list(data, simbase_constructor, ..., suffix = "_lst")
data |
A grouped dataset (see |
simbase_constructor |
A function which returns a |
... |
Further arguments passed to the |
suffix |
Suffix to be added to the individual simbase labels if they are all the same (see details). |
A simbase_list
object; this is essentially a
tibble
with the grouping columns of data
and a column
.simbase
which contains the simbase_*
objects.
Currently, the "simbase_*
family" only consists of
simbase_covar
(although, in a broader sense,
simbase_list
can also be thought to be part of this "family").
It is planned to add further simulation types in a later release.
The functions of the simbase_*
family support label
generation (see e.g. simbase_covar
). These functions should
generate the label before invoking simbase_list
, so that there
is a common label for all of the simbases; simbase_list
adds a suffix
suffix
. A warning is issued if the labels of the different simbases
are not all equal; no suffix is added in this case.
simbase_*
objectAdd simulated values to a dataset conditionally, based on a simbase_*
object
simulate_conditionally(data, simbase, force_positive = TRUE, ...)
simulate_conditionally(data, simbase, force_positive = TRUE, ...)
data |
The dataset where simulated values are added to.
The dataset has to contain at least one variable which is also included in
the |
simbase |
Basic data object for the simulation, as calculated e.g.
by |
force_positive |
If |
... |
further arguments passed to or from other methods. |
Given a simbase_*
object, this function adds simulated values to a
dataset, conditional on the values of some of the variables already
contained in the dataset.
The modified dataset data
with simulated values.
# add simulated tension data based on a simbase stored in WoodSimulatR dataset <- data.frame(E_dyn = rnorm(n = 100, mean = 12500, sd = 2200)); dataset_t <- simulate_conditionally(dataset, ws_t) # add simulated bending data dataset_be <- simulate_conditionally(dataset, ws_be)
# add simulated tension data based on a simbase stored in WoodSimulatR dataset <- data.frame(E_dyn = rnorm(n = 100, mean = 12500, sd = 2200)); dataset_t <- simulate_conditionally(dataset, ws_t) # add simulated bending data dataset_be <- simulate_conditionally(dataset, ws_be)
simbase_list
objectAdd simulated values to a dataset conditionally, based on a simbase_list
object
## S3 method for class 'simbase_list' simulate_conditionally( data, simbase, force_positive = TRUE, ..., error_when_groups_missing = TRUE )
## S3 method for class 'simbase_list' simulate_conditionally( data, simbase, force_positive = TRUE, ..., error_when_groups_missing = TRUE )
data |
The dataset where simulated values are added to. |
simbase |
Basic data object for the simulation, as calculated by
|
force_positive |
If |
... |
further arguments passed to or from other methods. |
error_when_groups_missing |
Whether to raise an error if for a certain
value combination in the grouping variables no dedicated |
Simulating values based on a simbase_list
object
has some special aspects compared to that of other simbase_*
objects,
(see simulate_conditionally
).
In particular, a simbase_list
object stores simbase
s
for specific value combinations within the grouping variables.
These grouping variables must also be present in data
.
If there is a value combination in these grouping variables for which no
dedicated simbase
object exists, this will lead to NA
values
in the columns to be simulated and either to an error
(if error_when_groups_missing = TRUE
) or to a warning.
Due to the internal call to nest
and subsequent call to
unnest
, the returned dataset will be ordered according to
the grouping variables in the simbase, with any grouping variable
combinations missing in the simbase coming last.
The modified dataset data
with simulated values.
# create a simbase_list object for the values of subsets = c('AT', 'DE') dataset_0 <- simulate_dataset(subsets = c('AT', 'DE')); simbase <- simbase_covar(dplyr::group_by(dataset_0, country), c('f', 'E', 'E_dyn')); # simulate on another dataset dataset <- data.frame(E_dyn = rnorm(n = 100, mean = 12500, sd = 2200), country = 'AT'); dataset_1 <- simulate_conditionally(dataset, simbase); head(dataset_1); # warning if for some value of country we don't have an entry in the simbase dataset$country <- 'CH'; dataset_2 <- simulate_conditionally(dataset, simbase, error_when_groups_missing = FALSE); head(dataset_2);
# create a simbase_list object for the values of subsets = c('AT', 'DE') dataset_0 <- simulate_dataset(subsets = c('AT', 'DE')); simbase <- simbase_covar(dplyr::group_by(dataset_0, country), c('f', 'E', 'E_dyn')); # simulate on another dataset dataset <- data.frame(E_dyn = rnorm(n = 100, mean = 12500, sd = 2200), country = 'AT'); dataset_1 <- simulate_conditionally(dataset, simbase); head(dataset_1); # warning if for some value of country we don't have an entry in the simbase dataset$country <- 'CH'; dataset_2 <- simulate_conditionally(dataset, simbase, error_when_groups_missing = FALSE); head(dataset_2);
Generate an artificial dataset with correlated variables and defined means and standard deviations.
simulate_dataset( n = 5000, subsets = 4, random_seed = NULL, simbase = WoodSimulatR::ws_t_logf, loadtype = NULL, ..., RNGversion = "3.6.0" )
simulate_dataset( n = 5000, subsets = 4, random_seed = NULL, simbase = WoodSimulatR::ws_t_logf, loadtype = NULL, ..., RNGversion = "3.6.0" )
n |
Number of rows in the dataset |
subsets |
Either |
random_seed |
Allows to set an integer seed value for the random number
generator to achieve reproducible results
(see also |
simbase |
An object of class |
loadtype |
For passing on to |
... |
arguments passed on to |
RNGversion |
In |
In the package WoodSimulatR, a number of predefined base values for simulation
are stored – see simbase
.
Using a character vector for the argument subsets
leads to subsets
as equal in size as possible.
The argument subsets
enables differing means and standard deviations
for different subsamples. There are several possible usages:
If subsets = NULL
, the information about means and standard
deviations is taken from the simbase
. There can still be different
means and standard deviations if simbase
is an object of class
simbase_list
.
If a numeric vector or a character vector, it is used as argument
country
in an internal call to get_subsample_definitions
.
If a dataset, there are the following requirements:
identifier columns: The dataset has to have one or more
discrete-valued identifier columns (usually character vectors or
factors) which uniquely identify each row.
These identifier columns are named "country"
and
"subsample"
in the standard case as yielded by
get_subsample_definitions
.
In the general case, the identifier columns are detected as those
columns which are not named share, species, loadtype
or
literature
and which do not end in _mean
or _sd
.
If the argument simbase
is of class simbase_list
,
further restrictions apply (see below).
means and standard deviations: For at least one of the
variables defined in the simbase
, also the mean and the
standard deviation need to be given in each row; the column names for
this data must be the name of the respective variable(s)
from the simbase
, suffixed by _mean
and _sd
,
respectively.
optional: A column share
can be used to create
subsamples of different sizes proportional to the values in
share
.
The argument simbase
can be either an object of class
simbase_covar
or of class simbase_list
.
various predefined simbase_covar
objects are available
in WoodSimulatR
– see simbase
.
for objects of class simbase_list
, additional
restrictions apply:
the object may only have grouping variable(s) which are also
identifier columns according to the subsets
definition
above – if the subsets
argument is not a data frame,
the identifier columns are "country" and "subsample".
The value combinations in the identifier columns have to
match those which the subsets
argument leads to
(see also get_subsample_definitions
).
Both the means and standard deviations in the subsample definitions
(see get_subsample_definitions
) as well as the values in the
simbase
depend on the way the destructive testing of the sawn timber was
done. If the simbase
has a field loadtype
(see also simbase_covar
), this value is used in the call to
get_subsample_definitions
. Otherwise, the loadtype
has to be
passed directly to the present function unless no call to
get_subsample_definitions
is necessary (this depends on the
value of subsets
– see above). If a loadtype has been defined, a variable
loadtype
is also created in the resulting dataset for reference.
Negative values in any numeric column of the result dataset are forced to zero.
If random_seed
is not NULL
, reproducibility of results
is enforced by using set.seed
with arguments
kind='Mersenne-Twister'
and normal.kind='Inversion'
,
and by calling RNGversion
with argument RNGversion
.
If random_seed
is not NULL
, the random number generator
is reset at the end of the function using set.seed(NULL)
and
RNGversion(toString(getRversion()))
.
simulate_dataset(n = 10, subsets = 1, random_seed = 1) # As the loadtype is defined in the simbase, the argument loadtype is ignored # with a warning simulate_dataset(n = 10, subsets = 1, random_seed = 1, loadtype = 'be') # Two subsamples simulate_dataset(n = 10, subsets = 2, random_seed = 1) # Two subsamples from pre-defined countries simulate_dataset(n = 10, subsets = c('at', 'de'), random_seed = 1) # Two subsamples from pre-defined countries with different sample sizes simulate_dataset(n = 10, subsets = c(at = 3, de = 2), random_seed = 1)
simulate_dataset(n = 10, subsets = 1, random_seed = 1) # As the loadtype is defined in the simbase, the argument loadtype is ignored # with a warning simulate_dataset(n = 10, subsets = 1, random_seed = 1, loadtype = 'be') # Two subsamples simulate_dataset(n = 10, subsets = 2, random_seed = 1) # Two subsamples from pre-defined countries simulate_dataset(n = 10, subsets = c('at', 'de'), random_seed = 1) # Two subsamples from pre-defined countries with different sample sizes simulate_dataset(n = 10, subsets = c(at = 3, de = 2), random_seed = 1)