Introduction to dad

Below is an overview of the data analysis methods provided by the dad package, and a presentation of the type of data manipulated.

For more information on these elements, see: https://journal.r-project.org/archive/2021/RJ-2021-071/index.html

Data under consideration

The dad package provides tools for analysing multi-group data. Such data consist of variables observed on individuals, these individuals being organised into groups (or occasions). Hence, there are three types of objects: groups, individuals and variables.

Implemented methods

For the analysis of such data, a probability density function is associated to each group. Some methods dealing with these functions are implemented:

  • Multidimensional scaling (MDS) of probability density functions: function fmdsd (continuous data) or mdsdd (discrete data)
  • Hierarchical cluster analysis (HCA) of probability density functions: fhclustd (continuous) or hclustdd (discrete)
  • Discriminant analysis (DA) of probability density functions:
    • Computation of the misclassification ratio using the one-leave-out method: fdiscd.misclass (continuous) or discdd.misclass (discrete)
    • Assignment of groups of individuals, one group after another, for which the class is unknown: fdiscd.predict (continuous) or discdd.predict (discrete)

Data organisation

In order to facilitate the work with these multi-group data, the dad package uses objects of class "folder" or "folderh". These objects are lists of data frames having particular formats.

Objects of class folder

Such objects are lists of data frames which have the same column names. Each data frame matches with an occasion (a group of individuals).

An object of class "folder" is created by the functions folder or as.folder (see their help in R).

Example: Ten rosebushes A, B, ā€¦, J were evaluated by 14 assessors, at three sessions, according to several descriptors including their shape Sha, their foliage thickness Den and their symmetry Sym.

library(dad)
data("roses")
x <- roses[, c("Sha", "Den", "Sym", "rose")]
head(x)
##   Sha Den Sym rose
## 1 7.0 6.7 6.7    A
## 2 7.1 7.8 8.1    A
## 3 7.0 6.8 7.4    A
## 4 6.7 4.3 8.1    A
## 5 4.5 7.2 7.8    A
## 6 6.0 7.2 6.1    A

Coerce these data into an object of class "folder":

rosesf <- as.folder(x, groups = "rose")
print(rosesf, max = 9)
## $A
##   Sha Den Sym
## 1 7.0 6.7 6.7
## 2 7.1 7.8 8.1
## 3 7.0 6.8 7.4
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## $B
##    Sha Den Sym
## 43 8.1 7.7 3.0
## 44 8.6 5.9 6.7
## 45 7.7 6.7 7.4
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## $C
##    Sha Den Sym
## 85 0.7 9.3 1.4
## 86 2.3 7.7 2.4
## 87 3.6 7.9 7.2
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## $D
##     Sha Den Sym
## 127 9.2 1.8 9.0
## 128 9.0 2.3 9.2
## 129 6.9 2.6 7.6
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## $E
##     Sha Den Sym
## 169 5.6 1.7 8.2
## 170 7.5 3.4 8.6
## 171 5.8 3.9 5.8
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## $F
##     Sha Den Sym
## 211 8.3 8.0 6.5
## 212 8.4 7.8 3.3
## 213 9.2 8.2 7.6
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## $G
##     Sha Den Sym
## 253 8.6 2.0 5.4
## 254 8.5 2.3 7.9
## 255 7.6 3.5 7.1
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## $H
##     Sha Den Sym
## 295 6.5 4.3 2.6
## 296 6.6 2.9 2.9
## 297 8.4 5.1 6.4
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## $I
##     Sha Den Sym
## 337 4.9 6.5 7.6
## 338 5.8 6.6 7.9
## 339 4.3 5.6 6.0
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## $J
##     Sha Den Sym
## 379 4.9 5.2 8.9
## 380 4.6 8.1 8.6
## 381 3.5 7.8 7.4
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
## 
## attr(,"class")
## [1] "folder"
## attr(,"same.rows")
## [1] FALSE

Objects of class folderh

Objects of class "folderh" can be used to avoid redundancies in the data.

In the most useful case, such objects are hierarchical lists of two data frames df1 and df2 related by means of a key which describes the ā€œ1 to Nā€ relationship between the data frames.

They are created by the function folderh (see its help in R for the case of three data frames or more).

Example: Data about 5 rosebushes (roseflowers$variety). For each rosebush, measures on several flowers (roseflowers$flower).

library(dad)
data(roseflowers)
df1 <- roseflowers$variety
df2 <- roseflowers$flower

Build an object of class "folderh":

fh1 <- folderh(df1, "rose", df2)
print(fh1)
## $df1
##         place rose variety
## 34   outdoors   34      v1
## 40   outdoors   40      v4
## 60   outdoors   60      v3
## 66 glasshouse   66      v3
## 68 glasshouse   68      v4
## 
## $df2
##    rose numflower diameter height nleaves
## 1    34         1     94.5   57.0       8
## 2    34         2     89.5   54.0      10
## 3    40         1     57.0   21.5       9
## 4    40         2     52.5   20.5       5
## 5    40         3     51.5   14.0       7
## 6    60         1     53.0   23.0       4
## 7    60         2     52.0   24.5       9
## 8    66         1     35.0    9.5       4
## 9    66         2     35.0   14.0       6
## 10   66         3     36.0   13.5       7
## 11   68         1     45.5   19.5      10
## 
## attr(,"class")
## [1] "folderh"
## attr(,"keys")
## [1] "rose"