--- title: "Introduction to dad" author: "Pierre Santagostini, Rachid Boumaza" date: "2023-08-29" output: rmarkdown::html_vignette: toc: true toc_depth: 2 # number_sections: true vignette: > %\VignetteIndexEntry{Introduction to dad} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` Below is an overview of the data analysis methods provided by the dad package, and a presentation of the type of data manipulated. For more information on these elements, see: [https://journal.r-project.org/archive/2021/RJ-2021-071/index.html](https://doi.org/10.32614/RJ-2021-071) ## Data under consideration The **dad** package provides tools for analysing multi-group data. Such data consist of variables observed on individuals, these individuals being organised into groups (or occasions). Hence, there are three types of objects: groups, individuals and variables. ## Implemented methods For the analysis of such data, a probability density function is associated to each group. Some methods dealing with these functions are implemented: * **Multidimensional scaling (MDS) of probability density functions**: function `fmdsd` (continuous data) or `mdsdd` (discrete data) * **Hierarchical cluster analysis (HCA) of probability density functions**: `fhclustd` (continuous) or `hclustdd` (discrete) * **Discriminant analysis (DA) of probability density functions**: + Computation of the misclassification ratio using the one-leave-out method: `fdiscd.misclass` (continuous) or `discdd.misclass` (discrete) + Assignment of groups of individuals, one group after another, for which the class is unknown: `fdiscd.predict` (continuous) or `discdd.predict` (discrete) ## Data organisation In order to facilitate the work with these multi-group data, the **dad** package uses objects of class `"folder"` or `"folderh"`. These objects are lists of data frames having particular formats. ### Objects of class `folder` Such objects are lists of data frames which have the same column names. Each data frame matches with an occasion (a group of individuals). An object of class `"folder"` is created by the functions `folder` or `as.folder` (see their help in R). **Example:** Ten rosebushes $A$, $B$, $\dots$, $J$ were evaluated by 14 assessors, at three sessions, according to several descriptors including their shape `Sha`, their foliage thickness `Den` and their symmetry `Sym`. ```{r} library(dad) data("roses") x <- roses[, c("Sha", "Den", "Sym", "rose")] head(x) ``` Coerce these data into an object of class `"folder"`: ```{r} rosesf <- as.folder(x, groups = "rose") print(rosesf, max = 9) ``` ### Objects of class `folderh` Objects of class `"folderh"` can be used to avoid redundancies in the data. In the most useful case, such objects are hierarchical lists of two data frames `df1` and `df2` related by means of a key which describes the "1 to N" relationship between the data frames. They are created by the function `folderh` (see its help in R for the case of three data frames or more). **Example:** Data about 5 rosebushes (`roseflowers$variety`). For each rosebush, measures on several flowers (`roseflowers$flower`). ```{r} library(dad) data(roseflowers) df1 <- roseflowers$variety df2 <- roseflowers$flower ``` Build an object of class `"folderh"`: ```{r} fh1 <- folderh(df1, "rose", df2) print(fh1) ```