Package 'NaileR'

Title: Interpreting Latent Variables with AI
Description: A small package designed for interpreting continuous and categorical latent variables. You provide a data set with a latent variable you want to understand and some other explanatory variables. It provides a description of the latent variable based on the explanatory variables. It also provides a name to the latent variable.
Authors: Nel Hervé [aut], Sébastien Lê [aut, cre]
Maintainer: Sébastien Lê <[email protected]>
License: GPL (>= 2)
Version: 1.2.1
Built: 2024-12-12 06:45:41 UTC
Source: CRAN

Help Index


Agribusiness studies survey

Description

These data were collected after a Q-method-like survey on students' expectations of agribusiness studies. Participants had to rank how much they agreed with 38 statements about possible benefits from agribusiness studies; then, they were asked personal questions.

Usage

agri_studies

Format

A data frame with 53 rows (participants) and 42 columns (questions):

  • columns 1-38: statements about agribusiness studies

  • columns 39-42: personal information

Source

Juliette LE COLLONNIER and Lou ROBERT, students at l'Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
data(agri_studies)

res_mca_agri <- FactoMineR::MCA(agri_studies, quali.sup = 39:42,
level.ventil = 0.05, graph = FALSE)
agri_work <- res_mca_agri$ind$coord |> as.data.frame()
agri_work <- agri_work[,1] |> cbind(agri_studies)

intro_agri <- "These data were collected after a survey
on students' expectations of agribusiness studies.
Participants had to rank how much they agreed with 38 statements
about possible benefits from agribusiness studies;
then, they were asked personal questions."
intro_agri <- gsub('\n', ' ', intro_agri) |>
stringr::str_squish()

res_agri <- nail_condes(agri_work, num.var = 1,
introduction = intro_agri)
cat(res_agri$response)

## End(Not run)

Atomic habits survey

Description

People think they need to make big changes to change the course of their lives. But in James Clear's book, Atomic Habits, they will discover that the smallest of changes, coupled with a good knowledge of psychology and neuroscience, can have a revolutionary effect on their lives and relationships. To understand this concept of atomic habits, we interviewed 167 people and asked them if they were able to never take their car alone again, to buy local products... We also asked them how restrictive they found this and why.

Usage

atomic_habit

Format

A data frame with 167 rows and 50 columns:

  • columns 1-10, do you feel able to...

  • columns 11-20, from 0 to 5 how restrictive...

  • columns 21-30, is it restrictive, yes or no...

  • columns 31-40, justify your answers

  • columns 41-50, a combination of able and restrictive

Source

Applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(FactoMineR)
library(NaileR)
data(atomic_habit)

res_mfa <- MFA(atomic_habit[,1:30],
              group = c(10,10,10),
              type = c("n","s","n"),
              num.group.sup = 3,
              name.group = c("capable","restrictive", "restrictive binary"),
              graph = FALSE)

plot.MFA(res_mfa, choix = "ind", invisible = c("quali","quali.sup"),
        lab.ind = FALSE,
        title = "MFA based on being capable and restrictiveness data")

res_hcpc <- HCPC(res_mfa, nb.clust = 3, graph = FALSE)
plot.HCPC(res_hcpc, choice = "map",
         draw.tree = FALSE,
         ind.names = FALSE,
         title = "Atomic habits - typology")
summary(res_hcpc$data.clust)

## End(Not run)

Atomic habits survey

Description

People think they need to make big changes to change the course of their lives. But in James Clear's book, Atomic Habits, they will discover that the smallest of changes, coupled with a good knowledge of psychology and neuroscience, can have a revolutionary effect on their lives and relationships. To understand this concept of atomic habits, we interviewed 167 people and asked them if they were able to never take their car alone again, to buy local products... We also asked them how restrictive they found this and why.

Usage

atomic_habit_clust

Format

A data frame with 167 rows and 51 columns:

  • columns 1-10, do you feel able to...

  • columns 11-20, from 0 to 5 how restrictive...

  • columns 21-30, is it restrictive, yes or no...

  • columns 31-40, justify your answers

  • columns 41-50, a combination of able and restrictive

  • column 51, cluster variable based on MFA (20 first variables)

Source

Applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(FactoMineR)
library(NaileR)
data(atomic_habit_clust)

catdes(atomic_habit_clust, num.var = 51)

## End(Not run)

Beard descriptions

Description

These data refer to 8 types of beards. Each beard was evaluated by 62 assessors (except beard 8 which only had 60 evaluations).

Usage

beard

Format

A data frame with 494 rows and 2 columns:

  • the types of beards;

  • the words used to describe them.

Source

Applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

data(beard)
beard[1:8,]

## End(Not run)

Beard descriptions

Description

These data refer to 8 types of beards. Each beard was evaluated by 62 assessors (except beard 8 which only had 60 evaluations).

Usage

beard_cont

Format

A contingency table (data frame) with 8 rows and 337 columns:

  • rows are the types of beards;

  • columns are the words used at least once to describe them.

Source

Applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
data(beard_cont)

FactoMineR::descfreq(beard_cont)

intro_beard <- 'A survey was conducted about beards
and 8 types of beards were described.
In the data that follow, beards are named B1 to B8.'
intro_beard <- gsub('\n', ' ', intro_beard) |>
stringr::str_squish()

req_beard <- 'Please give a name to each beard
and summarize what makes this beard unique.'
req_beard <- gsub('\n', ' ', req_beard) |>
stringr::str_squish()

res_beard <- nail_descfreq(beard_cont,
introduction = intro_beard,
request = req_beard)
cat(res_beard$response)

## End(Not run)

Beard descriptions

Description

These data refer to 8 types of beards. They come from a subset of the original "beard" dataset. Each beard was evaluated by 62 assessors (except beard 8 which only had 60 evaluations).

Usage

beard_wide

Format

A data frame with 8 rows and 24 columns:

  • rows are the types of beards;

  • columns are the assessors' opinions.

Source

Applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
data(beard_wide)

intro_beard <- "As a barber, you make
recommendations based on consumers comments.
Examples of consumers descriptions of beards
are as follows."
intro_beard <- gsub('\n', ' ', intro_beard) |>
stringr::str_squish()

res <- nail_sort(beard_wide[,1:5], name_size = 3,
stimulus_id = "beard", introduction = intro_beard,
measure = 'the description was')

res$dta_sort
cat(res$prompt_llm[[1]])

## End(Not run)

Ideal boss survey

Description

These data were collected after a Q-method-like survey on participants' perception of an "ideal boss". Participants had to rank how much they agreed with 30 statements about an ideal boss; then, they were asked personal questions.

Usage

boss

Format

A data frame with 73 rows (participants) and 39 columns (questions):

  • columns 1-30: statements about the ideal boss

  • columns 31-39: personal information

Source

Florian LECLERE and Marianne ANDRE, students at l'Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(FactoMineR)
library(NaileR)
data(boss)
res_mca_boss <- MCA(boss, quali.sup = 31:39,
ncp = 30, level.ventil = 0.05, graph = FALSE)
res_hcpc_boss <- HCPC(res_mca_boss, nb.clust = 4, graph = FALSE)
don_clust_boss <- res_hcpc_boss$data.clust

intro_boss <- 'A study on "the ideal boss" was led on 73 participants.
The study had 2 parts. In the first part,
participants were given statements about the ideal boss
(starting with "My ideal boss...").
They had to rate, on a scale from 1 to 5,
how much they agreed with the statements;
1 being "Strongly disagree", 3 being "neutral"
and 5 being "Strongly agree".
In the second part, they were asked for personal information:
work experience, age, etc.
Participants were then split into groups based on their answers.'
intro_boss <- gsub('\n', ' ', intro_boss) |>
stringr::str_squish()

req_boss <- "Please describe, for each group, their ideal boss.
Then, give each group a new name, based on your conclusions."
req_boss <- gsub('\n', ' ', req_boss) |>
stringr::str_squish()


res_boss <- nail_catdes(don_clust_boss, num.var = 40,
introduction = intro_boss, request = req_boss,
isolate.groups = FALSE, drop.negative = TRUE)
res_boss$response |> cat()

## End(Not run)

Atomic habits survey

Description

People think they need to make big changes to change the course of their lives. But in James Clear's book, Atomic Habits, they will discover that the smallest of changes, coupled with a good knowledge of psychology and neuroscience, can have a revolutionary effect on their lives and relationships. To understand this concept of atomic habits, we interviewed 167 people and asked them if they were able to never take their car alone again, to buy local products... We also asked them how restrictive they found this and why.

Usage

car_alone

Format

  • column 1, a combination of being able and feeling restrictive

  • column 2, justify your answer

Source

Applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(FactoMineR)
library(NaileR)
library(dplyr)
data(car_alone)
sampled_car_alone <- car_alone %>%
group_by(car_alone_capable_restrictive) %>%
sample_frac(0.5)
sampled_car_alone <- as.data.frame(sampled_car_alone)

intro_car <- "Knowing the impact on the climate,
I have made these choices based on
the following benefits and constraints..."
intro_car <- gsub('\n', ' ', intro_car) |>
stringr::str_squish()
res_nail_textual <- nail_textual(sampled_car_alone, num.var = 1,
                                num.text = 2,
                                introduction = intro_car,
                                request = NULL,
                                model = 'llama3', isolate.groups = TRUE,
                                generate = TRUE)
res_nail_textual[[1]]$response |> cat()
res_nail_textual[[3]]$response |> cat()
res_nail_textual[[2]]$response |> cat()
res_nail_textual[[4]]$response |> cat()

## End(Not run)

LLM distance matrix

Description

Compute a distance matrix between randomly-generated responses to an LLM prompt.

Usage

dist_mat_llm(ppt, n, per_miss = 0)

Arguments

ppt

an LLM prompt.

n

the number of responses to be generated.

per_miss

the proportion of missing values in the final matrix (between 0 and 1; 0 by default).

Details

The final percentage of missing values might differ from the per_miss parameter value; rather than a percentage of values being turned to NA, each value has a per_miss probability of being NA.

Value

A list containing:

  • a list of the LLM results for each iteration;

  • a distance matrix.

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

data(iris)

intro_iris <- "A study measured various parts of iris flowers
from 3 different species: setosa, versicolor and virginica.
I will give you the results from this study.
You will have to identify what sets these flowers apart."
intro_iris <- gsub('\n', ' ', intro_iris) |>
stringr::str_squish()

req_iris <- "Please explain what makes each species distinct.
Also, tell me which species has the biggest flowers,
and which species has the smallest."
req_iris <- gsub('\n', ' ', req_iris) |>
stringr::str_squish()

res_iris <- nail_catdes(iris, num.var = 5,
introduction = intro_iris, request = req_iris)

dist_mat_llm(res_iris$prompt, n = 5, per_miss = 0)

## End(Not run)

LLM response consistency

Description

Compute distances between an LLM response of interest and some other responses to the same prompt.

Usage

dist_ref_llm(ppt, ref, n)

Arguments

ppt

an LLM prompt.

ref

the reference response.

n

the number of new responses to be generated.

Value

A list containing:

  • a list with the newly-generated prompts;

  • a vector of distances to the reference response.

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

data(iris)

intro_iris <- "A study measured various parts of iris flowers
from 3 different species: setosa, versicolor and virginica.
I will give you the results from this study.
You will have to identify what sets these flowers apart."
intro_iris <- gsub('\n', ' ', intro_iris) |>
stringr::str_squish()

req_iris <- "Please explain what makes each species distinct.
Also, tell me which species has the biggest flowers,
and which species has the smallest."
req_iris <- gsub('\n', ' ', req_iris) |>
stringr::str_squish()

res_iris <- nail_catdes(iris, num.var = 5,
introduction = intro_iris, request = req_iris)

dist_ref_llm(res_iris$prompt, res_iris$response, n = 5)

## End(Not run)

Car seat fabrics

Description

This dataset was initially collected to understand the free jar data.

Usage

fabric

Format

A data frame with 567 rows and 4 columns:

  • The ID of the judge

  • The product

  • The reason why the product was liked or disliked

  • O if the product was disliked, 1 otherwise

Source

Applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
data(fabric)

intro_car <- "For this consumer study,
a car seat fabric was evaluated by consumers.
Some of them didn't like it (group '0'),
others liked it (group '1'). The consumers
gave their reasons for disliking or liking the fabric."
intro_car <- gsub('\n', ' ', intro_car) |>
stringr::str_squish()

request_car <- "Based on the comments provided by the consumers,
please explain the reasons why
the fabric was not appreciated for group '0',
and the reasons why the fabric was appreciated for group '1'.
In other words, what are the drivers
for disliking and liking this fabric."
request_car <- gsub('\n', ' ', request_car) |>
stringr::str_squish()

fabric_A <- droplevels(fabric[fabric$Fabric=="A",])

res_nail_textual_fabric <- nail_textual(fabric_A, num.var = 4,
                                        num.text = 3,
                                        introduction = intro_car,
                                        request = request_car,
                                        model = 'llama3',
                                        isolate.groups = FALSE,
                                        generate = FALSE)
cat(res_nail_textual_fabric$prompt)

res_nail_textual_fabric <- nail_textual(fabric_A, num.var = 4,
                                        num.text = 3,
                                        introduction = intro_car,
                                        request = request_car,
                                        model = 'llama3',
                                        isolate.groups = FALSE,
                                        generate = TRUE)
cat(res_nail_textual_fabric$response)

## End(Not run)

Glossophobia survey

Description

These data were collected after a Q-method-like survey on participants' feelings about speaking in public. Participants had to rank how much they agreed with 25 descriptions of speaking in public; then, they were asked personal questions.

Usage

glossophobia

Format

A data frame with 139 rows (participants) and 41 columns (questions):

  • columns 1-25: descriptions of speaking in public

  • columns 26-41: personal information

Source

Elina BIAU and Théo LEDAIN, students at l'Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
data(glossophobia)

res_mca_phobia <- FactoMineR::MCA(glossophobia, quali.sup = 26:41,
level.ventil = 0.05, graph = FALSE)
phobia_work <- res_mca_phobia$ind$coord |> as.data.frame()
phobia_work <- phobia_work[,1] |> cbind(glossophobia)

intro_phobia <- "These data were collected after a survey
on participants' feelings about speaking in public.
Participants had to rank how much they agreed with
25 descriptions of speaking in public;
then, they were asked personal questions."
intro_phobia <- gsub('\n', ' ', intro_phobia) |>
stringr::str_squish()

res_phobia <- nail_condes(phobia_work, num.var = 1,
introduction = intro_phobia)
cat(res_phobia$response)

## End(Not run)

Local food systems survey

Description

These data were collected after a Q-method-like survey on sustainable food systems. Participants had to rank how acceptable they found 45 statements about a sustainable food system; then, they were asked if they agreed with 14 other statements.

Usage

local_food

Format

A data frame with 573 rows (participants) and 63 columns (questions):

  • columns 1-45 statements about food systems

  • columns 46-59 opinions

  • columns 60-63 personal information

Source

Applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(FactoMineR)
library(NaileR)
data(local_food)

res_mca_food <- MCA(local_food, quali.sup = 46:63,
ncp = 100, level.ventil = 0.05, graph = FALSE)
res_hcpc_food <- HCPC(res_mca_food, nb.clust = 3, graph = FALSE)
don_clust_food <- res_hcpc_food$data.clust

intro_food <- 'A study on sustainable food systems
was led on several French participants.
This study had 2 parts. In the first part,
participants had to rate how acceptable
"a food system that..." (e.g, "a food system that
only uses renewable energy") was to them.
In the second part, they had to say
if they agreed or disagreed with some statements.'
intro_food <- gsub('\n', ' ', intro_food) |>
stringr::str_squish()

req_food <- 'I will give you the answers from one group.
Please explain who the individuals of this group are,
what their beliefs are.
Then, give this group a new name,
and explain why you chose this name.
Do not use 1st person ("I", "my"...) in your answer.'
req_food <- gsub('\n', ' ', req_food) |>
stringr::str_squish()

res_food <- nail_catdes(don_clust_food, num.var = 64,
introduction = intro_food,
request = req_food,
isolate.groups = TRUE, drop.negative = TRUE)
res_food[[1]]$response |> cat()

## End(Not run)

Interpret a categorical latent variable

Description

Generate an LLM response to analyze a categorical latent variable.

Usage

nail_catdes(
  dataset,
  num.var,
  introduction = NULL,
  request = NULL,
  model = "llama3",
  isolate.groups = FALSE,
  drop.negative = FALSE,
  proba = 0.05,
  row.w = NULL,
  generate = FALSE
)

Arguments

dataset

a data frame made up of at least one categorical variable and a set of quantitative variables and/or categorical variables.

num.var

the index of the variable to be characterized.

introduction

the introduction for the LLM prompt.

request

the request made to the LLM.

model

the model name ('llama3' by default).

isolate.groups

a boolean that indicates whether to give the LLM a single prompt, or one prompt per category. Recommended with long catdes results.

drop.negative

a boolean that indicates whether to drop negative v.test values for interpretation (keeping only positive v.tests). Recommended with long catdes results.

proba

the significance threshold considered to characterize the categories (by default 0.05).

row.w

a vector of integers corresponding to an optional row weights (by default, a vector of 1 for uniform row weights)

generate

a boolean that indicates whether to generate the LLM response. If FALSE, the function only returns the prompt.

Details

This function directly sends a prompt to an LLM. Therefore, to get a consistent answer, we highly recommend to customize the parameters introduction and request and add all relevant information on your data for the LLM. We also recommend renaming the columns with clear, unshortened and unambiguous names.

Additionally, if isolate.groups = TRUE, you will need an introduction and a request that take into account the fact that only one group is analyzed at a time.

Value

A data frame, or a list of data frames, containing the LLM's prompt and response (if generate = TRUE).

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

### Example 1: Fisher's iris ###
library(NaileR)
data(iris)

intro_iris <- "A study measured various parts of iris flowers
from 3 different species: setosa, versicolor and virginica.
I will give you the results from this study.
You will have to identify what sets these flowers apart."
intro_iris <- gsub('\n', ' ', intro_iris) |>
stringr::str_squish()

req_iris <- "Please explain what makes each species distinct.
Also, tell me which species has the biggest flowers,
and which species has the smallest."
req_iris <- gsub('\n', ' ', req_iris) |>
stringr::str_squish()

res_iris <- nail_catdes(iris,
                        num.var = 5,
                        introduction = intro_iris,
                        request = req_iris,
                        generate = TRUE)

cat(res_iris$response)

### Example 2: food waste dataset ###

library(FactoMineR)

data(waste)
waste <- waste[-14]    # no variability on this question

set.seed(1)
res_mca_waste <- MCA(waste, quali.sup = c(1,2,50:76),
ncp = 35, level.ventil = 0.05, graph = FALSE)
plot.MCA(res_mca_waste, choix = "ind",
invisible = c("var", "quali.sup"), label = "none")
res_hcpc_waste <- HCPC(res_mca_waste, nb.clust = 3, graph = FALSE)
plot.HCPC(res_hcpc_waste, choice = "map", draw.tree = FALSE,
ind.names = FALSE)
don_clust_waste <- res_hcpc_waste$data.clust

intro_waste <- 'These data were collected
after a survey on food waste,
with participants describing their habits.'
intro_waste <- gsub('\n', ' ', intro_waste) |>
stringr::str_squish()

req_waste <- 'Please summarize the characteristics of each group.
Then, give each group a new name, based on your conclusions.
Finally, give each group a grade between 0 and 10,
based on how wasteful they are with food:
0 being "not at all", 10 being "absolutely".'
req_waste <- gsub('\n', ' ', req_waste) |>
stringr::str_squish()

res_waste <- nail_catdes(don_clust_waste,
                         num.var = ncol(don_clust_waste),
                         introduction = intro_waste,
                         request = req_waste,
                         drop.negative = TRUE,
                         generate = TRUE)

cat(res_waste$response)


### Example 3: local_food dataset ###

data(local_food)

set.seed(1)
res_mca_food <- MCA(local_food, quali.sup = 46:63,
ncp = 100, level.ventil = 0.05, graph = FALSE)
plot.MCA(res_mca_food, choix = "ind",
invisible = c("var", "quali.sup"), label = "none")
res_hcpc_food <- HCPC(res_mca_food, nb.clust = 3, graph = FALSE)
plot.HCPC(res_hcpc_food, choice = "map", draw.tree = FALSE,
ind.names = FALSE)
don_clust_food <- res_hcpc_food$data.clust

intro_food <- 'A study on sustainable food systems
was led on several French participants.
This study had 2 parts. In the first part,
participants had to rate how acceptable
"a food system that..." (e.g, "a food system that
only uses renewable energy") was to them.
In the second part, they had to say
if they agreed or disagreed with some statements.'
intro_food <- gsub('\n', ' ', intro_food) |>
stringr::str_squish()

req_food <- 'I will give you the answers from one group.
Please explain who the individuals of this group are,
what their beliefs are.
Then, give this group a new name,
and explain why you chose this name.
Do not use 1st person ("I", "my"...) in your answer.'
req_food <- gsub('\n', ' ', req_food) |>
stringr::str_squish()

res_food <- nail_catdes(don_clust_food,
                        num.var = 64,
                        introduction = intro_food,
                        request = req_food,
                        isolate.groups = TRUE,
                        drop.negative = TRUE,
                        generate = TRUE)

res_food[[1]]$response |> cat()
res_food[[2]]$response |> cat()
res_food[[3]]$response |> cat()

## End(Not run)

Interpret a continuous latent variable

Description

Generate an LLM response to analyze a continuous latent variable.

Usage

nail_condes(
  dataset,
  num.var,
  introduction = NULL,
  request = NULL,
  model = "llama3",
  quanti.threshold = 0,
  quanti.cat = c("Significantly above average", "Significantly below average", "Average"),
  weights = NULL,
  proba = 0.05,
  generate = FALSE
)

Arguments

dataset

a data frame made up of at least one quantitative variable and a set of quantitative variables and/or categorical variables.

num.var

the index of the variable to be characterized.

introduction

the introduction for the LLM prompt.

request

the request made to the LLM.

model

the model name ('llama3' by default).

quanti.threshold

the threshold above (resp. below) which a scaled variable is considered significantly above (resp.below) the average. Used when converting continuous variables to categorical ones.

quanti.cat

a vector of the 3 possible categories for continuous variables converted to categorical ones according to the threshold. Default is "above average", "below average" and "average".

weights

weights for the individuals (see FactoMineR::condes()).

proba

the significance threshold considered to characterize the category (by default 0.05).

generate

a boolean that indicates whether to generate the LLM response. If FALSE, the function only returns the prompt.

Details

This function directly sends a prompt to an LLM. Therefore, to get a consistent answer, we highly recommend to customize the parameters introduction and request and add all relevant information on your data for the LLM. We also recommend renaming the columns with clear, unshortened and unambiguous names.

Value

A data frame containing the LLM's prompt and response (if generate = TRUE).

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

### Example 1: decathlon dataset ###

library(FactoMineR)
data(decathlon)

names(decathlon) <- c('Time taken to complete the 100m',
'Distance reached for the long jump',
'Distance reached for the shot put',
'Height reached for the high jump',
'Time taken to complete the 400m',
'Time taken to complete the 110m hurdle',
'Distance reached for the discus',
'Height reached for the pole vault',
'Distance reached for the javeline',
'Time taken to complete the 1500 m',
'Rank/Counter-performance indicator',
'Points', 'Competition')

res_pca_deca <- FactoMineR::PCA(decathlon,
quanti.sup = 11:12, quali.sup = 13, graph = FALSE)
plot.PCA(res_pca_deca, choix = 'var')
deca_work <- res_pca_deca$ind$coord |> as.data.frame()
deca_work <- deca_work[,1] |> cbind(decathlon)

intro_deca <- "A study was led on athletes
participating in a decathlon event.
Their performance was assessed on each part of the decathlon,
and they were all placed on an unidimensional scale."
intro_deca <- gsub('\n', ' ', intro_deca) |>
stringr::str_squish()

res_deca <- nail_condes(deca_work,
                        num.var = 1,
                        quanti.threshold = 1,
                        quanti.cat = c('High', 'Low', 'Average'),
                        introduction = intro_deca,
                        generate = TRUE)

cat(res_deca$response)


### Example 2: agri_studies dataset ###

data(agri_studies)

set.seed(1)
res_mca_agri <- FactoMineR::MCA(agri_studies, quali.sup = 39:42,
level.ventil = 0.05, graph = FALSE)
plot.MCA(res_mca_agri, choix = 'ind',
invisible = c('var', 'quali.sup'), label = 'none')

agri_work <- res_mca_agri$ind$coord |> as.data.frame()
agri_work <- agri_work[,1] |> cbind(agri_studies)

intro_agri <- "These data were collected after a survey
on students' expectations of agribusiness studies.
Participants had to rank how much they agreed with 38 statements
about possible benefits from agribusiness studies;
then, they were asked personal questions."
intro_agri <- gsub('\n', ' ', intro_agri) |>
stringr::str_squish()

res_agri <- nail_condes(agri_work,
                        num.var = 1,
                        introduction = intro_agri,
                        generate = TRUE)

cat(res_agri$response)

### Example 3: glossophobia dataset ###

data(glossophobia)

set.seed(1)
res_mca_phobia <- FactoMineR::MCA(glossophobia,
quali.sup = 26:41, level.ventil = 0.05, graph = FALSE)
plot.MCA(res_mca_phobia, choix = 'ind',
invisible = c('var', 'quali.sup'), label = 'none')

phobia_work <- res_mca_phobia$ind$coord |> as.data.frame()
phobia_work <- phobia_work[,1] |> cbind(glossophobia)

intro_phobia <- "These data were collected after a survey
on participants' feelings about speaking in public.
Participants had to rank how much they agreed with
25 descriptions of speaking in public;
then, they were asked personal questions."
intro_phobia <- gsub('\n', ' ', intro_phobia) |>
stringr::str_squish()

res_phobia <- nail_condes(phobia_work,
                          num.var = 1,
                          introduction = intro_phobia,
                          generate = TRUE)

cat(res_phobia$response)

### Example 4: beard_cont dataset ###

data(beard_cont)

set.seed(1)
res_ca_beard <- FactoMineR::CA(beard_cont, graph = FALSE)
plot.CA(res_ca_beard, invisible = 'col')

beard_work <- res_ca_beard$row$coord |> as.data.frame()
beard_work <- beard_work[,1] |> cbind(beard_cont)

intro_beard <- "These data refer to 8 types of beards.
Each beard was evaluated by 62 assessors."
intro_beard <- gsub('\n', ' ', intro_beard) |>
stringr::str_squish()

req_beard <- "Please explain what differentiates beards
on both sides of the scale.
Then, give the scale a name."
req_beard <- gsub('\n', ' ', req_beard) |>
stringr::str_squish()

res_beard <- nail_condes(beard_work,
                         num.var = 1,
                         quanti.threshold = 0.5,
                         quanti.cat = c('Very often used', 'Never used', 'Sometimes used'),
                         introduction = intro_beard,
                         request = req_beard)

res_beard

ppt <- stringr::str_replace_all(res_beard, 'observations', 'beards')
cat(ppt)

res_beard <- ollamar::generate(model = 'llama3', prompt = ppt, output = 'text')

cat(res_beard)

## End(Not run)

Interpret the rows of a contingency table

Description

Describes the rows of a contingency table. For each row, this description is based on the columns of the contingency table that are significantly related to it.

Usage

nail_descfreq(
  dataset,
  introduction = NULL,
  request = NULL,
  model = "llama3",
  isolate.groups = FALSE,
  by.quali = NULL,
  proba = 0.05,
  generate = FALSE
)

Arguments

dataset

a data frame corresponding to a contingency table.

introduction

the introduction for the LLM prompt.

request

the request made to the LLM.

model

the model name ('llama3' by default).

isolate.groups

a boolean that indicates whether to give the LLM a single prompt, or one prompt per row. Recommended if the contingency table has a great number of rows.

by.quali

a factor used to merge the data from different rows of the contingency table; by default NULL and each row is characterized.

proba

the significance threshold considered to characterize the category (by default 0.05).

generate

a boolean that indicates whether to generate the LLM response. If FALSE, the function only returns the prompt.

Details

This function directly sends a prompt to an LLM. Therefore, to get a consistent answer, we highly recommend to customize the parameters introduction and request and add all relevant information on your data for the LLM.

Additionally, if isolate.groups = TRUE, you will need an introduction and a request that take into account the fact that only one group is analyzed at a time.

Value

A data frame, or a list of data frames, containing the LLM's prompt and response (if generate = TRUE).

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

### Example 1: beard dataset ###

data(beard_cont)

intro_beard_iso <- 'A survey was conducted about beards
and 8 types of beards were described.
I will give you the results for one type of beard.'
intro_beard_iso <- gsub('\n', ' ', intro_beard_iso) |>
stringr::str_squish()

req_beard_iso <- 'Please give a name to this beard
and summarize what makes this beard unique.'
req_beard_iso <- gsub('\n', ' ', req_beard_iso) |>
stringr::str_squish()

res_beard <- nail_descfreq(beard_cont,
                           introduction = intro_beard_iso,
                           request = req_beard_iso,
                           isolate.groups = TRUE,
                           generate = FALSE)

res_beard[[1]]
res_beard[[2]]

intro_beard <- 'A survey was conducted about beards
and 8 types of beards were described.
In the data that follow, beards are named B1 to B8.'
intro_beard <- gsub('\n', ' ', intro_beard) |>
stringr::str_squish()

req_beard <- 'Please give a name to each beard
and summarize what makes this beard unique.'
req_beard <- gsub('\n', ' ', req_beard) |>
stringr::str_squish()

res_beard <- nail_descfreq(beard_cont,
                           introduction = intro_beard,
                           request = req_beard,
                           generate = TRUE)

cat(res_beard$response)

text <- res_beard$response
titles <- stringr::str_extract_all(text, "\\*\\*B[0-9]+: [^\\*\\*]+\\*\\*")[[1]]

titles

# for the following code to work, the response must have the beards'
# new names with this format: **B1: The Nice beard**, etc.

titles <- stringr::str_replace_all(titles, "\\*\\*", "")  # remove asterisks
names <- stringr::str_extract(titles, ": .+")
names <- stringr::str_replace_all(names, ": ", "")  # remove the colon and space

rownames(beard_cont) <- names

library(FactoMineR)

res_ca_beard <- CA(beard_cont, graph = F)
plot.CA(res_ca_beard, invisible = "col")


### Example 2: children dataset ###

data(children)

children <- children[1:14, 1:5] |> t() |> as.data.frame()
rownames(children) <- c('No education', 'Elementary school',
'Middle school', 'High school', 'University')

intro_children <- 'The data used here is a contingency table
that summarizes the answers
given by different categories of people to the following question:
"according to you, what are the reasons that can make
a woman of a couple hesitate to have children?".
Each row corresponds to a level of education, and columns are reasons.'
intro_children <- gsub('\n', ' ', intro_children) |>
stringr::str_squish()

req_children <- "Please explain the main differences
between more educated and less educated couples,
when it comes to hesitating to have children."
req_children <- gsub('\n', ' ', req_children) |>
stringr::str_squish()

res_children <- nail_descfreq(children,
                              introduction = intro_children,
                              request = req_children,
                              generate = TRUE)

cat(res_children$response)

## End(Not run)

Interpret QDA data

Description

Generate an LLM response to analyze QDA data.

Usage

nail_qda(
  dataset,
  formul,
  firstvar,
  lastvar = length(colnames(dataset)),
  introduction = NULL,
  request = NULL,
  model = "llama3",
  isolate.groups = FALSE,
  drop.negative = FALSE,
  proba = 0.05,
  generate = FALSE
)

Arguments

dataset

a data frame made up of at least two categorical variables (product, panelist) and a set of quantitative variables (sensory attributes).

formul

the analyis of variance model to be evaluated for each sensory attribute.

firstvar

the index of the first sensory attribute.

lastvar

the index of the last sensory attribute.

introduction

the introduction for the LLM prompt.

request

the request for the LLM prompt.

model

the model name ('llama3' by default).

isolate.groups

a boolean that indicates whether to give the LLM a single prompt, or one prompt per product.

drop.negative

a boolean that indicates whether to drop negative v.test values for interpretation (keeping only positive v.tests).

proba

the significance threshold considered to characterize the products (by default 0.05).

generate

a boolean that indicates whether to generate the LLM response. If FALSE, the function only returns the prompt.

Details

This function directly sends a prompt to an LLM. Therefore, to get a consistent answer, we highly recommend to customize the parameters introduction and request and add all relevant information on your data for the LLM. We also recommend renaming the columns with clear, unshortened and unambiguous names.

Additionally, if isolate.groups = TRUE, you will need an introduction and a request that take into account the fact that only one group is analyzed at a time.

Value

A data frame, or a list of data frames, containing the LLM's prompt and response (if generate = TRUE).

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

### Example 1: QDA data on chocolates with isolate.groups = FALSE ###
library(NaileR)
library(SensoMineR)
data(chocolates)

intro_sensochoc <- "Six chocolates were measured according
to sensory attributes by a trained panel.
I will give you the results from this study.
You will have to identify what sets these chocolates apart."
intro_sensochoc <- gsub('\n', ' ', intro_sensochoc) |>
stringr::str_squish()

req_sensochoc <- "Please explain what makes each chocolate different
and provide a sensory profile of each chocolate, as well as a name."
req_sensochoc <- gsub('\n', ' ', req_sensochoc) |>
stringr::str_squish()

res_nail_qda <- nail_qda(sensochoc,
                         formul="~Product+Panelist",
                         firstvar = 5,
                         introduction = intro_sensochoc,
                         request = req_sensochoc,
                         model = 'llama3',
                         isolate.groups = FALSE,
                         drop.negative = FALSE,
                         proba = 0.05,
                         generate = TRUE)

cat(res_nail_qda$prompt)
cat(res_nail_qda$response)

### Example 2: QDA data on chocolates with isolate.groups = TRUE ###
library(NaileR)
library(SensoMineR)
data(chocolates)

intro_sensochoc <- "A chocolate was measured according
to sensory attributes by a trained panel.
I will give you the results from this study.
You will have to identify the characteristics of this chocolate."
intro_sensochoc <- gsub('\n', ' ', intro_sensochoc) |>
stringr::str_squish()

req_sensochoc <- "Please provide a detailed sensory profile for this chocolate,
as well as a name."
req_sensochoc <- gsub('\n', ' ', req_sensochoc) |>
stringr::str_squish()

res_nail_qda <- nail_qda(sensochoc,
                         formul="~Product+Panelist",
                         firstvar = 5,
                         introduction = intro_sensochoc,
                         request = req_sensochoc,
                         model = 'llama3',
                         isolate.groups = TRUE,
                         drop.negative = FALSE,
                         proba = 0.05,
                         generate = TRUE)

cat(res_nail_qda[[1]]$prompt)
cat(res_nail_qda[[1]]$response)

## End(Not run)

Sort textual data

Description

Group textual data according to their similarity, in a context in which the assessors have commented on a set of stimuli.

Usage

nail_sort(
  dataset,
  name_size = 3,
  stimulus_id = "stimulus",
  introduction = "",
  measure = "",
  nb_max = 6,
  generate = FALSE
)

Arguments

dataset

a data frame where each row is a stimulus and each column is an assessor.

name_size

the maximum number of words in a group name created by the LLM.

stimulus_id

the nature of the stimulus. Customizing it is highly recommended.

introduction

the introduction to the LLM prompt.

measure

the type of measure used in the experiment.

nb_max

the maximum number of clusters the LLM can form per assessor.

generate

a boolean that indicates whether to generate the LLM response. If FALSE, the function only returns the prompt.

Details

This function uses a while loop to ensure that the LLM gives the right number of groups. Therefore, customizing the stimulus ID, prompt introduction and measure is highly recommended; a clear prompt can help the LLM finish its task faster.

Value

A list consisting of:

  • a list of prompts (one per assessor);

  • a data frame with the group names.

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
data(beard_wide)

intro_beard <- "As a barber, you make
recommendations based on consumers comments.
Examples of consumers descriptions of beards
are as follows."
intro_beard <- gsub('\n', ' ', intro_beard) |>
stringr::str_squish()

res <- nail_sort(beard_wide[,1:5], name_size = 3,
stimulus_id = "beard", introduction = intro_beard,
measure = 'the description was', generate = TRUE)

res$dta_sort
cat(res$prompt_llm[[1]])

## End(Not run)

Interpret a group based on answers to open-ended questions

Description

Generate an LLM response to analyze a categorical latent variable, based on answers to open-ended questions.

Usage

nail_textual(
  dataset,
  num.var,
  num.text,
  introduction = NULL,
  request = NULL,
  model = "llama3",
  isolate.groups = TRUE,
  generate = FALSE
)

Arguments

dataset

a data frame made up of at least one categorical variable and a textual variable.

num.var

the index of the categorical variable to be characterized.

num.text

the index of the textual variable that characterizes the categorical variable of interest.

introduction

the introduction for the LLM prompt.

request

the request made to the LLM.

model

the model name ('llama3' by default).

isolate.groups

a boolean that indicates whether to give the LLM a single prompt, or one prompt per category. Recommended with long catdes results.

generate

a boolean that indicates whether to generate the LLM response. If FALSE, the function only returns the prompt.

Details

This function directly sends a prompt to an LLM. Therefore, to get a consistent answer, we highly recommend to customize the parameters introduction and request and add all relevant information on your data for the LLM. We also recommend renaming the columns with clear, unshortened and unambiguous names.

Additionally, if isolate.groups = TRUE, you will need an introduction and a request that take into account the fact that only one group is analyzed at a time.

Value

A data frame, or a list of data frames, containing the LLM's prompt and response (if generate = TRUE).

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

### Example 1: Car alone survey ###
library(NaileR)
library(dplyr)
data(car_alone)

sampled_car_alone <- car_alone %>%
group_by(car_alone_capable_restrictive) %>%
dplyr::sample_frac(0.5)
sampled_car_alone <- as.data.frame(sampled_car_alone)

intro_car <- "Knowing the impact on the climate,
I have made these choices based on
the following benefits and constraints..."
intro_car <- gsub('\n', ' ', intro_car) |>
stringr::str_squish()

res_nail_textual <- nail_textual(sampled_car_alone, num.var = 1,
                                 num.text = 2,
                                 introduction = intro_car,
                                 request = NULL,
                                 model = 'llama3', isolate.groups = TRUE,
                                 generate = TRUE)

res_nail_textual[[1]]$response |> cat()
res_nail_textual[[3]]$response |> cat()
res_nail_textual[[2]]$response |> cat()
res_nail_textual[[4]]$response |> cat()

### Example 2: Atomic habits survey ###
library(NaileR)
library(dplyr)
data(atomic_habit_clust)

intro_atomic <- "These data were collected
after a survey on atomic habits: we asked
what people were prepared to change about their daily habits
to make the world a better place,
what habits they felt able to adopt,
what habits were restrictive."
intro_atomic <- gsub('\n', ' ', intro_atomic) |>
stringr::str_squish()

dta_plane <- atomic_habit_clust[,c(32,51)] %>%
            filter(never_plane_text != 'THAT')

sampled_dta_plane <- dta_plane %>%
                    group_by(clust) %>%
                    dplyr::sample_frac(0.75)

sampled_dta_plane <- as.data.frame(sampled_dta_plane)
summary(sampled_dta_plane)

res_nail_textual_plane <- nail_textual(sampled_dta_plane, num.var = 2,
                                       num.text = 1,
                                       introduction = intro_atomic,
                                       request = NULL,
                                       model = 'llama3',
                                       isolate.groups = TRUE,
                                       generate = TRUE)

cat(res_nail_textual_plane[[1]]$prompt)
cat(res_nail_textual_plane[[1]]$response)

cat(res_nail_textual_plane[[2]]$prompt)
cat(res_nail_textual_plane[[2]]$response)

cat(res_nail_textual_plane[[3]]$prompt)
cat(res_nail_textual_plane[[3]]$response)

res_nail_textual_plane <- nail_textual(sampled_dta_plane, num.var = 2,
                                       num.text = 1,
                                       introduction = intro_atomic,
                                       request = NULL,
                                       model = 'llama3',
                                       isolate.groups = FALSE,
                                       generate = TRUE)

cat(res_nail_textual_plane$prompt)
cat(res_nail_textual_plane$response)

### Example 3: Car seat fabrics ###

# Drivers of liking and disliking
# isolate.groups = F

intro_car <- "In this consumer study, a number of car seat fabrics
were rated by consumers who gave their reasons
for liking or disliking the fabrics.
Reasons for disliking the fabrics were reported in group '0',
while reasons for liking the fabrics were reported in group '1'."
intro_car <- gsub('\n', ' ', intro_car) |>
stringr::str_squish()

request_car <- "Based on the comments provided by the consumers,
please explain the reasons why
the fabrics were not appreciated (group '0'),
and the reasons why fabrics were appreciated (group '1').
In other words, what are the drivers for disliking
and liking the fabrics."
request_car <- gsub('\n', ' ', request_car) |>
stringr::str_squish()

res_nail_textual_fabric <- nail_textual(fabric, num.var = 4,
                                        num.text = 3,
                                        introduction = intro_car,
                                        request = request_car,
                                        model = 'llama3',
                                        isolate.groups = FALSE,
                                        generate = TRUE)

cat(res_nail_textual_fabric$response)

# Drivers of disliking with a specific prompt
# isolate.groups = T

intro_car_disliking <- "In this consumer study, a range of car seat fabrics
were rated by consumers who gave their reasons
for disliking the fabrics.
In these data, only the reasons for disliking the fabrics were reported."
intro_car_disliking <- gsub('\n', ' ', intro_car_disliking) |>
stringr::str_squish()

request_car_disliking <- "Based on the comments provided by the consumers,
please explain the reasons why
the fabrics were not appreciated.
In other words, what are the drivers for disliking the fabrics."
request_car_disliking <- gsub('\n', ' ', request_car_disliking) |>
stringr::str_squish()

res_nail_textual_fabric <- nail_textual(fabric, num.var = 4,
                                        num.text = 3,
                                        introduction = intro_car_disliking,
                                        request = request_car_disliking,
                                        model = 'llama3',
                                        isolate.groups = TRUE,
                                        generate = FALSE)

ppt <- res_nail_textual_fabric[1]
cat(ppt)

res_disliking <- ollamar::generate(model = 'llama3', prompt = ppt,
                                   output = "df")
cat(res_disliking$response)

# Drivers of liking with a specific prompt
# isolate.groups = T

intro_car_liking <- "In this consumer study, a range of car seat fabrics
were rated by consumers who gave their reasons
for liking the fabrics.
In these data, only the reasons for liking the fabrics were reported."
intro_car_liking <- gsub('\n', ' ', intro_car_liking) |>
stringr::str_squish()

request_car_liking <- "Based on the comments provided by the consumers,
please explain the reasons why
the fabrics were appreciated.
In other words, what are the drivers for liking the fabrics."
request_car_liking <- gsub('\n', ' ', request_car_liking) |>
stringr::str_squish()

res_nail_textual_fabric <- nail_textual(fabric, num.var = 4,
                                        num.text = 3,
                                        introduction = intro_car_liking,
                                        request = request_car_liking,
                                        model = 'llama3',
                                        isolate.groups = TRUE,
                                        generate = FALSE)

ppt <- res_nail_textual_fabric[2]
cat(ppt)

res_liking <- ollamar::generate(model = 'llama3', prompt = ppt,
                                output = "df")
cat(res_liking$response)

### Example 4: Rorschach inkblots ###

# Description of each inkblot
# isolate.groups = TRUE

intro_rorschach <- "For this study,
we asked sixty people to briefly describe
one of the inkblots of the Rorschach test."
intro_rorschach <- gsub('\n', ' ', intro_rorschach) |>
stringr::str_squish()

request_rorschach <- "Based on the comments of the 60 people,
please give me a description of that inkblot
in terms of how it was perceived. Tell me if it was
a rather positive or negative perception."
request_rorschach <- gsub('\n', ' ', request_rorschach) |>
stringr::str_squish()

res_nail_textual_rorschach <- nail_textual(rorschach, num.var = 2,
                                           num.text = 5,
                                           introduction = intro_rorschach,
                                           request = request_rorschach,
                                           model = 'llama3',
                                           isolate.groups = TRUE,
                                           generate = FALSE)

cat(res_nail_textual_rorschach[[10]])

ppt <- gsub("## Group", "## Stimulus", res_nail_textual_rorschach[[10]])
cat(ppt)

res_inkblot_10 <- ollamar::generate(model = 'llama3', prompt = ppt, output = "df")
cat(res_inkblot_10$response)

cat(res_nail_textual_rorschach[[5]])

ppt <- gsub("## Group", "## Stimulus", res_nail_textual_rorschach[[5]])
cat(ppt)

res_inkblot_5 <- ollamar::generate(model = 'llama3', prompt = ppt,
                                   output = "df")
cat(res_inkblot_5$response)


#Comparison of panels

rorschach_10 <- droplevels(rorschach[rorschach$Inkblot=="10",])

intro_rorschach <- "For this study,
we asked sixty people to briefly describe
one of the inkblots of the Rorschach test.
The sixty people belonged to three different panels,
with 20 people per panel."
intro_rorschach <- gsub('\n', ' ', intro_rorschach) |>
stringr::str_squish()

request_rorschach <- "Based on the comments of the 60 people,
please tell me what is common from panel to panel
and what is specific to each panel
in terms of the perception of the inkblot."
request_rorschach <- gsub('\n', ' ', request_rorschach) |>
stringr::str_squish()

res_nail_textual_rorschach <- nail_textual(rorschach_10, num.var = 1,
                                           num.text = 5,
                                           introduction = intro_rorschach,
                                           request = request_rorschach,
                                           model = 'llama3',
                                           isolate.groups = FALSE,
                                           generate = TRUE)

cat(res_nail_textual_rorschach$prompt)
cat(res_nail_textual_rorschach$response)

## End(Not run)

Nutri-score survey

Description

These data were collected after a survey on the nutri-score. Participants were asked various questions about their views on the nutri-score, and about their eating habits.

Usage

nutriscore

Format

A data frame with 112 rows (participants) and 36 columns (questions).

Source

Anaëlle YANNIC and Jessie PICOT, students at l'Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
library(FactoMineR)

data(nutriscore)

res_mca_nutriscore <- MCA(nutriscore, quali.sup = 17:36,
ncp = 15, level.ventil = 0.05, graph = FALSE)

res_hcpc_nutriscore <- HCPC(res_mca_nutriscore, nb.clust = 3,
graph = FALSE)
don_clust_nutriscore <- res_hcpc_nutriscore$data.clust

intro_nutri <- 'These data were collected after a survey
on the nutri-score. Participants were asked
various questions about their views on the nutri-score,
and about their eating habits.
Participants were split into groups according to their answers.'
intro_nutri <- gsub('\n', ' ', intro_nutri) |>
stringr::str_squish()

req_nutri <- 'Please summarize the characteristics
of each group. Then, give each group a new name,
based on your conclusions.'
req_nutri <- gsub('\n', ' ', req_nutri)|>
stringr::str_squish()

res_nutriscore <- nail_catdes(don_clust_nutriscore, num.var = 37,
introduction = intro_nutri, request = req_nutri,
drop.negative = TRUE)

cat(res_nutriscore$response)

## End(Not run)

Perception of food quality

Description

These data were collected after a study on the perception of food quality. Participants were given 9 French logos; they had to rate, on a scale from 0 (not at all) to 10 (absolutely), how much a product bearing them aligned with their own perception of quality.

Usage

quality

Format

A data frame with 55 rows and 9 columns. Here is the list of logos:

  • AB: organic;

  • Label Rouge: superior quality (from the taste, process, packaging...);

  • FairTrade: decent wages and working conditions for the producers;

  • Bleu Blanc Coeur: diverse and balanced diet for the livestock;

  • AOC: controlled designation of origin;

  • Produit en Bretagne: processed in Brittany;

  • Viandes de France: livestock bred, grown and slaughtered in France, with respectful living conditions;

  • Nourri sans OGM: no GMOs in livestock food;

  • Médailles Agro: a prize won at a yearly contest based on taste.

Source

Sébastien Lê, applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
data(quality)

colnames(quality) <- c("Agriculture biologique",
"Label Rouge",
"FairTrade",
"Bleu Blanc Coeur",
"Appelation d'origine contrôlée",
"Produit en Bretagne",
"Viandes de France",
"Nourri sans OGM",
"Médailles Agro")

res_pca_quality <- FactoMineR::PCA(quality, graph = FALSE)
quali_work <- res_pca_quality$ind$coord |> as.data.frame()
quali_work <- quali_work[,1] |> cbind(quality)

intro_quali <- "These data were collected after a study
on the perception of food quality.
Participants were given 9 French logos;
they had to rate, on a scale from 0 (not at all)
to 10 (absolutely), how much a product bearing them
aligned with their own perception of quality."
intro_quali <- gsub('\n', ' ', intro_quali) |>
stringr::str_squish()

res_quality <- nail_condes(quali_work, num.var = 1,
quanti.cat = c('Higher quality', 'Lower quality', 'Neutral'),
introduction = intro_quali, generate = FALSE)

ppt <- gsub('characteristics', 'opinions', res_quality$prompt)

res_quality <- ollamar::generate('llama3', ppt, output = 'df')

cat(res_quality$response)

## End(Not run)

Rorschach inkblots

Description

This dataset was initially collected to understand the perception of the Rorschach test.

Usage

rorschach

Format

A data frame with 600 rows and 5 columns:

  • The Panel effect (3 panels)

  • The Inkblot effect (10 inkblots)

  • The Panelist effect (20 panelists par panel)

  • The interaction Panel and Panelist

  • The perception of the inkblot

Source

Applied mathematics department, Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
data(rorschach)

### Example 1: perception of the inkblots for one panel ###
intro_rorschach <- "For this study,
we asked 20 people to briefly describe
the 10 inkblots of the Rorschach test."
intro_rorschach <- gsub('\n', ' ', intro_rorschach) |>
stringr::str_squish()

request_rorschach <- "Based on the comments of the 20 people,
please give me a description of each inkblot
in terms of how it was perceived. Tell me if it was
a rather positive or negative perception."
request_rorschach <- gsub('\n', ' ', request_rorschach) |>
stringr::str_squish()

rorschach_A <- droplevels(rorschach[rorschach$Panel=="A",])

res_nail_textual_rorschach <- nail_textual(rorschach_A, num.var = 2,
                                           num.text = 5,
                                           introduction = intro_rorschach,
                                           request = request_rorschach,
                                           model = 'llama3',
                                           isolate.groups = FALSE,
                                           generate = FALSE)

cat(res_nail_textual_rorschach$prompt)

ppt <- gsub("## Group", "## Inkblot", res_nail_textual_rorschach$prompt)
cat(ppt)

res_inkblot <- ollamar::generate(model = 'llama3', prompt = ppt,
                                 output = "df")

cat(res_inkblot$response)

## End(Not run)

LLM text similarity

Description

Compute a similarity score, on a scale ranging from 0 (totally different) to 100 (the exact same), between two character strings.

Usage

sim_llm(textA, textB)

Arguments

textA, textB

two character strings.

Details

The similarity score is generated by an LLM. Therefore, the result might vary if the function is run several times.

Value

An integer between 0 and 100.

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

textA <- "Participant A was described as a nice, outgoing man, with a friendly attitude."
textB <- "Participant A was an extroverted and caring individual."

sim_llm(textA, textB)

## End(Not run)

Food waste survey

Description

These data were collected after a survey on food waste, with participants describing their habits.

Usage

waste

Format

A data frame with 180 rows (participants) and 77 columns (questions).

Source

Héloïse BILLES and Amélie RATEAU, students at l'Institut Agro Rennes-Angers

Examples

## Not run: 
# Processing time is often longer than ten seconds
# because the function uses a large language model.

library(NaileR)
library(FactoMineR)
data(waste)
waste <- waste[-14]

res_mca_waste <- MCA(waste, quali.sup = c(1,2,50:76),
ncp = 35, level.ventil = 0.05, graph = FALSE)
res_hcpc_waste <- HCPC(res_mca_waste, nb.clust = 3, graph = FALSE)
don_clust_waste <- res_hcpc_waste$data.clust

intro_waste <- 'These data were collected
after a survey on food waste,
with participants describing their habits.'
intro_waste <- gsub('\n', ' ', intro_waste) |>
stringr::str_squish()

req_waste <- 'Please summarize the characteristics of each group.
Then, give each group a new name, based on your conclusions.
Finally, give each group a grade between 0 and 10,
based on how wasteful they are with food:
0 being "not at all", 10 being "absolutely".'
req_waste <- gsub('\n', ' ', req_waste) |>
stringr::str_squish()

res_waste <- nail_catdes(don_clust_waste,
num.var = ncol(don_clust_waste),
introduction = intro_waste, request = req_waste,
drop.negative = TRUE)
cat(res_waste$response)

## End(Not run)