--- title: "Using latent2likert" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using latent2likert} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction In social sciences, variables of interest are often conceptualized as latent variables — hidden continuous variables measured through Likert scale questions, typically categorized as Strongly disagree, Disagree, Neutral, Agree, and Strongly agree. Researchers frequently aim to uncover these latent variables using various statistical techniques. Accurate modeling of survey data is essential for comparative analysis through simulation. The **latent2likert** package addresses this need by providing an effective algorithm to simulate Likert response variables from hypothetical latent variables. This vignette provides two practical workflow examples demonstrating the use of the **latent2likert** package. ## Simulating Survey Data The following hypothetical survey simulation is loosely based on an actual comparative study on teaching and learning R in a pair of introductory statistics labs [@mcnamara2024teaching]. Imagine a situation where 10 participants from Course A and 20 participants from Course B have completed the survey. Suppose the initial question was: > "How would you rate your experience with the course?" with four possible answers: > Poor, Fair, Good, and Excellent. Let's assume that the participants in Course A were neutral regarding the question, while participants in Course B had a more positive experience on average. By choosing appropriate parameters for the latent distributions and setting the number of categories n_levels = 4, we can generate hypothetical responses (standard deviation sd = 1 and skewness skew = 0, by default): ```{r} library(latent2likert) # Load the package set.seed(12345) # Ensure reproducible results # Generate responses for Course A and Course B responses_A <- rlikert(size = 10, n_items = 1, n_levels = 4, mean = 0, sd = 1) responses_B <- rlikert(size = 20, n_items = 1, n_levels = 4, mean = 1, sd = 1) ``` To summarize the results, create a data frame from all responses: ```{r} n_levels <- 4 n_groups <- 2 categories <- c("Poor", "Fair", "Good", "Excellent") # Create a data frame to summarize the responses response_data <- data.frame( Course = rep(c("A", "B"), each = n_levels), Response = factor(rep(categories, n_groups), levels = categories), Proportion = c( response_prop(responses_A, n_levels), response_prop(responses_B, n_levels) ) ) # Filter out rows with zero proportions response_data <- response_data[response_data$Proportion > 0, ] response_data ``` The results can then be visualized using a grouped bar chart:

## Pre and Post Comparison Now suppose that the survey also asked the participants to rate their skills on a 5-point Likert scale, ranging from 1 (very poor) to 5 (very good) in: * Programming, * Searching Online, * Solving Problems. The survey was completed by the participants both before and after taking the course for a pre and post-comparison. Suppose that participants' assessments of: * Programming skills on average increased, * Searching Online stayed about the same, * Solving Problems increased in Course A, but decreased for participants in Course B. Let's simulate the survey data for this scenario: ```{r} # Pre- and post-assessments of skills for Course A pre_A <- rlikert(size = 10, n_items = 3, n_levels = 5, mean = c(-1, 0, 1)) post_A <- rlikert(size = 10, n_items = 3, n_levels = 5, mean = c(0, 0, 2)) # Pre- and post-assessments of skills for Course B pre_B <- rlikert(size = 20, n_items = 3, n_levels = 5, mean = c(-1, 0, 1)) post_B <- rlikert(size = 20, n_items = 3, n_levels = 5, mean = c(0, 0, 0)) ``` Create a data frame from all responses to summarize the results: ```{r} # Combine pre and post responses into a list pre_post <- list(pre_A, post_A, pre_B, post_B) # Number of items and response levels n_items <- 3 n_levels <- 5 # Define skills assessed skills <- c("Programming", "Searching online", "Solving problems") # Generate repeated skill labels for questions questions <- rep(rep(skills, each = n_levels), 4) questions <- factor(questions, levels = skills) # Create a data frame to summarize the responses response_data <- data.frame( Course = rep(c("Course A", "Course B"), each = 2 * n_items * n_levels), Question = questions, Time = as.factor(rep(c( rep("before", n_items * n_levels), rep("after", n_items * n_levels) ), 2)), Response = rep(seq_len(n_levels), 2 * n_items * 2), Proportion = as.vector(sapply(pre_post, function(d) { as.vector(t(response_prop(d, n_levels))) })) ) head(response_data) ``` And visualize the results with a stacked bar chart:

## Recreating Scale Scores We will use part of the [bfi](https://search.r-project.org/CRAN/refmans/psych/html/bfi.html) data set from [@psych]. Specifically, we'll focus on the first 5 items corresponding to agreeableness. To investigate the differences in agreeableness between men and women, we'll also use the gender attribute. Load the data: ```{r} data(part_bfi) head(part_bfi) ``` Separate the items into two groups according to their gender: ```{r} vars <- c("A1", "A2", "A3", "A4", "A5") items_male <- part_bfi[part_bfi$gender == 0, vars] items_female <- part_bfi[part_bfi$gender == 1, vars] ``` Estimate the parameters of the latent variables, assuming they are normal and providing the number of possible response categories `n_levels = 6`: ```{r} params_male <- estimate_params(data = items_male, n_levels = 6) params_female <- estimate_params(data = items_female, n_levels = 6) ``` Generate new responses to the items using the estimated parameters and estimated correlations: ```{r} set.seed(12345) # Ensure reproducible results new_items_male <- rlikert( size = nrow(items_male), n_items = 5, n_levels = 6, mean = params_male["mean", ], sd = params_male["sd", ], corr = cor(items_male) ) new_items_female <- rlikert( size = nrow(items_female), n_items = 5, n_levels = 6, mean = params_female["mean", ], sd = params_female["sd", ], corr = cor(items_female) ) ``` Create agreeableness scale scores for both groups of participants by taking the average of these 5 items: ```{r} # Combine new items and gender in new data frame new_data <- data.frame(rbind(new_items_male, new_items_female)) new_data$gender <- c(rep(0, nrow(items_male)), rep(1, nrow(items_female))) head(new_data) # Reverse the first item because it has negative correlations part_bfi$A1 <- (min(part_bfi$A1) + max(part_bfi$A1)) - part_bfi$A1 new_data$Y1 <- (min(new_data$Y1) + max(new_data$Y1)) - new_data$Y1 # Create agreeableness scale scores part_bfi$agreeable <- rowMeans(part_bfi[, vars]) new_data$agreeable <- rowMeans(new_data[, c("Y1", "Y2", "Y3", "Y4", "Y5")]) ``` The results can be visualized with a grouped boxplot:

## References