Package 'latent2likert'

Title: Converting Latent Variables into Likert Scale Responses
Description: Effectively simulates the discretization process inherent to Likert scales while minimizing distortion. It converts continuous latent variables into ordinal categories to generate Likert scale item responses. Particularly useful for accurately modeling and analyzing survey data that use Likert scales, especially when applying statistical techniques that require metric data.
Authors: Marko Lalovic [aut, cre]
Maintainer: Marko Lalovic <[email protected]>
License: MIT + file LICENSE
Version: 1.2.1
Built: 2024-09-23 06:16:33 UTC
Source: CRAN

Help Index


Discretize Density

Description

Transforms the density function of a continuous random variable into a discrete probability distribution with minimal distortion using the Lloyd-Max algorithm.

Usage

discretize_density(density_fn, n_levels, eps = 1e-06)

Arguments

density_fn

probability density function.

n_levels

cardinality of the set of all possible outcomes.

eps

convergence threshold for the algorithm.

Details

The function addresses the problem of transforming a continuous random variable XX into a discrete random variable YY with minimal distortion. Distortion is measured as mean-squared error (MSE):

E[(XY)2]=k=1Kxk1xkfX(x)(xrk)2dx\text{E}\left[ (X - Y)^2 \right] = \sum_{k=1}^{K} \int_{x_{k-1}}^{x_{k}} f_{X}(x) \left( x - r_{k} \right)^2 \, dx

where:

fXf_{X}

is the probability density function of XX,

KK

is the number of possible outcomes of YY,

xkx_{k}

are endpoints of intervals that partition the domain of XX,

rkr_{k}

are representation points of the intervals.

This problem is solved using the following iterative procedure:

1.1.

Start with an arbitrary initial set of representation points: r1<r2<<rKr_{1} < r_{2} < \dots < r_{K}.

2.2.

Repeat the following steps until the improvement in MSE falls below given ε\varepsilon.

3.3.

Calculate endpoints as xk=(rk+1+rk)/2x_{k} = (r_{k+1} + r_{k})/2 for each k=1,,K1k = 1, \dots, K-1 and set x0x_{0} and xKx_{K} to -\infty and \infty, respectively.

4.4.

Update representation points by setting rkr_{k} equal to the conditional mean of XX given X(xk1,xk)X \in (x_{k-1}, x_{k}) for each k=1,,Kk = 1, \dots, K.

With each execution of step (3)(3) and step (4)(4), the MSE decreases or remains the same. As MSE is nonnegative, it approaches a limit. The algorithm terminates when the improvement in MSE is less than a given ε>0\varepsilon > 0, ensuring convergence after a finite number of iterations.

This procedure is known as Lloyd-Max's algorithm, initially used for scalar quantization and closely related to the k-means algorithm. Local convergence has been proven for log-concave density functions by Kieffer. Many common probability distributions are log-concave including the normal and skew normal distribution, as shown by Azzalini.

Value

A list containing:

prob

discrete probability distribution.

endp

endpoints of intervals that partition the continuous domain.

repr

representation points of the intervals.

dist

distortion measured as the mean-squared error (MSE).

References

Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178.

Kieffer, J. (1983). Uniqueness of locally optimal quantizer for log-concave density and convex error function. IEEE Transactions on Information Theory 29, 42–47.

Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28 (2), 129–137.

Examples

discretize_density(density_fn = stats::dnorm, n_levels = 5)
discretize_density(density_fn = function(x) {
  2 * stats::dnorm(x) * stats::pnorm(0.5 * x)
}, n_levels = 4)

Estimate Latent Parameters

Description

Estimates the location and scaling parameters of the latent variables from existing survey data.

Usage

estimate_params(data, n_levels, skew = 0)

Arguments

data

survey data with columns representing individual items. Apart from this, data can be of almost any class such as "data.frame" "matrix" or "array".

n_levels

number of response categories, a vector or a number.

skew

marginal skewness of latent variables, defaults to 0.

Details

The relationship between the continuous random variable XX and the discrete probability distribution pkp_k, for k=1,,Kk = 1, \dots, K, can be described by a system of non-linear equations:

pk=FX(xk1ξω)FX(xkξω)for k=1,,Kp_{k} = F_{X}\left( \frac{x_{k - 1} - \xi}{\omega} \right) - F_{X}\left( \frac{x_{k} - \xi}{\omega} \right) \quad \text{for} \ k = 1, \dots, K

where:

FXF_{X}

is the cumulative distribution function of XX,

KK

is the number of possible response categories,

xkx_{k}

are the endpoints defining the boundaries of the response categories,

pkp_{k}

is the probability of the kk-th response category,

ξ\xi

is the location parameter of XX,

ω\omega

is the scaling parameter of XX.

The endpoints xkx_{k} are calculated by discretizing a random variable ZZ with mean 0 and standard deviation 1 that follows the same distribution as XX. By solving the above system of non-linear equations iteratively, we can find the parameters that best fit the observed discrete probability distribution pkp_{k}.

The function estimate_params:

  • Computes the proportion table of the responses for each item.

  • Estimates the probabilities pkp_{k} for each item.

  • Computes the estimates of ξ\xi and ω\omega for each item.

  • Combines the estimated parameters for all items into a table.

Value

A table of estimated parameters for each latent variable.

See Also

discretize_density for details on calculating the endpoints, and part_bfi for example of the survey data.

Examples

data(part_bfi)
vars <- c("A1", "A2", "A3", "A4", "A5")
estimate_params(data = part_bfi[, vars], n_levels = 6)

Agreeableness and Gender Data

Description

This dataset is a cleaned up version of a small part of bfi dataset from psychTools package. It contains responses to the first 5 items of the agreeableness scale from the International Personality Item Pool (IPIP) and the gender attribute. It includes responses from 2800 subjects. Each item was answered on a six point Likert scale ranging from 1 (very inaccurate), to 6 (very accurate). Gender was coded as 0 for male and 1 for female. Missing values were addressed using mode imputation.

Usage

data(part_bfi)

Format

An object of class "data.frame" with 2800 observations on the following 6 variables:

A1

Am indifferent to the feelings of others.

A2

Inquire about others' well-being.

A3

Know how to comfort others.

A4

Love children.

A5

Make people feel at ease.

gender

Gender of the respondent.

Source

International Personality Item Pool (https://ipip.ori.org)

https://search.r-project.org/CRAN/refmans/psychTools/html/bfi.html

References

Revelle, W. (2024). Psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University. https://CRAN.R-project.org/package=psych

Examples

data(part_bfi)
head(part_bfi)

Plot Transformation

Description

Plots the densities of latent variables and the corresponding transformed discrete probability distributions.

Usage

plot_likert_transform(n_items, n_levels, mean = 0, sd = 1, skew = 0)

Arguments

n_items

number of Likert scale items (questions).

n_levels

number of response categories for each Likert item. Integer or vector of integers.

mean

means of the latent variables. Numeric or vector of numerics. Defaults to 0.

sd

standard deviations of the latent variables. Numeric or vector of numerics. Defaults to 1.

skew

marginal skewness of the latent variables. Numeric or vector of numerics. Defaults to 0.

Value

NULL. The function produces a plot.

Examples

plot_likert_transform(n_items = 3, n_levels = c(3, 4, 5))
plot_likert_transform(n_items = 3, n_levels = 5, mean = c(0, 1, 2))
plot_likert_transform(n_items = 3, n_levels = 5, sd = c(0.8, 1, 1.2))
plot_likert_transform(n_items = 3, n_levels = 5, skew = c(-0.5, 0, 0.5))

Calculate Response Proportions

Description

Returns a table of proportions for each possible response category.

Usage

response_prop(data, n_levels)

Arguments

data

numeric vector or matrix of responses.

n_levels

number of response categories.

Value

A table of response category proportions.

Examples

data <- c(1, 2, 2, 3, 3, 3)
response_prop(data, n_levels = 3)

data_matrix <- matrix(c(1, 2, 2, 3, 3, 3), ncol = 2)
response_prop(data_matrix, n_levels = 3)

Generate Random Responses

Description

Generates an array of random responses to Likert-type questions based on specified latent variables.

Usage

rlikert(size, n_items, n_levels, mean = 0, sd = 1, skew = 0, corr = 0)

Arguments

size

number of observations.

n_items

number of Likert scale items (number of questions).

n_levels

number of response categories for each item. Integer or vector of integers.

mean

means of the latent variables. Numeric or vector of numerics. Defaults to 0.

sd

standard deviations of the latent variables. Numeric or vector of numerics. Defaults to 1.

skew

marginal skewness of the latent variables. Numeric or vector of numerics. Defaults to 0.

corr

correlations between latent variables. Can be a single numeric value representing the same correlation for all pairs, or an actual correlation matrix. Defaults to 0.

Value

A matrix of random responses with dimensions size by n_items. The column names are Y1, Y2, ..., Yn where n is the number of items. Each entry in the matrix represents a Likert scale response, ranging from 1 to n_levels.

Examples

# Generate responses for a single item with 5 levels
rlikert(size = 10, n_items = 1, n_levels = 5)

# Generate responses for three items with different levels and parameters
rlikert(
  size = 10, n_items = 3, n_levels = c(4, 5, 6),
  mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = 0.5
)

# Generate responses with a correlation matrix
corr <- matrix(c(
  1.00, -0.63, -0.39,
  -0.63, 1.00, 0.41,
  -0.39, 0.41, 1.00
), nrow = 3)
data <- rlikert(
  size = 1000, n_items = 3, n_levels = c(4, 5, 6),
  mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = corr
)

Simulate Likert Scale Item Responses

Description

Simulates Likert scale item responses based on a specified number of response categories and the centered parameters of the latent variable.

Usage

simulate_likert(n_levels, cp)

Arguments

n_levels

number of response categories for the Likert scale item.

cp

centered parameters of the latent variable. Named vector including mean (mu), standard deviation (sd), and skewness (skew). Skewness must be between -0.95 and 0.95.

Details

The simulation process uses the following model detailed by Boari and Nai-Ruscone. Let XX be the continuous variable of interest, measured using Likert scale questions with KK response categories. The observed discrete variable YY is defined as follows:

Y=k, if   xk1<Xxk for   k=1,,KY = k, \quad \text{ if } \ \ x_{k - 1} < X \leq x_{k} \quad \text{ for } \ \ k = 1, \dots, K

where xkx_{k}, k=0,,Kk = 0, \dots, K are endpoints defined in the domain of XX such that:

=x0<x1<<xK1<xK=.-\infty = x_{0} < x_{1} < \dots < x_{K - 1} < x_{K} = \infty.

The endpoints dictate the transformation of the density fXf_{X} of XX into a discrete probability distribution:

Pr(Y=k)=xk1xkfX(x)dx for   k=1,,K.\text{Pr}(Y = k) = \int_{x_{k - 1}}^{x_{k}} f_{X}(x) \, dx \quad \text{ for } \ \ k = 1, \dots, K.

The continuous latent variable is modeled using a skew normal distribution. The function simulate_likert performs the following steps:

  • Ensures the centered parameters are within the acceptable range.

  • Converts the centered parameters to direct parameters.

  • Defines the density function for the skew normal distribution.

  • Computes the probabilities for each response category using optimal endpoints.

Value

A named vector of probabilities for each response category.

References

Boari, G. and Nai Ruscone, M. (2015). A procedure simulating Likert scale item responses. Electronic Journal of Applied Statistical Analysis 8(3), 288–297. doi:10.1285/i20705948v8n3p288

See Also

discretize_density for details on how to calculate the optimal endpoints.

Examples

cp <- c(mu = 0, sd = 1, skew = 0.5)
simulate_likert(n_levels = 5, cp = cp)
cp2 <- c(mu = 1, sd = 2, skew = -0.3)
simulate_likert(n_levels = 7, cp = cp2)