| Title: | Beta Factor Model |
|---|---|
| Description: | Provides tools for factor analysis in financial and econometric settings under Beta factor models. It includes functions to simulate factor-model data with Beta-distributed idiosyncratic components (e.g., standard Beta, scaled Beta, and truncated Beta distributions) and to conduct model diagnostic assessments such as likelihood ratio tests for factor number selection and goodness-of-fit tests for Beta distribution assumptions. Estimation routines encompass maximum likelihood estimation for finite-dimensional Beta factor models, regularized Beta factor analysis for high-dimensional datasets, and shrinkage-based estimation for robust Beta factor loading recovery in noisy or incomplete data environments. The package's methodological framework is detailed in Guo G. (2023) <doi:10.1007/s00180-022-01270-z>. |
| Authors: | Guangbao Guo [aut, cre], Jiahui Feng [aut] |
| Maintainer: | Guangbao Guo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.11 |
| Built: | 2026-05-18 21:14:54 UTC |
| Source: | https://github.com/cran/BFM |
A county-level monthly alcohol use dataset from California students (grades 7-11, 2008-2010).
The response variable Percentage is a proportion (0 < Percentage < 1), suitable for zero-inflated beta regression.
AlcoholUseAlcoholUse
A data frame with multiple rows and variables:
numeric: percentage of students who drank alcohol
factor: student grade level
factor: student gender
numeric: mid-point of days bucket
numeric: days bucket
factor: county identifier
A data frame with 44 rows and 4 variables:
numeric: proportion of correct responses in a reading task
numeric: transformed accuracy measure
factor: dyslexia status (levels: "yes", "no")
numeric: IQ score
http://www.kidsdata.org Reading Skills Data
A dataset from Smithson and Verkuilen (2006) on reading accuracy, dyslexia status, and IQ scores.
The response variable accuracy is a proportion (0 < accuracy < 1), suitable for beta regression.
Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. https://psycnet.apa.org/doi/10.1037/1082-989X.11.1.54
data(AlcoholUse) str(AlcoholUse)data(AlcoholUse) str(AlcoholUse)
The function supports various distribution types for generating the data.
BFM(n, p, m, mub, phib, distribution_type)BFM(n, p, m, mub, phib, distribution_type)
n |
Sample size. |
p |
Sample dimensionality. |
m |
Number of factors. |
mub |
Mean parameter for Beta distribution (numeric vector or scalar, 0 < mub < 1). |
phib |
Precision parameter for Beta distribution (positive numeric vector or scalar). |
distribution_type |
Type of Beta distribution. |
A list containing:
data |
Generated BFM data matrix (n rows, p columns). |
A |
A matrix representing the factor loadings. |
D |
Diagonal matrix of unique variances. |
kmo |
Kaiser-Meyer-Olkin sampling adequacy measure. |
bartlett |
Bartlett's test of sphericity. |
n <- 1000 p <- 10 m <- 5 mub <- runif(p, 0.2, 0.8) phib <- runif(p, 5, 30) dist_type <- "Elliptical Distribution" X <- BFM(n, p, m, mub, phib, dist_type)n <- 1000 p <- 10 m <- 5 mub <- runif(p, 0.2, 0.8) phib <- runif(p, 5, 30) dist_type <- "Elliptical Distribution" X <- BFM(n, p, m, mub, phib, dist_type)
This function calculates the Mean Squared Error (MSE) and relative error for factor loadings and uniqueness estimates.
calculate_errors(data, A, D, estimation_results)calculate_errors(data, A, D, estimation_results)
data |
Matrix of BFM data. |
A |
Matrix of true factor loadings. |
D |
Matrix of true uniquenesses (diagonal matrix). |
estimation_results |
A list containing |
A named vector containing:
MSEA |
Mean Squared Error for factor loadings. |
MSED |
Mean Squared Error for uniqueness estimates. |
LSA |
Relative error for factor loadings. |
LSD |
Relative error for uniqueness estimates. |
set.seed(123) n <- 10 p <- 5 A <- matrix(runif(p * p, -1, 1), nrow = p) D <- diag(runif(p, 1, 2)) data <- matrix(runif(n * p), nrow = n) estimation_results <- list(A_hat = A, D_hat = D) errors <- calculate_errors(data, A, D, estimation_results) print(errors)set.seed(123) n <- 10 p <- 5 A <- matrix(runif(p * p, -1, 1), nrow = p) D <- diag(runif(p, 1, 2)) data <- matrix(runif(n * p), nrow = n) estimation_results <- list(A_hat = A, D_hat = D) errors <- calculate_errors(data, A, D, estimation_results) print(errors)
A dataset from Griffiths, Hill, and Judge (1993) on household food expenditure, income, and household size.
The response variable food is a proportion (0 < food < 1), suitable for beta regression.
FoodExpenditureFoodExpenditure
A data frame with 38 rows and 3 variables:
numeric: proportion of household income spent on food
numeric: household income (in thousands of dollars)
numeric: number of persons living in the household
Griffiths, W. E., Hill, R. C., & Judge, G. G. (1993). Learning and Practicing Econometrics. Wiley.
data(FoodExpenditure) str(FoodExpenditure)data(FoodExpenditure) str(FoodExpenditure)
A dataset containing 32 observations on gasoline yield under different experimental conditions.
The response variable yield is a proportion (0 < yield < 1), making it suitable for beta regression.
GasolineYieldGasolineYield
A data frame with 32 rows and 6 variables:
numeric: proportion of crude oil converted to gasoline
factor: 10 unique batches of crude oil
numeric: temperature (Fahrenheit)
numeric: crude oil gravity
numeric: pressure
numeric: temperature (scaled)
Prater (1956), as cited in Ferrari and Cribari-Neto (2004) Beta Regression for Modelling Rates and Proportions https://www.jstor.org/stable/4110074
data(GasolineYield, package = "betareg") str(GasolineYield)data(GasolineYield, package = "betareg") str(GasolineYield)
A dataset from Smithson and Verkuilen (2006) on reading accuracy, dyslexia status, and IQ scores.
The response variable accuracy is a proportion (0 < accuracy < 1), suitable for beta regression.
ReadingSkillsReadingSkills
A data frame with 44 rows and 4 variables:
numeric: proportion of correct responses in a reading task
numeric: transformed accuracy measure
factor: dyslexia status (levels: "yes", "no")
numeric: IQ score
Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. https://psycnet.apa.org/doi/10.1037/1082-989X.11.1.54
data(ReadingSkills) str(ReadingSkills)data(ReadingSkills) str(ReadingSkills)