Package 'robustSFA' reference manual

Title:	Robust Estimation of Stochastic Frontier Models with MDPDE
Description:	This provides a robust estimator for stochastic frontier models, employing the Minimum Density Power Divergence Estimator (MDPDE) for enhanced robustness against outliers. Additionally, it includes a function to recommend the optimal tuning parameter, alpha, which controls the robustness of the MDPDE. The methods implemented in this package are based on Song et al. (2017) <doi:10.1016/j.csda.2016.08.005>.
Authors:	Junmo Song [aut, cre], Dong-hyun Oh [aut]
Maintainer:	Junmo Song <[email protected]>
License:	GPL-3
Version:	0.2.0
Built:	2026-05-19 07:23:38 UTC
Source:	https://github.com/cran/robustSFA

Bootstrap Test for Comparing Estimates in Stochastic Frontier Models

Description

This function performs a bootstrap test to assess the closeness of two MDPD estimates at different values of $\alpha$ . This test is based on the observation that outliers particularly affect the estimation of $\sigma^2$ and $\lambda$ . Accordingly, the bootstrap test is conducted using the following similarity measure:

$sim(\hat\theta_0,\hat\theta_1)= \frac{|\hat \sigma^2_1-\hat\sigma^2_0|}{\hat\sigma^2_0} +\frac{|\hat\lambda_1-\hat\lambda_0|}{\hat\lambda_0},$

where $\hat\theta_0$ and $\hat\theta_1$ are the MDPD estimates corresponding to $\alpha = \alpha_0$ and $\alpha = \alpha_1$ , respectively. If the two MDPD estimates are close, $sim(\hat\theta_0,\hat\theta_1)$ will be close to zero. We note that this similarity measure differs from the one used in Song et al. (2017). Apart from this, the bootstrap procedure follows the same steps described in Song et al. (2017). A low p-value indicates that the two estimates are significantly different. Note that this test may require significant computational time, as it involves numerous estimation procedures.

Usage

bootstrap_test(formula, data = NULL, alpha0, alpha1, B = 99)
bootstrap_test(formula, data = NULL, alpha0, alpha1, B = 99)

Arguments

formula

A symbolic description of the model to be estimated, specified using the standard R formula syntax (e.g., y ~ x1 + x2).

data

A data frame containing the variables in the model.

alpha0

First value of $\alpha$ . The bootstrap samples are generated using the MDPD estimates with $\alpha=\alpha_0$ .

alpha1

Second value of $\alpha$ .

B

A numeric value specifying the number of bootstrap replications. The default is 99.

Value

A numeric. p-value of the bootstrap test.

Examples


## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)


## Evaluate the closeness of ML estimates (alpha = 0) and
## MDPD estimates with alpha = 0.5.
bootstrap_test(my.model, data = riceProdPhil, alpha0=0.5, alpha1=0)


## Data with a single outlying observation
riceProdPhil2 <- riceProdPhil
riceProdPhil3 <- riceProdPhil

idx <- which.max(riceProdPhil$PROD)
riceProdPhil2$PROD[idx] <- riceProdPhil$PROD[idx]*10
riceProdPhil3$PROD[idx] <- riceProdPhil$PROD[idx]/100


## Evaluate the closeness of ML estimates (alpha = 0) and
## MDPD estimates with alpha = 0.5.
bootstrap_test( my.model, data = riceProdPhil2, alpha0=0.5, alpha1=0)
bootstrap_test( my.model, data = riceProdPhil3, alpha0=0.5, alpha1=0)

## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)


## Evaluate the closeness of ML estimates (alpha = 0) and
## MDPD estimates with alpha = 0.5.
bootstrap_test(my.model, data = riceProdPhil, alpha0=0.5, alpha1=0)


## Data with a single outlying observation
riceProdPhil2 <- riceProdPhil
riceProdPhil3 <- riceProdPhil

idx <- which.max(riceProdPhil$PROD)
riceProdPhil2$PROD[idx] <- riceProdPhil$PROD[idx]*10
riceProdPhil3$PROD[idx] <- riceProdPhil$PROD[idx]/100


## Evaluate the closeness of ML estimates (alpha = 0) and
## MDPD estimates with alpha = 0.5.
bootstrap_test( my.model, data = riceProdPhil2, alpha0=0.5, alpha1=0)
bootstrap_test( my.model, data = riceProdPhil3, alpha0=0.5, alpha1=0)

Coefficients Method for Class `rsfa`

Description

This function extracts the estimates from the object of class rsfa.

Usage

## S3 method for class 'rsfa'
coef(object, ...)
## S3 method for class 'rsfa'
coef(object, ...)

Arguments

object

An object of class rsfa.

...

Unused.

Value

A named vector of MDPD estimates.

Examples

## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
coef(fit.ml)
fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
coef(fit.mdpde)
## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
coef(fit.ml)
fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
coef(fit.mdpde)

Function for Estimating Technical Efficiencies for Class `rsfa`

Description

This function returns the individual efficiencies calculated from the MDPD or ML estimates. The efficiencies are calculated using the estimator of Battese and Coelli (1988).

Usage

efficiency(object, ...)
efficiency(object, ...)

Arguments

object

An object of class rsfa.

...

Unsed.

Value

A column vector of the estimated individual efficiencies.

References

Battese, G.E. and Coelli, T. (1988). Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data. Journal of Econometrics, 38, 387-399.

Examples

## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
efficiency(fit.ml)
fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
efficiency(fit.mdpde)
## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
efficiency(fit.ml)
fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
efficiency(fit.mdpde)

Selecting an Optimal $\alpha$ for MDPDE

Description

This function recommends an optimal $\alpha$ for performing MDPD estimation. The function repeatedly calls bootstrap_test and selects the optimal $\alpha$ from the set {0, 0.05, 0.1, ..., specified $\alpha$ }.

Usage

optimal_alpha(formula, data = NULL, base_alpha = 0.5, B = 99, s_level = 0.1)
optimal_alpha(formula, data = NULL, base_alpha = 0.5, B = 99, s_level = 0.1)

Arguments

formula

A symbolic description of the model to be estimated, specified using the standard R formula syntax (e.g., y ~ x1 + x2).

data

A data frame containing the variables in the model.

base_alpha

A numeric value. This selection procedure includes a bootstrap test. The bootstrap samples are generated using the MDPD estimates with $\alpha$ = base_alpha, and the procedure selects an optimal $\alpha$ from the set {0, 0.05, 0.1, ..., base_alpha}.

B

A numeric. This selecting procedure includes a bootstrap test, where B is the number of the bootstrap replications required for calculating a critical value. The default value is 99.

s_level

A numeric value representing the significance level for the bootstrap test. The default value is 0.1.

Value

A character string indicating the recommended optimal $\alpha$ for MDPD estimation, selected from the set {0, 0.05, 0.1, ...,base_alpha}.

Examples


## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

## It takes to time to get the result of the optimal_alpha() function.
my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
optimal_alpha(my.model, data = riceProdPhil, base_alpha=0.5)


## Data with a single outlying observation
riceProdPhil2 <- riceProdPhil
riceProdPhil3 <- riceProdPhil

idx <- which.max(riceProdPhil$PROD)
riceProdPhil2$PROD[idx] <- riceProdPhil$PROD[idx]*10
riceProdPhil3$PROD[idx] <- riceProdPhil$PROD[idx]/100

optimal_alpha(my.model, data = riceProdPhil2, base_alpha=0.5)
optimal_alpha(my.model, data = riceProdPhil3, base_alpha=0.5)

## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

## It takes to time to get the result of the optimal_alpha() function.
my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
optimal_alpha(my.model, data = riceProdPhil, base_alpha=0.5)


## Data with a single outlying observation
riceProdPhil2 <- riceProdPhil
riceProdPhil3 <- riceProdPhil

idx <- which.max(riceProdPhil$PROD)
riceProdPhil2$PROD[idx] <- riceProdPhil$PROD[idx]*10
riceProdPhil3$PROD[idx] <- riceProdPhil$PROD[idx]/100

optimal_alpha(my.model, data = riceProdPhil2, base_alpha=0.5)
optimal_alpha(my.model, data = riceProdPhil3, base_alpha=0.5)

Print Method for Class `rsfa`

Description

This function prints the results of the rsfa estimation.

Usage

## S3 method for class 'rsfa'
print(x, ...)
## S3 method for class 'rsfa'
print(x, ...)

Arguments

x

An object of class rsfa.

...

Unused.

Value

A named vector of MDPD estimates.

Examples

## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
print(fit.ml)
fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
print(fit.mdpde)
## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
print(fit.ml)
fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
print(fit.mdpde)

Robust Estimation of Stochastic Frontier Models

Description

This function performs the robust estimation for stochastic frontier models using the Minimum Density Power Divergence Estimator (MDPDE). The MDPDE is particularly useful when outliers in a dataset distort estimation results, especially regarding technical efficiencies. The parameter $\alpha$ in the MDPDE plays a crucial role in mitigating the impact of outliers when estimating coefficients and technical efficiencies. It actually controls the trade-off between robustness and efficiency of the MDPDE. The current estimation process is based on the following model:

$Y=\beta_0 +\beta_1 X_1 +\cdots +\beta_p X_p + V - U:=g(X,\beta)+V-U,$

where $V$ follows the normal distribution with mean 0 and variance $\sigma_v^2$ , and $U$ follows the half-normal distribution with variance $\sigma_u^2$ . The MDPDE with $\alpha$ for the model above is given as follows:

$\underset{\theta \in \Theta}{\operatorname{argmin}} \, \frac{1}{\sigma^\alpha} \bigg\{ \frac{\sqrt{2}}{\sqrt{\pi}} \int e^{-(1+\alpha)\frac{z^2}{2}}\Phi^{1+\alpha}(-z\lambda)dz$

$\hspace{5cm} - \Big(1+\frac{1}{\alpha}\Big)\frac{1}{n} \sum_{i=1}^n e^{-\alpha\frac{(Y_i-g(X_i,\beta))^2}{2\sigma^2}} \Phi^\alpha\Big(-\frac{Y_i-g(X_i,\beta)}{\sigma}\lambda\Big) \bigg\},$

where $\theta=(\beta_0, \beta_1, \cdots, \beta_p, \sigma^2, \lambda)$ with $\sigma^2=\sigma^2_v+\sigma^2_u$ and $\lambda=\sigma_u/\sigma_v$ . For more details, refer to Song et al. (2017).

Usage

rsfa(formula, data = NULL, alpha = 0, se = TRUE)
rsfa(formula, data = NULL, alpha = 0, se = TRUE)

Arguments

formula

A symbolic description of the model to be estimated, specified using the standard R formula syntax (e.g., y ~ x1 + x2).

data

A data frame containing the variables in the model.

alpha

A numeric value controlling the robustness of the MDPDE. When alpha is zero (the default), 'rsfa' returns the ML estimates. It is not necessary for $\alpha$ to be large. Values of $\alpha$ close to zero are known to be efficient and sufficiently robust. The usually recommended range of $\alpha$ is 0.1 to 0.3. A function for selecting the optimal $\alpha$ is also provided in this package. For this purpose, see the optimal_alpha() function.

se

Logical. If TRUE (the default), standard errors are presented.

Value

A list containing robust estimation results: estimates, standard errors, and residuals, as well as technical efficiencies.

call

The matched call.

coefficients

A named vector of parameter estimates.

se

A vector of estimated standard errors.

residuals

A vector of residuals.

efficiencies

A vector of estimated technical efficiencies calculated using the estimator of Battese and Coelli (1988).

sigma2_V

Numeric. Estimated value of $\sigma^2_V$ .

sigma2_U

Numeric. Estimated value of $\sigma^2_U$ .

alpha

Numeric. $\alpha$ used in the MDPDE.

References

Battese, G.E. and Coelli, T. (1988). Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data. Journal of Econometrics, 38, 387-399.

Coelli, T. and Henningsen, A. (2020). Stochastic Frontier Analysis. R package version 1.1-8.

Song, J., Oh, D-h., and Kang, J. (2017). Robust estimation in stochastic frontier models. Computational Statistics & Data Analysis, 105, 243-267.

Examples


## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)
summary(riceProdPhil)


## ML estimates and MDPD estimates with alpha=0.1
my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
summary(fit.ml)

fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
summary(fit.mdpde)


## Comparison of the technical efficiencies from MLE and MDPDE
eff.ml<-efficiency(fit.ml)
eff.mdpde<-efficiency(fit.mdpde)
plot(eff.ml, eff.mdpde, pch=20,
     xlab="efficiency from MLE", ylab="efficiency from MDPDE")
abline(0,1, lty=2)


## Data with a single outlying observation
riceProdPhil2 <- riceProdPhil
riceProdPhil3 <- riceProdPhil
idx <- which.max(riceProdPhil$PROD)
riceProdPhil2$PROD[idx] <- riceProdPhil$PROD[idx]*10
riceProdPhil3$PROD[idx] <- riceProdPhil$PROD[idx]/100


## The technical efficiencies for "riceProdPhil2" data
fit.ml2<- rsfa(my.model, data = riceProdPhil2)
eff.ml2 <- efficiency(fit.ml2)
fit.mdpde2<- rsfa(my.model, data = riceProdPhil2, alpha = 0.1)
eff.mdpde2 <- efficiency(fit.mdpde2)

plot(eff.ml, eff.ml2, xlim = c(0, 1), ylim = c(0, 1), col="red", pch=20,
     xlab = "efficiency from MLE (riceProdPhil)",
     ylab = "efficiency (riceProdPhil2)")
points(eff.ml, eff.mdpde2, col="blue", pch=20)
abline(0,1,lty=2)
legend("topleft", legend = c("MLE", "MDPDE"), col = c("red", "blue"), pch = 20, bty="n")     
title(main =
          "Efficiency comparison: MLE(no outlier) vs MLE and MPDPE (one over-performing producer)")


## The technical efficiencies for "riceProdPhil3" data
fit.ml3<- rsfa(my.model, data = riceProdPhil3)
eff.ml3 <- efficiency(fit.ml3)
fit.mdpde3<- rsfa(my.model, data = riceProdPhil3, alpha = 0.1)
eff.mdpde3 <- efficiency(fit.mdpde3)

plot(eff.ml, eff.ml3, xlim = c(0, 1), ylim = c(0, 1), col="red", pch=20,
     xlab = "efficiency from MLE (riceProdPhil)",
     ylab = "efficiency (riceProdPhil3)")
points(eff.ml, eff.mdpde3, col="blue", pch=20)
abline(0,1,lty=2)
legend("topleft", legend = c("MLE", "MDPDE"), col = c("red", "blue"), pch = 20, bty="n")     
title(main =
          "Efficiency comparison: MLE(no outlier) vs MLE and MPDPE (one under-performing producer)")
          
## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)
summary(riceProdPhil)


## ML estimates and MDPD estimates with alpha=0.1
my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
summary(fit.ml)

fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
summary(fit.mdpde)


## Comparison of the technical efficiencies from MLE and MDPDE
eff.ml<-efficiency(fit.ml)
eff.mdpde<-efficiency(fit.mdpde)
plot(eff.ml, eff.mdpde, pch=20,
     xlab="efficiency from MLE", ylab="efficiency from MDPDE")
abline(0,1, lty=2)


## Data with a single outlying observation
riceProdPhil2 <- riceProdPhil
riceProdPhil3 <- riceProdPhil
idx <- which.max(riceProdPhil$PROD)
riceProdPhil2$PROD[idx] <- riceProdPhil$PROD[idx]*10
riceProdPhil3$PROD[idx] <- riceProdPhil$PROD[idx]/100


## The technical efficiencies for "riceProdPhil2" data
fit.ml2<- rsfa(my.model, data = riceProdPhil2)
eff.ml2 <- efficiency(fit.ml2)
fit.mdpde2<- rsfa(my.model, data = riceProdPhil2, alpha = 0.1)
eff.mdpde2 <- efficiency(fit.mdpde2)

plot(eff.ml, eff.ml2, xlim = c(0, 1), ylim = c(0, 1), col="red", pch=20,
     xlab = "efficiency from MLE (riceProdPhil)",
     ylab = "efficiency (riceProdPhil2)")
points(eff.ml, eff.mdpde2, col="blue", pch=20)
abline(0,1,lty=2)
legend("topleft", legend = c("MLE", "MDPDE"), col = c("red", "blue"), pch = 20, bty="n")     
title(main =
          "Efficiency comparison: MLE(no outlier) vs MLE and MPDPE (one over-performing producer)")


## The technical efficiencies for "riceProdPhil3" data
fit.ml3<- rsfa(my.model, data = riceProdPhil3)
eff.ml3 <- efficiency(fit.ml3)
fit.mdpde3<- rsfa(my.model, data = riceProdPhil3, alpha = 0.1)
eff.mdpde3 <- efficiency(fit.mdpde3)

plot(eff.ml, eff.ml3, xlim = c(0, 1), ylim = c(0, 1), col="red", pch=20,
     xlab = "efficiency from MLE (riceProdPhil)",
     ylab = "efficiency (riceProdPhil3)")
points(eff.ml, eff.mdpde3, col="blue", pch=20)
abline(0,1,lty=2)
legend("topleft", legend = c("MLE", "MDPDE"), col = c("red", "blue"), pch = 20, bty="n")     
title(main =
          "Efficiency comparison: MLE(no outlier) vs MLE and MPDPE (one under-performing producer)")

Summary Method for Class `rsfa`

Description

This function provides a summary of the MDPD estimation, including coefficient estimates, standard errors, z-values, p-values, and significance codes.

Usage

## S3 method for class 'rsfa'
summary(object, ...)
## S3 method for class 'rsfa'
summary(object, ...)

Arguments

object

An object of class rsfa.

...

Unused.

Value

A summary table with estimates, standard errors, z-values, p-values, and significance codes.

Examples

## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
summary(fit.ml)
fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
summary(fit.mdpde)
## Example using the 'riceProdPhil' dataset from the `frontier` package
library(frontier)
data(riceProdPhil)

my.model <- log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) + log(OTHER)
fit.ml <- rsfa(my.model, data = riceProdPhil)
summary(fit.ml)
fit.mdpde<- rsfa(my.model, data = riceProdPhil, alpha = 0.1)
summary(fit.mdpde)

Package 'robustSFA'

Help Index

Bootstrap Test for Comparing Estimates in Stochastic Frontier Models

Description

Usage

Arguments

Value

Examples

Coefficients Method for Class rsfa

Description

Usage

Arguments

Value

Examples

Function for Estimating Technical Efficiencies for Class rsfa

Description

Usage

Arguments

Value

References

Examples

Selecting an Optimal α\alphaα for MDPDE

Description

Usage

Arguments

Value

Examples

Print Method for Class rsfa

Description

Usage

Arguments

Value

Examples

Robust Estimation of Stochastic Frontier Models

Description

Usage

Arguments

Value

References

Examples

Summary Method for Class rsfa

Description

Usage

Arguments

Value

Examples

Coefficients Method for Class `rsfa`

Function for Estimating Technical Efficiencies for Class `rsfa`

Selecting an Optimal $\alpha$ for MDPDE

Print Method for Class `rsfa`

Summary Method for Class `rsfa`