Title: | Nonparametric Boundary Regression |
---|---|
Description: | A variety of functions for the best known and most innovative approaches to nonparametric boundary estimation. The selected methods are concerned with empirical, smoothed, unrestricted as well as constrained fits under both separate and multiple shape constraints. They cover robust approaches to outliers as well as data envelopment techniques based on piecewise polynomials, splines, local linear fitting, extreme values and kernel smoothing. The package also seamlessly allows for Monte Carlo comparisons among these different estimation methods. Its use is illustrated via a number of empirical applications and simulated examples. |
Authors: | Abdelaati Daouia <[email protected]>, Thibault Laurent <[email protected]>, Hohsuk Noh <[email protected]> |
Maintainer: | Thibault Laurent <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.8 |
Built: | 2024-12-06 06:50:21 UTC |
Source: | CRAN |
The package npbr (Daouia et al., 2017) is the first free specialized software for data edge and frontier analysis in the statistical literature. It provides a variety of functions for the best known and most innovative approaches to nonparametric boundary estimation. The selected methods are concerned with empirical, smoothed, unrestricted as well as constrained fits under both separate and multiple shape constraints. They also cover data envelopment techniques as well as robust approaches to outliers. The routines included in npbr are user friendly and afford a large degree of flexibility in the estimation specifications. They provide smoothing parameter selection for the modern local linear and polynomial spline methods and for some promising extreme value techniques. Also, they seamlessly allow for Monte Carlo comparisons among the implemented estimation procedures. This package will be very useful for statisticians and applied researchers interested in employing nonparametric boundary regression models. Its use is illustrated with a number of empirical applications and simulated examples.
Suppose that we have pairs of observations
,
from a bivariate distribution with a density
in
. The support
of
is assumed to be of the form
where the graph of corresponds to the locus of the curve above which the density
is zero. We consider the estimation of the frontier function
based on the sample
in the general setting where the density
may have sudden jumps at the frontier,
decay to zero or rise up to infinity as it approaches its support boundary.
The overall objective of the present package is to provide a large variety of functions for the best known approaches to nonparametric boundary regression, including the vast class of methods employed in both Monte Carlo comparisons of Daouia et al. (2016) and Noh (2014) as well as other promising nonparametric devices, namely the extreme-value techniques of Gijbels and Peng (2000), Daouia et al. (2010) and Daouia et al. (2012). The various functions in the npbr package are summarized in the table below. We are not aware of any other existing set of statistical routines more adapted to data envelope fitting and robust frontier estimation. Only the classical nonsmooth FDH and DEA methods can be found in some available packages dedicated to the economic literature on measurements of the production performance of enterprises, such as the programs Benchmarking by Bogetoft and Otto (2011) and FEAR by Wilson (2008). Other contributions to the econometric literature on frontier analysis by Parmeter and Racine (2013) can be found at https://socialsciences.mcmaster.ca/racinej/Gallery/Home.html. The package npbr is actually the first free specialized software for the statistical literature on nonparametric frontier analysis. The routines included in npbr are user friendly and highly flexible in terms of estimation specifications. They allow the user to locate the boundary from data by making use of both empirical and smooth fits as well as (un)constrained estimates under single and multiple shape constraints. They also integrate smoothing parameter selection for the innovative methods based on local linear techniques, polynomial splines, extreme values and kernel smoothing, though the proposed selection procedures can be computationally demanding.
In addition, the package will be extremely useful for researchers and practitioners interested in employing nonparametric boundary regression methods. On one hand, such methods are very appealing because they rely on very few assumptions and benefit from their modeling flexibility, function approximation power and ability to detect the boundary structure of data without recourse to any a priori parametric restrictions on the shape of the frontier and/or the distribution of noise. On the other hand, the package offers R users and statisticians in this active area of research simple functions to compute the empirical mean integrated squared error, the empirical integrated squared bias and the empirical integrated variance of the implemented frontier estimators. This seamlessly allows the interested researcher to reproduce the Monte Carlo estimates obtained in the original articles and, perhaps most importantly, to easily compare the quality of any new proposal with the competitive existing methods.
Function | Description | Reference |
dea_est |
DEA, FDH | Farrell (1957) |
Deprins et al. (1984), | ||
and linearized FDH | Hall and Park (2002) | |
Jeong and Simar (2006) | ||
loc_est |
Local linear fitting | Hall et al. (1998), |
Hall and Park (2004) | ||
loc_est_bw |
Bandwidth choice | Hall and Park (2004) |
for local linear fitting | ||
poly_est |
Polynomial estimation | Hall et al. (1998) |
poly_degree |
Optimal polynomial | Daouia et al. (2015) |
degree selection | ||
dfs_momt |
Moment type estimation | Daouia et al. (2010), |
Dekkers et al. (1989) | ||
dfs_pick |
Pickands type estimation | Daouia et al. (2010), |
Dekkers and de Haan (1989) | ||
rho_momt_pick |
Conditional tail | Daouia et al. (2010), |
index estimation | Dekkers et al. (1989), | |
Dekkers and de Haan (1989) | ||
kopt_momt_pick |
Threshold selection for | Daouia et al. (2010) |
moment/Pickands frontiers | ||
dfs_pwm |
Nonparametric frontier | Daouia et al. (2012) |
regularization | ||
loc_max |
Local constant estimation | Gijbels and Peng (2000) |
pick_est |
Local extreme-value estimation | Gijbels and Peng (2000) |
quad_spline_est |
Quadratic spline fitting | Daouia et al. (2015) |
quad_spline_kn |
Knot selection for | Daouia et al. (2015) |
quadratic spline fitting | ||
cub_spline_est |
Cubic spline fitting | Daouia et al. (2015) |
cub_spline_kn |
Knot selection for | Daouia et al. (2015) |
cubic spline fitting | ||
kern_smooth |
Nonparametric kernel | Parmeter and Racine (2013), |
boundary regression | Noh (2014) | |
kern_smooth_bw |
Bandwidth choice for | Parmeter and Racine (2013), |
kernel boundary regression | Noh (2014) | |
Abdelaati Daouia <[email protected]>, Thibault Laurent <[email protected]>, Hohsuk Noh <[email protected]>
Maintainer: Thibault Laurent <[email protected]>
Daouia, A., Florens, J.-P. and Simar, L. (2010). Frontier estimation and extreme value theory. Bernoulli, 16, 1039-1063.
Daouia, A., Florens, J.-P. and Simar, L. (2012). Regularization of Nonparametric Frontier Estimators. Journal of Econometrics, 168, 285-299.
Daouia, A., Laurent, T. and Noh, H. (2017). npbr: A Package for Nonparametric Boundary Regression in R. Journal of Statistical Software, 79(9), 1-43. doi:10.18637/jss.v079.i09.
Daouia, A., Noh, H. and Park, B.U. (2016). Data Envelope fitting with constrained polynomial splines. Journal of the Royal Statistical Society: Series B, 78(1), 3-30. doi:10.1111/rssb.12098.
Dekkers, A.L.M. and L. de Haan (1989). On the estimation of extreme-value index and large quantiles estimation. Annals of Statistics, 17, 1795-1832.
Dekkers, A.L.M., Einmahl, J.H.J. and L. de Haan (1989). A moment estimator for the index of an extreme-value distribution. Annals of Statistics, 17, 1833-1855.
Deprins, D., Simar, L. and Tulkens H. (1984). Measuring labor efficiency in post offices, in: M. Marchand, P. Pestieau and H. Tulkens (Eds), The performance of Public Enterprises: Concepts and Measurements. North-Holland, Amsterdam, 243-267.
Farrell, M.J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society, Series A, 120, 253-281.
Gijbels, I. and Peng, L. (2000). Estimation of a support curve via order statistics. Extremes, 3, 251-277.
Hall, P., Park, B.U. and Stern, S.E. (1998). On polynomial estimators of frontiers and boundaries. Journal of Multivariate Analysis, 66, 71-98.
Hall, P. and Park, B.U. (2004). Bandwidth choice for local polynomial estimation of smooth boundaries. Journal of Multivariate Analysis, 91, 240-261.
Jeong, S.-O. and Simar, L. (2006). Linearly interpolated FDH efficiency score for nonconvex frontiers. Journal of Multivariate Analysis, 97, 2141-2161.
Noh, H. (2014). Frontier Estimation using Kernel Smoothing with Data Transformation. Journal of the Korean Statistical Society, 43, 503-512.
Parmeter, C. and Racine, J.S. (2013). Smooth Constrained Frontier Analysis. In Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis: Essays in Honor of Halbert L. White, Jr., Springer Verlag, (X. Chen and N.R. Swanson Eds), 463-488.
The dataset is concerned with the assessment of the efficiency of 37 European Air Controllers. The performance of each controller can be measured by its “distance” from the upper support boundary, or equivalently, the set of the most efficient controllers. This dataset is taken from Mouchart and Simar (2002). Here, the activity of the controllers is described by one input (an aggregate factor of different kind of labor) and one output (an aggregate factor of the activity produced, based on the number of controlled air movements, the number of controlled flight hours, etc.). See also Daouia, Florens and Simar (2008).
data(air)
data(air)
A data frame with 37 observations on the following 2 variables.
xtab
an input.
ytab
an output.
Daouia, A., Florens, J.-P. and Simar, L. (2008). Functional Convergence of Quantile-type Frontiers with Application to Parametric Approximations. Journal of Statistical Planning and Inference, 138, 708-725.
Mouchart, M. and L. Simar (2002). Efficiency analysis of Air Controllers: first insights, Consulting report 0202, Institut de Statistique, UCL, Belgium.
data("air")
data("air")
The function cub_spline_est is an implementation of the (un)constrained cubic spline estimates proposed by Daouia, Noh and Park (2016).
cub_spline_est(xtab, ytab, x, kn = ceiling((length(xtab))^(1/4)), method= "u", all.dea=FALSE, control = list("tm_limit" = 700))
cub_spline_est(xtab, ytab, x, kn = ceiling((length(xtab))^(1/4)), method= "u", all.dea=FALSE, control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
kn |
an integer specifying the number of inter-knot segments used in the computation of the spline estimate. |
method |
a character equal to "u" (unconstrained estimator), "m" (under the monotonicity constraint) or "mc" (under simultaneous monotonicity and concavity constraints). |
all.dea |
a boolean. |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
Let and
be, respectively, the minimum and maximum of the design points
.
Denote a partition of
by
(see below the selection process).
Let
and
be the vector of normalized B-splines of order 4 based on the knot mesh
(see Daouia et al. (2015)). The
unconstrained (option
method="u"
) cubic spline estimate of the frontier is then
defined by
, where
minimizes
over subject to the envelopment constraints
,
.
A simple way of choosing the knot mesh in this unconstrained setting is by considering the
th quantiles
of the distinct values of
for
.
The number of inter-knot segments
is obtained by calling the function
cub_spline_kn
(option method="u"
).
In what concerns the monotonicity constraint, we use the following suffcient condtion for the monotonicity:
.
This condition was previously used in Lu et al. (2007) and Pya and Wood (2015). Note that since the condition corresponds to linear constraints on , the estimator satisfying the monotonocity can be obtained via linear programming.
When the estimate is required to be both monotone and concave, we use the function cub_spline_est
with the option method="mc"
. Such estimate is obtained as the cubic spline function which minimizes the same linear objective function as the unconstrained estimate subject to the same linear envelopment constraints, the monotonicity constraint above and the additional linear concavity constraints , where the second derivative
is a linear spline. Regarding the choice of knots, we just apply the same scheme as for the unconstrained cubic spline estimate.
Returns a numeric vector with the same length as x
. Returns a vector of NA if no solution has been found by the solver (GLPK).
Hohsuk Noh
Daouia, A., Noh, H. and Park, B.U. (2016). Data Envelope fitting with constrained polynomial splines. Journal of the Royal Statistical Society: Series B, 78(1), 3-30. doi:10.1111/rssb.12098.
Lu, M., Zhang, Y. and Huang, J. (2007). Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika, 94, 705-718.
Pya, N. and Wood, S. N. (2015). Shape constrained additive models. Statistical Computing, to appear.
Schumaker, L.L. (2007). Spline Functions: Basic Theory, 3rd edition, Cambridge University Press.
data("air") x.air <- seq(min(air$xtab), max(air$xtab), length.out=101) # 1. Unconstrained cubic spline fits # Optimal number of inter-knot segments via the BIC criterion (kn.bic.air.u<-cub_spline_kn(air$xtab, air$ytab, method="u", type="BIC")) # Unconstrained spline estimate y.cub.air.u<-cub_spline_est(air$xtab, air$ytab, x.air, kn=kn.bic.air.u, method="u") # 2. Monotonicity constraint # Optimal number of inter-knot segments via the BIC criterion (kn.bic.air.m<-cub_spline_kn(air$xtab, air$ytab, method="m", type="BIC")) # Monotonic splines estimate y.cub.air.m<-cub_spline_est(air$xtab, air$ytab, x.air, kn=kn.bic.air.m, method="m") # 3. Monotonicity and Concavity constraints # Optimal number of inter-knot segments via the BIC criterion (kn.bic.air.mc<-cub_spline_kn(air$xtab, air$ytab, method="mc", type="BIC")) # Monotonic/Concave splines estimate y.cub.air.mc<-cub_spline_est(air$xtab, air$ytab, x.air, kn=kn.bic.air.mc, method="mc", all.dea=TRUE) # Representation plot(x.air, y.cub.air.u, lty=1, lwd=4, col="green", type="l", xlab="log(COST)", ylab="log(OUTPUT)") lines(x.air, y.cub.air.m, lty=2, lwd=4, col="cyan") lines(x.air, y.cub.air.mc, lty=3, lwd=4, col="magenta") points(ytab~xtab, data=air) legend("topleft", col=c("green","cyan","magenta"), lty=c(1,2,3), legend=c("unconstrained", "monotone", "monotone + concave"), lwd=4, cex=0.8)
data("air") x.air <- seq(min(air$xtab), max(air$xtab), length.out=101) # 1. Unconstrained cubic spline fits # Optimal number of inter-knot segments via the BIC criterion (kn.bic.air.u<-cub_spline_kn(air$xtab, air$ytab, method="u", type="BIC")) # Unconstrained spline estimate y.cub.air.u<-cub_spline_est(air$xtab, air$ytab, x.air, kn=kn.bic.air.u, method="u") # 2. Monotonicity constraint # Optimal number of inter-knot segments via the BIC criterion (kn.bic.air.m<-cub_spline_kn(air$xtab, air$ytab, method="m", type="BIC")) # Monotonic splines estimate y.cub.air.m<-cub_spline_est(air$xtab, air$ytab, x.air, kn=kn.bic.air.m, method="m") # 3. Monotonicity and Concavity constraints # Optimal number of inter-knot segments via the BIC criterion (kn.bic.air.mc<-cub_spline_kn(air$xtab, air$ytab, method="mc", type="BIC")) # Monotonic/Concave splines estimate y.cub.air.mc<-cub_spline_est(air$xtab, air$ytab, x.air, kn=kn.bic.air.mc, method="mc", all.dea=TRUE) # Representation plot(x.air, y.cub.air.u, lty=1, lwd=4, col="green", type="l", xlab="log(COST)", ylab="log(OUTPUT)") lines(x.air, y.cub.air.m, lty=2, lwd=4, col="cyan") lines(x.air, y.cub.air.mc, lty=3, lwd=4, col="magenta") points(ytab~xtab, data=air) legend("topleft", col=c("green","cyan","magenta"), lty=c(1,2,3), legend=c("unconstrained", "monotone", "monotone + concave"), lwd=4, cex=0.8)
Computes the optimal number of inter-knot segments for the (un)constrained cubic spline fit proposed by Daouia, Noh and Park (2016).
cub_spline_kn(xtab, ytab, method, krange = 1:20, type = "AIC", control = list("tm_limit" = 700))
cub_spline_kn(xtab, ytab, method, krange = 1:20, type = "AIC", control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
method |
a character equal to "u" (unconstrained estimator), "m" (under the monotonicity constraint) or "mc" (under simultaneous monotonicity and concavity constraints). |
krange |
a vector of integers specifying the range in which the optimal number of inter-knot segments is to be selected. |
type |
a character equal to "AIC" or "BIC". |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
The implementation of the unconstrained cubic spline smoother (see
cub_spline_est
)
is based on the knot mesh , with
being the
th quantile
of the distinct values of
for
.
Because the number of knots
determines the complexity of the spline approximation,
its choice may then be viewed as model selection through the minimization of the following two information criteria:
The first one (option type = "AIC"
) is similar to the famous Akaike information criterion (Akaike, 1973) and the second one
(option type = "BIC"
) to the Bayesian information criterion (Schwartz, 1978).
For the implementation of the concave cubic spline estimator, just apply the same scheme as for the unconstrained version.
Returns an integer.
Hohsuk Noh.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle, in Second International Symposium of Information Theory, eds. B. N. Petrov and F. Csaki, Budapest: Akademia Kiado, 267–281.
Daouia, A., Noh, H. and Park, B.U. (2016). Data Envelope fitting with constrained polynomial splines. Journal of the Royal Statistical Society: Series B, 78(1), 3-30. doi:10.1111/rssb.12098.
Schwartz, G. (1978). Estimating the dimension of a model, Annals of Statistics, 6, 461–464.
data("air") # a. Unconstrained cubic spline fits (kn.bic.air.u<-cub_spline_kn(air$xtab, air$ytab, method="u", type="BIC")) # b. Monotone cubic spline smoother (kn.bic.air.m<-cub_spline_kn(air$xtab, air$ytab, method="m", type="BIC")) # c. Monotone and Concave cubic spline smoother (kn.bic.air.mc<-cub_spline_kn(air$xtab, air$ytab, method="mc", type="BIC"))
data("air") # a. Unconstrained cubic spline fits (kn.bic.air.u<-cub_spline_kn(air$xtab, air$ytab, method="u", type="BIC")) # b. Monotone cubic spline smoother (kn.bic.air.m<-cub_spline_kn(air$xtab, air$ytab, method="m", type="BIC")) # c. Monotone and Concave cubic spline smoother (kn.bic.air.mc<-cub_spline_kn(air$xtab, air$ytab, method="mc", type="BIC"))
The function implements the empirical FDH (free disposal hull), LFDH (linearized FDH) and DEA (data envelopment analysis) frontier estimators.
dea_est(xtab, ytab, x, type = "dea")
dea_est(xtab, ytab, x, type = "dea")
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
type |
a character equal to "dea", "fdh" or "lfdh". |
There are mainly two usual frontier estimation methods for preserving monotonicity: the free disposal hull (FDH) introduced by Deprins et al. (1984) and the data envelopment analysis (DEA) initiated by Farrell (1957). The FDH boundary is the lowest “stair-case” monotone curve covering all the data points
An improved version of this estimator, referred to as the linearized FDH (LFDH), is obtained by drawing the polygonal line smoothing the staircase FDH curve.
It has been considered in Hall and Park (2002) and Jeong and Simar (2006).
When the joint support of data is in addition convex, the DEA estimator is defined as the least concave majorant of the FDH frontier
(see also Gijbels et al. (1999)). We employ the function DEA
from the package Benchmarking to implement the function dea_est
.
Returns a numeric vector with the same length as x
.
Hohsuk Noh.
Bogetoft, P. and Otto, L. (2011), Benchmarking with DEA, SFA and R, Springer-Verlag.
Deprins, D., Simar, L. and H. Tulkens (1984). Measuring labor efficiency in post offices, in The performance of Public Enterprises: Concepts and Measurements (M. Marchand, P. Pestieau and H. Tulkens Eds), North-Holland, Amsterdam, 243–267.
Farrell, M.J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society: Series A, 120, 253–281.
Gijbels, I., Mammen, E., Park, B.U. and Simar, L. (1999). On estimation of monotone and concave frontier functions, Journal of American Statistical Association, 94, 220–228.
Hall, P. and Park, B.U. (2002). New methods for bias correction at endpoints and boundaries, Annals of Statistics, 30, 1460-1479.
Jeong, S.-O. and Simar, L. (2006). Linearly interpolated FDH efficiency score for nonconvex frontiers, Journal of Multivariate Analysis, 97, 2141–2161.
quad_spline_est
, cub_spline_est
.
data("green") plot(OUTPUT~COST, data=green) x <- seq(min(green$COST), max(green$COST), length.out=1001) # We compute and represent the DEA, FDH and LFDH estimates lines(x, dea_est(green$COST, green$OUTPUT, x, type="dea"), lty=4, lwd=4, col="cyan") lines(x, dea_est(green$COST, green$OUTPUT, x, type="fdh"), lty=1, lwd=4, col="green") lines(x, dea_est(green$COST, green$OUTPUT, x, type="lfdh"), lty=2, lwd=4, col="magenta") legend("topleft",legend=c("dea","fdh","lfdh"), col=c("cyan","green","magenta"), lty=c(4,1,2), lwd=4)
data("green") plot(OUTPUT~COST, data=green) x <- seq(min(green$COST), max(green$COST), length.out=1001) # We compute and represent the DEA, FDH and LFDH estimates lines(x, dea_est(green$COST, green$OUTPUT, x, type="dea"), lty=4, lwd=4, col="cyan") lines(x, dea_est(green$COST, green$OUTPUT, x, type="fdh"), lty=1, lwd=4, col="green") lines(x, dea_est(green$COST, green$OUTPUT, x, type="lfdh"), lty=2, lwd=4, col="magenta") legend("topleft",legend=c("dea","fdh","lfdh"), col=c("cyan","green","magenta"), lty=c(4,1,2), lwd=4)
This function is an implementation of the moment-type estimator developed by Daouia, Florens and Simar (2010).
dfs_momt(xtab, ytab, x, rho, k, ci=TRUE)
dfs_momt(xtab, ytab, x, rho, k, ci=TRUE)
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
rho |
a numeric vector of the same length as |
k |
a numeric vector of the same length as |
ci |
a boolean, TRUE for computing the confidence interval. |
Combining ideas from Dekkers, Einmahl and de Haan (1989) with the dimensionless
transformation of
the observed sample
, the authors
estimate the conditional endpoint
by
where ,
are the ascending order statistics
corresponding to the transformed sample
and
is referred to as the extreme-value index and has the following interpretation:
when
, the joint density of data decays smoothly to zero at a speed of power
of the distance from the frontier;
when
, the density has sudden jumps at the frontier; when
, the density increases toward infinity at a speed of power
of the distance from the frontier.
Most of the contributions to econometric literature on frontier analysis assume that the joint density is strictly positive at its support boundary, or equivalently,
for all
.
When
is unknown, Daouia et al. (2010) suggest to use the following two-step estimator:
First, estimate
by the moment estimator
implemented in the function
rho_momt_pick
by utilizing the option method="moment"
,
or by the Pickands estimator by using the option
method="pickands"
.
Second, use the estimator , as if
were known, by substituting the estimated value
or
in place of
.
The
confidence interval of
derived from the asymptotic normality of
is given by
where .
The sample fraction
plays here the role of the smoothing parameter and varies between 1 and
, with
being the number of observations
with
. See
kopt_momt_pick
for an automatic data-driven rule for selecting .
Returns a numeric vector with the same length as x
.
As it is common in extreme-value theory, good results require a large sample size at each evaluation point
.
See also the note in
kopt_momt_pick
.
Abdelaati Daouia and Thibault Laurent (converted from Leopold Simar's Matlab code).
Daouia, A., Florens, J.P. and Simar, L. (2010). Frontier Estimation and Extreme Value Theory, Bernoulli, 16, 1039-1063.
Dekkers, A.L.M., Einmahl, J.H.J. and L. de Haan (1989), A moment estimator for the index of an extreme-value distribution, nnals of Statistics, 17, 1833-1855.
data("post") x.post <- seq(post$xinput[100], max(post$xinput), length.out = 100) # 1. When rho[x] is known and equal to 2, we set: rho <- 2 # To determine the sample fraction k=k[n](x) # in tilde(varphi[momt])(x). best_kn.1 <- kopt_momt_pick(post$xinput, post$yprod, x.post, rho = rho) # To compute the frontier estimates and confidence intervals: res.momt.1 <- dfs_momt(post$xinput, post$yprod, x.post, rho = rho, k = best_kn.1) # Representation plot(yprod~xinput, data = post, xlab = "Quantity of labor", ylab = "Volume of delivered mail") lines(x.post, res.momt.1[,1], lty = 1, col = "cyan") lines(x.post, res.momt.1[,2], lty = 3, col = "magenta") lines(x.post, res.momt.1[,3], lty = 3, col = "magenta") ## Not run: # 2. rho[x] is unknown and estimated by # the Pickands estimator tilde(rho[x]) rho_momt <- rho_momt_pick(post$xinput, post$yprod, x.post) best_kn.2 <- kopt_momt_pick(post$xinput, post$yprod, x.post, rho = rho_momt) res.momt.2 <- dfs_momt(post$xinput, post$yprod, x.post, rho = rho_momt, k = best_kn.2) # 3. rho[x] is unknown independent of x and estimated # by the (trimmed) mean of tilde(rho[x]) rho_trimmean <- mean(rho_momt, trim=0.00) best_kn.3 <- kopt_momt_pick(post$xinput, post$yprod, x.post, rho = rho_trimmean) res.momt.3 <- dfs_momt(post$xinput, post$yprod, x.post, rho = rho_trimmean, k = best_kn.3) # Representation plot(yprod~xinput, data = post, col = "grey", xlab = "Quantity of labor", ylab = "Volume of delivered mail") lines(x.post, res.momt.2[,1], lty = 1, lwd = 2, col = "cyan") lines(x.post, res.momt.2[,2], lty = 3, lwd = 4, col = "magenta") lines(x.post, res.momt.2[,3], lty = 3, lwd = 4, col = "magenta") plot(yprod~xinput, data = post, col = "grey", xlab = "Quantity of labor", ylab = "Volume of delivered mail") lines(x.post, res.momt.3[,1], lty = 1, lwd = 2, col = "cyan") lines(x.post, res.momt.3[,2], lty = 3, lwd = 4, col = "magenta") lines(x.post, res.momt.3[,3], lty = 3, lwd = 4, col = "magenta") ## End(Not run)
data("post") x.post <- seq(post$xinput[100], max(post$xinput), length.out = 100) # 1. When rho[x] is known and equal to 2, we set: rho <- 2 # To determine the sample fraction k=k[n](x) # in tilde(varphi[momt])(x). best_kn.1 <- kopt_momt_pick(post$xinput, post$yprod, x.post, rho = rho) # To compute the frontier estimates and confidence intervals: res.momt.1 <- dfs_momt(post$xinput, post$yprod, x.post, rho = rho, k = best_kn.1) # Representation plot(yprod~xinput, data = post, xlab = "Quantity of labor", ylab = "Volume of delivered mail") lines(x.post, res.momt.1[,1], lty = 1, col = "cyan") lines(x.post, res.momt.1[,2], lty = 3, col = "magenta") lines(x.post, res.momt.1[,3], lty = 3, col = "magenta") ## Not run: # 2. rho[x] is unknown and estimated by # the Pickands estimator tilde(rho[x]) rho_momt <- rho_momt_pick(post$xinput, post$yprod, x.post) best_kn.2 <- kopt_momt_pick(post$xinput, post$yprod, x.post, rho = rho_momt) res.momt.2 <- dfs_momt(post$xinput, post$yprod, x.post, rho = rho_momt, k = best_kn.2) # 3. rho[x] is unknown independent of x and estimated # by the (trimmed) mean of tilde(rho[x]) rho_trimmean <- mean(rho_momt, trim=0.00) best_kn.3 <- kopt_momt_pick(post$xinput, post$yprod, x.post, rho = rho_trimmean) res.momt.3 <- dfs_momt(post$xinput, post$yprod, x.post, rho = rho_trimmean, k = best_kn.3) # Representation plot(yprod~xinput, data = post, col = "grey", xlab = "Quantity of labor", ylab = "Volume of delivered mail") lines(x.post, res.momt.2[,1], lty = 1, lwd = 2, col = "cyan") lines(x.post, res.momt.2[,2], lty = 3, lwd = 4, col = "magenta") lines(x.post, res.momt.2[,3], lty = 3, lwd = 4, col = "magenta") plot(yprod~xinput, data = post, col = "grey", xlab = "Quantity of labor", ylab = "Volume of delivered mail") lines(x.post, res.momt.3[,1], lty = 1, lwd = 2, col = "cyan") lines(x.post, res.momt.3[,2], lty = 3, lwd = 4, col = "magenta") lines(x.post, res.momt.3[,3], lty = 3, lwd = 4, col = "magenta") ## End(Not run)
This function is an implementation of the Pickands type estimator developed by Daouia, Florens and Simar (2010).
dfs_pick(xtab, ytab, x, k, rho, ci=TRUE)
dfs_pick(xtab, ytab, x, k, rho, ci=TRUE)
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
k |
a numeric vector of the same length as |
rho |
a numeric vector of the same length as |
ci |
a boolean, TRUE for computing the confidence interval. |
Built on the ideas of Dekkers and de Haan (1989), Daouia et al. (2010) propose to estimate the frontier point by
from the transformed data described in
dfs_momt
, where is the same tail-index as in
dfs_momt
.
If is known (typically equal to 2 if the joint density of data is believed to have sudden jumps at the frontier), then one can use the estimator
in conjunction with the function
kopt_momt_pick
which implements an automatic data-driven method for selecting the threshold .
In contrast, if
is unknown, one could consider using the following two-step estimator:
First, estimate
by the Pickands estimator
implemented in the function
rho_momt_pick
by using the option method="pickands"
,
or by the moment estimator by utilizing the option
method="moment"
.
Second, use the estimator , as if
were known, by substituting the estimated value
or
in place of
.
The pointwise
confidence interval of the frontier function obtained from the asymptotic normality of
is given by
where .
Finally, to select the threshold
, one could use the automatic data-driven method of Daouia et al. (2010) implemented in the function
kopt_momt_pick
(option method="pickands"
).
Returns a numeric vector with the same length as x
.
As it is common in extreme-value theory, good results require a large sample size at each evaluation point
. See also the note in
kopt_momt_pick
.
Abdelaati Daouia and Thibault Laurent (converted from Leopold Simar's Matlab code).
Daouia, A., Florens, J.P. and Simar, L. (2010). Frontier Estimation and Extreme Value Theory, Bernoulli, 16, 1039-1063.
Dekkers, A.L.M., Einmahl, J.H.J. and L. de Haan (1989), A moment estimator for the index of an extreme-value distribution, Annals of Statistics, 17, 1833-1855.
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) # 1. When rho[x] is known and equal to 2, we set: rho<-2 # To determine the sample fraction k=k[n](x) # in hat(varphi[pick])(x). best_kn.1<-kopt_momt_pick(post$xinput, post$yprod, x.post, method="pickands", rho=rho) # To compute the frontier estimates and confidence intervals: res.pick.1<-dfs_pick(post$xinput, post$yprod, x.post, rho=rho, k=best_kn.1) # Representation plot(yprod~xinput, data=post, xlab="Quantity of labor", ylab="Volume of delivered mail") lines(x.post, res.pick.1[,1], lty=1, col="cyan") lines(x.post, res.pick.1[,2], lty=3, col="magenta") lines(x.post, res.pick.1[,3], lty=3, col="magenta") ## Not run: # 2. rho[x] is unknown and estimated by # the Pickands estimator hat(rho[x]) rho_pick<-rho_momt_pick(post$xinput, post$yprod, x.post, method="pickands") best_kn.2<-kopt_momt_pick(post$xinput, post$yprod, x.post, method="pickands", rho=rho_pick) res.pick.2<-dfs_pick(post$xinput, post$yprod, x.post, rho=rho_pick, k=best_kn.2) # 3. rho[x] is unknown independent of x and estimated # by the (trimmed) mean of hat(rho[x]) rho_trimmean<-mean(rho_pick, trim=0.00) best_kn.3<-kopt_momt_pick(post$xinput, post$yprod, x.post, rho=rho_trimmean, method="pickands") res.pick.3<-dfs_pick(post$xinput, post$yprod, x.post, rho=rho_trimmean, k=best_kn.3) # Representation plot(yprod~xinput, data=post, col="grey", xlab="Quantity of labor", ylab="Volume of delivered mail") lines(x.post, res.pick.2[,1], lty=1, lwd=2, col="cyan") lines(x.post, res.pick.2[,2], lty=3, lwd=4, col="magenta") lines(x.post, res.pick.2[,3], lty=3, lwd=4, col="magenta") plot(yprod~xinput, data=post, col="grey", xlab="Quantity of labor", ylab="Volume of delivered mail") lines(x.post, res.pick.3[,1], lty=1, lwd=2, col="cyan") lines(x.post, res.pick.3[,2], lty=3, lwd=4, col="magenta") lines(x.post, res.pick.3[,3], lty=3, lwd=4, col="magenta") ## End(Not run)
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) # 1. When rho[x] is known and equal to 2, we set: rho<-2 # To determine the sample fraction k=k[n](x) # in hat(varphi[pick])(x). best_kn.1<-kopt_momt_pick(post$xinput, post$yprod, x.post, method="pickands", rho=rho) # To compute the frontier estimates and confidence intervals: res.pick.1<-dfs_pick(post$xinput, post$yprod, x.post, rho=rho, k=best_kn.1) # Representation plot(yprod~xinput, data=post, xlab="Quantity of labor", ylab="Volume of delivered mail") lines(x.post, res.pick.1[,1], lty=1, col="cyan") lines(x.post, res.pick.1[,2], lty=3, col="magenta") lines(x.post, res.pick.1[,3], lty=3, col="magenta") ## Not run: # 2. rho[x] is unknown and estimated by # the Pickands estimator hat(rho[x]) rho_pick<-rho_momt_pick(post$xinput, post$yprod, x.post, method="pickands") best_kn.2<-kopt_momt_pick(post$xinput, post$yprod, x.post, method="pickands", rho=rho_pick) res.pick.2<-dfs_pick(post$xinput, post$yprod, x.post, rho=rho_pick, k=best_kn.2) # 3. rho[x] is unknown independent of x and estimated # by the (trimmed) mean of hat(rho[x]) rho_trimmean<-mean(rho_pick, trim=0.00) best_kn.3<-kopt_momt_pick(post$xinput, post$yprod, x.post, rho=rho_trimmean, method="pickands") res.pick.3<-dfs_pick(post$xinput, post$yprod, x.post, rho=rho_trimmean, k=best_kn.3) # Representation plot(yprod~xinput, data=post, col="grey", xlab="Quantity of labor", ylab="Volume of delivered mail") lines(x.post, res.pick.2[,1], lty=1, lwd=2, col="cyan") lines(x.post, res.pick.2[,2], lty=3, lwd=4, col="magenta") lines(x.post, res.pick.2[,3], lty=3, lwd=4, col="magenta") plot(yprod~xinput, data=post, col="grey", xlab="Quantity of labor", ylab="Volume of delivered mail") lines(x.post, res.pick.3[,1], lty=1, lwd=2, col="cyan") lines(x.post, res.pick.3[,2], lty=3, lwd=4, col="magenta") lines(x.post, res.pick.3[,3], lty=3, lwd=4, col="magenta") ## End(Not run)
This function is an implementation of the probability-weighted moment frontier estimator developed by Daouia, Florens and Simar (2012).
dfs_pwm(xtab, ytab, x, coefm, a=2, rho, ci=TRUE)
dfs_pwm(xtab, ytab, x, coefm, a=2, rho, ci=TRUE)
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
coefm |
a tuning parameter (integer) larger than or equal to 1. |
a |
a smoothing parameter (integer) larger than or equal to 2. |
rho |
a numeric vector of the same length as |
ci |
a boolean, TRUE for computing the confidence interval. |
The regularized frontier estimator introduced by Daouia et al. (2012) is based on the unregularized probability-weighted moment estimator
where the trimming order is an integer such that
as
,
and
.
The implemented estimator of
is then defined as
where
with being a fixed integer. If the true tail-index
is known, we set
in the expressions above.
The two smoothing parameters
and
have to be fixed by the user in the 4th and 5th arguments of the function.
The pointwise confidence interval of
derived from the asymptotic normality of
is given by
where
with .
Note that the standard deviation
of the bias-corrected estimator
is adjusted by a bootstrap estimator
in the numerical illustrations of Daouia et al. (2012), whereas the exact estimate
is utilized in the implemented function.
A practical choice of
that Daouia et al. (2012) have employed is the simple rule of thumb
, where
,
and the integer
coefm
as well as the second smoothing parameter a
are to be tuned by the user to avoid numerical instabilities
in the pointwise estimates of the tail-index and the frontier function
.
The user may start with the values
coefm=5
and a=2
[respectively, coefm=10
and a=20
]
for computing the estimator [respectively,
]. Note that tail-index estimation and frontier estimation are conducted separately.
Returns a numeric vector with the same length as x
.
The computational burden here is demanding, so be forewarned.
Abdelaati Daouia and Thibault Laurent (converted from Abdelaati Daouia's Matlab code).
Daouia, A., Florens, J.-P. and Simar, L. (2012). Regularization of Nonparametric Frontier Estimators. Journal of Econometrics, 168, 285-299.
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) ## Not run: # 1. When rho[x] is known and equal to 2, we set: rho<-2 res.pwm.1<- dfs_pwm(post$xinput, post$yprod, x.post, coefm=5, a=2, rho, ci=TRUE) # 2. When rho[x] is unknown and dependent of x, # its estimate hat(rho[x]) is obtained via: rho_pwm <- rho_pwm(post$xinput, post$yprod, x.post, coefm=10, a=20) # and the corresponding frontier estimator via: res.pwm.2<- dfs_pwm(post$xinput, post$yprod, x.post, coefm=5, a=2, rho_pwm, ci=TRUE) # 3. When rho[x] is unknown but independent of x, # a robust estimation strategy is by using the (trimmed) mean # over the estimates hat(rho[x]): rho_trimmean<-mean(rho_pwm, trim=0.00) res.pwm.3<- dfs_pwm(post$xinput, post$yprod, x.post, coefm=5, a=2, rho_trimmean, ci=TRUE) ## End(Not run)
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) ## Not run: # 1. When rho[x] is known and equal to 2, we set: rho<-2 res.pwm.1<- dfs_pwm(post$xinput, post$yprod, x.post, coefm=5, a=2, rho, ci=TRUE) # 2. When rho[x] is unknown and dependent of x, # its estimate hat(rho[x]) is obtained via: rho_pwm <- rho_pwm(post$xinput, post$yprod, x.post, coefm=10, a=20) # and the corresponding frontier estimator via: res.pwm.2<- dfs_pwm(post$xinput, post$yprod, x.post, coefm=5, a=2, rho_pwm, ci=TRUE) # 3. When rho[x] is unknown but independent of x, # a robust estimation strategy is by using the (trimmed) mean # over the estimates hat(rho[x]): rho_trimmean<-mean(rho_pwm, trim=0.00) res.pwm.3<- dfs_pwm(post$xinput, post$yprod, x.post, coefm=5, a=2, rho_trimmean, ci=TRUE) ## End(Not run)
The dataset consists of 123 American electric utility companies. As in the set-up of Gijbels et al. (1999),
we used the measurements of the variables and
, where
is the production output
of the company
and
is the total cost involved in the production.
For a detailed description and analysis of these data see, e.g., Christensen and Greene (1976) and Greene (1990).
data(green)
data(green)
A data frame with 123 observations on the following 2 variables.
COST
a numeric vector.
OUTPUT
a numeric vector.
Gijbels et al. (1999).
Christensen, L.R. and Greene, W.H. (1976). Economies of Scale in U.S. Electric Power Generation, Journal of Political Economy, University of Chicago Press, 84, 655-76.
Gijbels, I., Mammen, E., Park, B.U. and Simar, L. (1999). On estimation of monotone and concave frontier functions. Journal of American Statistical Association, 94, 220-228.
Greene, W.H. (1990). A Gamma-distributed stochastic frontier model, Journal of Econometrics, 46, 141-163.
data("green")
data("green")
The function kern_smooth
implements two frontier estimators based on kernel smoothing techniques. One is from Noh (2014) and the other is from Parmeter and Racine (2013).
kern_smooth(xtab, ytab, x, h, method="u", technique="noh", control = list("tm_limit" = 700))
kern_smooth(xtab, ytab, x, h, method="u", technique="noh", control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
h |
determines the bandwidth at which the smoothed kernel estimate will be computed. |
method |
a character equal to "u" (unconstrained estimator), "m" (under the monotonicity constraint) or "mc" (under simultaneous monotonicity and concavity constraints). |
technique |
which estimation method to use. "Noh"" specifies the use of the method in Noh (2014) and "pr" is for the method in Parameter and Racine (2013). |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
To estimate the frontier function, Parameter and Racine (2013) considered the following generalization of linear regression smoothers
where is the kernel weight function of
for the
th data depending on
's and the sort of linear smoothers. For example, the Nadaraya-Watson kernel weights are
where
, with the kernel function
being a bounded and symmetric probability density, and
is a bandwidth. Then, the weight vector
is chosen to minimize the distance
subject to the envelopment constraints and the choice of the shape constraints, where
is an
-dimensional vector with all elements being one. The envelopement and shape constraints are
where is the
th derivative of
, with
and
being the collections of points where monotonicity and concavity are imposed, respectively. In our implementation of the estimator, we simply take the entire dataset
to be
and
and, in case of small samples, we augment the sample points by an equispaced grid of length 201 over the observed support
of
. For the weight
, we use the Nadaraya-Watson weights.
Noh (2014) considered the same generalization of linear smoothers for frontier estimation, but with a difference choice of the weight
. Using the same envelopment and shape constraints as Parmeter and Racine (2013), the weight vector
is chosen to minimize the area under the fitted curve
, that is
, where
is the true support of
. In practice, we integrate over the observed support
since the theoretic one is unknown. In what concerns the kernel weights
, we use the Priestley-Chao weights
where it is assumed that the pairs have been ordered so that
. The choice of such weights is motivated by their convenience for the evaluation of the integral
.
Returns a numeric vector with the same length as x
. Returns a vector of NA if no solution has been found by the solver (GLPK).
Hohsuk Noh
Noh, H. (2014). Frontier estimation using kernel smoothing estimators with data transformation. Journal of the Korean Statistical Society, 43, 503-512.
Parmeter, C.F. and Racine, J.S. (2013). Smooth constrained frontier analysis in Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis, Springer-Verlag, New York, 463-488.
## Not run: data("green") x.green <- seq(min(log(green$COST)), max(log(green$COST)), length.out = 101) options(np.tree=TRUE, crs.messages=FALSE, np.messages=FALSE) # 1. Unconstrained (h.bic.green.u <- kern_smooth_bw(log(green$COST), log(green$OUTPUT), method = "u", technique = "noh", bw_method = "bic")) y.ks.green.u <- kern_smooth(log(green$COST), log(green$OUTPUT), x.green, h = h.bic.green.u, method = "u", technique = "noh") # 2. Monotonicity constraint (h.bic.green.m <- kern_smooth_bw(log(green$COST), log(green$OUTPUT), method = "m", technique = "noh", bw_method = "bic")) y.ks.green.m <- kern_smooth(log(green$COST), log(green$OUTPUT), x.green, h = h.bic.green.m, method = "m", technique = "noh") # 3. Monotonicity and Concavity constraints (h.bic.green.mc<-kern_smooth_bw(log(green$COST), log(green$OUTPUT), method="mc", technique="noh", bw_method="bic")) y.ks.green.mc<-kern_smooth(log(green$COST), log(green$OUTPUT), x.green, h=h.bic.green.mc, method="mc", technique="noh") # Representation plot(log(OUTPUT)~log(COST), data=green, xlab="log(COST)", ylab="log(OUTPUT)") lines(x.green, y.ks.green.u, lty=1, lwd=4, col="green") lines(x.green, y.ks.green.m, lty=2, lwd=4, col="cyan") lines(x.green, y.ks.green.mc, lty=3, lwd=4, col="magenta") legend("topleft", col=c("green","cyan","magenta"), lty=c(1,2,3), legend=c("unconstrained", "monotone", "monotone + concave"), lwd=4, cex=0.8) ## End(Not run)
## Not run: data("green") x.green <- seq(min(log(green$COST)), max(log(green$COST)), length.out = 101) options(np.tree=TRUE, crs.messages=FALSE, np.messages=FALSE) # 1. Unconstrained (h.bic.green.u <- kern_smooth_bw(log(green$COST), log(green$OUTPUT), method = "u", technique = "noh", bw_method = "bic")) y.ks.green.u <- kern_smooth(log(green$COST), log(green$OUTPUT), x.green, h = h.bic.green.u, method = "u", technique = "noh") # 2. Monotonicity constraint (h.bic.green.m <- kern_smooth_bw(log(green$COST), log(green$OUTPUT), method = "m", technique = "noh", bw_method = "bic")) y.ks.green.m <- kern_smooth(log(green$COST), log(green$OUTPUT), x.green, h = h.bic.green.m, method = "m", technique = "noh") # 3. Monotonicity and Concavity constraints (h.bic.green.mc<-kern_smooth_bw(log(green$COST), log(green$OUTPUT), method="mc", technique="noh", bw_method="bic")) y.ks.green.mc<-kern_smooth(log(green$COST), log(green$OUTPUT), x.green, h=h.bic.green.mc, method="mc", technique="noh") # Representation plot(log(OUTPUT)~log(COST), data=green, xlab="log(COST)", ylab="log(OUTPUT)") lines(x.green, y.ks.green.u, lty=1, lwd=4, col="green") lines(x.green, y.ks.green.m, lty=2, lwd=4, col="cyan") lines(x.green, y.ks.green.mc, lty=3, lwd=4, col="magenta") legend("topleft", col=c("green","cyan","magenta"), lty=c(1,2,3), legend=c("unconstrained", "monotone", "monotone + concave"), lwd=4, cex=0.8) ## End(Not run)
The function kern_smooth_bw
provides two bandwidth selection methods. One is the least squares cross-validation developed by Parmeter and Racine (2013). The other is the BIC developed in Noh (2014).
kern_smooth_bw(xtab, ytab, method="u", technique="noh", bw_method="bic", control = list("tm_limit" = 700))
kern_smooth_bw(xtab, ytab, method="u", technique="noh", bw_method="bic", control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
method |
a character equal to "u" (unconstrained estimator), "m" (under the monotonicity constraint) or "mc" (under simultaneous monotonicity and concavity constraints). |
technique |
which estimation technique to use: "Noh" specifies the use of the method in Noh (2014), while "pr" is for the method in Parameter and Racine (2013). |
bw_method |
which bandwidth selection method to use: "cv" returns the bandwidth that minimizes the least squares cross-validation criterion, and "bic" returns the bandwidth minimizing the BIC. |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
As with any smoothed techniques, the bandwidth selection is critical to the quality of the frontier estimator. Parmeter and Racine (2013)'s recommendation is to use the least squares cross-validation method implemented with bw\_method="cv"
in the function kern\_smooth\_bw
.
Instead, Noh (2014) proposed to select the bandwidth which minimizes the following criterion:
where is the chosen weight vector associated to the bandwidth
, and
is the trace of the smoothing matrix
The function kern\_smooth\_bw
computes the optimal bandwidth from this criterion with option bw\_method="bic"
.
Returns an optimal bandwidth depending on the specified selection method.
Hohsuk Noh
Noh, H. (2014). Frontier estimation using kernel smoothing estimators with data transformation. Journal of the Korean Statistical Society, 43, 503-512.
Parmeter, C.F. and Racine, J.S. (2013). Smooth constrained frontier analysis in Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis, Springer-Verlag, New York, 463-488.
## Not run: data("green") x.green <- seq(min(log(green$COST)), max(log(green$COST)),length.out=101) options(np.tree=TRUE,crs.messages=FALSE,np.messages=FALSE) h.pr.green.m<-kern_smooth_bw(log(green$COST),log(green$OUTPUT), method="m", technique="pr", bw_method="cv") h.noh.green.m<-kern_smooth_bw(log(green$COST),log(green$OUTPUT), method="m", technique="noh", bw_method="bic") y.pr.green.m<-kern_smooth(log(green$COST),log(green$OUTPUT), x.green, h=h.pr.green.m, method="m", technique="pr") y.noh.green.m<-kern_smooth(log(green$COST),log(green$OUTPUT), x.green, h=h.noh.green.m, method="m", technique="noh") plot(log(OUTPUT)~log(COST), data=green, xlab="log(COST)",ylab="log(OUTPUT)") lines(x.green, y.pr.green.m, lwd=4, lty=3, col="red") lines(x.green, y.noh.green.m, lwd=4, lty=3, col="blue") legend("topleft", col=c("blue","red"),lty=3, legend=c("noh","pr"), lwd=4, cex=0.8) ## End(Not run)
## Not run: data("green") x.green <- seq(min(log(green$COST)), max(log(green$COST)),length.out=101) options(np.tree=TRUE,crs.messages=FALSE,np.messages=FALSE) h.pr.green.m<-kern_smooth_bw(log(green$COST),log(green$OUTPUT), method="m", technique="pr", bw_method="cv") h.noh.green.m<-kern_smooth_bw(log(green$COST),log(green$OUTPUT), method="m", technique="noh", bw_method="bic") y.pr.green.m<-kern_smooth(log(green$COST),log(green$OUTPUT), x.green, h=h.pr.green.m, method="m", technique="pr") y.noh.green.m<-kern_smooth(log(green$COST),log(green$OUTPUT), x.green, h=h.noh.green.m, method="m", technique="noh") plot(log(OUTPUT)~log(COST), data=green, xlab="log(COST)",ylab="log(OUTPUT)") lines(x.green, y.pr.green.m, lwd=4, lty=3, col="red") lines(x.green, y.noh.green.m, lwd=4, lty=3, col="blue") legend("topleft", col=c("blue","red"),lty=3, legend=c("noh","pr"), lwd=4, cex=0.8) ## End(Not run)
in moment and Pickands frontier estimators
This function gives the optimal sample fraction k in the moment and Pickands type of estimators introduced by Daouia, Florens and Simar (2010).
kopt_momt_pick(xtab, ytab, x, rho, method="moment", wind.coef=0.1)
kopt_momt_pick(xtab, ytab, x, rho, method="moment", wind.coef=0.1)
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
rho |
a numeric vector of the same length as |
method |
a character equal to "moment" or "pickands". |
wind.coef |
a scalar coefficient to be selected in the interval (0,1]. |
This function is an implementation of an experimental method by Daouia et al. (2010) for the
automated threshold selection (choice of ) for the moment frontier estimator
[see
dfs_momt
] in case method="moment"
and for the Pickands
frontier estimator [see
dfs_pick
] in case method="pickands"
.
The idea is to select first (for each ) a grid of values for the sample fraction
given by
,
where
stands for the integer part of
with
,
and then select the
where the variation of the results is the smallest. To achieve this here,
Daouia et al. (2010) compute the standard deviations of
[option
method="moment"
] or [option
method="pickands"
]
over a “window” of size , where the coefficient
wind.coef
should be selected in the interval in such
a way to avoid numerical instabilities.
The default option
wind.coef=0.1
corresponds to having a window large enough to cover around of the possible values of
in the selected range of values for
.
The value of
where the standard deviation is minimal defines the desired sample fraction
.
Returns a numeric vector with the same length as x
.
In order to choose a raisonable estimate [see
dfs_momt
] and
[see
dfs_pick
] of the frontier function ,
for each fixed
, one can construct the plot of the estimator of interest, consisting of the points
or
, and select a value of the estimate at which the obtained graph looks stable. This is this kind of idea
which guides the proposed automatic data-driven rule for a chosen grid of values of
. The main difficulty with such a method is that the plots of
or
as functions of
, for each
, may be so unstable that reasonable values of
[which would correspond to the true value of
] may be hidden in the graphs. In result, the obtained frontier estimator may exhibit considerable volatility as
a function of
. One way to avoid such instabilities is by tuning the choice of the parameter
wind.coef
in the interval (0,1]. Note that the default value is wind.coef=0.1
.
The user can also improve appreciably the estimation of by refining the estimation of the extreme-value index
(see
rho_momt_pick
for details).
Abdelaati Daouia and Thibault Laurent (converted from Leopold Simar's Matlab code).
Daouia, A., Florens, J.P. and Simar, L. (2010). Frontier Estimation and Extreme Value Theory, Bernoulli, 16, 1039-1063.
Dekkers, A.L.M., Einmahl, J.H.J. and L. de Haan (1989), A moment estimator for the index of an extreme-value distribution, Annals of Statistics, 17, 1833-1855.
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) # When rho[x] is known and equal to 2, we set: rho<-2 # a. Optimal k in Pickands frontier estimators best_kn.pick<-kopt_momt_pick(post$xinput, post$yprod, x.post, method="pickands", rho=rho) # b. Optimal k in moment frontier estimators ## Not run: best_kn.momt<-kopt_momt_pick(post$xinput, post$yprod, x.post, rho=rho) ## End(Not run)
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) # When rho[x] is known and equal to 2, we set: rho<-2 # a. Optimal k in Pickands frontier estimators best_kn.pick<-kopt_momt_pick(post$xinput, post$yprod, x.post, method="pickands", rho=rho) # b. Optimal k in moment frontier estimators ## Not run: best_kn.momt<-kopt_momt_pick(post$xinput, post$yprod, x.post, rho=rho) ## End(Not run)
Computes the local linear smoothing frontier estimator of Hall, Park and Stern (1998) and Hall and Park (2004).
loc_est(xtab, ytab, x, h, method="u", control = list("tm_limit" = 700))
loc_est(xtab, ytab, x, h, method="u", control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
h |
determines the bandwidth at which the local linear estimate will be computed. |
method |
a character equal to "u" (unconstrained estimator) or "m" (improved version of the unconstrained estimator). |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
In the unconstrained case (option method="u"
), the implemented estimator of is defined by
where the bandwidth has to be fixed by the user in the 4th argument of the function.
This estimator may lack of smoothness in case of small samples and has no guarantee of being monotone even if the true frontier is so.
Following the curvature of the monotone frontier
, the unconstrained estimator
is likely to exhibit substantial bias, especially at the sample boundaries
(see Daouia et al (2016) for numerical illustrations). A simple way to remedy to this drawback is by imposing the extra condition
in the definition of
to get
As shown in Daouia et al (2016), this version only reduces the vexing bias and border defects of the original estimator when the true frontier is monotone.
The option method="m"
indicates that the improved fit should be utilized in place of
.
Hall and Park (2004) proposed a bootstrap procedure for selecting the optimal
bandwidth
in
and
(see the function
loc_est_bw
).
Returns a numeric vector with the same length as x
. Returns a vector of NA if no solution has been found by the solver (GLPK).
Hohsuk Noh.
Daouia, A., Noh, H. and Park, B.U. (2016). Data Envelope fitting with constrained polynomial splines. Journal of the Royal Statistical Society: Series B, 78(1), 3-30. doi:10.1111/rssb.12098.
Hall, P. and Park, B.U. (2004). Bandwidth choice for local polynomial estimation of smooth boundaries. Journal of Multivariate Analysis, 91, 240-261.
Hall, P., Park, B.U. and Stern, S.E. (1998). On polynomial estimators of frontiers and boundaries. Journal of Multivariate Analysis, 66, 71-98.
data("nuclear") x.nucl <- seq(min(nuclear$xtab), max(nuclear$xtab), length.out=101) # 1. Unconstrained estimator # Optimal bandwidths over 100 bootstrap replications ## Not run: h.nucl.u <- loc_est_bw(nuclear$xtab, nuclear$ytab, x.nucl, h=40, B=100, method="u") ## End(Not run) (h.nucl.u<-79.11877) y.nucl.u<-loc_est(nuclear$xtab, nuclear$ytab, x.nucl, h=h.nucl.u, method="u") # 2. improved version of the estimator # Optimal bandwidths over 100 bootstrap replications ## Not run: h.nucl.m <- loc_est_bw(nuclear$xtab, nuclear$ytab, x.nucl, h=40, B=100, method="m") ## End(Not run) (h.nucl.m<-79.12) y.nucl.m<-loc_est(nuclear$xtab, nuclear$ytab, x.nucl, h=h.nucl.m, method="m") # 3. Representation plot(x.nucl, y.nucl.u, lty=1, lwd=4, col="magenta", type="l") lines(x.nucl, y.nucl.m, lty=2, lwd=4, col="cyan") points(ytab~xtab, data=nuclear) legend("topleft",legend=c("unconstrained", "improved"), col=c("magenta","cyan"), lwd=4, lty=c(1,2))
data("nuclear") x.nucl <- seq(min(nuclear$xtab), max(nuclear$xtab), length.out=101) # 1. Unconstrained estimator # Optimal bandwidths over 100 bootstrap replications ## Not run: h.nucl.u <- loc_est_bw(nuclear$xtab, nuclear$ytab, x.nucl, h=40, B=100, method="u") ## End(Not run) (h.nucl.u<-79.11877) y.nucl.u<-loc_est(nuclear$xtab, nuclear$ytab, x.nucl, h=h.nucl.u, method="u") # 2. improved version of the estimator # Optimal bandwidths over 100 bootstrap replications ## Not run: h.nucl.m <- loc_est_bw(nuclear$xtab, nuclear$ytab, x.nucl, h=40, B=100, method="m") ## End(Not run) (h.nucl.m<-79.12) y.nucl.m<-loc_est(nuclear$xtab, nuclear$ytab, x.nucl, h=h.nucl.m, method="m") # 3. Representation plot(x.nucl, y.nucl.u, lty=1, lwd=4, col="magenta", type="l") lines(x.nucl, y.nucl.m, lty=2, lwd=4, col="cyan") points(ytab~xtab, data=nuclear) legend("topleft",legend=c("unconstrained", "improved"), col=c("magenta","cyan"), lwd=4, lty=c(1,2))
Computes the optimal bootstrap bandwidth proposed by Hall and Park (2004) for the local linear frontier estimator.
loc_est_bw(xtab, ytab, x, hini, B = 5, method = "u", fix.seed = FALSE, control = list("tm_limit" = 700))
loc_est_bw(xtab, ytab, x, hini, B = 5, method = "u", fix.seed = FALSE, control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
hini |
the initial bandwidth at which the local linear estimate will be computed. |
B |
number of bootstrap replications. |
method |
a character equal to "u" (unconstrained estimator) or "m" (improved version of the unconstrained estimator). |
fix.seed |
a boolean equal to TRUE for fixing the seed (bootstrap sampling). |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
For a detailed description of the bootstrap procedure, see Hall and Park (2004).
Returns the optimal bootstrap bandwidth.
The computational burden here is very demanding, so be forewarned.
Hohsuk Noh.
Hall, P. and Park, B.U. (2004). Bandwidth choice for local polynomial estimation of smooth boundaries. Journal of Multivariate Analysis, 91, 240-261.
## Not run: data("nuclear") x.nucl <- seq(min(nuclear$xtab), max(nuclear$xtab), length.out = 101) # 1. Unconstrained case # Optimal bandwidths over 100 bootstrap replications system.time( h.nucl.u <- loc_est_bw(nuclear$xtab, nuclear$ytab, x.nucl, hini = 40, B = 1, method = "u") ) # result is 79.11877 # 2. Monotonicity constraint # Optimal bandwidths over 100 bootstrap replications h.nucl.m <- loc_est_bw(nuclear$xtab, nuclear$ytab, x.nucl, hini = 40, B = 100, method = "m") # result is 79.12 ## End(Not run)
## Not run: data("nuclear") x.nucl <- seq(min(nuclear$xtab), max(nuclear$xtab), length.out = 101) # 1. Unconstrained case # Optimal bandwidths over 100 bootstrap replications system.time( h.nucl.u <- loc_est_bw(nuclear$xtab, nuclear$ytab, x.nucl, hini = 40, B = 1, method = "u") ) # result is 79.11877 # 2. Monotonicity constraint # Optimal bandwidths over 100 bootstrap replications h.nucl.m <- loc_est_bw(nuclear$xtab, nuclear$ytab, x.nucl, hini = 40, B = 100, method = "m") # result is 79.12 ## End(Not run)
Computes the local constant and local DEA boundary estimates proposed by Gijbels and Peng (2000).
loc_max(xtab, ytab, x, h, type="one-stage")
loc_max(xtab, ytab, x, h, type="one-stage")
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
h |
determines the bandwidth at which the estimate will be computed. |
type |
a character equal to "one-stage" or "two-stage". |
When estimating , for a given point
,
the methodology of Gijbels and Peng consists of considering a strip around
of width
,
where
with
as
, and focusing then on the
observations falling into this strip.
More precisely, they consider the transformend variables
,
, and the corresponding order statistics
.
The simple maximum defines then the local constant estimator of the
frontier point
[option
type="one-stage"
].
This opens a way to a two-stage estimation procedure as follows.
In a first stage, Gijbels and Peng calculate the maximum .
Then, they suggest to replace each observation
in the strip of width
around
by this maximum, leaving all observations outside the strip unchanged.
More precisely, they define
if
and
if
either.
Then, they apply the DEA estimator (see the function
dea_est
) to these transformed data ,
giving the local DEA estimator (option
type="two-stage"
).
An ad hoc way of selecting is by using for instance the function
npcdistbw
from the np package (see Daouia et al. (2016) for details).
Returns a numeric vector with the same length as x
.
Abdelaati Daouia and Thibault Laurent.
Daouia, A., Laurent, T. and Noh, H. (2017). npbr: A Package for Nonparametric Boundary Regression in R. Journal of Statistical Software, 79(9), 1-43. doi:10.18637/jss.v079.i09.
Gijbels, I. and Peng, L. (2000). Estimation of a support curve via order statistics, Extremes, 3, 251–277.
data("green") x.green <- seq(min(log(green$COST)), max(log(green$COST)), length.out=101) # Local maximum frontier estimates # a. Local constant estimator loc_max_1stage<-loc_max(log(green$COST), log(green$OUTPUT), x.green, h=0.5, type="one-stage") # b. Local DEA estimator loc_max_2stage<-loc_max(log(green$COST), log(green$OUTPUT), x.green, h=0.5, type="two-stage") # Representation plot(log(OUTPUT)~log(COST), data=green) lines(x.green, loc_max_1stage, lty=1, col="magenta") lines(x.green, loc_max_2stage, lty=2, col="cyan") legend("topleft",legend=c("one-stage", "two-stage"), col=c("magenta","cyan"), lty=c(1,2))
data("green") x.green <- seq(min(log(green$COST)), max(log(green$COST)), length.out=101) # Local maximum frontier estimates # a. Local constant estimator loc_max_1stage<-loc_max(log(green$COST), log(green$OUTPUT), x.green, h=0.5, type="one-stage") # b. Local DEA estimator loc_max_2stage<-loc_max(log(green$COST), log(green$OUTPUT), x.green, h=0.5, type="two-stage") # Representation plot(log(OUTPUT)~log(COST), data=green) lines(x.green, loc_max_1stage, lty=1, col="magenta") lines(x.green, loc_max_2stage, lty=2, col="cyan") legend("topleft",legend=c("one-stage", "two-stage"), col=c("magenta","cyan"), lty=c(1,2))
This function implements the optimal smoothing parameter
coefm
involved in the probability-weighted moment frontier estimator of Daouia, Florens and Simar (2012).
mopt_pwm(xtab, ytab, x, a=2, rho, wind.coef=0.1)
mopt_pwm(xtab, ytab, x, a=2, rho, wind.coef=0.1)
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
a |
a smoothing parameter (integer) larger than or equal to 2 (2 by default). |
rho |
a numeric vector of the same length as |
wind.coef |
a scalar coefficient to be selected in the interval (0,1]. |
This is an implementation of an automated selection of the parameter coefm
involved in the probability-weighted moment (PWM) estimator [see
dfs_pwm
].
It is an adaptation of the experimental method kopt_momt_pick
by Daouia et al. (2010).
The idea is to select first (for each ) a grid of values for the parameter
coefm
given by
, where
,
and then select the
where the variation of the results is the smallest.
To achieve this, we compute the standard deviations of
over a “window” of size
, where the coefficient
wind.coef
should be selected
in the interval in such a way to avoid numerical instabilities.
The default option
wind.coef=0.1
corresponds to having a window large enough to cover around
of the possible values of
in the selected range of values for
coefm
.
The value of where the standard deviation is minimal defines the desired
coefm
.
Returns a numeric vector with the same length as x
.
Abdelaati Daouia and Thibault Laurent.
Daouia, A., Florens, J.-P. and Simar, L. (2010). Frontier estimation and extreme value theory. Bernoulli, 16, 1039-1063.
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) ## Not run: # When rho[x] is known and equal to 2: best_cm.1<- mopt_pwm(post$xinput, post$yprod, x.post, a=2, rho=2) ## End(Not run)
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) ## Not run: # When rho[x] is known and equal to 2: best_cm.1<- mopt_pwm(post$xinput, post$yprod, x.post, a=2, rho=2) ## End(Not run)
The dataset from the US Electric Power Research Institute (EPRI) consists of 254 toughness results obtained from
non-irradiated representative steels. For each steel , fracture toughness
and temperature
were measured.
data(nuclear)
data(nuclear)
A data frame with 254 observations on the following 2 variables.
xtab
Temperature.
ytab
Fracture toughness of each material.
US Electric Power Research Institute (EPRI).
Daouia, A., Girard, S. and Guillou, A. (2014). A Gamma-moment approach to monotonic boundary estimation. Journal of Econometrics, 78, 727-740.
data("nuclear")
data("nuclear")
Computes the Pickands type of estimator introduced by Gijbels and Peng (2000).
pick_est(xtab, ytab, x, h, k, type="one-stage")
pick_est(xtab, ytab, x, h, k, type="one-stage")
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
h |
determines the bandwidth at which the estimate will be computed. |
k |
a numeric vector of the same length as |
type |
a character equal to "one-stage" or "two-stage". |
The local Pickands' frontier estimator (option type="one-stage"
), obtained by applying the well-known approach of Dekkers and de Haan (1989)
in conjunction with the transformed sample of 's described in the function
loc_max
, is defined as
It is based on three upper order statistics ,
,
, and depends on
(see
loc_max
)
as well as an intermediate sequence with
as
.
The two smoothing parameters
and
have to be fixed in the 4th and 5th arguments of the function.
Also, the user can replace each observation in the strip of width
around
by the resulting local Pickands', leaving all observations outside the strip unchanged.
Then, one may apply the DEA estimator (see the function
dea_est
) to the obtained transformed data,
giving the local DEA estimator (option type="two-stage"
).
Returns a numeric vector with the same length as x
.
Abdelaati Daouia and Thibault Laurent.
Dekkers, A.L.M. and L. de Haan (1989). On the estimation of extreme-value index and large quantiles estimation, Annals of Statistics, 17, 1795-1832.
Gijbels, I. and Peng, L. (2000). Estimation of a support curve via order statistics, Extremes, 3, 251-277.
## Not run: data("green") plot(log(OUTPUT)~log(COST), data=green) x <- seq(min(log(green$COST)), max(log(green$COST)), length.out=101) h=0.5 nx<-unlist(lapply(x,function(y) length(which(abs(log(green$COST)-y)<=h)))) k<-trunc(nx^0.1) lines(x, pick_est(log(green$COST), log(green$OUTPUT), x, h=h, k=k), lty=1, col="red") ## End(Not run)
## Not run: data("green") plot(log(OUTPUT)~log(COST), data=green) x <- seq(min(log(green$COST)), max(log(green$COST)), length.out=101) h=0.5 nx<-unlist(lapply(x,function(y) length(which(abs(log(green$COST)-y)<=h)))) k<-trunc(nx^0.1) lines(x, pick_est(log(green$COST), log(green$OUTPUT), x, h=h, k=k), lty=1, col="red") ## End(Not run)
Computes the optimal degree of the unconstrained polynomial frontier estimator proposed by Hall, Park and Stern (1998).
poly_degree(xtab, ytab, prange=0:20, type="AIC", control = list("tm_limit" = 700))
poly_degree(xtab, ytab, prange=0:20, type="AIC", control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
prange |
a vector of integers specifying the range in which the optimal degree of the polynomial frontier estimator is to be selected. |
type |
a character equal to "AIC" or "BIC". |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
As the degree of the polynomial estimator
(see
poly_est
) determines the dimensionality of the approximating function, we may view the problem of choosing p as model selection.
By analogy to the information criteria proposed by Daouia et al. (2016) in the boundary regression context, we obtain the optimal polynomial degree by minimizing
The first one (option type = "AIC"
) is similar to the famous Akaike information criterion Akaike (1973) and the second one
(option type = "BIC"
) to the Bayesian information criterion Schwartz (1978).
Returns an integer.
Hohsuk Noh.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle, in Second International Symposium of Information Theory, eds. B. N. Petrov and F. Csaki, Budapest: Akademia Kiado, 267–281.
Daouia, A., Noh, H. and Park, B.U. (2016). Data Envelope fitting with constrained polynomial splines. Journal of the Royal Statistical Society: Series B, 78(1), 3-30. doi:10.1111/rssb.12098.
Hall, P., Park, B.U. and Stern, S.E. (1998). On polynomial estimators of frontiers and boundaries. Journal of Multivariate Analysis, 66, 71-98.
Schwartz, G. (1978). Estimating the dimension of a model, Annals of Statistics, 6, 461–464.
data("air") x.air <- seq(min(air$xtab), max(air$xtab), length.out = 101) # Optimal polynomial degrees via the AIC criterion (p.aic.air <- poly_degree(air$xtab, air$ytab, type = "AIC")) # Optimal polynomial degrees via the BIC criterion (p.bic.air <- poly_degree(air$xtab, air$ytab, type = "BIC"))
data("air") x.air <- seq(min(air$xtab), max(air$xtab), length.out = 101) # Optimal polynomial degrees via the AIC criterion (p.aic.air <- poly_degree(air$xtab, air$ytab, type = "AIC")) # Optimal polynomial degrees via the BIC criterion (p.bic.air <- poly_degree(air$xtab, air$ytab, type = "BIC"))
Computes the polynomial-type estimators of frontiers and boundaries proposed by Hall, Park and Stern (1998).
poly_est(xtab, ytab, x, deg, control = list("tm_limit" = 700))
poly_est(xtab, ytab, x, deg, control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
deg |
an integer (polynomial degree). |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
The data edge is modeled by a single polynomial
of known degree
that envelopes the full data and minimizes the area under its graph for
, with
and
being respectively the lower and upper endpoints of the design points
.
The implemented function is the estimate
of
, where
minimizes
over
subject to the envelopment constraints
,
.
Returns a numeric vector with the same length as x
. Returns a vector of NA if no solution has been found by the solver (GLPK).
Hohsuk Noh.
Hall, P., Park, B.U. and Stern, S.E. (1998). On polynomial estimators of frontiers and boundaries. Journal of Multivariate Analysis, 66, 71-98.
data("air") x.air <- seq(min(air$xtab), max(air$xtab), length.out = 101) # Optimal polynomial degrees via the AIC criterion (p.aic.air <- poly_degree(air$xtab, air$ytab, type = "AIC")) # Polynomial boundaries estimate y.poly.air<-poly_est(air$xtab, air$ytab, x.air, deg = p.aic.air) # Representation plot(x.air, y.poly.air, lty = 1, lwd = 4, col = "magenta", type = "l") points(ytab~xtab, data = air) legend("topleft",legend = paste("degree =", p.aic.air), col = "magenta", lwd = 4, lty = 1)
data("air") x.air <- seq(min(air$xtab), max(air$xtab), length.out = 101) # Optimal polynomial degrees via the AIC criterion (p.aic.air <- poly_degree(air$xtab, air$ytab, type = "AIC")) # Polynomial boundaries estimate y.poly.air<-poly_est(air$xtab, air$ytab, x.air, deg = p.aic.air) # Representation plot(x.air, y.poly.air, lty = 1, lwd = 4, col = "magenta", type = "l") points(ytab~xtab, data = air) legend("topleft",legend = paste("degree =", p.aic.air), col = "magenta", lwd = 4, lty = 1)
The dataset post
about the cost of the delivery activity of the postal services in Europe was first analyzed by Cazals, Florens and Simar (2002).
There are 4,000 post offices observed in 1994. For each post office , the input
is the labor cost measured by the quantity of labor,
which represents more than
of the total cost of the delivery activity. The output
is defined as the volume of delivered mail (in number of objects). It should be noted that noise has been added to the original data.
data(post)
data(post)
A data frame with 4000 observations on the following 2 variables.
xinput
a numeric vector.
yprod
a numeric vector.
Cazals, C., Florens, J.-P., Simar, L. (2002), Nonparametric frontier estimation: a robust approach, Journal of Econometrics, 106, 1-25.
data("post")
data("post")
This function is an implementation of the (un)constrained quadratic spline smoother proposed by Daouia, Noh and Park (2016).
quad_spline_est(xtab, ytab, x, kn = ceiling((length(xtab))^(1/4)), method= "u", all.dea = FALSE, control = list("tm_limit" = 700))
quad_spline_est(xtab, ytab, x, kn = ceiling((length(xtab))^(1/4)), method= "u", all.dea = FALSE, control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
kn |
an integer specifying the number of inter-knot segments used in the computation of the spline estimate. |
method |
a character equal to "u" (unconstrained estimator), "m" (under the monotonicity constraint) or "mc" (under simultaneous monotonicity and concavity constraints). |
all.dea |
a boolean. |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
Let and
be, respectively, the minimum and maximum of the design points
.
Denote a partition of
by
(see below the selection process).
Let
and
be the vector of normalized
B-splines of order 3 based on the knot mesh
(see, e.g., Schumaker (2007)).
When the true frontier
is known or required to be monotone nondecreasing (option
cv=0
),
its constrained quadratic spline estimate is defined by , where
minimizes
over subject to the envelopment and monotonicity constraints
,
, and
,
,
with
being the derivative of
.
Considering the special connection of the spline smoother with the traditional FDH frontier
(see the function
dea_est
),
Daouia et al. (2015) propose an easy way of choosing the knot mesh.
Let
be the observations
lying on the FDH boundary (i.e.
).
The basic idea is to pick out a set of knots equally spaced in percentile ranks
among the
FDH points
by
taking
, the
th quantile
of the values of
for
.
The choice of the number of internal knots is then viewed as model selection
through the minimization of the AIC and BIC information criteria (see the function
quad_spline_kn
).
When the monotone boundary is also believed to be concave (option
cv=1
),
its constrained fit is defined as , where
minimizes the same objective function as
subject to the same envelopment
and monotonicity constraints and the additional concavity constraints
,
where
is the constant second derivative of
on each inter-knot interval and
is the midpoint of
.
Regarding the choice of knots, the same scheme as for can be applied by replacing
the FDH points
with the DEA points
, that is,
the observations
lying on the piecewise linear DEA frontier (see the function
dea_est
).
Alternatively, the strategy of just using all the DEA points as knots is also
working quite well for datasets of modest size as shown in Daouia et al. (2016).
In this case, the user has to choose the option all.dea=TRUE
.
Returns a numeric vector with the same length as x
. Returns a vector of NA if no solution has been found by the solver (GLPK).
Hohsuk Noh.
Daouia, A., Noh, H. and Park, B.U. (2016). Data Envelope fitting with constrained polynomial splines. Journal of the Royal Statistical Society: Series B, 78(1), 3-30. doi:10.1111/rssb.12098.
Schumaker, L.L. (2007). Spline Functions: Basic Theory, 3rd edition, Cambridge University Press.
## Not run: data("green") x.green <- seq(min(log(green$COST)), max(log(green$COST)), length.out=101) # 1. Unconstrained quadratic spline fits # Optimal number of inter-knot segments via the BIC criterion (kn.bic.green.u<-quad_spline_kn(log(green$COST), log(green$OUTPUT), method="u", type="BIC")) # Unconstrained spline estimate y.quad.green.u<-quad_spline_est(log(green$COST), log(green$OUTPUT), x.green, kn=kn.bic.green.u, method="u") # 2. Monotonicity constraint # Optimal number of inter-knot segments via the BIC criterion (kn.bic.green.m<-quad_spline_kn(log(green$COST), log(green$OUTPUT), method="m", type="BIC")) # Monotonic splines estimate y.quad.green.m<-quad_spline_est(log(green$COST), log(green$OUTPUT), x.green, kn=kn.bic.green.m, method="m") # 3. Monotonicity and Concavity constraints # Optimal number of inter-knot segments via the BIC criterion (kn.bic.green.mc<-quad_spline_kn(log(green$COST), log(green$OUTPUT), method="mc", type="BIC")) # Monotonic/Concave splines estimate y.quad.green.mc<-quad_spline_est(log(green$COST), log(green$OUTPUT), x.green, kn=kn.bic.green.mc, method="mc", all.dea=TRUE) # Representation plot(x.green, y.quad.green.u, lty=1, lwd=4, col="green", type="l", xlab="log(COST)", ylab="log(OUTPUT)") lines(x.green, y.quad.green.m, lty=2, lwd=4, col="cyan") lines(x.green, y.quad.green.mc, lwd=4, lty=3, col="magenta") points(log(OUTPUT)~log(COST), data=green) legend("topleft", col=c("green","cyan","magenta"), lty=c(1,2,3), legend=c("unconstrained", "monotone", "monotone + concave"), lwd=4, cex=0.8) ## End(Not run)
## Not run: data("green") x.green <- seq(min(log(green$COST)), max(log(green$COST)), length.out=101) # 1. Unconstrained quadratic spline fits # Optimal number of inter-knot segments via the BIC criterion (kn.bic.green.u<-quad_spline_kn(log(green$COST), log(green$OUTPUT), method="u", type="BIC")) # Unconstrained spline estimate y.quad.green.u<-quad_spline_est(log(green$COST), log(green$OUTPUT), x.green, kn=kn.bic.green.u, method="u") # 2. Monotonicity constraint # Optimal number of inter-knot segments via the BIC criterion (kn.bic.green.m<-quad_spline_kn(log(green$COST), log(green$OUTPUT), method="m", type="BIC")) # Monotonic splines estimate y.quad.green.m<-quad_spline_est(log(green$COST), log(green$OUTPUT), x.green, kn=kn.bic.green.m, method="m") # 3. Monotonicity and Concavity constraints # Optimal number of inter-knot segments via the BIC criterion (kn.bic.green.mc<-quad_spline_kn(log(green$COST), log(green$OUTPUT), method="mc", type="BIC")) # Monotonic/Concave splines estimate y.quad.green.mc<-quad_spline_est(log(green$COST), log(green$OUTPUT), x.green, kn=kn.bic.green.mc, method="mc", all.dea=TRUE) # Representation plot(x.green, y.quad.green.u, lty=1, lwd=4, col="green", type="l", xlab="log(COST)", ylab="log(OUTPUT)") lines(x.green, y.quad.green.m, lty=2, lwd=4, col="cyan") lines(x.green, y.quad.green.mc, lwd=4, lty=3, col="magenta") points(log(OUTPUT)~log(COST), data=green) legend("topleft", col=c("green","cyan","magenta"), lty=c(1,2,3), legend=c("unconstrained", "monotone", "monotone + concave"), lwd=4, cex=0.8) ## End(Not run)
Computes the optimal number of inter-knot segments in the quadratic spline fits proposed by Daouia, Noh and Park (2016).
quad_spline_kn(xtab, ytab, method, krange = 1:20, type = "AIC", control = list("tm_limit" = 700))
quad_spline_kn(xtab, ytab, method, krange = 1:20, type = "AIC", control = list("tm_limit" = 700))
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
method |
a character equal to "u" (unconstrained estimator), "m" (under the monotonicity constraint) or "mc" (under simultaneous monotonicity and concavity constraints). |
krange |
a vector of integers specifying the range in which the optimal number of inter-knot segments is to be selected. |
type |
a character equal to "AIC" or "BIC". |
control |
a list of parameters to the GLPK solver. See *Details* of help(Rglpk_solve_LP). |
For the implementation of the unconstrained quadratic spline smoother
(see
quad_spline_est
), based on the knot mesh
,
the user has to employ the option
method="u"
.
Since the number determines the complexity of the spline approximation,
its choice may be viewed as model selection via the minimization of the following Akaike (option
type="AIC"
)
or Bayesian (option type="BIC"
) information criteria:
For the implementation of the monotone (option method="m"
) quadratic spline smoother (see
quad_spline_est
),
the authors first suggest using the set of knots
among the FDH points
,
(function
quad_spline_est
).
Then, they propose to choose by minimizing the following AIC (option
type="AIC"
) or BIC (option type="BIC"
) information criteria:
A small number of knots is typically needed as elucidated by the asymptotic theory.
For the implementation of the monotone and concave (option method="mc"
) spline estimator ,
just apply the same scheme as above by replacing the FDH points
with the DEA points
(see
dea_est
).
Returns an integer.
Hohsuk Noh.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle, in Second International Symposium of Information Theory, eds. B. N. Petrov and F. Csaki, Budapest: Akademia Kiado, 267–281.
Daouia, A., Noh, H. and Park, B.U. (2016). Data Envelope fitting with constrained polynomial splines. Journal of the Royal Statistical Society: Series B, 78(1), 3-30. doi:10.1111/rssb.12098.
Schwartz, G. (1978). Estimating the dimension of a model, Annals of Statistics, 6, 461–464.
data("green") ## Not run: # BIC criteria for choosing the optimal number of # inter-knot segments in: # a. Unconstrained quadratic spline fits (kn.bic.green.u <- quad_spline_kn(log(green$COST), log(green$OUTPUT), method = "u", type = "BIC")) # b. Monotone quadratic spline smoother (kn.bic.green.m <- quad_spline_kn(log(green$COST), log(green$OUTPUT), method = "m", type = "BIC")) # c. Monotone and concave quadratic spline smoother (kn.bic.green.mc<-quad_spline_kn(log(green$COST), log(green$OUTPUT), method = "mc", type = "BIC")) ## End(Not run)
data("green") ## Not run: # BIC criteria for choosing the optimal number of # inter-knot segments in: # a. Unconstrained quadratic spline fits (kn.bic.green.u <- quad_spline_kn(log(green$COST), log(green$OUTPUT), method = "u", type = "BIC")) # b. Monotone quadratic spline smoother (kn.bic.green.m <- quad_spline_kn(log(green$COST), log(green$OUTPUT), method = "m", type = "BIC")) # c. Monotone and concave quadratic spline smoother (kn.bic.green.mc<-quad_spline_kn(log(green$COST), log(green$OUTPUT), method = "mc", type = "BIC")) ## End(Not run)
The dataset records
is concerned with the yearly best men's outdoor 1500m times starting from 1966. Following Jirak, Meister and Reiss (2014), the lower boundary can be
interpreted as the best possible time for a given year. This boundary is not believed to be shape constrained and can be estimated by any unconstrained shape nonparametric method.
data(records)
data(records)
A data frame with 46 observations on the following 2 variables.
year
year.
result
1500m record in seconds.
Jirak, M., Meister, A. and M. Reiss (2014), Optimal adaptive estimation in nonparametric regression with one-sided errors. Annals of Statistics, 42, 1970–2002.
data("records")
data("records")
This function gives the optimal rho involved in the moment and Pickands estimators of Daouia, Florens and Simar (2010).
rho_momt_pick(xtab, ytab, x, method="moment", lrho=1, urho=Inf)
rho_momt_pick(xtab, ytab, x, method="moment", lrho=1, urho=Inf)
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
method |
a character equal to "moment" or "pickands". |
lrho |
a scalar, minimum rho threshold value. |
urho |
a scalar, maximum rho threshold value. |
This function computes the moment and Pickands estimates of the extreme-value index
involved in the frontier estimators
[see
dfs_momt
] and
[see
dfs_pick
].
In case method="moment"
, the estimator of defined as
is based on the moments
for
, with
are the ascending order statistics
corresponding to the transformed sample
In case
method="pickands"
, the estimator of is given by
To select the threshold in
and
, Daouia et al. (2010) have suggested to use the following data driven method for each
: They first select a grid of values for
.
For the Pickands estimator
, they choose
, where
is an integer varying between 1
and the integer part
of
, with
.
For the moment estimator
, they choose
, where
is an integer varying between 1 and
.
Then, they evaluate the estimator
(respectively,
) and select the k where the variation of the results is the smallest.
They achieve this by computing the standard deviation of
(respectively,
) over a “window” of
(respectively,
)
successive values of
. The value of
where this standard deviation is minimal defines the value of
.
The user can also appreciably improve the estimation of
and
itself by tuning the choice of the lower limit (default option
lrho=1
)
and upper limit (default option urho=Inf
).
Returns a numeric vector with the same length as x
.
In order to choose a raisonable estimate and
of the extreme-value index
,
for each fixed
, one can construct the plot of the estimator of interest, consisting of the points
or
, and select a value of the estimate at which the obtained graph looks stable. This is this kind of idea
which guides the propoed automatic data-driven rule for a chosen grid of values of
. The main difficulty with such a method is that the plots of
or
as functions of
, for each
, may be so unstable that reasonable values of
[which would correspond to the true value of
] may be hidden in the graphs. In results, the obtained extreme-value index estimator and the frontier estimator itself may
exhibits considerable volatility as functions of
. The user can appreciably improve the estimation of
and
by tuning the choice of the lower limit (default option
lrho=1
) and upper limit (default option urho=Inf
).
Abdelaati Daouia and Thibault Laurent (codes converted from Matlab's Leopold Simar code).
Daouia, A., Florens, J.P. and Simar, L. (2010). Frontier Estimation and Extreme Value Theory, Bernoulli, 16, 1039-1063.
Dekkers, A.L.M., Einmahl, J.H.J. and L. de Haan (1989), A moment estimator for the index of an extreme-value distribution, The Annals of Statistics, 17(4), 1833-1855.
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) ## Not run: # a. Optimal rho for Pickands frontier estimator rho_pick<-rho_momt_pick(post$xinput, post$yprod, x.post, method="pickands") # b. Optimal rho for moment frontier estimator rho_momt<-rho_momt_pick(post$xinput, post$yprod, x.post, method="moment") ## End(Not run)
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) ## Not run: # a. Optimal rho for Pickands frontier estimator rho_pick<-rho_momt_pick(post$xinput, post$yprod, x.post, method="pickands") # b. Optimal rho for moment frontier estimator rho_momt<-rho_momt_pick(post$xinput, post$yprod, x.post, method="moment") ## End(Not run)
This function is an implementation of the Probability-weighted moment frontier estimator developed by Daouia, Florens and Simar (2012).
rho_pwm(xtab, ytab, x, a=2, lrho=1, urho=Inf)
rho_pwm(xtab, ytab, x, a=2, lrho=1, urho=Inf)
xtab |
a numeric vector containing the observed inputs |
ytab |
a numeric vector of the same length as |
x |
a numeric vector of evaluation points in which the estimator is to be computed. |
a |
a smoothing parameter (integer) larger than or equal to 2. |
lrho |
a scalar, minimum rho threshold value. |
urho |
a scalar, maximum rho threshold value. |
The function computes the probability-weighted moment (PWM) estimator utilized in the frontier estimate
[see
dfs_pwm
].
This estimator depends on the smoothing parameters and
. A simple selection rule of thumb that Daouia et al. (2012) have employed is
[default option in the 4th argument of the function]
and
, where
and the integer
coefm
is to be tuned by the user.
To choose this parameter in an optimal way for each , we adapt the automated threshold selection method of Daouia et al. (2010) as follows:
We first evaluate the estimator
over a grid of values of
coefm
given by
.
Then, we select the
where the variation of the results is the smallest. This is achieved by computing the standard deviation of the estimates
over a “window” of
successive values of
. The value of
where this standard deviation is minimal defines the value of
coefm
.
The user can also appreciably improve the estimation of the extreme-value index and the frontier function
itself by tuning the choice of the lower limit
(default option
lrho=1
) and upper limit (default option urho=Inf
).
Returns a numeric vector with the same length as x
.
The computational burden here is demanding, so be forewarned.
Abdelaati Daouia and Thibault Laurent.
Daouia, A., Florens, J.-P. and Simar, L. (2010). Frontier estimation and extreme value theory. Bernoulli, 16, 1039-1063.
Daouia, A., Florens, J.-P. and Simar, L. (2012). Regularization of Nonparametric Frontier Estimators. Journal of Econometrics, 168, 285-299.
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) ## Not run: # When rho[x] is unknown and dependent of x, # its estimate hat(rho[x]) is obtained via: rho_pwm <- rho_pwm(post$xinput, post$yprod, x.post, a=20) ## End(Not run)
data("post") x.post<- seq(post$xinput[100],max(post$xinput), length.out=100) ## Not run: # When rho[x] is unknown and dependent of x, # its estimate hat(rho[x]) is obtained via: rho_pwm <- rho_pwm(post$xinput, post$yprod, x.post, a=20) ## End(Not run)