Title: | Robust Regression and Estimation Through Maximum Mean Discrepancy Minimization |
---|---|
Description: | The functions in this package compute robust estimators by minimizing a kernel-based distance known as MMD (Maximum Mean Discrepancy) between the sample and a statistical model. Recent works proved that these estimators enjoy a universal consistency property, and are extremely robust to outliers. Various optimization algorithms are implemented: stochastic gradient is available for most models, but the package also allows gradient descent in a few models for which an exact formula is available for the gradient. In terms of distribution fit, a large number of continuous and discrete distributions are available: Gaussian, exponential, uniform, gamma, Poisson, geometric, etc. In terms of regression, the models available are: linear, logistic, gamma, beta and Poisson. Alquier, P. and Gerber, M. (2024) <doi:10.1093/biomet/asad031> Cherief-Abdellatif, B.-E. and Alquier, P. (2022) <doi:10.3150/21-BEJ1338>. |
Authors: | Pierre Alquier [aut, cre] , Mathieu Gerber [aut] |
Maintainer: | Pierre Alquier <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.1 |
Built: | 2024-10-26 03:37:57 UTC |
Source: | CRAN |
Fits a statistical models to the data, using the robust procedure based on maximum mean discrepancy (MMD) minimization introduced and studied in Briol et al. (2019); Chérief-Abdellatif and Alquier (2022).
mmd_est(x, model, par1, par2, kernel, bdwth, control= list())
mmd_est(x, model, par1, par2, kernel, bdwth, control= list())
x |
Data. Must be a vector for univariate models, a matrix of dimension n by d, where n is the sample size and d the dimension of the model. |
model |
Parametric model to be fitted to the data. No default. See details for the list of available models. |
par1 |
First parameter of the model. In models where the first parameter is fixed, it is necessary to provide a value for |
par2 |
Second parameter of the model (if any). In models where the second parameter is fixed, it is necessary to provide a value for |
kernel |
Kernel to be used in the MMD. Available options for |
bdwth |
Bandwidth parameter for the kernel. |
control |
A |
Available options for model
are:
beta
"Beta distribution with pdf on
,
par1
and
par2
are both estimated.
binomial
"Binomial distribution with pmf on
,
par1
and
par2
are both estimated. Note that in this case, if the user specifies a value for
, it is used as an upper bound rather than an initialization.
binomial.prob
"Binomial distribution with pmf on
,
par1
is fixed and must be specified by the user while
par2
is estimated.
binomial.size
"Binomial distribution with pmf on
,
par1
is estimated while
par2
fixed and must be specified by the user. Note that in this case, if the user specifies a value for
, it is used as an upper bound rather than an initialization.
Cauchy
"Cauchy distribution with pdf ,
par1
is estimated.
continuous.uniform.loc
"Uniform distribution with pdf on
,
par1
is estimated while
par2
is fixed and must be specified by the user.
continuous.uniform.upper
"Uniform distribution with pdf on
,
par1
is fixed and must be specified by the user while
par2
is estimated.
continuous.uniform.lower.upper
"Uniform distribution with pdf on
,
par1
and
par2
are estimated.
Dirac
"Dirac mass at point on the reals,
par1
is estimated.
discrete.uniform
"Uniform distribution with pmf on
,
par1
is estimated. Note that in this case, if the user specifies a value for
, it is used as an upper bound rather than an initialization.
exponential
"Exponential distribution with pdf on positive reals
,
par1
is estimated.
gamma
"Gamma distribution with pdf on positive reals
,
par1
and
par2
are estimated.
gamma.shape
"Gamma distribution with pdf on positive reals
,
par1
is estimated while
par2
is fixed and must be specified by the user.
gamma.rate
"Gamma distribution with pdf on positive reals
,
par1
is fixed and must be specified by the user while
par2
is estimated.
Gaussian
"Gaussian distribution with pdf on reals
,
par1
and
par2
are estimated.
Gaussian.loc
"Gaussian distribution with pdf on reals
,
par1
is estimated while
par2
is fixed and must be specified by the user.
Gaussian.scale
"Gaussian distribution with pdf on reals
,
par1
is fixed and must be specified by the user while
par2
is estimated.
geometric
"Geometric distribution with pmf on
,
par1
is estimated.
multidim.Dirac
"Dirac mass at point on
,
par1
(
-dimensional vector) is estimated.
multidim.Gaussian
"Gaussian distribution with pdf on
,
par1
(
-dimensional vector) and
par2
(
-
matrix) are estimated.
multidim.Gaussian.loc
"Gaussian distribution with pdf on
,
par1
(
-dimensional vector) is estimated while
par2
is fixed.
multidim.Gaussian.scale
"Gaussian distribution with pdf on
,
par1
(
-dimensional vector) is fixed and must be specified by the user while and
par2
(
-
matrix) is estimated.
Pareto
"Pareto distribution with pmf on the reals
,
par1
is estimated.
Poisson
"Poisson distribution with pmf on
,
par1
is estimated.
The control
argument is a list that can supply any of the following components:
Length of the burn-in period in GD or SGD. burnin
must be a non-negative integer and defaut burnin
==.
Number of iterations performed after the burn-in period in GD or SGD. nsetps
must be an integer strictly larger than 2 and by default nsteps
=
Stepsize parameter. An adaptive gradient step is used (adagrad), but it is possible to pre-multiply it by stepsize
. It must be strictly positive number and by default stepsize
=
Parameter used in adagrad to avoid numerical errors in the computation of the step-size. epsilon
must be a strictly positive real number and by default epsilon
=.
Optimization method to be used: "EXACT"
for exact, "GD"
for gradient descent and "SGD"
for stochastic gradient descent. Not all methods are available for all models. By default, exact is preferred to GD which is prefered to SGD.
MMD_est
returns an object of class
"estMMD"
.
The functions summary
can be used to print the results.
An object of class estMMD
is a list containing the following components:
model |
Model estimated |
par1 |
In models where the first parameter is fixed, this is the value |
par2 |
In models where the second parameter is fixed, this is the value |
kernel |
Kernel used in the MMD |
bdwth |
Bandwidth used. That is, either the value specified by the user, either the bandwidth computedby the median heuristic |
burnin |
Number of steps in the burnin of the GD or SGD algorithm |
nstep |
Number of steps in the GD or SGD algorithm |
stepsize |
Stepize parameter in GD or SGD |
epsilon |
Parameter used in adagrad to avoid numerical errors in the computation of the step-size |
method |
Optimization method used |
error |
Error message (if any) |
estimator |
Estimated parameter(s) |
type |
Takes the value " |
Briol F, Barp A, Duncan AB, Girolami M (2019).
“Statistical inference for generative models with maximum mean discrepancy.”
arXiv preprint arXiv:1906.05944.
Chérief-Abdellatif B, Alquier P (2022).
“Finite Sample Properties of Parametric MMD Estimation: Robustness to Misspecification and Dependence.”
Bernoulli, 28(1), 181-213.
Garreau D, Jitkrittum W, Kanagawa M (2017).
“Large sample analysis of the median heuristic.”
arXiv preprint arXiv:1707.07269.
#simulate data x = rnorm(50,0,1.5) # estimate the mean and variance (assuming the data is Gaussian) Est = mmd_est(x, model="Gaussian") # print a summary summary(Est) # estimate the mean (assuming the data is Gaussian with known standard deviation =1.5) Est2 = mmd_est(x, model="Gaussian.loc", par2=1.5) # print a summary summary(Est2) # estimate the standard deviation (assuming the data is Gaussian with known mean = 0) Est3 = mmd_est(x, model="Gaussian.scale", par1=0) # print a summary summary(Est3) # test of the robustness x[42] = 100 mean(x) # estimate the mean and variance (assuming the data is Gaussian) Est4 = mmd_est(x, model="Gaussian") summary(Est4)
#simulate data x = rnorm(50,0,1.5) # estimate the mean and variance (assuming the data is Gaussian) Est = mmd_est(x, model="Gaussian") # print a summary summary(Est) # estimate the mean (assuming the data is Gaussian with known standard deviation =1.5) Est2 = mmd_est(x, model="Gaussian.loc", par2=1.5) # print a summary summary(Est2) # estimate the standard deviation (assuming the data is Gaussian with known mean = 0) Est3 = mmd_est(x, model="Gaussian.scale", par1=0) # print a summary summary(Est3) # test of the robustness x[42] = 100 mean(x) # estimate the mean and variance (assuming the data is Gaussian) Est4 = mmd_est(x, model="Gaussian") summary(Est4)
Fits a regression model to the data, using the robust procedure based on maximum mean discrepancy (MMD) minimization introduced and studied in Alquier and Gerber (2024).
mmd_reg(y, X, model, intercept, par1, par2, kernel.y, kernel.x, bdwth.y, bdwth.x, control= list())
mmd_reg(y, X, model, intercept, par1, par2, kernel.y, kernel.x, bdwth.y, bdwth.x, control= list())
y |
Response variable. Must be a vector of length |
X |
Design matrix. |
model |
Regression model to be fitted to the data. By default, the linear regression model with |
intercept |
If |
par1 |
Values of the regression coefficients of the model used as starting values to numerically optimize the objective function. |
par2 |
A value for the auxilliary parameter |
kernel.y |
Kernel applied on the response variable. Available options for |
kernel.x |
Kernel applied on the explanatory variables. Available options for |
bdwth.y |
Bandwidth parameter for the kernel |
bdwth.x |
Bandwidth parameter for the kernel |
control |
A |
Available options for model
are:
"linearGaussian"
Linear regression model with error terms, with
unknown.
"linearGaussian.loc"
Linear regression model with error terms, with
known.
"gamma"
Gamma regression model with unknown shape parameter . The inverse function is used as link function.
"gamma.loc"
Gamma regression model with known shape parameter . The inverse function is used as link function.
"beta"
Beta regression model with unknown precision parameter . The logistic function is used as link function.
"beta.loc"
Beta regression model with known precision parameter . The logistic function is used as link function.
"logistic"
Logistic regression model.
"exponential"
Exponential regression.
"poisson"
Poisson regression model.
When bdwth.x
>0 the function reg_mmd
computes the estimator introduced in Alquier and Gerber (2024). When
bdwth.x
=0 the function reg_mmd
computes the estimator introduced in Alquier and Gerber (2024). The former estimator has stronger theoretical properties but is more expensive to compute (see below).
When bdwth.x
=0 and model
is "linearGaussian"
, "linearGaussian.loc"
or "logistic"
, the objective function and its gradient can be computed on operations, where
is the sample size (i.e. the dimension of
y
). In this case, gradient descent with backtraking line search is used to perform the minimizatiom. The algorithm stops when the maximum number of iterations maxit
is reached, or as soon as the change in the objective function is less than eps_gd
times the current function value. In the former case, a warning message is generated. By defaut, maxit
= and
eps_gd=sqrt(.Machine$double.eps)
, and the value of these two parameters can be changed using the control
argument of reg_mmd
.
When bdwth.x
>0 and model
is "linearGaussian"
, "linearGaussian.loc"
or "logistic"
, the objective function and its gradient can be computed on operations. To reduce the computational cost the objective function is minimized using norm adagrad (Duchi et al. 2011), an adaptive step size stochastic gradient algorithm. Each iteration of the algorithm requires
operations. However, the algorithm has an intialization step that requires
operations and has a memory requirement of size
.
When model
is not in c("linearGaussian", "linearGaussian.loc", "logistic")
, the objective function and its gradient cannot be computed explicitly and the minimization is performed using norm adagrad. The cost per iteration of the algorithm is but, for
bdwth.x
>0, the memory requirement and the initialization cost are both of size .
When adagrad is used, burnin
iterations are performed as a warm-up step. The algorithm then stops when burnin
+maxit
iterations are performed, or as soon as the norm of the average value of the gradient evaluations computed in all the previous iterations is less than eps_sg
. A warning message is generated if the maximum number of iterations is reached. By default, burnin
=,
nsteps
= and
eps_sg
= and the value of these three parameters can be changed using the
control
argument of reg_mmd
.
If bdwth.y="auto"
then the value of the bandwidth parameter of kernel.y
is equal to with
the median value of the set
, where
denote the ith component of
y
. This definition of bdwth.y
is motivated by the results in Garreau et al. (2017). If the bandwidth parameter of
kernel.y
is set to 1.
If bdwth.x="auto"
then the value of the bandwidth parameter of kernel.x
is equal to with
is the median value of the set
, where
denote the ith row of the design matrix
X
. If the bandwidth parameter of
kernel.x
is set to 1.
The control
argument is a list that can supply any of the following components:
If rescale=TRUE
the (non-constant) columns of X
are rescalled before perfoming the optimization, while if rescale=FLASE
no rescaling is applied. By default rescale=TRUE
.
A non-negative integer.
A non-negative real number.
A non-negative real number.
A integer strictly larger than 2.
Scaling constant for the step-sizes used by adagrad. stepsize
must be a stictly positive number and by default stepsize
=1.
If trace=TRUE
then the parameter value obtained at the end of each iteration (after the burn-in perdiod for adagrad) is returned. By default, trace=TRUE
and trace
is automatically set to TRUE
if the maximum number of iterations is reached.
Parameter used in adagrad to avoid numerical errors in the computation of the step-sizes. epsilon
must be a strictly positive real number and by default epsilon
=.
Parameter for the backtraking line search. alpha
must be a real number in and by default
alpha
=0.8.
Parameter used to control the computational cost of the algorithm when gamma.x
, see the Suplementary material in Alquier and Gerber (2024) for mode details.
c_det
must be a real number in and by default
c_det
=0.2.
Parameter used to control the computational cost of the algorithm when bdwth.x
, see the Suplementary material in Alquier and Gerber (2024) for mode details.
c_rand
must be a real number in and by default
c_rand
=0.1.
MMD_reg
returns an object of class
"regMMD"
.
The function summary
can be used to print the results.
An object of class regMMD
is a list containing the following components:
coefficients |
Estimated regression coefficients. |
intercept |
If |
phi |
If relevant (see details), either the estimated value of the |
kernel.y |
Kernel applied on the response variable used to fit the model. |
bdwth.y |
Value of the bandwidth for the kernel applied on the response variable used to fit the model. |
kernel.x |
Kernel applied on the explanatory variables used to fit the model. |
bdwth.x |
Value of the bandwidth for the kernel applied on the explanatory variables used to fit the model. |
par1 |
Value of the parameter |
par2 |
Value of parameter |
trace |
If the control parameter |
Alquier P, Gerber M (2024).
“Universal robust regression via maximum mean discrepancy.”
Biometrika, 111(1), 71-92.
Duchi J, Hazan E, Singer Y (2011).
“Adaptive subgradient methods for online learning and stochastic optimization.”
Journal of machine learning research, 12(7), 2121-2159.
Garreau D, Jitkrittum W, Kanagawa M (2017).
“Large sample analysis of the median heuristic.”
arXiv preprint arXiv:1707.07269.
#Simulate data n<-1000 p<-4 beta<-1:p phi<-1 X<-matrix(data=rnorm(n*p,0,1),nrow=n,ncol=p) data<-1+X%*%beta+rnorm(n,0,phi) ##Example 1: Linear Gaussian model y<-data Est<-mmd_reg(y, X) summary(Est) ##Example 2: Logisitic regression model y<-data y[data>5]<-1 y[data<=5]<-0 Est<-mmd_reg(y, X, model="logistic") summary(Est) Est<-mmd_reg(y, X, model="logistic", bdwth.x="auto") summary(Est)
#Simulate data n<-1000 p<-4 beta<-1:p phi<-1 X<-matrix(data=rnorm(n*p,0,1),nrow=n,ncol=p) data<-1+X%*%beta+rnorm(n,0,phi) ##Example 1: Linear Gaussian model y<-data Est<-mmd_reg(y, X) summary(Est) ##Example 2: Logisitic regression model y<-data y[data>5]<-1 y[data<=5]<-0 Est<-mmd_reg(y, X, model="logistic") summary(Est) Est<-mmd_reg(y, X, model="logistic", bdwth.x="auto") summary(Est)
class
"estMMD"
Summary method for the class
"estMMD"
## S3 method for class 'estMMD' summary(object, ...)
## S3 method for class 'estMMD' summary(object, ...)
object |
An object of |
... |
Additional arguments (not used). |
No return value, called only to print information on the output of "estMMD"
.
class
"regMMD"
Summary method for the class
"regMMD"
## S3 method for class 'regMMD' summary(object, ...)
## S3 method for class 'regMMD' summary(object, ...)
object |
An object of |
... |
Additional arguments (not used). |
No return value, called only to print information on the output of "regMMD"
.