Title: | Stochastic Gradient Descent for Scalable Estimation |
---|---|
Description: | A fast and flexible set of tools for large scale estimation. It features many stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics. |
Authors: | Junhyung Lyle Kim [cre, aut], Dustin Tran [aut], Panos Toulis [aut], Tian Lian [ctb], Ye Kuang [ctb], Edoardo Airoldi [ctb] |
Maintainer: | Junhyung Lyle Kim <[email protected]> |
License: | GPL-2 |
Version: | 1.1.2 |
Built: | 2024-10-31 06:54:12 UTC |
Source: | CRAN |
Extract model coefficients from sgd
objects. coefficients
is an alias for it.
## S3 method for class 'sgd' coef(object, ...)
## S3 method for class 'sgd' coef(object, ...)
object |
object of class |
... |
some methods for this generic require additional arguments. None are used in this method. |
Coefficients extracted from the model object object
.
Extract fitted values from from sgd
objects.
fitted.values
is an alias for it.
## S3 method for class 'sgd' fitted(object, ...)
## S3 method for class 'sgd' fitted(object, ...)
object |
object of class |
... |
some methods for this generic require additional arguments. None are used in this method. |
Fitted values extracted from the object object
.
sgd
.Plot objects of class sgd
.
## S3 method for class 'sgd' plot(x, ..., type = "mse", xaxis = "iteration") ## S3 method for class 'list' plot(x, ..., type = "mse", xaxis = "iteration")
## S3 method for class 'sgd' plot(x, ..., type = "mse", xaxis = "iteration") ## S3 method for class 'list' plot(x, ..., type = "mse", xaxis = "iteration")
x |
object of class |
... |
additional arguments used for each type of plot. See ‘Details’. |
type |
character specifying the type of plot: |
xaxis |
character specifying the x-axis of plot: |
Types of plots available:
mse
Mean squared error in predictions, which takes the following arguments:
x_test
test set
y_test
test responses to compare predictions to
clf
Classification error in predictions, which takes the following arguments:
x_test
test set
y_test
test responses to compare predictions to
mse-param
Mean squared error in parameters, which takes the following arguments:
true_param
true vector of parameters to compare to
Form predictions using the estimated model parameters from stochastic gradient descent.
## S3 method for class 'sgd' predict(object, newdata, type = "link", ...) predict_all(object, newdata, ...)
## S3 method for class 'sgd' predict(object, newdata, type = "link", ...) predict_all(object, newdata, ...)
object |
object of class |
newdata |
design matrix to form predictions on |
type |
the type of prediction required. The default "link" is on the scale of the linear predictors; the alternative '"response"' is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and 'type = "response"' gives the predicted probabilities. The '"terms"' option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale. |
... |
further arguments passed to or from other methods. |
A column of 1's must be included to newdata
if the
parameters include a bias (intercept) term.
sgd
.Print objects of class sgd
.
## S3 method for class 'sgd' print(x, ...)
## S3 method for class 'sgd' print(x, ...)
x |
object of class |
... |
further arguments passed to or from other methods. |
Extract model residuals from sgd
objects. resid
is an
alias for it.
## S3 method for class 'sgd' residuals(object, ...)
## S3 method for class 'sgd' residuals(object, ...)
object |
object of class |
... |
some methods for this generic require additional arguments. None are used in this method. |
Residuals extracted from the object object
.
Run stochastic gradient descent in order to optimize the induced loss function given a model and data.
sgd(x, ...) ## S3 method for class 'formula' sgd(formula, data, model, model.control = list(), sgd.control = list(...), ...) ## S3 method for class 'matrix' sgd(x, y, model, model.control = list(), sgd.control = list(...), ...) ## S3 method for class 'big.matrix' sgd(x, y, model, model.control = list(), sgd.control = list(...), ...)
sgd(x, ...) ## S3 method for class 'formula' sgd(formula, data, model, model.control = list(), sgd.control = list(...), ...) ## S3 method for class 'matrix' sgd(x, y, model, model.control = list(), sgd.control = list(...), ...) ## S3 method for class 'big.matrix' sgd(x, y, model, model.control = list(), sgd.control = list(...), ...)
x , y
|
a design matrix and the respective vector of outcomes. |
... |
arguments to be used to form the default |
formula |
an object of class |
data |
an optional data frame, list or environment (or object coercible
by |
model |
character specifying the model to be used: |
model.control |
a list of parameters for controlling the model.
|
sgd.control |
an optional list of parameters for controlling the estimation.
|
Models: The Cox model assumes that the survival data is ordered when passed in, i.e., such that the risk set of an observation i is all data points after it.
Methods:
sgd
stochastic gradient descent (Robbins and Monro, 1951)
implicit
implicit stochastic gradient descent (Toulis et al., 2014)
asgd
stochastic gradient with averaging (Polyak and Juditsky, 1992)
ai-sgd
implicit stochastic gradient with averaging (Toulis et al., 2015)
momentum
"classical" momentum (Polyak, 1964)
nesterov
Nesterov's accelerated gradient (Nesterov, 1983)
Learning rates and hyperparameters:
one-dim
scalar value prescribed in Xu (2011) as
where the defaults are
lr.control = (scale=1, gamma=1, alpha=1, c)
where c
is 1
if implemented without averaging,
2/3
if with averaging
one-dim-eigen
diagonal matrix
lr.control = NULL
d-dim
diagonal matrix
lr.control = (epsilon=1e-6)
adagrad
diagonal matrix prescribed in Duchi et al. (2011) as
lr.control = (eta=1, epsilon=1e-6)
rmsprop
diagonal matrix prescribed in Tieleman and Hinton
(2012) as
lr.control = (eta=1, gamma=0.9, epsilon=1e-6)
An object of class "sgd"
, which is a list containing the following
components:
model |
name of the model |
coefficients |
a named vector of coefficients |
converged |
logical. Was the algorithm judged to have converged? |
estimates |
estimates from algorithm stored at each iteration
specified in |
fitted.values |
the fitted mean values |
pos |
vector of indices specifying the iteration number each estimate was stored for |
residuals |
the residuals, that is response minus fitted values |
times |
vector of times in seconds it took to complete the number of
iterations specified in |
model.out |
a list of model-specific output attributes |
Dustin Tran, Tian Lan, Panos Toulis, Ye Kuang, Edoardo Airoldi
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.
Yurii Nesterov. A method for solving a convex programming problem with
convergence rate . Soviet Mathematics Doklady,
27(2):372-376, 1983.
Boris T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1-17, 1964.
Boris T. Polyak and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838-855, 1992.
Herbert Robbins and Sutton Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400-407, 1951.
Panos Toulis, Jason Rennie, and Edoardo M. Airoldi, "Statistical analysis of stochastic gradient methods for generalized linear models", In Proceedings of the 31st International Conference on Machine Learning, 2014.
Panos Toulis, Dustin Tran, and Edoardo M. Airoldi, "Stability and optimality in stochastic gradient descent", arXiv preprint arXiv:1505.02417, 2015.
Wei Xu. Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490, 2011.
# Dimensions
## Linear regression set.seed(42) N <- 1e4 d <- 5 X <- matrix(rnorm(N*d), ncol=d) theta <- rep(5, d+1) eps <- rnorm(N) y <- cbind(1, X) %*% theta + eps dat <- data.frame(y=y, x=X) sgd.theta <- sgd(y ~ ., data=dat, model="lm") sprintf("Mean squared error: %0.3f", mean((theta - as.numeric(sgd.theta$coefficients))^2))
## Linear regression set.seed(42) N <- 1e4 d <- 5 X <- matrix(rnorm(N*d), ncol=d) theta <- rep(5, d+1) eps <- rnorm(N) y <- cbind(1, X) %*% theta + eps dat <- data.frame(y=y, x=X) sgd.theta <- sgd(y ~ ., data=dat, model="lm") sprintf("Mean squared error: %0.3f", mean((theta - as.numeric(sgd.theta$coefficients))^2))
This dataset is a collection of white "Vinho Verde" wine samples from the north of Portugal. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
winequality
winequality
A data frame with 4898 rows and 12 variables
fixed acidity.
volatile acidity.
citric acid.
residual sugar.
chlorides.
free sulfur dioxide.
total sulfur dioxide.
density.
pH.
sulphates.
alcohol.
quality (score between 0 and 10).
https://archive.ics.uci.edu/ml/datasets/Wine+Quality