---
title: "OptimModel Vignette"
author: "Steven Novick"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{OptimModel Vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

The R package OptimModel provides various nonlinear (or linear) curve-fitting functions that use `stats::optim()` as its base. The packages contains many commonly-used curves and also permits the user to create a new curve function as well.  To estimate curve parameters, the user may call upon ordinary least squares (OLS), weighted least squares (WLS), iterative reweighted least squares (IRWLS), or maximum likelihood estimate (MLE).  For WLS, the user provides a fixed set of weights.  For IRWLS and MLE, the user may choose among a variety of weight functions.  Finally, there is also a function for robust outlier detection (ROUT), based on the work of Motulsky and Brown (2006). 

# Pre-specified models

In the OptimModel package, data are assumed to stem from the model

$$ y_i = f(\boldsymbol{\theta}, x_i) + e_i, i = 1, \ldots, n $$
where $x_i$ is a concentration or dose, $y_i$ is the response, $\boldsymbol{\theta}$ is a vector of mean-model parameters, and $e_i \sim N(0, \sigma^2 g^2(\boldsymbol{\theta}, \boldsymbol{\phi}))$.  The mean of $y_i | x_i$ is $f(\boldsymbol{\theta}, x_i)$ and the variance is $\sigma^2 g^2(\boldsymbol{\theta}, \boldsymbol{\phi}))$ with variance-model parameters $\boldsymbol{\phi}$.  The default is $g(\boldsymbol{\theta}, \boldsymbol{\phi}) = 1$ so that $e_i \sim N(0, \sigma^2)$.  The user, however, may instead provide non-negative valued numerical weights $\{w_1, \ldots, w_n \}$ so that $e_i  \sim N(0, \sigma^2 \sqrt{w_i})$.


A number of pre-defined models for the mean and variance are provided in OptimModel.  User-defined mean models and/or variance models may be supplied to the main function, optim_fit().


## Mean-model functions

Mean models are functions of the form $f(\theta, x)$, where $\theta$ is a parameter vector and $x$ is an R object that is processed by the function.  Usually $x$ is a vector of doses or concentrations; however, $x$ may also be a matrix, data.frame, or list.  Examples of user-defined mean models are provided at the end of this section.

Although it is not a requirement, each of the mean-model functions included in OptimModel is given several attributes, shown in the table below, where f_model takes the place of the model function.


```{r, echo=FALSE, results='asis'}

d = data.frame( Attribute=c("attr(f_model, 'gradient')", "attr(f_model, 'backsolve')",
                            "attr(f_model, 'start')"),
                Purpose=c("Gradient of f_model with respect to theta", 
                          "Find x such that f_model(theta, x) = y",
                          "Starting values for optimization")
)
                           
knitr::kable(d)
```

If attr(f_model, 'gradient') is missing, the gradient of f_model will be approximated.  If attr(f_model, 'start') is missing, the user must supply a starting value.

### Exponential decay (first order)

The first-order exponential decay model is given by
$$f(\boldsymbol{\theta}, x) = A + B \times exp(-K x),$$
where $x$ is the concentration or dose, $\boldsymbol{\theta} = (A, B, lK)$, $A$ is the minimum value, $A+B$ is the maximum value, and $K=exp(lK)$ is the shape parameter.

## Exponential decay (first order) with plateau

The first-order exponential decay model with plateau is an extension of the exponential decay model and given as $f(\boldsymbol{\theta}, x) = y_{max}$ if $x \le X_0$ and $f(\boldsymbol{\theta}, x) = y_{min} + (y_{max} - y_{min}) \times exp(-K (x-X0))$ if $x > X_0$, where $\boldsymbol{\theta} = (x_0, y_{min}, y_{max}, lK)$, $X_0=exp(x_0)$ is the inflection point between plateau and exponential decay curve, $y_{min}$ is the minimum value, $y_{max}$ is the maximum value, and $K=exp(lK)$ is the shape parameter.

## Exponential decay (second order)

The second-order exponential decay model is given by 
$$f(\boldsymbol{\theta}, x) = A + B \times \{  p \times \exp(-K_1 x) + (1-p) \times \exp(-K_2 x) \},$$
where $x$ is the concentration or dose, $\boldsymbol{\theta} = (A, B, lK_1, lK_2, p)$, $A$ is the minimum value, $A+B$ is the maximum value, $K_1=exp(lK_1)$ and $K_2 = exp(lK_2)$ are shape parameters, and $p$ is the proportion of signal from the first exponential term.

### Gompertz model

The four-parameter Gompertz model is a sigmoidal shaped curve, given by


$$ f(\boldsymbol{\theta}, x) = A + (B - A)\times \exp( -\exp( m*(x- x_0)) ), $$
where $x$ is the concentration or dose, $\boldsymbol{\theta} = (A, B, m, x_0)$, $A$ is the minimum value, $A + (B-A)\times exp(-exp(-m x_0 ))$ is the maximum value, $m$ is the shape parameter, and $x_0$ is an offset that shifts the curve on the $x$-axis.

### Four-parameter Hill (4PL) model

The Hill equation (Hill, 1910), which is equivalent to the four-parameter logistic curve (Verhulst, 1845), is a sigmoidal-shaped curve with meaningful parameters.  The mean model is

$$ f(\boldsymbol{\theta}, x) = e_{min} + \frac{e_{max} - e_{min}}{ 1 + \exp( m \times( lec50 - log(x) ) )}, $$
where $x$ is the concentration or dose, $\boldsymbol{\theta}=(e_{min}, e_{max}, lec50, m)$, $e_{min}$ is the minimum asymptote, $e_{max}$ is the maximum asymptote, $lec50$ is the natural log EC50, and $m$ is the shape parameter.  Note that $f(\boldsymbol{\theta}, EC50) = \frac{1}{2}( e_{min} + e_{max})$ and $m$ is often called the *slope* or *Hill* parameter. Refer to Hill (1910) or Verhulst (1845) for more information. 

### Five-parameter Hill model

The 5-parameter Hill equation is an extension of the 4-parameter model that is asymmetric around the EC50.  The model is given by

$$ f(\boldsymbol{\theta}, x) = e_{min} + \frac{e_{max} - e_{min}}{ \{1 + exp( m \times( lec50 - log(x) ) )\}^s}, $$
where $x$ is the concentration or dose, $\boldsymbol{\theta}=(e_{min}, e_{max}, lec50, m, ls)$, $e_{min}$ is the minimum asymptote, $e_{max}$ is the maximum asymptote, $lec50$ is the natural log EC50, $m$ is the shape parameter, and $s=exp(ls)$ is the symmetry parameter.  Note that $f(\boldsymbol{\theta}, EC50) = \frac{1}{2^s}( e_{min} + e_{max})$ and $m$ is often called the *slope* or *Hill* parameter.

### Hill quadratic model

The Hill model with a quadratic term is a five-parameter equation for biphasic data.  The mean model is given by

$$ f(\boldsymbol{\theta}, x) = e_{min} + \frac{e_{max} - e_{min}}{ \{1 + exp( -\{a + b z + c z^2 \})}, $$
where $z=log(x)$, $x$ is the concentration or dose, $\boldsymbol{\theta}=(e_{min}, e_{max}, a, b, c)$, $e_{min}$ is the minimum asymptote, $e_{max}$ is the maximum asymptote, and $(a, b, c)$ are respectively the intercept, linear, and quadratic terms.

Note that if $c = 0$, this model is equivalent to the four-parameter Hill model (hill.model).  Also, the $EC50$ is defined where $a + b z + c z^2 = 0$.  If the roots of the quadratic equation are real, then the $EC50$ is given by $(-b +/- \sqrt{b^2 - 4 a c})/(2 c)$.  The user decides which of the roots is meaningful.


### Hill switchpoint model

The Hill switchpoint is a biphasic equation that extends the four-parameter Hill model. The mean model is

$$ f(\boldsymbol{\theta}, x) = e_{min} + \frac{e_{max} - e_{min}}{ 1 + exp( m f(s, x) \times( lec50 - log(x) ) )}, $$
where $x$ is the concentration or dose, $\boldsymbol{\theta}=(e_{min}, e_{max}, lec50, m, ls)$, $e_{min}$ is the minimum asymptote, $e_{max}$ is the maximum asymptote, $lec50$ is the natural log EC50, and $m$ is the shape parameter.  Note that $f(\boldsymbol{\theta}, EC50) = \frac{1}{2}( e_{min} + e_{max})$ and $m$ is often called the *slope* or *Hill* parameter.


### Beta model

The Beta model is a biphasic curve based on the Beta function. Found in `MCPMod::betaMod()`, the curve is given by

$$ f(\boldsymbol{\theta}, x) = e_{min} + e_{max} \times exp \left( log\{ \beta(\delta_1, \delta_2) \} +
    \delta_1 \times log(x) + \delta_2 \times log(sc - x) - (\delta_1 + \delta_2) \times log(sc) \right),
$$
where $\beta(\delta_1, \delta_2) = (\delta_1+\delta_2)^{(\delta_1+\delta_2)} / ( \delta_1^{\delta_1} \times \delta_2^{\delta^2} )$, $e_{min}$ is a lower aymptote, $e_{max}$ is an upper asymptote, $sc$ is a scaling parameter, and $\delta_1$ and $\delta_2$ are power parameters.

### User defined mean model

An example of a user-defined mean model is from simple linear regression.  Code is provided below in which theta = c( intercept, slope ) and $x$ is a vector.

```
slr_model = function(theta, x){  theta[1] + theta[2]*x }

theta = c( 2, -5 )   ## Intercept, slope
x = seq(1, 10, by=0.5)
slr_model(theta, x)

```

The next example shows $x$ as a matrix for multivariate linear regression.

```
mlr_model = function(theta, x){  theta[1] + theta[2]*x[,1] + theta[3]*x[,2] }

theta = c( 2, -5, 0.1 )   ## Intercept, slope1, slope2
x1 = seq(1, 10, by=0.5)
x2 = seq(10, 1, by=-0.5)
mlr_model(theta, x=cbind(x1, x2))

```


## Weight/Variance functions

The package OptimModel also provides several pre-defined weight/variance functions.  The default function is weights_varIdent(), which sets $g(\boldsymbol{\theta}, \boldsymbol{\phi}) = 1$ for all observations (i.e., no weights).  The user has no reason to call weights_varIdent().

The following table shows the forms of $g(\boldsymbol{\theta}, \boldsymbol{\phi})$ for pre-defined functions, where $mu = f(\boldsymbol{\theta}, x)$ and $resid = y - mu$.

```{r, echo=FALSE, results='asis'}

d = data.frame( Function=c("weights_varIdent(phi, mu)", "weights_varExp(phi, mu)", 
                           "weights_varPower(phi, mu)", "weights_varConstPower(phi, mu)", 
                           "weights_tukey_bw(phi=4.685, resid)", "weights_huber(phi=1.345, resid)"),
                Code=c("rep(1, length(mu))", "exp(phi*mu)", "abs(mu)^(phi)", "phi[1]+abs(mu)^(phi[2])",
                       "see below", "see below") )
                           
knitr::kable(d)
```

### Tukey bi-weight

  The weight function is $w = (1-(r/\phi)^2)^2$ if $r \le \phi$ and $w=0$ otherwise, where $r = |resid|/sig$ and sig = mad(resid, center=0).  Traditionally, $phi = 4.685$.

### Huber weights

The weight function is $w = min(1, \phi / r)$, where $r = |resid|/sig$ and sig = mad(resid, center=0).  Traditionally, $phi = 1.345$.  This is also the default weight used in `MASS::rlm()`.

### User-defined weights

The user may provide a weight function with arguments *phi* and *mu*.  For example,

```
weights_user1 = function(phi, mu){  ifelse( mu < 5, (1/mu), (1/mu)^phi ) }

mu = 1:10
phi = 2.4
weights_user1(phi, mu)
```

## Calling optim_fit

Usage of the optim_fit() function will be illustrated with a data set generated from the hill_model().  An optim_fit() object is associated with S3-type functionality for *print*, *fitted*, and *residuals*.

### OLS and WLS

Data are generated from hill_model().  For OLS and WLS, the call to optim_fit() is bare boned.  The print.optim_fit() function provides parameter estimates, standard errors, and 95\% confidence intervals. It also provides the rMSE (sigma), an $r^2$ value, and BIC value.

```{r, echo=TRUE}

library(OptimModel)

set.seed(456)
x = rep( c(0, 2^(-4:4)), each=4 )
theta = c(0, 100, log(.5), 2)
y = hill_model(theta, x) + rnorm( length(x), sd=2 )

  ## No need to include a starting value.
fit = optim_fit(theta0=NULL, f.model=hill_model, x=x, y=y)
print(fit)

## Coefficients
coef(fit)

## Fitted values
fitted(fit)

## Residuals
residuals(fit)
```

The predict() function may be used to obtain mean-model estimates at specific *x* values. It may also provide standard errors, confidence intervals, and prediction intervals.  A graph of the data with 95\% confidence bands is provided.

```{r, echo=TRUE}
predict(fit, x=2:5)

predict(fit, x=2:5, se.fit=TRUE)

## 95% confidence interval
predict(fit, x=2:5, interval="confidence")

## 90% confidence interval
predict(fit, x=2:5, interval="confidence", level=0.9)

## 95% prediciton interval for next observation (K=1)
predict(fit, x=2:5, interval="prediction", K=1)


d = data.frame(x=x, y=y)
pred = as.data.frame(predict(fit, x=2^seq(-6, 4, length=25), interval="confidence"))

library(ggplot2)
ggplot( d, aes(x=x, y=y) ) + geom_point() + 
  geom_line( data=pred, aes(x, y.hat)) +
  geom_ribbon( data=pred, aes(x=x, y=y.hat, ymin=lower, ymax=upper), alpha=0.3) +
  scale_x_log10() + ggtitle("OLS")

## Fixed weights.  In this example, weights are increasing step-wise with x
w = ifelse(x < 0.1, 0.5, ifelse(x < 2, 1, 2))
  ## No need to include a starting value.
fit = optim_fit(theta0=NULL, f.model=hill_model, x=x, y=y, wts=w)
print(fit)

pred = as.data.frame(predict(fit, x=2^seq(-6, 4, length=25), interval="confidence"))

ggplot( d, aes(x=x, y=y) ) + geom_point() + 
  geom_line( data=pred, aes(x, y.hat)) +
  geom_ribbon( data=pred, aes(x=x, y=y.hat, ymin=lower, ymax=upper), alpha=0.3) +
  scale_x_log10() + ggtitle("WLS")
```

### MLE

The same curve may be fitted to the data via maximum likelihood.  In the following code output, note that the starting value needs to include log of the standard deviation ('lsigma') as the last parameter.

```{r, echo=TRUE}

fit = optim_fit(theta0=NULL, f.model=hill_model, x=x, y=y, fit.method="mle")
print(fit)
```

Maximum likelihood can also be used with variance model functions.  The following code illustrates the method with a power-of-the-mean variance model (weights_varPower) with $\sigma=0.7$ and $\phi = 0.5$.

```{r, echo=TRUE}

set.seed(876)

## Standard deviation = sigma*( Mean )^0.5
y2 = hill_model(theta, x) + rnorm( length(x), sd=0.7*sqrt(hill_model(theta, x)) )


fit = optim_fit(theta0=NULL, f.model=hill_model, x=x, y=y2, wts=weights_varPower,
                fit.method="mle", phi.fixed=FALSE)
print(fit)
```

### IRWLS

The same variance model may be fitted to the data using iterative reweighted least squares.

```{r, echo=TRUE}

fit = optim_fit(theta0=NULL, f.model=hill_model, x=x, y=y2, wts=weights_varPower,
                fit.method="irwls", phi.fixed=FALSE)
print(fit)


d = data.frame(x=x, y=y2)
pred = as.data.frame(predict(fit, x=2^seq(-6, 4, length=25), interval="confidence"))

ggplot( d, aes(x=x, y=y) ) + geom_point() + 
  geom_line( data=pred, aes(x, y.hat)) +
  geom_ribbon( data=pred, aes(x=x, y=y.hat, ymin=lower, ymax=upper), alpha=0.3) +
  scale_x_log10()

```


### Robust model fit

Iterative reweighted least squares may be implemented with a robust weighting scheme, such as Huber weighting, to down-weight outliers.  With OLS, the curve gets pulled slightly downward by the outliers, but with the Huber weighting, the outliers are virtually ignored.

```{r, echo=TRUE}

set.seed(456)
x = rep( c(0, 2^(-4:4)), each=4 )
theta = c(0, 100, log(.5), 2)
y = hill_model(theta, x) + rnorm( length(x), sd=2 )

## Create outliers
y[c(8, 11)] = y[c(8, 11)] + -30

## Ordinary least squares
fit = optim_fit(theta0=NULL, f.model=hill_model, x=x, y=y)


d = data.frame(x=x, y=y)
pred = as.data.frame(predict(fit, x=2^seq(-6, 4, length=25), interval="confidence"))

ggplot( d, aes(x=x, y=y) ) + geom_point() + 
  geom_line( data=pred, aes(x, y.hat)) +
  geom_ribbon( data=pred, aes(x=x, y=y.hat, ymin=lower, ymax=upper), alpha=0.3) +
  scale_x_log10() + ggtitle("OLS fit")

## IRWLS with Huber weights
fit = optim_fit(theta0=NULL, f.model=hill_model, x=x, y=y, wts=weights_huber,
                fit.method="irwls")

d = data.frame(x=x, y=y)
pred = as.data.frame(predict(fit, x=2^seq(-6, 4, length=25), interval="confidence"))

ggplot( d, aes(x=x, y=y) ) + geom_point() + 
  geom_line( data=pred, aes(x, y.hat)) +
  geom_ribbon( data=pred, aes(x=x, y=y.hat, ymin=lower, ymax=upper), alpha=0.3) +
  scale_x_log10() + ggtitle("IRWLS + Huber weights fit")


```

### Multiple starting values

As a convenience, optim_fit() can be called upon to try starting values in a neighborhood of the original starting value (either user-supplied *theta0* or the attr(f_model, 'start') value).  The following code illustrates its usage. The model is initially fitted with OLS and a singular starting value.  Although the fitted object claims that convergence is achieved, the $r^2=0.50$ value is contradictory.  When a grid of neighboring starting value is tried, one of them achieves a much lower BIC.  A graph of the model fit is shown with a single starting value (red) and the fixed (green) and random (blue) neighborhoods of starting values. Note that the green and blue curves overlap.

```{r, echo=TRUE}

set.seed(123)
theta = c(0, 100, log(0.25), -3, -2)
y = hill_switchpoint_model(theta, x) + rnorm( length(x), mean=0, sd=5 )

## Try OLS, but original starting value does not provide convergence
fit1 = optim_fit(NULL, hill_switchpoint_model, x=x, y=y) 
print(fit1)


## Fixed grid of equally-spaced starting values
fit2 = optim_fit(NULL, hill_switchpoint_model, x=x, y=y, ntry=50, 
       start.method="fixed", until.converge=FALSE)
print(fit2)

## Random grid of starting values
fit3 = optim_fit(NULL, hill_switchpoint_model, x=x, y=y, ntry=50, 
       start.method="fixed", until.converge=FALSE)
print(fit3)

d = data.frame(x=x, y=y)
x.seq = 2^seq(-6, 4, length=25)
pred = data.frame(x=rep(x.seq, 3),
                  Fit=rep(c("1 starting value", "Fixed grid", "Random grid"), each=25),
                  mu=c(predict(fit1, x=x.seq)[,"y.hat"],
                       predict(fit2, x=x.seq)[,"y.hat"],
                       predict(fit3, x=x.seq)[,"y.hat"])
                )

ggplot( d, aes(x=x, y=y) ) + geom_point() + 
  geom_line( data=pred, aes(x, mu, col=Fit)) +
  scale_x_log10() + theme(legend.position="top") +
  guides(col=guide_legend(ncol=2))


```

### User-defined model

To illustrate a simple user-defined model, a three-parameter Hill model is constructed with $e_{max}=100$.


```{r, echo=TRUE}

set.seed(456)
x = rep( c(0, 2^(-4:4)), each=4 )
theta = c(0, log(.5), 2)

hill3_model = function(theta, x){ 
  ## Parameters are theta = (emin, lec50, m).  Assumes emax = 100.
  mu = theta[1] + (100 - theta[1])/(1+exp(theta[3]*log(x) - theta[3]*theta[2]))
  return(mu)
  }

y = hill3_model(theta, x) + rnorm( length(x), sd=2 )

fit = optim_fit(theta0=c(emin=-1, lec50=log(0.5), m=1), f.model=hill3_model, x=x, y=y)
print(fit)
```


### Functions of parameters

As a convenience function, the standard error (using the *delta* method) and confidence interval for a function of model parameters may be obtained with  get_se_func() as demonstrated below.  The hill_model parameters are $(e_{min}, e_{max}, lec50, m)$ and the function of interest is $lec50 + (1/m) log( \frac{0.5 e_{max}}{ 0.5 e_{max} - e_{min}} )$.


```{r, echo=TRUE}

set.seed(456)
x = rep( c(0, 2^(-4:4)), each=4 )
theta = c(0, 100, log(.5), 2)
y = hill_model(theta, x) + rnorm( length(x), sd=2 )

fit = optim_fit(theta0=NULL, f.model=hill_model, x=x, y=y)

myfunc = function(theta){ theta[3] + (1/theta[4])*log( 0.5*theta[2]/(0.5*theta[2] - theta[1])) }
myfunc( coef(fit) )
get_se_func(object=fit, Func=myfunc) 
```

## Detecting outliers with ROUT

Instead of down-weighting outliers, sometimes it is more desirable to detect and eliminate them.  The Robust Outlier (ROUT) detection method of Motulsky and Brown (2008) has been implemented in the OptimModel package.  The Huber weights data example is repeated, but with the rout_fitter().

```{r, echo=TRUE}

set.seed(456)
x = rep( c(0, 2^(-4:4)), each=4 )
theta = c(0, 100, log(.5), 2)
y = hill_model(theta, x) + rnorm( length(x), sd=2 )

## Create outliers
y[c(8, 11)] = y[c(8, 11)] + -30

## ROUT
fit_r = rout_fitter(theta0=NULL, f.model=hill_model, x=x, y=y)
print(fit_r)

## Get the outliers and remove them
print(which(fit_r$outlier.adj))
index = !fit_r$outlier.adj
x.clean = x[index]
y.clean = y[index]

## Refit model with OLS, but outliers removed
fit = optim_fit(NULL, hill_model, x=x.clean, y=y.clean)


d = data.frame(x=x, y=y)
pred = as.data.frame(predict(fit, x=2^seq(-6, 4, length=25), interval="confidence"))

ggplot( d, aes(x=x, y=y) ) + geom_point() + 
  geom_line( data=pred, aes(x, y.hat)) +
  geom_ribbon( data=pred, aes(x=x, y=y.hat, ymin=lower, ymax=upper), alpha=0.3) +
  scale_x_log10() + ggtitle("OLS fit after ROUT")


```

## Summary

The OptimModel package provides a wide variety of curve-fitting options.  From an optim_fit() object, the user may obtain a printed summary of the model fit, residuals, predicted values with standard error, confidence intervals, and prediction intervals.  The standard error and confidence intervals may also be obtained for functions of the model parameters.  The user may opt to fit the model with OLS, MLE, or IRWLS.  The ROUT outlier detection method is also available.

## References

1. Motulsky, H.J. and Brown, R.E. (2006) Detecting Outliers When Fitting Data with Nonlinear Regression: A New Method Based on Robust Nonlinear Regression and the False Discovery Rate. BMC Bioinformatics, 7, 123.

2. Hill AV. (1910) The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves. J Physiol (Lond.) 40:iv–vii.

3 Verhulst, Pierre-François (1845). "Recherches mathématiques sur la loi d'accroissement de la population" [Mathematical Researches into the Law of Population Growth Increase]. Nouveaux Mémoires de l'Académie Royale des Sciences et Belles-Lettres de Bruxelles. 18: 8. Retrieved 18 February 2013. Nous donnerons le nom de logistique à la courbe [We will give the name logistic to the curve]