---
title: "Custom logistic regression model"
output: rmarkdown::html_vignette
description: |
  Examples of using custom model to reproduce change point detection in
  logistic regression models.
vignette: >
  %\VignetteIndexEntry{Custom logistic regression model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = FALSE)
library(fastcpd)
```

We try to reproduce the logistic regression models with
custom cost functions and show that the results are similar to the built-in
logistic regression models.

The built-in logistic regression model in `fastcpd()` is implemented with the
help of the **fastglm** package. The **fastglm** package
utilizes the iteratively reweighted least squares with the step-halving approach
to help safeguard against convergence issues. If a
custom cost function is used with gradient descent, we should expect the results
will be similar to the built-in logistic regression model.

Specifying the `cost`, `cost_gradient` and `cost_hessian`
parameters below,
we can obtain similar results as the built-in logistic regression
model.

```{r fastcpd_binomial}
set.seed(1)
x <- matrix(rnorm(1500, 0, 1), ncol = 5)
theta <- rbind(rnorm(5, 0, 1), rnorm(5, 2, 1))
y <- c(
  rbinom(125, 1, 1 / (1 + exp(-x[1:125, ] %*% theta[1, ]))),
  rbinom(175, 1, 1 / (1 + exp(-x[126:300, ] %*% theta[2, ])))
)
binomial_data <- data.frame(y = y, x = x)

result <- fastcpd.binomial(cbind(y, x), r.progress = FALSE, cost_adjustment = NULL)
#> Warning: fit_glm: fitted probabilities numerically 0 or 1 occurred

#> Warning: fit_glm: fitted probabilities numerically 0 or 1 occurred

#> Warning: fit_glm: fitted probabilities numerically 0 or 1 occurred

#> Warning: fit_glm: fitted probabilities numerically 0 or 1 occurred
summary(result)
#>
#> Call:
#> fastcpd.binomial(data = cbind(y, x), r.progress = FALSE, cost_adjustment = NULL)
#>
#> Change points:
#> 126
#>
#> Cost values:
#> 56.90525 30.76875
#>
#> Parameters:
#>    segment 1 segment 2
#> 1  0.7259293  1.878525
#> 2 -1.0294802  2.704376
#> 3  1.0576503  3.702310
#> 4 -0.8812767  2.258796
#> 5  0.2419351  2.524173
```

```{r custom_cost_functions}
logistic_loss <- function(data, theta) {
  x <- data[, -1]
  y <- data[, 1]
  u <- x %*% theta
  nll <- -y * u + log(1 + exp(u))
  nll[u > 10] <- -y[u > 10] * u[u > 10] + u[u > 10]
  sum(nll)
}
logistic_gradient <- function(data, theta) {
  x <- data[nrow(data), -1]
  y <- data[nrow(data), 1]
  c(-(y - 1 / (1 + exp(-x %*% theta)))) * x
}
logistic_hessian <- function(data, theta) {
  x <- data[nrow(data), -1]
  prob <- 1 / (1 + exp(-x %*% theta))
  (x %o% x) * c((1 - prob) * prob)
}
result <- fastcpd(
  y ~ . - 1, binomial_data, epsilon = 1e-5, cost = logistic_loss,
  cost_gradient = logistic_gradient, cost_hessian = logistic_hessian,
  r.progress = FALSE
)
summary(result)
#>
#> Call:
#> fastcpd(formula = y ~ . - 1, data = binomial_data, cost = logistic_loss,
#>     cost_gradient = logistic_gradient, cost_hessian = logistic_hessian,
#>     epsilon = 1e-05, r.progress = FALSE)
#>
#> Change points:
#> 22 125
#>
#> Parameters:
#>   segment 1  segment 2 segment 3
#> 1 -59.20045  0.8170446  1.902379
#> 2 -34.56676 -0.9600438  2.751578
#> 3 216.53373  0.9353306  3.734179
#> 4 -80.96420 -0.7393653  2.247423
#> 5  51.25224  0.1390591  2.535372
```

Note that the result obtained through custom cost functions is inferior
compared to
the one obtained through built-in models.
We remark that the results can be improved with extra parameters
already provided in the package.
The detailed discussion of several advanced usages of the package can be found
in
[Advanced examples](examples-advanced.html).

# Notes

The evaluation of this vignette is set to be `FALSE`.

# Appendix: all code snippets

```{r ref.label = knitr::all_labels(), echo = TRUE}
```