---
title: "emery"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{emery}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(emery)
set.seed(65123)
```

Emery is a package for estimating accuracy statistics for multiple measurement
methods in the absence of a gold standard. It supports sets of methods which are
binary, ordinal, or continuous. 

The `generate_multimethod_data()` function can be used to simulate the results 
from paired measurements of a set of objects. 

```{r simulate binary data}
ex_bin_data <- 
  generate_multimethod_data(
    type = "binary",
    n_method = 3,
    n_obs = 200,
    se = c(0.85, 0.90, 0.95),
    sp = c(0.95, 0.90, 0.85),
    method_names = c("alpha", "beta", "gamma")
  )
ex_bin_data$generated_data[98:103, ]
```

The resulting list contains the simulated data as well as the parameters used
to generate it. If method, observation, or level (ordinal only) names are not
provided, default names will be applied

Estimating the accuracy statistics of each method is as simple as calling the
`estimate_ML()` function on the data set. The function expects the data to be a
matrix of results with each row representing an observation and each column 
representing a method. Starting values for the EM algorithm can be provided 
through the `init` argument, but these are not required.

```{r}
ex_bin <- 
  estimate_ML(
    type = "binary",
    data = ex_bin_data$generated_data,
    init = list(prev_1 = 0.8, se_1 = c(0.7, 0.8, 0.75), sp_1 = c(0.85, 0.95, 0.75))
  )
ex_bin
```

The result of this function is an S4 object of the class MultiMethodMLEstimate.
Basic plots illustrating the estimation process can be created by calling the 
standard `plot()` function on the object.

```{r, fig.width=7, fig.height=4}
plot(ex_bin)
```

If the true population parameters are known, as is the case with simulated data,
these can be provided to the plot function to enhance the information provided.

```{r, fig.width=7, fig.height=4}
plot(ex_bin, params = ex_bin_data$params)
```

The process for working with ordinal or continuous data is similar to above, 
though the inputs tend to be more complex.

To simulate ordinal data, we must supply the probability mass functions (pmf) associated 
with the method's levels for the "positive" and "negative" observations. It is 
assumed that "positive" observations correspond to higher levels.

An example pmf for detecting "positive" observations for 3 methods with 5 levels
may look something like this.

```{r}
pmf_pos_ex <- 
  matrix(
    c(
      c(0.05, 0.10, 0.15, 0.30, 0.40),
      c(0.00, 0.05, 0.20, 0.25, 0.50),
      c(0.10, 0.15, 0.20, 0.25, 0.30)
    ),
    nrow = 3, 
    byrow = TRUE
  )

pmf_pos_ex

```
We'll assume the pmf for negative observations is just the reverse of this for 
simplicity here.

```{r}
pmf_neg_ex <- pmf_pos_ex[, 5:1]
```


```{r}
ex_ord_data <- 
  generate_multimethod_data(
    type = "ordinal",
    n_method = 3,
    n_obs = 200,
    pmf_pos = pmf_pos_ex,
    pmf_neg = pmf_neg_ex,
    method_names = c("alice", "bob", "carrie"),
    level_names = c("strongly dislike", "dislike", "neutral", "like", "strongly like")
  )
ex_ord_data$generated_data[98:103, ]
```


```{r}
ex_ord <- 
  estimate_ML(
    type = "ordinal",
    data = ex_ord_data$generated_data,
    level_names = ex_ord_data$params$level_names
  )
ex_ord
```

```{r, fig.width=7, fig.height=4}
plot(ex_ord, params = ex_ord_data$params)
```

Unlike binary and ordinal methods which require 3 or more methods to create 
estimates, continuous method estimates can be produced with data from just 2.

```{r}
ex_con_data <- 
  generate_multimethod_data(
    type = "continuous",
    n_method = 3,
    n_obs = 200,
    method_names = c("phi", "kappa", "sigma")
  )
ex_con_data$generated_data[98:103, ]
```

Estimating the accuracy parameters is the same as above.

```{r}
ex_con <- 
  estimate_ML(
    type = "continuous",
    data = ex_con_data$generated_data
  )
ex_con
```

```{r, fig.width=7, fig.height=4}
plot(ex_con, params = ex_con_data$params)
```

Confidence intervals for all accuracy statistics can be estimated by bootstrap.
The `boot_ML()` function is a handy tool for generating bootstrapped estimates.

```{r}
ex_boot_bin <- boot_ML(
  type = "binary",
  data = ex_bin_data$generated_data,
  n_boot = 20
)

# print the estimates of sensitivity from the complete data set
ex_boot_bin$v_0@results$se_est

# print the first 3 bootstrap estimates of sensitivity
ex_boot_bin$v_star[[1]]$se_est
ex_boot_bin$v_star[[2]]$se_est
ex_boot_bin$v_star[[3]]$se_est
```