--- title: "Getting started with seqcomp" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with seqcomp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` ```{r setup} library(seqcomp) ``` ## Overview `seqcomp` compares two sequential probabilistic forecasters. The basic question is: > As outcomes arrive over time, is one forecaster consistently scoring better > than the other? The package follows the convention that all scores are positively oriented: larger scores are better. For two forecasters `p` and `q`, the pointwise score difference is \[ \hat{\delta}_t = S(p_t, y_t) - S(q_t, y_t). \] Positive values favour forecaster `p`; negative values favour forecaster `q`. ## A simple binary forecasting example Suppose we observe a sequence of binary outcomes. ```{r} set.seed(1) n <- 300 y <- rbinom(n, size = 1, prob = 0.55) ``` Now create two forecasters. Forecaster `p` is mildly informative: it tends to assign higher probability when the event occurs and lower probability when it does not. Forecaster `q` is less informative and stays closer to 0.5. ```{r} p <- ifelse(y == 1, 0.62, 0.38) q <- rep(0.50, n) head(data.frame(y = y, p = p, q = q)) ``` This toy example is deliberately simple. In a real application, `p` and `q` would come from two forecasting models, analysts, or institutions. ## Compare the forecasts The easiest workflow is to use `compare_forecasts()`. ```{r} cmp <- compare_forecasts( p = p, q = q, y = y, scoring_rule = "brier" ) head(cmp) ``` The output contains one row per time point. The most important columns are: - `score_p`: the score of forecaster `p`, - `score_q`: the score of forecaster `q`, - `delta`: the score difference `score_p - score_q`, - `estimate`: the running mean score difference, - `lower` and `upper`: the confidence sequence, - `e_pq` and `e_qp`: the two one-sided e-processes. ## Interpreting the confidence sequence The confidence sequence tracks the running average score advantage. ```{r} plot( cmp$t, cmp$estimate, type = "l", ylim = range(c(cmp$lower, cmp$upper, 0), finite = TRUE), xlab = "Time", ylab = "Mean score difference", main = "Sequential comparison using the Brier score" ) lines(cmp$t, cmp$lower, lty = 2) lines(cmp$t, cmp$upper, lty = 2) abline(h = 0, col = "gray50") ``` If the whole interval lies above zero, the data favour `p`. If the whole interval lies below zero, the data favour `q`. If the interval still contains zero, the comparison is not yet decisive at the chosen level. ## Interpreting the e-process The e-process is an evidence process. Larger values mean stronger evidence against the corresponding null hypothesis. For a two-sided comparison at level `alpha`, the rejection threshold used by `eprocess()` is `2 / alpha`. ```{r} alpha <- 0.05 threshold <- 2 / alpha threshold ``` The column `e_pq` gives evidence that `p` outperforms `q`. The column `e_qp` gives evidence that `q` outperforms `p`. ```{r} plot( cmp$t, cmp$e_pq, type = "l", log = "y", xlab = "Time", ylab = "e-process value", main = "Evidence that p outperforms q" ) abline(h = threshold, lty = 2, col = "gray50") ``` We can summarize rejection times with `eprocess_rejections()`. ```{r} eprocess_rejections(cmp, alpha = alpha) ``` ## Using lower-level functions directly The wrapper is convenient, but the package also exposes the individual pieces. First compute the scores. ```{r} score_p <- brier_score(p, y) score_q <- brier_score(q, y) head(score_p) head(score_q) ``` Then construct a confidence sequence. ```{r} cs <- cs_bernstein( scores1 = score_p, scores2 = score_q, alpha = 0.05, c = 2 ) head(cs) ``` And construct an e-process. ```{r} ep <- eprocess( scores1 = score_p, scores2 = score_q, alpha = 0.05, c = 2 ) head(ep) ``` This gives the same core information as `compare_forecasts()`, but with more manual control. ## Choosing a scoring rule For binary probability forecasts, `"brier"` is the safest starting point. ```{r} cmp_brier <- compare_forecasts(p, q, y, scoring_rule = "brier") tail(cmp_brier) ``` The spherical score is also bounded and can be used with finite-sample confidence sequences and e-processes. ```{r} cmp_spherical <- compare_forecasts(p, q, y, scoring_rule = "spherical") tail(cmp_spherical) ``` The logarithmic score is unbounded. Therefore `compare_forecasts()` uses an asymptotic confidence sequence by default and does not compute an e-process. ```{r} cmp_log <- compare_forecasts( p = p, q = q, y = y, scoring_rule = "log", compute_e = FALSE ) tail(cmp_log) ``` For binary log-score comparisons where the Winkler construction is appropriate, use `winkler_compare()`. ```{r} wcmp <- winkler_compare(p, q, y) names(wcmp) ``` ## Practical guidance Use `compare_forecasts()` for the common workflow. Use the lower-level functions when you want more control over the scoring rule, confidence sequence, e-process, lag handling, or predictable bounds. For bounded binary or categorical scores such as Brier and spherical scores, finite-sample confidence sequences and e-processes are available. For unbounded scores such as the log score, QLIKE, tick loss, and CRPS, be more careful. Use asymptotic confidence sequences, Winkler scores where applicable, or problem-specific predictable bounds.