--- title: "Getting Started" author: "Matthew Cornell" date: "2022-04-04" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message=FALSE, warning=FALSE, eval = nzchar(Sys.getenv("IS_DEVELOPMENT_MACHINE")) ) ``` # Getting Started with zoltr zoltr is an R package that simplifies access to the [zoltardata.com](https://www.zoltardata.com/) API. This vignette takes you through the package's main features. So that you can experiment without needing a Zoltar account, we use the example project from [docs.zoltardata.com](https://docs.zoltardata.com/), which should always be available for public read-only access. # Setting up your account You need to have an account on Zoltar and be authenticated to the server in order to access data from the API. Once you have an account, we recommend storing your Zoltar username and password in your .Renviron file. In practice this means having a file named `.Renviron` in your home directory. (You can read more about R and environment variables [here](https://db.rstudio.com/best-practices/managing-credentials/#use-environment-variables).) The lines of code in this vignette will work if you have the following two lines somewhere in your `.Renviron` file (where you replace your username and password in the appropriate locations). Note there is no space around the `=` sign: ```bash Z_USERNAME=insert-your-username-here Z_PASSWORD=insert-your-password-here ``` Note that the Zoltar service uses a "token"-based scheme for authentication. These tokens have a five-minute expiration for security, which requires re-authentication after that period of time. The zoltr library takes care of re-authenticating as needed by passing your username and password back to the server to get another token. Note that the connection object returned by the `new_connection` function stores a token internally, so be careful if saving that object into a file. ## Connect to the host and authenticate The starting point for working with Zoltar's API is a `ZoltarConnection` object, obtained via the `new_connection` function. Most zoltr functions take a `ZoltarConnection` along with the API _URL_ of the thing of interest, e.g., a project, model, or forecast. API URLs look like `https://www.zoltardata.com/api/project/3/`, which is that of the "Docs Example Project". An important note regarding URLs: zoltr's convention for URLs is to require a trailing slash character ('/') on all URLs. The only exception is the optional `host` parameter passed to `new_connection()`. Thus, `https://www.zoltardata.com/api/project/3/` is valid, but `https://www.zoltardata.com/api/project/3` is not. You can obtain a URL using some of the `*_info` functions, and you can always use the web interface to navigate to the item of interest and look at its URL in the browser address field. Keep in mind that you'll need to add `api` to the browsable address, along with the trailing slash character. For example, if you browsed the _Docs Example Project_ project at (say) `https://www.zoltardata.com/project/3` then its API for use in zoltr would be `https://www.zoltardata.com/api/project/3/`. ```{r setup, include=FALSE} library(httr) # o/w devtools::check() gets `could not find function "POST"` error ``` ```{r, include=FALSE} library(zoltr) zoltar_connection <- new_connection(host = Sys.getenv("Z_HOST")) zoltar_authenticate(zoltar_connection, Sys.getenv("Z_USERNAME"), Sys.getenv("Z_PASSWORD")) zoltar_connection ``` ```{r, eval=FALSE, include=TRUE} library(zoltr) zoltar_connection <- new_connection() zoltar_authenticate(zoltar_connection, Sys.getenv("Z_USERNAME"), Sys.getenv("Z_PASSWORD")) zoltar_connection ``` ## Get a list of all projects on the host Now that you have a connection, you can use the `projects()` function to get all projects as a `data.frame`. Note that it will only list those that you are authorized to access, i.e., all public projects plus any private ones that you own or are a model owner. ```{r} the_projects <- projects(zoltar_connection) str(the_projects) ``` ## Get a project to work with and list its info and models Let's start by getting a public project to work with. We will search the projects list for it by name. Then we will pass its URL to the `project_info()` function to get a `list` of details, and then pass it to the `models()` function to get a `data.frame` of its models. ```{r} project_url <- the_projects[the_projects$name == "Docs Example Project", "url"] the_project_info <- project_info(zoltar_connection, project_url) names(the_project_info) the_project_info$description the_models <- models(zoltar_connection, project_url) str(the_models) ``` There is other project-related information that you can access, such as its configuration (`zoltar_units()`, `targets()`, and `timezeros()` - concepts that are explained at [docs.zoltardata.com](https://docs.zoltardata.com/) - and `truth()` ## Query a project's forecast data You can query a project's forecast data using the `submit_query()` function. Keep in mind that Zoltar enqueues long operations like querying and uploading forecasts, which keeps the site responsive but makes the Zoltar API a little more complicated. Rather than having the `submit_query()` function _block_ until the query is done, you instead get a quick response in the form of a `Job` URL that you can pass to the `job_info()` function to check its status and find out if the upload is pending, successfully finished, or failed. (This is called _polling_ the host to ask the status.) Here we poll every second using the `busy_poll_job()` helper function. Then we use the `job_data()` function when the query is successfully completed to get the results as a `data.frame`. > Note: You may find the `do_zoltar_query()` function helpful, which combines `submit_query()`, `busy_poll_job()`, and `job_data()` in one call. Putting it together, we'll show the long way to do it (for reference) but use `do_zoltar_query()` to actually run the example: ```{r, eval=FALSE, include=TRUE} query <- list("targets" = list("pct next week", "cases next week"), "types" = list("point")) job_url <- submit_query(zoltar_connection, project_url, "forecasts", query) busy_poll_job(zoltar_connection, job_url) the_job_data <- job_data(zoltar_connection, job_url) the_job_data ``` ```{r} forecast_data <- do_zoltar_query(zoltar_connection, project_url, "forecasts", "docs_mod", c("loc1", "loc2"), c("pct next week", "cases next week"), c("2011-10-02", "2011-10-09", "2011-10-16"), types = c("point", "quantile")) forecast_data ``` Hopefully you'll see "SUCCESS" eventually printed and then the resulting data itself. > Note: Zoltar returns a 404 Not Found error if `job_data()` is called on a Job that has no underlying data file (Zoltar saves query results as temporary files on the server). This can happen for two reasons: 1) 24 hours has passed (the expiration time for temporary files) or 2) the Job is not complete and therefore there is no data file yet. As noted above, you can avoid the latter condition by using `busy_poll_job()` to ensure the job is done. > Note: Zoltar limits the number of rows a query can return, giving you an error if they are exceeded. The job's failure message will indicate whether this has happened. ## Query a project's truth data Similarly, querying truth is done by passing a `query_type` of `"truth"`. Further, only the `units`, `targets`, `timezeros`, and `as_of` args are allowed: ```{r} truth_data <- do_zoltar_query(zoltar_connection, project_url, "truth", NULL, c("loc1", "loc2"), c("pct next week", "cases next week"), c("2011-10-02", "2011-10-09", "2011-10-16"), "2020-12-18 12:00:00 UTC") truth_data ``` ## Get project's latest forecast IDs and their sources This is a somewhat specialized function that returns the `ID` and `source` of the latest versions of a project's forecasts. (Later we may generalize to allow passing specific columns to retrieve, such as 'forecast_model_id', 'time_zero_id', 'issued_at', 'created_at', 'source', and 'notes'.) ```{r} the_latest_forecasts <- latest_forecasts(zoltar_connection, project_url) the_latest_forecasts ``` ## Get a model to work with and list its info and forecasts Now let's work with a particular model, getting its URL by name and then passing it to the `model_info()` function to get details. Then use the `forecasts()` function to get a `data.frame` of that model's forecasts (there is only one). Note that obtaining the model's URL is straightforward because it is provided in the `url` column of `the_models`. ```{r} model_url <- the_models[the_models$name == "docs forecast model", "url"] the_model_info <- model_info(zoltar_connection, model_url) names(the_model_info) the_model_info$name the_forecasts <- forecasts(zoltar_connection, model_url) str(the_forecasts) ``` ## Finally, download the forecast's data in three formats You can get forecast data using the `download_forecast()` function, which returns a nested `list` format that corresponds to Zoltar's native JSON one. That format can be converted to a CSV-friendly `data.frame` via `data_frame_from_forecast_data()`, which can represent all prediction types, or `quantile_data_frame_from_forecast_data()` for users who are mainly interested in `point` and `quantile` data. Please see [docs.zoltardata.com](https://docs.zoltardata.com/) for forecast format details. ```{r} forecast_url <- the_forecasts[1, "url"] forecast_info <- forecast_info(zoltar_connection, forecast_url) forecast_data <- download_forecast(zoltar_connection, forecast_url) length(forecast_data$predictions) ``` As a `data.frame`: ```{r} forecast_data_frame <- data_frame_from_forecast_data(forecast_data) str(forecast_data_frame) ``` And just quantile data: ```{r} forecast_data_frame <- quantile_data_frame_from_forecast_data(forecast_data) str(forecast_data_frame) ```