--- title: "Get started with steves" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Get started with steves} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4) ``` ```{r setup} library(steves) library(dplyr) library(ggplot2) ``` > Source: Rick Steves' Europe (compiled dataset). This dataset was created > from public sources for teaching purposes and is not an official or > verified Rick Steves' Europe dataset. It is shared with the permission of > the Rick Steves' Europe team. ## What's in `episodes`? ```{r} glimpse(episodes) ``` One row per episode, 13 seasons, 159 episodes spanning 2000–2025. Most columns fall into a few groups: - **Identity**: `overall_episode`, `season`, `episode_in_season`, `season_episode_code`. - **Editorial**: `title`, `synopsis`, `theme_tags`, `region`, `primary_destination`, `episode_type`, `is_retired`. - **Geography**: `primary_country`, `all_countries`, `iso2`, `flag`, `lat`, `long`, `geo_match`. - **Dates**: `original_air_date`, `air_year`, `air_month`, `air_weekday`, plus derived gap and span fields. - **IMDB**: `imdb_rating`, `imdb_votes`, `imdb_rating_shrunk`, `imdb_low_votes`, `imdb_url`, `imdb_tconst`. - **Canonical bests**: `image_url`, `best_summary`, `best_runtime`, plus `*_source` provenance flags. ## How often does Rick Steves visit each country? ```{r} episodes |> count(primary_country, flag, sort = TRUE) |> head(10) ``` ```{r country-bar, fig.height = 5, fig.alt = "Horizontal bar chart of the number of Rick Steves' Europe episodes set in each country, sorted from most to fewest."} episodes |> count(primary_country) |> filter(primary_country != "Multiple") |> mutate(primary_country = forcats::fct_reorder(primary_country, n)) |> ggplot(aes(n, primary_country)) + geom_col(fill = "#1B3A6B") + labs(title = "Episodes per country", x = "Episodes", y = NULL) + theme_minimal() ``` ## Are highly-rated episodes a particular kind of place? `imdb_rating_shrunk` pulls noisy ratings (some episodes have only 5 votes) toward the show mean. It is always populated, so it sorts cleanly. ```{r} episodes |> filter(!is.na(lat)) |> group_by(region) |> summarise(n = n(), median_rating = median(imdb_rating_shrunk)) |> arrange(desc(median_rating)) ``` ## Mapping the show `lat` and `long` are filled for ~93% of episodes. Compilation episodes ("Travel Skills Special", "Why We Travel") are intentionally `NA` — there is no single coordinate. ```{r leaflet, eval = requireNamespace("leaflet", quietly = TRUE)} library(leaflet) episodes |> filter(!is.na(lat)) |> leaflet() |> addTiles() |> addCircleMarkers( ~long, ~lat, radius = ~ pmax(3, imdb_rating_shrunk - 5), popup = ~ sprintf("%s
%s %s
%s", title, flag, primary_country, best_summary), color = "#1B3A6B", fillOpacity = 0.6, stroke = FALSE ) ``` ## Production cadence ```{r cadence, fig.alt = "Bar chart of episodes aired per calendar year, showing seasonal production cadence from 2000 to 2025."} episodes |> count(air_year) |> ggplot(aes(air_year, n)) + geom_col(fill = "#FFC72C") + labs(title = "Episodes aired per year", x = NULL, y = "Episodes") + theme_minimal() ```