Title: | Modeling and Forecasting Visitor Counts Using Social Media |
---|---|
Description: | Performs modeling and forecasting of park visitor counts using social media data and (partial) on-site visitor counts. Specifically, the model is built based on an automatic decomposition of the trend and seasonal components of the social media-based park visitor counts, from which short-term forecasts of the visitor counts and percent changes in the visitor counts can be made. A reference for the underlying model that 'VisitorCounts' uses can be found at Russell Goebel, Austin Schmaltz, Beth Ann Brackett, Spencer A. Wood, Kimihiro Noguchi (2023) <doi:10.1002/for.2965> . |
Authors: | Robert Bowen [aut, cre], Russell Goebel [aut], Beth Ann Brackett [ctb], Kimihiro Noguchi [aut], Dylan Way [aut] |
Maintainer: | Robert Bowen <[email protected]> |
License: | GPL-3 |
Version: | 2.0.2 |
Built: | 2024-10-31 22:25:14 UTC |
Source: | CRAN |
Automatically decomposes a time series using singular spectrum analysis. See package Rssa for details on singular spectrum analysis.
auto_decompose( time_series, suspected_periods = c(12, 6, 4, 3), proportion_of_variance_type = c("leave_out_first", "total"), max_proportion_of_variance = 0.995, log_ratio_cutoff = 0.2, window_length = "auto", num_trend_components = 2 )
auto_decompose( time_series, suspected_periods = c(12, 6, 4, 3), proportion_of_variance_type = c("leave_out_first", "total"), max_proportion_of_variance = 0.995, log_ratio_cutoff = 0.2, window_length = "auto", num_trend_components = 2 )
time_series |
A vector which stores the time series of interest in the log scale. |
suspected_periods |
A vector which stores the suspected periods in the descending order of importance. The default option is c(12,6,4,3), corresponding to 12, 6, 4, and 3 months. |
proportion_of_variance_type |
A character string specifying the option for choosing the maximum number of eigenvalues based on the proportion of total variance explained. If "leave_out_first" is chosen, then the contribution made by the first eigenvector is ignored; otherwise, if "total" is chosen, then the contribution made by all the eigenvectors is considered. |
max_proportion_of_variance |
A numeric specifying the proportion of total variance explained using the method specified in proportion_of_variance_type. The default option is 0.995. |
log_ratio_cutoff |
A numeric specifying the threshold for the deviation between the estimated period and candidate periods in suspected_periods. The default option is 0.2, which means that, if the absolute log ratio between the estimated and candidate period is within 0.2 (approximately a 20% difference), then the estimated period is deemed equal to the candidate period. |
window_length |
A character string or positive integer specifying the window length for the SSA estimation. If "auto" is chosen, then the algorithm automatically selects the window length by taking a multiple of 12 which does not exceed half the length of time_series. The default option is "auto". |
num_trend_components |
A positive integer specifying the number of eigenvectors to be chosen for describing the trend in SSA. The default option is 2. |
reconstruction |
A list containing important information about the reconstructed time series. In particular, it contains the reconstructed main trend component, overall trend component, seasonal component for each period specified in suspected_periods, and overall seasonal component. |
grouping |
A matrix containing information about the locations of the eigenvalue groups for each period in suspected_periods and trend component. The locations are indicated by '1'. |
window_length |
A numeric indicating the window length. |
ts_ssa |
An ssa object storing the singular spectrum analysis decomposition. |
data("park_visitation") ### Decompose national parks service visitor counts and flickr photo user-days # parameters --------------------------------------------- suspected_periods <- c(12,6,4,3) proportion_of_variance_type = "leave_out_first" max_proportion_of_variance <- 0.995 log_ratio_cutoff <- 0.2 # load data ---------------------------------------------- park <- "YELL" #for Yellowstone National Park nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) # decompose time series and plot decompositions ----------- decomp_pud <- auto_decompose(pud_ts, suspected_periods, proportion_of_variance_type = proportion_of_variance_type, max_proportion_of_variance, log_ratio_cutoff) plot(decomp_pud) decomp_nps <- auto_decompose(nps_ts,suspected_periods, proportion_of_variance_type = proportion_of_variance_type, max_proportion_of_variance,log_ratio_cutoff) plot(decomp_nps)
data("park_visitation") ### Decompose national parks service visitor counts and flickr photo user-days # parameters --------------------------------------------- suspected_periods <- c(12,6,4,3) proportion_of_variance_type = "leave_out_first" max_proportion_of_variance <- 0.995 log_ratio_cutoff <- 0.2 # load data ---------------------------------------------- park <- "YELL" #for Yellowstone National Park nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) # decompose time series and plot decompositions ----------- decomp_pud <- auto_decompose(pud_ts, suspected_periods, proportion_of_variance_type = proportion_of_variance_type, max_proportion_of_variance, log_ratio_cutoff) plot(decomp_pud) decomp_nps <- auto_decompose(nps_ts,suspected_periods, proportion_of_variance_type = proportion_of_variance_type, max_proportion_of_variance,log_ratio_cutoff) plot(decomp_nps)
Check arguments.
check_arguments( popularity_proxy, onsite_usage, constant, omit_trend, trend, ref_series, is_input_logged, ... )
check_arguments( popularity_proxy, onsite_usage, constant, omit_trend, trend, ref_series, is_input_logged, ... )
popularity_proxy |
A vector which stores a time series which may be used as a proxy for the monthly popularity of social media over time. The length of |
onsite_usage |
A vector which stores monthly on-site usage for a particular social media platform and recreational site. |
constant |
A numeric specifying the constant term (beta0) in the model. This constant is understood as the mean log adjusted monthly visitation relative to the base month. The default option is 0, implying that the (logged) |
omit_trend |
This is obsolete and is left only for compatibility. In other words, |
trend |
A character string specifying how the trend is modeled. Can be any of NULL, "linear", "none", and "estimated", where "none" and "estimated" correspond to |
ref_series |
A numeric vector specifying the original visitation series. The default option is NULL, implying that no such series is available. If such series is available, then its length must be the same as that of |
is_input_logged |
A boolean specifying if the input is logged or not |
... |
Additional arguments. |
No return value, called for extra information.
method for converting a timerseries to a dataframe so that it can be plotted with ggplot2 and keep a Date x-axis.
convert_ts_forecast_to_df(forecast)
convert_ts_forecast_to_df(forecast)
forecast |
timeseries object to convert |
Decomposes the popularity proxy time series into trend and seasonality components.
decompose_proxy( onsite_usage, popularity_proxy = NULL, suspected_periods = c(12, 6, 4, 3), proportion_of_variance_type = c("leave_out_first", "total"), max_proportion_of_variance = 0.995, log_ratio_cutoff = 0.2, window_length = "auto", num_trend_components = 2, criterion = c("cross-correlation", "MSE", "rank"), possible_lags = -36:36, leave_off = 6, estimated_change = 0, order_of_polynomial_approximation = 7, order_of_derivative = 1, ref_series = NULL, constant = 0, beta = "estimate", slope = 0, is_input_logged = FALSE, spline = FALSE, parameter_estimates = c("separate", "joint"), omit_trend = TRUE, trend = c("linear", "none", "estimated"), onsite_usage_decomposition, ... )
decompose_proxy( onsite_usage, popularity_proxy = NULL, suspected_periods = c(12, 6, 4, 3), proportion_of_variance_type = c("leave_out_first", "total"), max_proportion_of_variance = 0.995, log_ratio_cutoff = 0.2, window_length = "auto", num_trend_components = 2, criterion = c("cross-correlation", "MSE", "rank"), possible_lags = -36:36, leave_off = 6, estimated_change = 0, order_of_polynomial_approximation = 7, order_of_derivative = 1, ref_series = NULL, constant = 0, beta = "estimate", slope = 0, is_input_logged = FALSE, spline = FALSE, parameter_estimates = c("separate", "joint"), omit_trend = TRUE, trend = c("linear", "none", "estimated"), onsite_usage_decomposition, ... )
onsite_usage |
A vector which stores monthly on-site usage for a particular social media platform and recreational site. |
popularity_proxy |
A vector which stores a time series which may be used as a proxy for the monthly popularity of social media over time. The length of |
suspected_periods |
A vector which stores the suspected periods in the descending order of importance. The default option is c(12,6,4,3), corresponding to 12, 6, 4, and 3 months if observations are monthly. |
proportion_of_variance_type |
A character string specifying the option for choosing the maximum number of eigenvalues based on the proportion of total variance explained. If "leave_out_first" is chosen, then the contribution made by the first eigenvector is ignored; otherwise, if "total" is chosen, then the contribution made by all the eigenvectors is considered. |
max_proportion_of_variance |
A numeric specifying the proportion of total variance explained using the method specified in |
log_ratio_cutoff |
A numeric specifying the threshold for the deviation between the estimated period and candidate periods in suspected_periods. The default option is 0.2, which means that if the absolute log ratio between the estimated and candidate period is within 0.2 (approximately a 20 percent difference), then the estimated period is deemed equal to the candidate period. |
window_length |
A character string or positive integer specifying the window length for the SSA estimation. If "auto" is chosen, then the algorithm automatically selects the window length by taking a multiple of 12 which does not exceed half the length of |
num_trend_components |
A positive integer specifying the number of eigenvectors to be chosen for describing the trend in SSA. The default option is 2. This is relevant only when |
criterion |
A character string specifying the criterion for estimating the lag in |
possible_lags |
A numeric vector specifying all the candidate lags for |
leave_off |
A positive integer specifying the number of observations to be left off when estimating the lag. The default option is 6. This is relevant only when |
estimated_change |
A numeric specifying the estimated change in the visitation trend. The default option is 0, implying no change in the trend. |
order_of_polynomial_approximation |
A numeric specifying the order of the polynomial approximation of the difference between time series used in |
order_of_derivative |
A numeric specifying the order of derivative for the approximated difference between lagged |
ref_series |
A numeric vector specifying the original visitation series. The default option is NULL, implying that no such series is available. If such series is available, then its length must be the same as that of |
constant |
A numeric specifying the constant term (beta0) in the model. This constant is understood as the mean log adjusted monthly visitation relative to the base month. The default option is 0, implying that the (logged) |
beta |
A numeric or a character string specifying the seasonality adjustment factor (beta1). The default option is "estimate", in which case, it is estimated by using the Fisher's z-transformed lag-12 autocorrelation. Even if an actual value is supplied, if |
slope |
A numeric specifying the slope coefficient (beta2) in the model. This constant is applicable only when |
is_input_logged |
A Boolean describing whether the |
spline |
A Boolean specifying whether or not to use a smoothing spline for the lag estimation. This is relevant only when |
parameter_estimates |
A character string specifying how to estimate beta and constant parameters should a reference series be supplied. Both options use least squares estimates, but "separate" indicates that the differenced series should be used to estimate beta separately from the constant, while "joint" indicates to estimate both using non-differenced detrended series. |
omit_trend |
This is obsolete and is left only for compatibility. In other words, |
trend |
A character string specifying how the trend is modeled. Can be any of NULL, "linear", "none", and "estimated", where "none" and "estimated" correspond to |
onsite_usage_decomposition |
A "decomposition" class object containing decomposition data for the onsite usage time series (outputs from 'auto_decompose'). |
... |
Additional arguments to be passed onto the smoothing spline ( |
proxy_decomposition |
A "decomposition" object representing the automatic decomposition obtained from popularity_proxy (see |
lagged_proxy_trend_and_forecasts_window |
A 'ts' object storing the potentially lagged popularity proxy trend and any forecasts needed due to the lag. |
ts_trend_window |
A 'ts' object storing the trend component of the onsite social media usage. This trend component is potentially truncated to match available popularity proxy data. |
ts_seasonality_window |
A 'ts' object storing the seasonality component of the onsite social media usage. This seasonality component is potentially truncated to match available popularity proxy data. |
latest_starttime |
A 'tsp' attribute of a 'ts' object representing the latest of the two start times of the potentially lagged populairty proxy and the onsite social media usage. |
endtime |
A 'tsp' attribute of a 'ts' object representing the time of the final onsite usage observation. |
forecasts_needed |
An integer representing the number of forecasts of popularity_proxy needed to obtain all fitted values. Negative values indicate extra observations which may be useful for predictions. |
lag_estimate |
A list storing both the MSE-based esitmate and rank-based estimates for the lag. |
Uses polynomial approximation and derivatives for time series objects to estimate lag between series.
estimate_lag( time_series1, time_series2, possible_lags, method = c("cross-correlation", "MSE", "rank"), leave_off, estimated_change = 0, order_of_polynomial_approximation = 7, order_of_derivative = 1, spline = FALSE, ... )
estimate_lag( time_series1, time_series2, possible_lags, method = c("cross-correlation", "MSE", "rank"), leave_off, estimated_change = 0, order_of_polynomial_approximation = 7, order_of_derivative = 1, spline = FALSE, ... )
time_series1 |
A numeric vector which stores the time series of interest in the log scale. |
time_series2 |
A numeric vector which stores the trend proxy time series in the log scale. The length of trend_proxy must be the same as that of time_series1. |
possible_lags |
A numeric vector specifying all the candidate lags for trend_proxy. The default option is -36:36. |
method |
A character vector specifying the method used to obtain the lag estimate. "polynomial" uses polynomial approximation, while "cross-correlation" uses cross-correlation. |
leave_off |
A positive integer specifying the number of observations to be left off when estimating the lag. |
estimated_change |
A numeric specifying the estimated change in the visitation trend. The default option is 0, implying no change in the trend. |
order_of_polynomial_approximation |
A numeric specifying the order of the polynomial approximation of the difference between time series used in |
order_of_derivative |
A numeric specifying the order of derivative for the approximated difference between time_series1 and lagged time_series2. The default option is 1, the first derivative. |
spline |
A Boolean specifying whether or not to use a smoothing spline for the lag estimation. |
... |
Additional arguments to be passed onto the |
cc_lag |
A numeric indicating the estimated lag with the cross-correlation criterion. |
mse_criterion |
A numeric indicating the estimated lag with the MSE criterion. |
rank_criterion |
A numeric indicating the estimate lag with the rank criterion. |
# Generate dataset with known lag and recover this lag --------------#' lag <- 3 n <- 156 start_year <- 2005 frequency <- 12 trend_function <- function(x) x^2 x <- seq(-3,3, length.out = n) y1 <- ts(trend_function(x),start = start_year, freq = frequency) y2 <- stats::lag(y1, k = lag) # Recover lag estimate_lag(y1,y2, possible_lags = -36:36, method = "rank",leave_off = 0, spline = FALSE)
# Generate dataset with known lag and recover this lag --------------#' lag <- 3 n <- 156 start_year <- 2005 frequency <- 12 trend_function <- function(x) x^2 x <- seq(-3,3, length.out = n) y1 <- ts(trend_function(x),start = start_year, freq = frequency) y2 <- stats::lag(y1, k = lag) # Recover lag estimate_lag(y1,y2, possible_lags = -36:36, method = "rank",leave_off = 0, spline = FALSE)
Estimate the two parameters (y-intercept and seasonality factor) for the visitation model.
estimate_parameters( popularity_proxy_decomposition_data = NULL, onsite_usage, onsite_usage_decomposition, omit_trend, trend, ref_series, constant, beta, slope, parameter_estimates, is_input_logged, ... )
estimate_parameters( popularity_proxy_decomposition_data = NULL, onsite_usage, onsite_usage_decomposition, omit_trend, trend, ref_series, constant, beta, slope, parameter_estimates, is_input_logged, ... )
popularity_proxy_decomposition_data |
A "decomposition" class object containing decomposition data for the popularity proxy time series (outputs from |
onsite_usage |
A vector which stores monthly onsite usage for a particular social media platform and recreational site. |
onsite_usage_decomposition |
A "decomposition" class object containing decomposition data for the monthly onsite usage time series (outputs from |
omit_trend |
This is obsolete and is left only for compatibility. In other words, |
trend |
A character string specifying how the trend is modeled. Can be any of NULL, "linear", "none", and "estimated", where "none" and "estimated" correspond to |
ref_series |
A numeric vector specifying the original visitation series. The default option is NULL, implying that no such series is available. If such series is available, then its length must be the same as that of |
constant |
A numeric specifying the constant term (beta0) in the model. This constant is understood as the mean log adjusted monthly visitation relative to the base month. The default option is 0, implying that the (logged) |
beta |
A numeric or a character string specifying the seasonality adjustment factor (beta1). The default option is "estimate", in which case, it is estimated by using the Fisher's z-transformed lag-12 autocorrelation. Even if an actual value is supplied, if |
slope |
A numeric specifying the slope coefficient (beta2) in the model. This constant is applicable only when |
parameter_estimates |
A character string specifying how to estimate beta and constant parameters should a reference series be supplied. Both options use least squares estimates, but "separate" indicates that the differenced series should be used to estimate beta separately from the constant, while "joint" indicates to estimate both using non-differenced detrended series. |
is_input_logged |
A boolean specifying if the input is logged or not |
... |
Additional arguments. |
lagged_proxy_trend_and_forecasts_window |
A 'ts' object storing the potentially lagged popularity proxy trend and any forecasts needed due to the lag. |
ts_trend_window |
A 'ts' object storing the trend component of the onsite social media usage. This trend component is potentially truncated to match available popularity proxy data. |
ts_seasonality_window |
A 'ts' object storing the seasonality component of the onsite social media usage. This seasonality component is potentially truncated to match available popularity proxy data. |
latest_starttime |
A 'tsp' attribute of a 'ts' object representing the latest of the two start times of the potentially lagged populairty proxy and the onsite social media usage. |
endtime |
A 'tsp' attribute of a 'ts' object representing the time of the final onsite usage observation. |
beta |
A numeric storing the estimated seasonality adjustment factor. |
constant |
A numeric storing estimated constant term used in the model. |
slope |
A numeric storing the estimated slope term used in the model. Applicable when the trend parameter is "linear". Otherwise, NULL is returned. |
Fit the visitation model.
fit_model( parameter_estimates_and_time_series_windows, omit_trend, trend, is_input_logged, ... )
fit_model( parameter_estimates_and_time_series_windows, omit_trend, trend, is_input_logged, ... )
parameter_estimates_and_time_series_windows |
# a list storing the outputs of |
omit_trend |
This is obsolete and is left only for compatibility. In other words, |
trend |
A character string specifying how the trend is modeled. Can be any of NULL, "linear", "none", and "estimated", where "none" and "estimated" correspond to |
is_input_logged |
a Boolean specifying if the input is logged or not. |
... |
Additional arguments |
visitation_fit |
A vector storing fitted values of visitation model. |
A time series representing the popularity of Flickr in the United States, as measured in user-days. Here, user-days count the number of unique users posting on Flickr on a given day.
flickr_userdays
flickr_userdays
A time series object with 156 observations.
Flickr. (2019). Retrieved October, 2019, from https://flickr.com/
A data frame storing monthly visitation counts by National Forest Service (NFS) for 4 popular US national parks and associated Flickr photo-user-days (PUD). Here, photo-user-days (PUD) count the number of unique users posting a photo on Flickr on a given day from within the boundaries of a given National Forest.
forest_visitation
forest_visitation
A data frame with 995 observations and 4 variables.
Date of monthly observation, in year-month-day format.
National Forest 3 letter identifier code, except for San Juan County which is labled as SJC.
Flickr photo-user-days (PUD). Here, PUD count the number of unique users posting a photo on flickr on a given day from within the boundaries of a given National Forest.
Annual Visitation count for the corresponding forest and year given by the National Forest Service (NFS) and then distributed monthly utilizing the PUD as a proxy.
Flickr (2022). Retrieved August, 2022, from https://flickr.com/
Generating proxy trend forecasts from objects of the class "visitation_model".
generate_proxy_trend_forecasts( object, n_ahead, starttime, endtime, proxy_trend_correction, ts_frequency )
generate_proxy_trend_forecasts( object, n_ahead, starttime, endtime, proxy_trend_correction, ts_frequency )
object |
A visitation model object. |
n_ahead |
The number of desired forecasts. |
starttime |
The start time of the desired forecasts. |
endtime |
The end time of the desired forecasts. |
proxy_trend_correction |
The lag correction needed on the proxy trend. |
ts_frequency |
Frequency of the time series to forecast. |
A time series object storing forecasts for the proxy trend.
Imputation by replacing negative infinities with appropriate numbers.
imputation(x)
imputation(x)
x |
A numeric vector (usually the log visitation counts or photo-user days). |
A numeric vector with the negative infinities replaced with appropriate numbers.
Class for visitation_model predictions (for use with predict.visitation_model()).
label_visitation_forecast(visitation_forecast, label)
label_visitation_forecast(visitation_forecast, label)
visitation_forecast |
A visitation_forecast object |
label |
A character string of the label of forecast |
Object of class "visitation_forecast_ensemble".
Constructs objects of the "decomposition" class.
new_decomposition(reconstruction_list, grouping_matrix, window_length, ts_ssa)
new_decomposition(reconstruction_list, grouping_matrix, window_length, ts_ssa)
reconstruction_list |
A list containing important information about the reconstructed time series. In particular, it contains the reconstructed main trend component, overall trend component, seasonal component for each period specified in suspected_periods, and overall seasonal component. |
grouping_matrix |
A matrix containing information about the locations of the eigenvalue groups for each period in suspected_periods and trend component. The locations are indicated by '1'. |
window_length |
A numeric indicating the window length. |
ts_ssa |
An object of the class "ssa". |
A list of the class "decomposition".
Class for visitation_model predictions (for use with predict.visitation_model()).
new_visitation_forecast( forecasts, logged_forecasts, differenced_logged_forecasts, differenced_standard_forecasts, n_ahead, proxy_forecasts, onsite_usage_forecasts, beta, constant, slope, criterion, past_observations, lag_estimate )
new_visitation_forecast( forecasts, logged_forecasts, differenced_logged_forecasts, differenced_standard_forecasts, n_ahead, proxy_forecasts, onsite_usage_forecasts, beta, constant, slope, criterion, past_observations, lag_estimate )
forecasts |
A time series of forecasts for the visitation model. the forecasts will be in the standard scale of visitors per month |
logged_forecasts |
A time series of the logged forecasts for the visitation model. |
differenced_logged_forecasts |
A time series of the differenced logged forecasts for the visitation model. |
differenced_standard_forecasts |
A time series of the exponentiated differenced logged forecasts that are for the visitation model. |
n_ahead |
An integer describing the number of forecasts made. |
proxy_forecasts |
A time series of forecasts of the popularity proxy series. |
onsite_usage_forecasts |
A time series of forecasts of the original time series. |
beta |
A numeric or a character string specifying the seasonality adjustment factor. (beta_1) |
constant |
A numeric specifying the constant term in the model. This constant is understood as the mean of the trend-adjusted time_series. (beta_0) |
slope |
A numeric specifying the slope term in the model when a linear trend is assumed. (beta_2) |
criterion |
One of "MSE" or "Nonparametric", to specify the criterion used to select the lag. |
past_observations |
One of "none", "fitted", or "ref_series". If "fitted", past model fitted values are used. If "ref_series", the reference series in the visitation model object is used. Note that if difference = TRUE, one of these is needed to forecast the first difference. |
lag_estimate |
A numeric value specifying the estimated lag in the visitation model. |
Object of class "labeled_visitation_forecast".
Object of class "Visitation_forecast".
Class for plotting an array of visitation_forecast objects
new_visitation_forecast_ensemble(visitation_forecasts, labels)
new_visitation_forecast_ensemble(visitation_forecasts, labels)
visitation_forecasts |
An array of visitation_forecast object |
labels |
An array of labels associated with visitation_forecast |
Constructs objects of the "visitation_model" class.
new_visitation_model( visitation_fit, differenced_fit, beta, constant, slope, lag_estimate, proxy_decomposition, onsite_usage_decomposition, forecasts_needed, ref_series, criterion, omit_trend, trend, call )
new_visitation_model( visitation_fit, differenced_fit, beta, constant, slope, lag_estimate, proxy_decomposition, onsite_usage_decomposition, forecasts_needed, ref_series, criterion, omit_trend, trend, call )
visitation_fit |
A time series storing the fitted values of the visitation model. |
differenced_fit |
A time series storing the differenced fitted values of the visitation model. |
beta |
Seasonality adjustment factor. (beta_1) |
constant |
A numeric describing the constant term used in the model. (beta_0) |
slope |
A numeric describing the slope term used in the model when trend is set to "linear". (beta_2) |
lag_estimate |
An integer representing the lag parameter for the model fit. |
proxy_decomposition |
A decomposition class object representing the decomposition of a popularity measure (e.g., US Photo-User-Days). |
onsite_usage_decomposition |
A decomposition class object representing the decomposition of time series (e.g., park Photo-User-Days). |
forecasts_needed |
An integer describing how many forecasts for the proxy_decomposition are needed for the fit. |
ref_series |
A reference time series (or NULL) used in the model fit. |
criterion |
A character string specifying the criterion for estimating the lag in popularity_proxy. If "cross-correlation" is chosen, it chooses the lag that maximizes the correlation coefficient between lagged popularity_proxy and onsite_usage. If "MSE" is chosen, it does so by identifying the lagged popularity_proxy whose derivative is closest to that of onsite_usage by minimizing the mean squared error. If "rank" is chosen, it does so by firstly ranking the square errors of the derivatives and identifying the lag which would minimize the mean rank. |
omit_trend |
This is obsolete and is left only for compatibility. A Boolean specifying whether or not to consider the NPS trend to be zero. |
trend |
A character string specifying how the trend is modeled. Can be any of NULL, "linear", "none", and "estimated", where "none" and "estimated" correspond to |
call |
A call for the visitation model. |
A list of the class "model_forecast".
A data frame storing monthly visitation counts by National Park Service (NPS) for 20 popular US national parks and associated Flickr photo-user-days (PUD). Here, photo-user-days (PUD) count the number of unique users posting a photo on Flickr on a given day from within the boundaries of a given National Park.
park_visitation
park_visitation
A data frame with 3276 rows and 4 variables.
Date of monthly observation, in year-month-day format.
National Park alpha code identifying a National Park.
Flickr photo-user-days (PUD). Here, PUD count the number of unique users posting a photo on flickr on a given day from within the boundaries of a given National Park.
Visitation count for the corresponding park and month given by the National Park Service (NPS).
National Park Service (2018). National park service visitor use statistics. Retrieved May 10, 2018 from https://irma.nps.gov/Stats/
Flickr (2019). Retrieved October, 2019, from https://flickr.com/
Methods for plotting objects of the class "decomposition".
## S3 method for class 'decomposition' plot(x, type = c("full", "period", "classical"), legend = TRUE, ...)
## S3 method for class 'decomposition' plot(x, type = c("full", "period", "classical"), legend = TRUE, ...)
x |
An object of class "decomposition". |
type |
A character string. One of "full","period", or "classical". If "full", the full reconstruction is plotted. If "period", the reconstruction of each period is plotted individually. If "classical", the trend and seasonality are plotted. |
legend |
A Boolean specifying whether a legend should be added when type is "full". The default option is TRUE. |
... |
Additional arguments. |
A plot of the reconstruction in the "decomposition" class object.
data("park_visitation") park <- "YELL" nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, frequency = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, frequency = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, frequency = 12) nps_ts <- log(nps_ts) decomposition_pud <- auto_decompose(pud_ts) decomposition_nps <- auto_decompose(nps_ts) plot(decomposition_pud,lwd = 2) plot(decomposition_pud,type = "period") plot(decomposition_pud,type = "classical") plot(decomposition_nps,legend = TRUE) plot(decomposition_nps,type = "period") plot(decomposition_nps,type = "classical")
data("park_visitation") park <- "YELL" nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, frequency = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, frequency = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, frequency = 12) nps_ts <- log(nps_ts) decomposition_pud <- auto_decompose(pud_ts) decomposition_nps <- auto_decompose(nps_ts) plot(decomposition_pud,lwd = 2) plot(decomposition_pud,type = "period") plot(decomposition_pud,type = "classical") plot(decomposition_nps,legend = TRUE) plot(decomposition_nps,type = "period") plot(decomposition_nps,type = "classical")
Methods for plotting objects of the class "visitation_forecast".
## S3 method for class 'visitation_forecast' plot( x, difference = FALSE, log_outputs = FALSE, actual_visitation = NULL, xlab = "Time", ylab = "Fitted Value", pred_color = "#228B22", actual_color = "#FF0000", size = 1.5, main = "Forecasts for Visitation Model", plot_points = FALSE, date_breaks = "1 month", date_labels = "%y %b", ... )
## S3 method for class 'visitation_forecast' plot( x, difference = FALSE, log_outputs = FALSE, actual_visitation = NULL, xlab = "Time", ylab = "Fitted Value", pred_color = "#228B22", actual_color = "#FF0000", size = 1.5, main = "Forecasts for Visitation Model", plot_points = FALSE, date_breaks = "1 month", date_labels = "%y %b", ... )
x |
An object of the "visitation_forecast" class. |
difference |
A boolean to plot the differenced series. |
log_outputs |
A boolean to plot the logged outputs of the forecast. |
actual_visitation |
A timeseries object representing the actual visitation that will be plotted along site the visitation_forecast object. |
xlab |
A string that will be used for the xlabel of the plot. |
ylab |
A string that will be used for the ylabel of the plot. |
pred_color |
a String that will be used for the predicted series color of the plot. |
actual_color |
a String that will be used for the actual series color of the plot. |
size |
A number that represents the thickness of the lines being plotted. |
main |
A string that will be used for the title of the plot. |
plot_points |
a boolean to specify if the plot should be points or continous line. |
date_breaks |
A string to represent the distance between dates that the x-axis should be in. ex "1 month", "1 year". |
date_labels |
A string to represent the format of the x-axis time labels. ex |
... |
extra arguments to pass in |
No return value, called for plotting objects of the class "visitation_forecast".
#' #Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) mf <- visitation_model(pud_ts,trend_proxy) vf <- predict(mf,12, only_new = TRUE) plot(vf)
#' #Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) mf <- visitation_model(pud_ts,trend_proxy) vf <- predict(mf,12, only_new = TRUE) plot(vf)
Method for plotting forecast ensemble.
## S3 method for class 'visitation_forecast_ensemble' plot( x, difference = FALSE, log_outputs = FALSE, plot_cumsum = FALSE, plot_percent_change = FALSE, actual_visitation = NULL, actual_visitation_label = "Actual", xlab = "Time", ylab = "Fitted Value", pred_colors = c("#ff6361", "#58508d", "#bc5090", "#003f5c"), actual_color = "#ffa600", size = 1.5, main = "Forecasts for Visitation Model", plot_points = FALSE, date_breaks = "1 month", date_labels = "%y %b", ... )
## S3 method for class 'visitation_forecast_ensemble' plot( x, difference = FALSE, log_outputs = FALSE, plot_cumsum = FALSE, plot_percent_change = FALSE, actual_visitation = NULL, actual_visitation_label = "Actual", xlab = "Time", ylab = "Fitted Value", pred_colors = c("#ff6361", "#58508d", "#bc5090", "#003f5c"), actual_color = "#ffa600", size = 1.5, main = "Forecasts for Visitation Model", plot_points = FALSE, date_breaks = "1 month", date_labels = "%y %b", ... )
x |
An object of class visitation_forecast_ensemble. |
difference |
A Boolean specifying whether to plot the original fit or differenced series. The default option is FALSE, in which case, the series is not differenced. |
log_outputs |
whether to log the outputted forecasts or not |
plot_cumsum |
whether to plot the cumulative sum or not |
plot_percent_change |
whether to plot the percent change or not |
actual_visitation |
A timeseries object representing the actual visitation that will be plotted along site the visitation_forecast object |
actual_visitation_label |
a string that will be used for the label of the actual visitation. |
xlab |
A string that will be used for the xlabel of the plot |
ylab |
A string that will be used for the ylabel of the plot |
pred_colors |
an array of Strings that will be used for the predicted series colors of the plot |
actual_color |
a String that will be used for the actual series color of the plot, |
size |
A number that represents the thickness of the lines being plotted |
main |
A string that will be used for the title of the plot |
plot_points |
a boolean to specify if the plot should be points or continous line. |
date_breaks |
A string to represent the distance between dates that the x-axis should be in. ex "1 month", "1 year" |
date_labels |
A string to represent the format of the x-axis time labels. |
... |
extra arguments to pass in |
No return value, called for plotting objects of the class "visitation_forecast".
Methods for plotting objects of the class "decomposition".
## S3 method for class 'visitation_model' plot(x, type = c("fitted"), difference = FALSE, ...)
## S3 method for class 'visitation_model' plot(x, type = c("fitted"), difference = FALSE, ...)
x |
An object of class "decomposition". |
type |
A character string. One of "full","period", or "classical". If "full", the full reconstruction is plotted. If "period", the reconstruction of each period is plotted individually. If "classical", the trend and seasonality are plotted. |
difference |
A Boolean specifying whether to plot the original fit or differenced series. The default option is FALSE, in which case, the series is not differenced. |
... |
Additional arguments. |
No return value, called for plotting objects of the class "decomposition".
data("park_visitation") data("flickr_userdays") park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) nps_decomp <- auto_decompose(nps_ts) trend_proxy <- log(flickr_userdays) vm <- visitation_model(pud_ts,trend_proxy,ref_series = nps_ts) plot(vm)
data("park_visitation") data("flickr_userdays") park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) nps_decomp <- auto_decompose(nps_ts) trend_proxy <- log(flickr_userdays) vm <- visitation_model(pud_ts,trend_proxy,ref_series = nps_ts) plot(vm)
Methods for generating predictions from objects of the class "decomposition".
## S3 method for class 'decomposition' predict(object, n_ahead, only_new = TRUE, ...)
## S3 method for class 'decomposition' predict(object, n_ahead, only_new = TRUE, ...)
object |
An object of class "decomposition". |
n_ahead |
An integer describing the number of forecasts to make. |
only_new |
A Boolean describing whether or not to include past values. |
... |
Additional arguments. |
forecasts |
A vector with overall forecast values. |
trend_forecasts |
A vector with trend forecast values. |
seasonality_forecasts |
A vector with seasonality forecast values. |
data("park_visitation") suspected_periods <- c(12,6,4,3) proportion_of_variance_type = "leave_out_first" max_proportion_of_variance <- 0.995 log_ratio_cutoff <- 0.2 park <- "DEVA" nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) decomp_pud <- auto_decompose(pud_ts, suspected_periods, proportion_of_variance_type = proportion_of_variance_type, max_proportion_of_variance, log_ratio_cutoff) n_ahead = 36 pud_predictions <- predict(decomp_pud,n_ahead = n_ahead, only_new = FALSE)
data("park_visitation") suspected_periods <- c(12,6,4,3) proportion_of_variance_type = "leave_out_first" max_proportion_of_variance <- 0.995 log_ratio_cutoff <- 0.2 park <- "DEVA" nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) decomp_pud <- auto_decompose(pud_ts, suspected_periods, proportion_of_variance_type = proportion_of_variance_type, max_proportion_of_variance, log_ratio_cutoff) n_ahead = 36 pud_predictions <- predict(decomp_pud,n_ahead = n_ahead, only_new = FALSE)
Methods for generating predictions from objects of the class "visitation_model".
## S3 method for class 'visitation_model' predict( object, n_ahead, only_new = TRUE, past_observations = c("fitted", "reference"), ... )
## S3 method for class 'visitation_model' predict( object, n_ahead, only_new = TRUE, past_observations = c("fitted", "reference"), ... )
object |
An object of class "visitation_model". |
n_ahead |
An integer indicating how many observations to forecast. |
only_new |
A Boolean specifying whether to include only the forecasts (if TRUE) or the full reconstruction (if FALSE). The default option is TRUE. |
past_observations |
A character string; one of "fitted" or "reference". Here, "fitted" uses the fitted values of the visitation model, while "reference" uses values supplied in ‘ref_series’. |
... |
Additional arguments. |
A predictions for the automatic decomposition.
forecasts |
A vector with forecast values. |
n_ahead |
A numeric that shows the number of steps ahead. |
proxy_forecasts |
A vector for the proxy of trend forecasts. |
onsite_usage_forecasts |
A vector for the visitation forecasts. |
beta |
A numeric for the seasonality adjustment factor. |
constant |
A numeric for the value of the constant in the model. |
slope |
A numeric for the value of the slope term in the model when trend is set to "linear". |
criterion |
A string which specifies the method used to select the appropriate lag. Only applicable if the trend component is part of the forecasts. |
past_observations |
A vector which specifies the fitted values for the past observations. |
lag_estimate |
A numeric for the estimated lag. Only applicable if the trend component is part of the forecasts. |
data("park_visitation") data("flickr_userdays") n_ahead <- 36 park <- "ROMO" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, frequency = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, frequency = 12) nps_ts <- log(nps_ts) popularity_proxy <- log(flickr_userdays) vm <- visitation_model(pud_ts,popularity_proxy, ref_series = nps_ts, trend = "linear") predict_vm <- predict(vm,n_ahead, only_new = FALSE, past_observations = "reference") plot(predict_vm, ) predict_vm2 <- predict(vm,n_ahead, only_new = FALSE, past_observations = "reference") plot(predict_vm2)
data("park_visitation") data("flickr_userdays") n_ahead <- 36 park <- "ROMO" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, frequency = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, frequency = 12) nps_ts <- log(nps_ts) popularity_proxy <- log(flickr_userdays) vm <- visitation_model(pud_ts,popularity_proxy, ref_series = nps_ts, trend = "linear") predict_vm <- predict(vm,n_ahead, only_new = FALSE, past_observations = "reference") plot(predict_vm, ) predict_vm2 <- predict(vm,n_ahead, only_new = FALSE, past_observations = "reference") plot(predict_vm2)
Notfy the user of details related to the outputs of the model being potentially inaccurate when constant of model is 0.
prediction_warning(constant)
prediction_warning(constant)
constant |
The B_0 parameter of the model. |
No return value
S3 method for summarizing objects of the class "decomposition".
## S3 method for class 'decomposition' print(x, ...)
## S3 method for class 'decomposition' print(x, ...)
x |
An object of class "decomposition". |
... |
Additional arguments. |
A "decomposition" class object.
data("park_visitation") park <- "YELL" nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) decomposition_pud <- auto_decompose(pud_ts) decomposition_nps <- auto_decompose(nps_ts) summary(decomposition_pud) summary(decomposition_nps)
data("park_visitation") park <- "YELL" nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) decomposition_pud <- auto_decompose(pud_ts) decomposition_nps <- auto_decompose(nps_ts) summary(decomposition_pud) summary(decomposition_nps)
Methods for summarizing objects of the class "decomposition".
## S3 method for class 'visitation_forecast' print(x, ...)
## S3 method for class 'visitation_forecast' print(x, ...)
x |
An object of class "decomposition". |
... |
Additional arguments. |
A "decomposition" class object.
#Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) mf <- visitation_model(pud_ts,trend_proxy) vf <- predict(mf,12, only_new = FALSE) summary(vf)
#Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) mf <- visitation_model(pud_ts,trend_proxy) vf <- predict(mf,12, only_new = FALSE) summary(vf)
Methods for summarizing objects of the class "decomposition".
## S3 method for class 'visitation_model' print(x, ...)
## S3 method for class 'visitation_model' print(x, ...)
x |
An object of class "decomposition". |
... |
Additional arguments. |
A "decomposition" class object.
#Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) vm <- visitation_model(pud_ts,trend_proxy) summary(vm)
#Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) vm <- visitation_model(pud_ts,trend_proxy) summary(vm)
S3 method for summarizing objects of the class "decomposition".
## S3 method for class 'decomposition' summary(object, ...)
## S3 method for class 'decomposition' summary(object, ...)
object |
An object of class "decomposition". |
... |
Additional arguments. |
A "decomposition" class object.
data("park_visitation") park <- "YELL" nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts)#' decomposition_pud <- auto_decompose(pud_ts) decomposition_nps <- auto_decompose(nps_ts) summary(decomposition_pud) summary(decomposition_nps)
data("park_visitation") park <- "YELL" nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts) pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, freq = 12) nps_ts <- log(nps_ts)#' decomposition_pud <- auto_decompose(pud_ts) decomposition_nps <- auto_decompose(nps_ts) summary(decomposition_pud) summary(decomposition_nps)
Methods for summarizing objects of the class "decomposition".
## S3 method for class 'visitation_forecast' summary(object, ...)
## S3 method for class 'visitation_forecast' summary(object, ...)
object |
An object of class "decomposition". |
... |
Additional arguments. |
A "decomposition" class object.
#Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) mf <- visitation_model(pud_ts,trend_proxy) vf <- predict(mf,12, only_new = FALSE) summary(vf)
#Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) mf <- visitation_model(pud_ts,trend_proxy) vf <- predict(mf,12, only_new = FALSE) summary(vf)
Methods for summarizing objects of the class "decomposition".
## S3 method for class 'visitation_model' summary(object, ...)
## S3 method for class 'visitation_model' summary(object, ...)
object |
An object of class "decomposition". |
... |
Additional arguments. |
A "decomposition" class object.
#Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) vm <- visitation_model(pud_ts,trend_proxy) summary(vm)
#Example: data("park_visitation") data("flickr_userdays") n_ahead <- 12 park <- "YELL" pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, freq = 12) pud_ts <- log(pud_ts) trend_proxy <- log(flickr_userdays) vm <- visitation_model(pud_ts,trend_proxy) summary(vm)
Makes sure that the provided onsite_usage and ref_series have at least 12 counts and overlap.
trim_training_data(onsite_usage = NULL, ref_series = NULL)
trim_training_data(onsite_usage = NULL, ref_series = NULL)
onsite_usage |
A vector which stores monthly on-site usage for a particular social media platform and recreational site. |
ref_series |
A numeric vector specifying the original visitation series. The default option is NULL, implying that no such series is available. If such series is available, then its length must be the same as that of |
a list of onsite_usage and ref_series that has been trimmed and modified to share same window of time.
Fits a time series model that uses social media posts and popularity of the social media to model visitation to recreational sites.
visitation_model( onsite_usage, popularity_proxy = NULL, suspected_periods = c(12, 6, 4, 3), proportion_of_variance_type = c("leave_out_first", "total"), max_proportion_of_variance = 0.995, log_ratio_cutoff = 0.2, window_length = "auto", num_trend_components = 2, criterion = c("cross-correlation", "MSE", "rank"), possible_lags = -36:36, leave_off = 6, estimated_change = 0, order_of_polynomial_approximation = 7, order_of_derivative = 1, ref_series = NULL, constant = 0, beta = "estimate", slope = 0, is_input_logged = FALSE, spline = FALSE, parameter_estimates = c("joint", "separate"), omit_trend = TRUE, trend = c("linear", "none", "estimated"), ... )
visitation_model( onsite_usage, popularity_proxy = NULL, suspected_periods = c(12, 6, 4, 3), proportion_of_variance_type = c("leave_out_first", "total"), max_proportion_of_variance = 0.995, log_ratio_cutoff = 0.2, window_length = "auto", num_trend_components = 2, criterion = c("cross-correlation", "MSE", "rank"), possible_lags = -36:36, leave_off = 6, estimated_change = 0, order_of_polynomial_approximation = 7, order_of_derivative = 1, ref_series = NULL, constant = 0, beta = "estimate", slope = 0, is_input_logged = FALSE, spline = FALSE, parameter_estimates = c("joint", "separate"), omit_trend = TRUE, trend = c("linear", "none", "estimated"), ... )
onsite_usage |
A vector which stores monthly on-site usage for a particular social media platform and recreational site. |
popularity_proxy |
A vector which stores a time series which may be used as a proxy for the monthly popularity of social media over time. The length of |
suspected_periods |
A vector which stores the suspected periods in the descending order of importance. The default option is c(12,6,4,3), corresponding to 12, 6, 4, and 3 months if observations are monthly. |
proportion_of_variance_type |
A character string specifying the option for choosing the maximum number of eigenvalues based on the proportion of total variance explained. If "leave_out_first" is chosen, then the contribution made by the first eigenvector is ignored; otherwise, if "total" is chosen, then the contribution made by all the eigenvectors is considered. |
max_proportion_of_variance |
A numeric specifying the proportion of total variance explained using the method specified in |
log_ratio_cutoff |
A numeric specifying the threshold for the deviation between the estimated period and candidate periods in suspected_periods. The default option is 0.2, which means that if the absolute log ratio between the estimated and candidate period is within 0.2 (approximately a 20 percent difference), then the estimated period is deemed equal to the candidate period. |
window_length |
A character string or positive integer specifying the window length for the SSA estimation. If "auto" is chosen, then the algorithm automatically selects the window length by taking a multiple of 12 which does not exceed half the length of |
num_trend_components |
A positive integer specifying the number of eigenvectors to be chosen for describing the trend in SSA. The default option is 2. This is relevant only when |
criterion |
A character string specifying the criterion for estimating the lag in |
possible_lags |
A numeric vector specifying all the candidate lags for |
leave_off |
A positive integer specifying the number of observations to be left off when estimating the lag. The default option is 6. This is relevant only when |
estimated_change |
A numeric specifying the estimated change in the visitation trend. The default option is 0, implying no change in the trend. |
order_of_polynomial_approximation |
A numeric specifying the order of the polynomial approximation of the difference between time series used in |
order_of_derivative |
A numeric specifying the order of derivative for the approximated difference between lagged |
ref_series |
A numeric vector specifying the original visitation series. The default option is NULL, implying that no such series is available. If such series is available, then its length must be the same as that of |
constant |
A numeric specifying the constant term (beta0) in the model. This constant is understood as the mean log adjusted monthly visitation relative to the base month. The default option is 0, implying that the (logged) |
beta |
A numeric or a character string specifying the seasonality adjustment factor (beta1). The default option is "estimate", in which case, it is estimated by using the Fisher's z-transformed lag-12 autocorrelation. Even if an actual value is supplied, if |
slope |
A numeric specifying the slope coefficient (beta2) in the model. This constant is applicable only when |
is_input_logged |
A Boolean describing whether the |
spline |
A Boolean specifying whether or not to use a smoothing spline for the lag estimation. This is relevant only when |
parameter_estimates |
A character string specifying how to estimate beta and constant parameters should a reference series be supplied. Both options use least squares estimates, but "separate" indicates that the differenced series should be used to estimate beta separately from the constant, while "joint" indicates to estimate both using non-differenced detrended series. |
omit_trend |
This is obsolete and is left only for compatibility. In other words, |
trend |
A character string specifying how the trend is modeled. Can be any of NULL, "linear", "none", and "estimated", where "none" and "estimated" correspond to |
... |
Additional arguments to be passed onto the smoothing spline ( |
visitation_fit |
A vector storing fitted values of visitation model. |
differenced_fit |
A vector storing differenced fitted values of visitation model. (Equal to |
constant |
A numeric storing estimated constant term used in the model (beta0). |
beta |
A numeric storing the estimated seasonality adjustment factor (beta1). |
slope |
A numeric storing estimated slope coefficient term used in the model (beta2). |
proxy_decomposition |
A "decomposition" object representing the automatic decomposition obtained from |
time_series_decomposition |
A "decomposition" object representing the automatic decomposition obtained from |
forecasts_needed |
An integer representing the number of forecasts of |
lag_estimate |
A list storing both the MSE-based estimate and rank-based estimates for the lag. |
criterion |
A string; one of "cross-correlation", "MSE", or "rank", specifying the method used to select the appropriate lag. |
ref_series |
The reference series, if one was supplied. |
omit_trend |
Whether or not trend was considered 0 in the model. This is obsolete and is left only for compatibility. |
trend |
The trend used in the model. |
call |
The model call. |
See predict.visitation_model
for forecast methods, estimate_lag
for details on the lag estimation, and auto_decompose
for details on the automatic decomposition of time series using singular spectrum analysis (SSA). See the package Rssa for details regarding singular spectrum analysis.
### load data -------------------- data("park_visitation") data("flickr_userdays") park <- "YELL" #Yellowstone National Park pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, frequency = 12) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, frequency = 12) ### fit three models --------------- vm_pud_linear <- visitation_model(onsite_usage = pud_ts, ref_series = nps_ts, parameter_estimates = "joint", trend = "linear") vm_pud_only <- visitation_model(onsite_usage = pud_ts, popularity_proxy = flickr_userdays, trend = "estimated") vm_ref_series <- visitation_model(onsite_usage = pud_ts, popularity_proxy = flickr_userdays, ref_series = nps_ts, parameter_estimates = "separate", possible_lags = -36:36, trend = "none") ### visualize fit ------------------ plot(vm_pud_linear, ylim = c(-3,3), difference = TRUE) lines(diff(nps_ts), col = "red") plot(vm_pud_only, ylim = c(-3,3), difference = TRUE) lines(diff(nps_ts), col = "red") plot(vm_ref_series, ylim = c(-3,3), difference = TRUE) lines(diff(nps_ts), col = "red")
### load data -------------------- data("park_visitation") data("flickr_userdays") park <- "YELL" #Yellowstone National Park pud_ts <- ts(park_visitation[park_visitation$park == park,]$pud, start = 2005, frequency = 12) nps_ts <- ts(park_visitation[park_visitation$park == park,]$nps, start = 2005, frequency = 12) ### fit three models --------------- vm_pud_linear <- visitation_model(onsite_usage = pud_ts, ref_series = nps_ts, parameter_estimates = "joint", trend = "linear") vm_pud_only <- visitation_model(onsite_usage = pud_ts, popularity_proxy = flickr_userdays, trend = "estimated") vm_ref_series <- visitation_model(onsite_usage = pud_ts, popularity_proxy = flickr_userdays, ref_series = nps_ts, parameter_estimates = "separate", possible_lags = -36:36, trend = "none") ### visualize fit ------------------ plot(vm_pud_linear, ylim = c(-3,3), difference = TRUE) lines(diff(nps_ts), col = "red") plot(vm_pud_only, ylim = c(-3,3), difference = TRUE) lines(diff(nps_ts), col = "red") plot(vm_ref_series, ylim = c(-3,3), difference = TRUE) lines(diff(nps_ts), col = "red")
Convert annual counts into monthly counts using photo-user-days.
yearsToMonths(visitation_years, pud)
yearsToMonths(visitation_years, pud)
visitation_years |
A numeric vector with annual visitation counts. If not available, NA should be entered. |
pud |
A numeric vector for the monthly photo-user-days corresponding to |
A numeric vector with estimated monthly visitation counts based on the annual counts and monthly photo-user-days.