Title: | Robust Exponential Decreasing Index |
---|---|
Description: | Implementation of the Robust Exponential Decreasing Index (REDI), proposed in the article by Issa Moussa, Arthur Leroy et al. (2019) <https://bmjopensem.bmj.com/content/bmjosem/5/1/e000573.full.pdf>. The REDI represents a measure of cumulated workload, robust to missing data, providing control of the decreasing influence of workload over time. Various functions are provided to format data, compute REDI, and visualise results in a simple and convenient way. |
Authors: | Alexia Grenouillat [aut, cre], Arthur Leroy [aut] |
Maintainer: | Alexia Grenouillat <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2024-12-25 06:38:06 UTC |
Source: | CRAN |
Compute REDI for a specific input
compute_redi(data, coef = 0.1)
compute_redi(data, coef = 0.1)
data |
A tibble or data frame, containing an |
coef |
A number corresponding to the lambda coefficient, controlling the decay of the exponential weights. |
A number, corresponding to the REDI value at the last Input
time,
computed over the whole period.
data <- simu_db() compute_redi(data = data, coef = 0.1)
data <- simu_db() compute_redi(data = data, coef = 0.1)
Format the dataset to the syntax of REDI functions
format_data( data, input = 1, output = 2, by = "day", format = "%Y%m%d", summary_duplicate = mean )
format_data( data, input = 1, output = 2, by = "day", format = "%Y%m%d", summary_duplicate = mean )
data |
A tibble or data frame containing one column indicating time and another indicating the quantity for which we want to compute REDI. |
input |
A character or a number, indicating whether the name or the index of the input column (time). |
output |
A character or a number, indicating whether the name or the index of the output column (workload). |
by |
A number or a character string, indicating the reference time period between two observations. Possible values are 'day', 'week', 'month', 'year', or any arbitrary number. See documentation of the 'seq()' for additional information if necessary. Default is 'day'. |
format |
A character string, indicating the date format of the input.
Please read |
summary_duplicate |
A function, used to summarise Output values for duplicated Input values. Default is mean. |
A tibble with Input and Output columns and explicit missing values between observations.
TRUE
TRUE
Compute the evolution of REDI over successive inputs
loop_redi(data, coef = 0.1)
loop_redi(data, coef = 0.1)
data |
A tibble or data frame, containing an |
coef |
A number corresponding to the lambda coefficient, controlling the decay of the exponential weights. Default is 0.1. |
A tibble similar to data
, containing an additional REDI
column computed over the successive input values.
data <- simu_db() loop_redi(data = data, coef = 0.1)
data <- simu_db() loop_redi(data = data, coef = 0.1)
Display the evolution of REDI over time and data points.
plot_redi( redi, plot_data = TRUE, x_axis = "Input", y_axis = "Output", alpha = 0.2, size = 1 )
plot_redi( redi, plot_data = TRUE, x_axis = "Input", y_axis = "Output", alpha = 0.2, size = 1 )
redi |
A tibble or data frame containing 4 mandatory columns : |
plot_data |
A boolean, indicating whether original data should be displayed. Default is TRUE. |
x_axis |
A character string, label of the x-axis. Default is 'Input'. |
y_axis |
A character string, label of the y-axis. Default is 'Output'. |
alpha |
A number, between 0 and 1, controlling the transparency of data points. Default is 0.2. |
size |
A number, controlling the size of the data points. Default is 1. |
Graph of the evolution of REDI over time, possibly for different values of Lambda, along with the original data points.
TRUE
TRUE
Wrapper function that converts the dataset to the adequate format, compute
values of REDI for each Input
values, display a generic plot of the results
and return a tibble containing both data and corresponding REDI values.
redi( data, coef = c(0.05, 0.1, 0.5), input = 1, output = 2, plot = TRUE, by = "day", format = "%Y%m%d", summary_duplicate = mean )
redi( data, coef = c(0.05, 0.1, 0.5), input = 1, output = 2, plot = TRUE, by = "day", format = "%Y%m%d", summary_duplicate = mean )
data |
A tibble or a data frame, containing an |
coef |
A number or vector, containing the values of the lambda coefficient used in the REDI computations, controlling the decay of the exponential weights. Default is c(0.05, 0.1, 0.5). |
input |
A character or a number, indicating the name or the
index of the |
output |
A character or a number, indicating the name or the
index of the |
plot |
A boolean, indicating whether results should be displayed. is TRUE. |
by |
A number or a character string, indicating the reference time period between two observations. Possible values are 'day', 'week', 'month', 'year', or any arbitrary number. See documentation of the 'seq()' for additional information if necessary. Default is 'day'. |
format |
A character string, indicating the date format of the input.
Please read |
summary_duplicate |
A function, used to summarise Output values for duplicated Input values. Default is mean. |
A tibble containing 4 columns : Input
(without duplicates),
Output
, Lambda
and REDI
, which corresponds to the vector
returned by the loop_REDI()
function.
data <- simu_db() redi <- redi(data)
data <- simu_db() redi <- redi(data)
Simulate a complete training dataset, which may be representative of various
applications. Several flexible arguments allow adjustment of the range of
observed days, the distribution and the mean of Output
values, as well as
the ratio of missing data.
simu_db( start_date = "2022-01-01", end_date = "2023-01-01", by = "day", output_distrib = "Gaussian", ratio_missing = 0.5, mean = 50, var = 10, range_unif = c(0, 100) )
simu_db( start_date = "2022-01-01", end_date = "2023-01-01", by = "day", output_distrib = "Gaussian", ratio_missing = 0.5, mean = 50, var = 10, range_unif = c(0, 100) )
start_date |
A date, indicating the starting time of observations. Default is '2022-01-01'. |
end_date |
A date, indicating the ending time of observations. Default is '2023-01-01'. |
by |
A number or a character string, indicating the reference time time period between two observations. Possible values are 'day', 'week', 'month', 'year', or any arbitrary number. See documentation of the 'seq()' for additional information if necessary. Default is 'day'. |
output_distrib |
A character string, indicating the distribution of
|
ratio_missing |
A number, between 0 and 1, indicating the ratio of missing values in the dataset. Default is 0.5. |
mean |
A number, indicating the mean value of the Gaussian distribution. Default is 50. |
var |
A number, indicating the variance of the Gaussian distribution. Default is 10. |
range_unif |
A vector, indicating the range of values for the Uniform distribution. Default is c(0,100). |
A full dataset of synthetic data.
## Generate a dataset with Gaussian measurements data = simu_db(output_distrib = 'Gaussian') ## Generate a dataset with Uniform measurements and 30% of missing data. data = simu_db(output_distrib = 'Uniform', ratio_missing = 0.3)
## Generate a dataset with Gaussian measurements data = simu_db(output_distrib = 'Gaussian') ## Generate a dataset with Uniform measurements and 30% of missing data. data = simu_db(output_distrib = 'Uniform', ratio_missing = 0.3)