Title: | Accident and Development Period Adjusted Linear Pools for Actuarial Stochastic Reserving |
---|---|
Description: | Loss reserving generally focuses on identifying a single model that can generate superior predictive performance. However, different loss reserving models specialise in capturing different aspects of loss data. This is recognised in practice in the sense that results from different models are often considered, and sometimes combined. For instance, actuaries may take a weighted average of the prediction outcomes from various loss reserving models, often based on subjective assessments. This package allows for the use of a systematic framework to objectively combine (i.e. ensemble) multiple stochastic loss reserving models such that the strengths offered by different models can be utilised effectively. Our framework is developed in Avanzi et al. (2023). Firstly, our criteria model combination considers the full distributional properties of the ensemble and not just the central estimate - which is of particular importance in the reserving context. Secondly, our framework is that it is tailored for the features inherent to reserving data. These include, for instance, accident, development, calendar, and claim maturity effects. Crucially, the relative importance and scarcity of data across accident periods renders the problem distinct from the traditional ensemble techniques in statistical learning. Our framework is illustrated with a complex synthetic dataset. In the results, the optimised ensemble outperforms both (i) traditional model selection strategies, and (ii) an equally weighted ensemble. In particular, the improvement occurs not only with central estimates but also relevant quantiles, such as the 75th percentile of reserves (typically of interest to both insurers and regulators). Reference: Avanzi B, Li Y, Wong B, Xian A (2023) "Ensemble distributional forecasting for insurance loss reserving" <doi:10.48550/arXiv.2206.08541>. |
Authors: | Benjamin Avanzi [aut], William Ho [aut], Yanfeng Li [aut, cre], Bernard Wong [aut], Alan Xian [aut] |
Maintainer: | Yanfeng Li <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-10-16 06:42:57 UTC |
Source: | CRAN |
General framework for any user-defined function for partitions of claims triangle data.
adlp_partition(df, ...) adlp_partition_none(df) adlp_partition_ap(df, tri.size, size = 1, weights = rep(1, size))
adlp_partition(df, ...) adlp_partition_none(df) adlp_partition_ap(df, tri.size, size = 1, weights = rep(1, size))
df |
data.frame format of claims and related information for each cell.
Dataframe will have columns |
... |
Other parameters used to calculate ADLP partitions |
tri.size |
Triangle size in claims |
size |
Number of partitions |
weights |
a vector of weights for the size of each partition. |
adlp_partition_none
is the default functionality with no partitions. This is
equivalent to the standard linear pooling.
adlp_partition_ap
will partition the claims triangle by accident period,
Where the choice of accident period to partition will be determined to most
closely resemble the desired weights
.
The choice of accident period relies on a greedy algorithm that aims to find the accident period that provides the amount of cells that is larger or equal to the desired split.
List containing the df
as a result of the partitions.
adlp_partition_none, adlp_partition_ap
data("test_claims_dataset") adlp_partition_none(test_claims_dataset) data("test_claims_dataset") adlp_partition_ap(test_claims_dataset, tri.size = 40, size = 3)
data("test_claims_dataset") adlp_partition_none(test_claims_dataset) data("test_claims_dataset") adlp_partition_ap(test_claims_dataset, tri.size = 40, size = 3)
Accident and Development period Adjusted Linear Pools Component Models
calc_adlp_component( component, newdata, y = NULL, model = c("train", "full"), calc = c("pdf", "cdf", "mu", "sim") ) calc_adlp_component_lst( components_lst, newdata, model = c("train", "full"), calc = c("pdf", "cdf", "mu", "sim"), ... )
calc_adlp_component( component, newdata, y = NULL, model = c("train", "full"), calc = c("pdf", "cdf", "mu", "sim") ) calc_adlp_component_lst( components_lst, newdata, model = c("train", "full"), calc = c("pdf", "cdf", "mu", "sim"), ... )
component |
Object of class |
newdata |
Claims Triangle and other information. |
y |
Optional vector of |
model |
Whether the training component model or the full component model should be used |
calc |
Type of calculation to perform |
components_lst |
List of objects of class |
... |
Other parameters to be passed into |
Calls the specified function for an object of class adlp_component
.
calc_adlp_component_lst
is a wrapper for calc_adlp_component
for each
component in the list components_lst
. This wrapper also contains functionality
to signal the component that causes an error if it is occuring downstream.
The result of the evaluated function on the adlp_component
. This
would be a vector with the same length as rows on newdata
with the
calculations.
data(test_adlp_component) newdata <- test_adlp_component$model_train$data pdf_data = calc_adlp_component(test_adlp_component, newdata = newdata, model = "train", calc = "pdf") data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_adlp_component$model_train$data pdf_data = calc_adlp_component_lst(test_components, newdata = newdata, model = "train", calc = "pdf")
data(test_adlp_component) newdata <- test_adlp_component$model_train$data pdf_data = calc_adlp_component(test_adlp_component, newdata = newdata, model = "train", calc = "pdf") data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_adlp_component$model_train$data pdf_data = calc_adlp_component_lst(test_components, newdata = newdata, model = "train", calc = "pdf")
Function to define basic functionality needed for a custom model that does not fit the general framework of models that align with adlp_component
custom_model(formula, data, ...) ## S3 method for class 'custom_model' update(object, data, ...)
custom_model(formula, data, ...) ## S3 method for class 'custom_model' update(object, data, ...)
formula |
Formula needed that defines all variables required for the model |
data |
data to update custom model |
... |
Additional variables for update |
object |
Object of type |
Custom model should support the S3 method formula
and update
.
An object of class custom_model
. custom_model
is a list that
stores the required formula to update the model and the data used to update
the model.
data("test_claims_dataset") custom_model <- custom_model(claims~., data=test_claims_dataset)
data("test_claims_dataset") custom_model <- custom_model(claims~., data=test_claims_dataset)
The Minorization-Maximization algorithm aims to optimize a surrogate objective function that approximates the Log Score. This approach typically results in fast and stable convergence, while ensuring that combination weights adhere to the constraints of being non-negative and summing to one. For detailed description of the algorithm, one might refer to: Conflitti, De Mol, and Giannone (2015)
MM_optim(w_init, dat, niter = 500)
MM_optim(w_init, dat, niter = 500)
w_init |
initial weights for each ADLP |
dat |
matrix of densities for each ADLP |
niter |
maximum number of iterations. Defaults to 500 |
An object of class mm_optim
. mm_optim
is a list that stores the
results of the MM algorithm performed, including the final parameters, the
final loss and numer of iterations.
Conflitti, Cristina, Christine De Mol, and Domenico Giannone. "Optimal combination of survey forecasts." International Journal of Forecasting 31.4 (2015): 1096-1103.
w_init <- rep(1/3, 3) set.seed(1) density_data <- matrix(runif(9), nrow = 3, ncol = 3) MM_optim(w_init, density_data, niter = 500)
w_init <- rep(1/3, 3) set.seed(1) density_data <- matrix(runif(9), nrow = 3, ncol = 3) MM_optim(w_init, density_data, niter = 500)
Family of functions used to support ADLP inference and prediction.
## S3 method for class 'adlp' predict(object, newdata = NULL, ...) adlp_dens(adlp, newdata, model = c("train", "full")) adlp_logS(adlp, newdata, model = c("train", "full"), epsilon = 1e-06) adlp_CRPS( adlp, newdata, response_name, model = c("train", "full"), lower = 1, upper = NULL, sample_n = 2000 ) adlp_simulate(n, adlp, newdata = NULL)
## S3 method for class 'adlp' predict(object, newdata = NULL, ...) adlp_dens(adlp, newdata, model = c("train", "full")) adlp_logS(adlp, newdata, model = c("train", "full"), epsilon = 1e-06) adlp_CRPS( adlp, newdata, response_name, model = c("train", "full"), lower = 1, upper = NULL, sample_n = 2000 ) adlp_simulate(n, adlp, newdata = NULL)
object |
Object of class |
newdata |
Data to perform the function on |
... |
Other parameters to pass onto predict |
adlp |
Object of class |
model |
Whether the |
epsilon |
Offset added to the density before calculating the log |
response_name |
The column name of the response variable; in string format |
lower |
The lower limit to calculate CRPS; the default value is set to be 1 |
upper |
The upper limit to calculate CRPS; the default value is set to be twice the maximum value of the response variable in the dataset |
sample_n |
The number of evenly spaced values to sample between lower and upper range of numeric integration used to calculate CRPS. This sample function is designed to constrain memory usage during the computation of CRPS, particularly when dealing with large response variables. |
n |
number of simulations |
Predicts the central estimates based on the ADLP component models and weights.
Calculates the probability density ad each point, given newdata
.
Calculates the log score, which is the log of the probability density, with
an offset epsilon
to handle zero densities.
Log Score is a strictly proper scoring rule.
For full discussion of the mathematical details and
advantages of Log Score, one might refer to Gneiting and Raftery (2007)
Continuously Ranked Probability Score (CRPS) is calculated for each data point.
lower
and upper
are used as limits when approximating the integral.
CRPS is a strictly proper scoring rule.
For full discussion of the mathematical details and
advantages of CRPS, one might refer to Gneiting and Raftery (2007).
The CRPS function has been discretized in this context to ensure
adaptability to various distributions.
For details, one might refer to
Gneiting and Ranjan (2011)
Simulations of ADLP predictions, given component models and ADLP weights.
data.frame
of results, where the first and second columns correspond
to the $origin
and $dev
columns from the triangles. An index column for
simulation #
is also included when simulating ADLP.
Gneiting, T., Raftery, A. E., 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102 (477), 359–378.
Gneiting, T., Ranjan, R., 2011. Comparing density forecasts using threshold-and quantile-weighted scoring rules. Journal of Business & Economic Statistics 29 (3), 411–422.
data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, partition_func = adlp_partition_ap, tri.size = 40, size = 3) test_adlp_dens <- adlp_dens(test_adlp, newdata, "full") data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, partition_func = adlp_partition_ap, tri.size = 40, size = 3) test_adlp_logs <- adlp_logS(test_adlp, newdata, "full") data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, partition_func = adlp_partition_ap, tri.size = 40, size = 3) test_adlp_crps <- adlp_CRPS(test_adlp, newdata, "full", response_name = "claims", sample_n = 100) data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, partition_func = adlp_partition_ap, tri.size = 40, size = 3) test_adlp_sim <- adlp_simulate(10, test_adlp, newdata=newdata)
data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, partition_func = adlp_partition_ap, tri.size = 40, size = 3) test_adlp_dens <- adlp_dens(test_adlp, newdata, "full") data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, partition_func = adlp_partition_ap, tri.size = 40, size = 3) test_adlp_logs <- adlp_logS(test_adlp, newdata, "full") data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, partition_func = adlp_partition_ap, tri.size = 40, size = 3) test_adlp_crps <- adlp_CRPS(test_adlp, newdata, "full", response_name = "claims", sample_n = 100) data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, partition_func = adlp_partition_ap, tri.size = 40, size = 3) test_adlp_sim <- adlp_simulate(10, test_adlp, newdata=newdata)
Class to estimate an ADLP model fitted by Minorization-Maximisation.
## S3 method for class 'adlp' print(x, ...) adlp(components_lst, newdata, partition_func, param_tol = 1e-16, ...)
## S3 method for class 'adlp' print(x, ...) adlp(components_lst, newdata, partition_func, param_tol = 1e-16, ...)
x |
Object of class |
... |
Other named parameters passed onto further functions |
components_lst |
List of |
newdata |
Validation data to fit the ADLP partitions on |
partition_func |
Partition function used to subset the data. ADLP weights
will be generated for each partition. To specify partition preferences,
set the parameter to |
param_tol |
Tolerance for weights. Any value less than tolerance in magnitude is assumed zero. |
See adlp_component and adlp_components objects for more information on required format for inputs.
See adlp_partition for information on valid partition functions.
For an understanding of how partitions affect the performance of the ADLP ensemble, one might refer to Avanzi, Li, Wong and Xian (2022)
Object of class adlp
. This object has the following components:
adlp_components; List of adlp_components, see
also adlp_components
vector; vector of model weights fitted for each component
function; Partition function used to fit the components
mm_optim; Details related to the MM algorithm
see also MM_optim()
data.frame; Data.frame used to fit the ADLP
Avanzi, B., Li, Y., Wong, B., & Xian, A. (2022). Ensemble distributional forecasting for insurance loss reserving. arXiv preprint arXiv:2206.08541.
data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, response_name = "claims", partition_func = adlp_partition_ap, tri.size = 40, size = 3)
data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 ) newdata <- test_component1$model_train$data test_adlp <- adlp(test_components, newdata = newdata, response_name = "claims", partition_func = adlp_partition_ap, tri.size = 40, size = 3)
Class to store component models and related functions required for ADLP estimation, prediction and goodness of fit.
## S3 method for class 'adlp_component' print(x, ...) adlp_component( model_train, model_full, calc_dens, calc_mu, calc_cdf, sim_fun, ... )
## S3 method for class 'adlp_component' print(x, ...) adlp_component( model_train, model_full, calc_dens, calc_mu, calc_cdf, sim_fun, ... )
x |
Object of class |
... |
Other named parameters required for the model or any of its related functions to run. |
model_train |
Model trained on training data |
model_full |
Model trained on all in-sample data |
calc_dens |
function to calculate the pdf of each point |
calc_mu |
function to calculate the estimated mean of each point |
calc_cdf |
function to calculate the cdf of each point |
sim_fun |
function to simulate new from |
Component models model_train
and model_full
are designed to be objects of
class glm
, lm
, or similar. The models would desirably have a S3 method for
'formula. For models that do not fit under this umbrella,
see custom_model. For a potential list of candidate models,
one might refer to Avanzi, Li, Wong and Xian (2022).
Functions as assumed to have the following parameter naming convention:
y
as the response variable
model
as the modeling object model_train
or model_full
newdata
to designate new data
Other inputs not in this list will need to be intialised with the adlp_component
Object of class adlp_component
Avanzi, B., Li, Y., Wong, B., & Xian, A. (2022). Ensemble distributional forecasting for insurance loss reserving. arXiv preprint arXiv:2206.08541.
data("test_claims_dataset") train_val <- train_val_split_method1( df = test_claims_dataset, tri.size = 40, val_ratio = 0.3, test = TRUE ) train_data <- train_val$train valid_data <- train_val$valid insample_data <- rbind(train_data, valid_data) base_model1 <- glm(formula = claims~factor(dev), family=gaussian(link = "identity"), data=train_data) base_model1_full <- update(base_model1, data = insample_data) dens_normal <- function(y, model, newdata){ pred_model <- predict(model, newdata=newdata, type="response", se.fit=TRUE) mu <- pred_model$fit sigma <- pred_model$residual.scale return(dnorm(x=y, mean=mu, sd=sigma)) } cdf_normal<-function(y, model, newdata){ pred_model <- predict(model, newdata=newdata, type="response", se.fit=TRUE) mu <- pred_model$fit sigma <- pred_model$residual.scale return(pnorm(q=y, mean=mu, sd=sigma)) } mu_normal<-function(model, newdata){ mu <- predict(model, newdata=newdata, type="response") mu <- pmax(mu, 0) return(mu) } sim_normal<-function(model, newdata){ pred_model <- predict(model, newdata=newdata, type="response", se.fit=TRUE) mu <- pred_model$fit sigma <- pred_model$residual.scale sim <- rnorm(length(mu), mean=mu, sd=sigma) sim <- pmax(sim, 0) return(sim) } base_component1 = adlp_component( model_train = base_model1, model_full = base_model1_full, calc_dens = dens_normal, calc_cdf = cdf_normal, calc_mu = mu_normal, sim_fun = sim_normal )
data("test_claims_dataset") train_val <- train_val_split_method1( df = test_claims_dataset, tri.size = 40, val_ratio = 0.3, test = TRUE ) train_data <- train_val$train valid_data <- train_val$valid insample_data <- rbind(train_data, valid_data) base_model1 <- glm(formula = claims~factor(dev), family=gaussian(link = "identity"), data=train_data) base_model1_full <- update(base_model1, data = insample_data) dens_normal <- function(y, model, newdata){ pred_model <- predict(model, newdata=newdata, type="response", se.fit=TRUE) mu <- pred_model$fit sigma <- pred_model$residual.scale return(dnorm(x=y, mean=mu, sd=sigma)) } cdf_normal<-function(y, model, newdata){ pred_model <- predict(model, newdata=newdata, type="response", se.fit=TRUE) mu <- pred_model$fit sigma <- pred_model$residual.scale return(pnorm(q=y, mean=mu, sd=sigma)) } mu_normal<-function(model, newdata){ mu <- predict(model, newdata=newdata, type="response") mu <- pmax(mu, 0) return(mu) } sim_normal<-function(model, newdata){ pred_model <- predict(model, newdata=newdata, type="response", se.fit=TRUE) mu <- pred_model$fit sigma <- pred_model$residual.scale sim <- rnorm(length(mu), mean=mu, sd=sigma) sim <- pmax(sim, 0) return(sim) } base_component1 = adlp_component( model_train = base_model1, model_full = base_model1_full, calc_dens = dens_normal, calc_cdf = cdf_normal, calc_mu = mu_normal, sim_fun = sim_normal )
Accident and Development period Adjusted Linear Pools Component Models
## S3 method for class 'adlp_components' print(x, ...) adlp_components(...)
## S3 method for class 'adlp_components' print(x, ...) adlp_components(...)
x |
Object of class |
... |
Individual |
Class to structure a list of adlp_components.
An object of class adlp_components
data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 )
data(test_adlp_component) test_component1 <- test_adlp_component test_component2 <- test_adlp_component test_components <- adlp_components( component1 = test_component1, component2 = test_component2 )
A adlp_component
object created for examples.
test_adlp_component
test_adlp_component
A adlp_component
format, see adlp_component.
test_adlp_component
test_adlp_component
A data.frame of claims, with the corresponding Accident Period (origin
)
and Development Period (dev
). A calendar
column is also included (as the
sum of dev
and origin
. This format is required for the ADLP package
test_claims_dataset
test_claims_dataset
A data.frame
with 4 components:
Accident Period
Development Period
Accident Period + Development Period
Claim amount
test_claims_dataset$claims
test_claims_dataset$claims
General framework for any user-defined training/validation/testing split of claims triangle data. The ADLP package contains three default splitting algorithms.
train_val_split(df, ...)
train_val_split(df, ...)
df |
Claims Triangle and other information. |
... |
Other parameters used to calculate train/test splitting. |
List containing $train
, $valid
, $test
, which should partition
the input df
.
train_val_split_by_AP, train_val_split_method1, train_val_split_method2
Function for training/validation splitting.
train_val_split_by_AP(df, accident_periods, max_dev_periods, test = FALSE)
train_val_split_by_AP(df, accident_periods, max_dev_periods, test = FALSE)
df |
Claims Triangle and other information. |
accident_periods |
Vector of accident periods. Will be equivalent to
|
max_dev_periods |
Vector of development periods |
test |
Returns the test set if |
Assigns training set defined by a maximum development period for each
accident period: .
Validation set is therefore cells outside of this period but within the upper triangle. The test set is all observations in the lower triangle.
List containing $train
, $valid
, $test
, which should partition
the input df
.
data("test_claims_dataset") train_val <- train_val_split_by_AP( df = test_claims_dataset, accident_periods = 1:40, max_dev_periods = 40:1, test = TRUE )
data("test_claims_dataset") train_val <- train_val_split_by_AP( df = test_claims_dataset, accident_periods = 1:40, max_dev_periods = 40:1, test = TRUE )
Function for training/validation splitting.
train_val_split_method1(df, tri.size, val_ratio, test = FALSE)
train_val_split_method1(df, tri.size, val_ratio, test = FALSE)
df |
Claims Triangle and other information. |
tri.size |
Triangle size. |
val_ratio |
Value between 0 and 1 as the approximate size of validation set. |
test |
Returns the test set if |
Approximates the validation set by taking the n
most recent calendar years
as validation to best fit val_ratio
.
Validation set is therefore cells outside of this period but within the upper triangle. The test set is all observations in the lower triangle.
Note that accident period 1 and development period 1 will always be within the training set.
List containing $train
, $valid
, $test
, which should partition
the input df
.
data("test_claims_dataset") train_val <- train_val_split_method1( df = test_claims_dataset, tri.size = 40, val_ratio = 0.3, test = TRUE )
data("test_claims_dataset") train_val <- train_val_split_method1( df = test_claims_dataset, tri.size = 40, val_ratio = 0.3, test = TRUE )
Function for training/validation splitting.
train_val_split_method2(df, tri.size, val_ratio, test = FALSE)
train_val_split_method2(df, tri.size, val_ratio, test = FALSE)
df |
Claims Triangle and other information. |
tri.size |
Triangle size. |
val_ratio |
Value between 0 and 1 as the approximate size of validaiton set. |
test |
Returns the test set if |
Approximates the validation set by defining the training set as the cells
below the function . Where
is equal to
the triangle size and
is optimised to best fit
val_ratio
.
The training set is therefore cells outside of this period but within the upper triangle. The test set is all observations in the lower triangle.
Note that accident period 1 and development period 1 will always be within the training set.
List containing $train
, $valid
, $test
, which should partition
the input df
.
data("test_claims_dataset") train_val <- train_val_split_method1( df = test_claims_dataset, tri.size = 40, val_ratio = 0.3, test = TRUE )
data("test_claims_dataset") train_val <- train_val_split_method1( df = test_claims_dataset, tri.size = 40, val_ratio = 0.3, test = TRUE )