Title: | Machine Learning Models for Predicting Claim Counts |
---|---|
Description: | Prediction of claim counts using the feature based development factors introduced in the manuscript Hiabu M., Hofman E. and Pittarello G. (2023) <doi:10.48550/arXiv.2312.14549>. Implementation of Neural Networks, Extreme Gradient Boosting, and Cox model with splines to optimise the partial log-likelihood of proportional hazard models. |
Authors: | Emil Hofman [aut, cre, cph], Gabriele Pittarello [aut, cph] , Munir Hiabu [aut, cph] |
Maintainer: | Emil Hofman <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2024-11-15 09:13:04 UTC |
Source: | CRAN |
This function generates the monthly individual claims data in the accompanying methodological paper using the SynthETIC
package.
This simple function allows to simulate from a sand-box to test out the ReSurv
approach.
Some parameters of the simulation can be changed.
data_generator( ref_claim = 2e+05, time_unit = 1/360, years = 4, random_seed = 1964, period_exposure = 200, period_frequency = 0.2, scenario = 1 )
data_generator( ref_claim = 2e+05, time_unit = 1/360, years = 4, random_seed = 1964, period_exposure = 200, period_frequency = 0.2, scenario = 1 )
ref_claim |
|
time_unit |
|
years |
|
random_seed |
|
period_exposure |
|
period_frequency |
|
scenario |
|
Individual claims data. It contains the following columns:
claim_number
: Policy ID.
claim_type
: Type of claim. It can be either 0 or 1.
AP
: Accident period
RP
: Reporting period.
Avanzi, B., Taylor, G., Wang, M., & Wong, B. (2021). SynthETIC: an individual insurance claim simulator with feature control. Insurance: Mathematics and Economics, 100, 296-308.
Hiabu, M., Hofman, E., & Pittarello, G. (2023). A machine learning approach based on survival analysis for IBNR frequencies in non-life reserving. arXiv preprint arXiv:2312.14549.
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 2, period_exposure = 100)
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 2, period_exposure = 100)
This function pre-processes the data for the application of a ReSurv
model.
IndividualDataPP( data, id = NULL, continuous_features = NULL, categorical_features = NULL, accident_period, calendar_period, input_time_granularity = "months", output_time_granularity = "quarters", years = NULL, calendar_period_extrapolation = FALSE, continuous_features_spline = NULL, degrees_cf = 3, degrees_of_freedom_cf = 4, degrees_cp = 3, degrees_of_freedom_cp = 4 )
IndividualDataPP( data, id = NULL, continuous_features = NULL, categorical_features = NULL, accident_period, calendar_period, input_time_granularity = "months", output_time_granularity = "quarters", years = NULL, calendar_period_extrapolation = FALSE, continuous_features_spline = NULL, degrees_cf = 3, degrees_of_freedom_cf = 4, degrees_cp = 3, degrees_of_freedom_cp = 4 )
data |
|
id |
|
continuous_features |
|
categorical_features |
|
accident_period |
|
calendar_period |
|
input_time_granularity |
Default to |
output_time_granularity |
The output granularity must be bigger than the input granularity.
Also, the output granularity must be consistent with the input granularity, meaning that the time conversion must be possible.
E.g., it is possible to group quarters to years. It is not possible to group quarters to semesters.
Default to |
years |
|
calendar_period_extrapolation |
|
continuous_features_spline |
|
degrees_cf |
|
degrees_of_freedom_cf |
|
degrees_cp |
|
degrees_of_freedom_cp |
|
The input accident_period
is coded as AP_i
. The input development periods are derived as DP_i
=calendar_period
-accident_period
+1.
The reverse time development factors are DP_rev_i
= DP_max
-DP_i
, where DP_max
is the maximum number of development times: DP_i
DP_max
. Given the parameter years
, DP_max
is derived internally from our package.
As for the truncation time, TR_i
= AP_i
-1.
AP_i
, DP_i
, DP_rev_i
and TR_i
are converted to AP_o
, DP_o
, DP_rev_o
and TR_o
(from the input_time_granularity
to the output_time_granularity
) using a multiplicative conversion factor. E.g., AP_o
= AP_i
* .
The conversion factor is computed as
,
where and
are the fraction of a year corresponding to
input_time_granularity
and output_time_granularity
. and
take values
1/360, 1/12, 1/4, 1/2, 1
for "days", "months", "quarters", "semesters", "years"
respectively.
We will have RP_o
= AP_o
+ DP_o
.
IndividualDataPP
object. A list containing
full.data
: data.frame
. The input data after pre-processing.
starting.data
: data.frame
. The input data as they were provided from the user.
training.data
: data.frame
. The input data pre-processed for training.
conversion_factor
: numeric
. The conversion factor for going from input granularity to output granularity. E.g, the conversion factor for going from months to quarters is 1/3.
string_formula_i
: character
. The survival
formula to model the data in input granularity.
string_formula_o
: character
. The survival
formula to model the in data output granularity.
continuous_features
: character
. The continuous features names as provided from the user.
categorical_features
: character
. The categorical features names as provided from the user.
calendar_period_extrapolation
: logical
. The value specifying if a calendar period component is extrapolated.
years
: numeric
. Total number of development years in the data. Default is NULL and computed automatically from the data.
accident_period
: character
. Accident period column name.
calendar_period
: character
. Calendar_period column name.
input_time_granularity
: character
. Input time granularity.
output_time_granularity
: character
. Output time granularity.
After pre-processing, we provide a standard encoding for the time components. This regards the output in training.data
and full.data
.
In the ReSurv
notation:
AP_i
: Input granularity accident period.
AP_o
: Output granularity accident period.
DP_i
: Input granularity development period in forward time.
DP_rev_i
: Input granularity development period in reverse time.
DP_rev_o
: Output granularity development period in reverse time.
TR_i
: Input granularity truncation time.
TR_o
: Output granularity truncation time.
I
: event indicator, under this framework is equal to one for each entry.
Munir, H., Emil, H., & Gabriele, P. (2023). A machine learning approach based on survival analysis for IBNR frequencies in non-life reserving. arXiv preprint arXiv:2312.14549.
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 2, period_exposure = 100) individual_data <- IndividualDataPP(data = input_data_0, categorical_features = "claim_type", continuous_features = "AP", accident_period = "AP", calendar_period = "RP", input_time_granularity = "years", output_time_granularity = "years", years = 2)
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 2, period_exposure = 100) individual_data <- IndividualDataPP(data = input_data_0, categorical_features = "claim_type", continuous_features = "AP", accident_period = "AP", calendar_period = "RP", input_time_granularity = "years", output_time_granularity = "years", years = 2)
Install a Python environment that allows the user to apply the Neural Network (NN) models.
install_pyresurv( ..., envname = "pyresurv", new_env = identical(envname, "pyresurv") )
install_pyresurv( ..., envname = "pyresurv", new_env = identical(envname, "pyresurv") )
... |
Additional arguments for 'virtualenv_create'. |
envname |
'character'. Name of the environment created. Default 'pyresurv'. |
new_env |
'logical'. If 'TRUE', any existing Python virtual environment and/or 'conda' environment specified by 'envname' is deleted first. |
No return value.
When the lower triangle data are available, this method computes the likelihood on the lower triangle.
ooslkh(object, ...)
ooslkh(object, ...)
object |
|
... |
Other arguments to pass to ooslkh. |
numeric
, out-of-sample likelihood.
When the lower triangle data are available, this method computes the likelihood on the lower triangle.
## Default S3 method: ooslkh(object, ...)
## Default S3 method: ooslkh(object, ...)
object |
|
... |
Other arguments to pass to ooslkh. |
numeric
, out-of-sample likelihood.
When the lower triangle data are available, this method computes the likelihood on the lower triangle.
## S3 method for class 'ReSurvFit' ooslkh(object, ...)
## S3 method for class 'ReSurvFit' ooslkh(object, ...)
object |
|
... |
Other arguments to pass to ooslkh. |
numeric
, out-of-sample likelihood.
This script contains the utils functions that are used in ReSurv.
pkg.env
pkg.env
An object of class environment
of length 60.
This function plots the mean absolute SHAP values for the ReSurv fits of machine learning models.
## S3 method for class 'ReSurvFit' plot(x, nsamples = NULL, ...)
## S3 method for class 'ReSurvFit' plot(x, nsamples = NULL, ...)
x |
|
nsamples |
|
... |
Other arguments to be passed to plot. |
ggplot2
of the SHAP values for an "XGB"
model or a "NN"
model.
Plots the development factors by group code.
## S3 method for class 'ReSurvPredict' plot( x, granularity = "input", group_code = 1, color_par = "royalblue", linewidth_par = 2.5, ylim_par = NULL, ticks_by_par = NULL, base_size_par = NULL, title_par = NULL, x_text_par = NULL, plot.title.size_par = NULL, ... )
## S3 method for class 'ReSurvPredict' plot( x, granularity = "input", group_code = 1, color_par = "royalblue", linewidth_par = 2.5, ylim_par = NULL, ticks_by_par = NULL, base_size_par = NULL, title_par = NULL, x_text_par = NULL, plot.title.size_par = NULL, ... )
x |
"ReSurvPredict" object specifying hazard and development factors. |
granularity |
|
group_code |
|
color_par |
|
linewidth_par |
|
ylim_par |
|
ticks_by_par |
|
base_size_par |
|
title_par |
|
x_text_par |
|
plot.title.size_par |
|
... |
Other arguments to be passed to Plot. Optional. |
ggplot2
of the development factors
This function predicts the results from the ReSurv fits.
## S3 method for class 'ReSurvFit' predict( object, newdata = NULL, grouping_method = "probability", check_value = 1.85, ... )
## S3 method for class 'ReSurvFit' predict( object, newdata = NULL, grouping_method = "probability", check_value = 1.85, ... )
object |
|
newdata |
|
grouping_method |
Default is |
check_value |
|
... |
Additional arguments to pass to the predict function. |
Predictions for the ReSurvFit
model. It includes
ReSurvFit
: Fitted ReSurv
model.
long_triangle_format_out
: data.frame
. Predicted development factors and IBNR claim counts for each feature combination in long format.
input_granularity
: data.frame
. Predictions for each feature combination in long format for input_time_granularity
.
AP_i
: Accident period, input_time_granularity
.
DP_i
: Development period, input_time_granularity
.
f_i
: Predicted development factors, input_time_granularity
.
group_i
: Group code, input_time_granularity
. This associates to each feature combination an identifier.
expected_counts
: Expected counts, input_time_granularity
.
IBNR
: Predicted IBNR claim counts, input_time_granularity
.
output_granularity
: data.frame
. Predictions for each feature combination in long format for output_time_granularity
.
AP_o
: Accident period, output_time_granularity
.
DP_o
: Development period, output_time_granularity
.
f_o
: Predicted development factors, output_time_granularity
.
group_o
: Group code, output_time_granularity
. This associates to each feature combination an identifier.
expected_counts
: Expected counts, output_time_granularity
.
IBNR
: Predicted IBNR claim counts, output_time_granularity
.
lower_triangle
: Predicted lower triangle.
input_granularity
: data.frame
. Predicted lower triangle for input_time_granularity
.
output_granularity
: data.frame
. Predicted lower triangle for output_time_granularity
.
predicted_counts
: numeric
. Predicted total frequencies.
grouping_method
: character
. Chosen grouping method.
Gives overview of IBNr predictions
## S3 method for class 'summaryReSurvPredict' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'summaryReSurvPredict' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
"ReSurvPredict" object specifying hazard and development factors. |
digits |
|
... |
Other arguments to be passed to print. |
print of summary of predictions
ReSurv
models on the individual data.This function fits and computes the reserves for the ReSurv
models
ReSurv( IndividualDataPP, hazard_model = "COX", tie = "efron", baseline = "spline", continuous_features_scaling_method = "minmax", random_seed = 1, hparameters = list(), percentage_data_training = 0.8, grouping_method = "exposure", check_value = 1.85 )
ReSurv( IndividualDataPP, hazard_model = "COX", tie = "efron", baseline = "spline", continuous_features_scaling_method = "minmax", random_seed = 1, hparameters = list(), percentage_data_training = 0.8, grouping_method = "exposure", check_value = 1.85 )
IndividualDataPP |
IndividualDataPP object to use for the |
hazard_model |
|
tie |
ties handling, default is the Efron approach. |
baseline |
handling the baseline hazard. Default is a spline. |
continuous_features_scaling_method |
method to preprocess the features |
random_seed |
|
hparameters |
|
percentage_data_training |
|
grouping_method |
Default is |
check_value |
|
The model fit uses the theoretical framework of Hiabu et al. (2023), that relies on the correspondence between hazard models and development factors:
To be completed with final notation of the paper.
The ReSurv
package assumes proportional hazard models.
Given an i.i.d. sample the individual hazard at time
is:
Composed of a baseline and a proportional effect
.
Currently, the implementation allows to optimize the partial likelihood (concerning the proportional effects) using one of the following statistical learning approaches:
ReSurv
fit. A list containing
model.out
: list
containing the pre-processed covariates data for the fit (data
) and the basic model output (model.out
;COX, XGB or NN).
is_lkh
: numeric
Training negative log likelihood.
os_lkh
: numeric
Validation negative log likelihood. Not available for COX.
hazard_frame
: data.frame
containing the fitted hazard model with the corresponding covariates. It contains:
expg
: fitted risk score.
baseline
: fitted baseline.
hazard
: fitted hazard rate (expg
*baseline
).
f_i
: fitted development factors.
cum_f_i
: fitted cumulative development factors.
S_i
:fitted survival function.
S_i_lag
:fitted survival function (lag version, for further information see ?dplyr::lag
).
S_i_lead
:fitted survival function (lead version, for further information see ?dplyr::lead
).
hazard_model
: string
chosen hazard model (COX, NN or XGB)
IndividualDataPP
: starting IndividualDataPP
object.
Munir, H., Emil, H., & Gabriele, P. (2023). A machine learning approach based on survival analysis for IBNR frequencies in non-life reserving. arXiv preprint arXiv:2312.14549.
Therneau, T. M., & Lumley, T. (2015). Package ‘survival’. R Top Doc, 128(10), 28-33.
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC medical research methodology, 18(1), 1-12.
Chen, T., He, T., Benesty, M., & Khotilovich, V. (2019). Package ‘xgboost’. R version, 90, 1-66.
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 4, period_exposure = 100) individual_data <- IndividualDataPP(data = input_data_0, categorical_features = "claim_type", continuous_features = "AP", accident_period = "AP", calendar_period = "RP", input_time_granularity = "years", output_time_granularity = "years", years=4) resurv_fit_cox <- ReSurv(individual_data, hazard_model = "COX")
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 4, period_exposure = 100) individual_data <- IndividualDataPP(data = input_data_0, categorical_features = "claim_type", continuous_features = "AP", accident_period = "AP", calendar_period = "RP", input_time_granularity = "years", output_time_granularity = "years", years=4) resurv_fit_cox <- ReSurv(individual_data, hazard_model = "COX")
ReSurv
models on the individual data.This function fits and computes the reserves for the ReSurv
models
## Default S3 method: ReSurv( IndividualDataPP, hazard_model = "COX", tie = "efron", baseline = "spline", continuous_features_scaling_method = "minmax", random_seed = 1, hparameters = list(), percentage_data_training = 0.8, grouping_method = "exposure", check_value = 1.85 )
## Default S3 method: ReSurv( IndividualDataPP, hazard_model = "COX", tie = "efron", baseline = "spline", continuous_features_scaling_method = "minmax", random_seed = 1, hparameters = list(), percentage_data_training = 0.8, grouping_method = "exposure", check_value = 1.85 )
IndividualDataPP |
IndividualDataPP object to use for the |
hazard_model |
|
tie |
ties handling, default is the Efron approach. |
baseline |
handling the baseline hazard. Default is a spline. |
continuous_features_scaling_method |
method to preprocess the features |
random_seed |
|
hparameters |
|
percentage_data_training |
|
grouping_method |
Default is |
check_value |
|
The model fit uses the theoretical framework of Hiabu et al. (2023), that relies on the correspondence between hazard models and development factors:
To be completed with final notation of the paper.
The ReSurv
package assumes proportional hazard models.
Given an i.i.d. sample the individual hazard at time
is:
Composed of a baseline and a proportional effect
.
Currently, the implementation allows to optimize the partial likelihood (concerning the proportional effects) using one of the following statistical learning approaches:
ReSurv
fit. A list containing
model.out
: list
containing the pre-processed covariates data for the fit (data
) and the basic model output (model.out
;COX, XGB or NN).
is_lkh
: numeric
Training negative log likelihood.
os_lkh
: numeric
Validation negative log likelihood. Not available for COX.
hazard_frame
: data.frame
containing the fitted hazard model with the corresponding covariates. It contains:
expg
: fitted risk score.
baseline
: fitted baseline.
hazard
: fitted hazard rate (expg
*baseline
).
f_i
: fitted development factors.
cum_f_i
: fitted cumulative development factors.
S_i
:fitted survival function.
S_i_lag
:fitted survival function (lag version, for further information see ?dplyr::lag
).
S_i_lead
:fitted survival function (lead version, for further information see ?dplyr::lead
).
hazard_model
: string
chosen hazard model (COX, NN or XGB)
IndividualDataPP
: starting IndividualDataPP
object.
Pittarello, G., Hiabu, M., & Villegas, A. M. (2023). Chain Ladder Plus: a versatile approach for claims reserving. arXiv preprint arXiv:2301.03858.
Therneau, T. M., & Lumley, T. (2015). Package ‘survival’. R Top Doc, 128(10), 28-33.
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC medical research methodology, 18(1), 1-12.
Chen, T., He, T., Benesty, M., & Khotilovich, V. (2019). Package ‘xgboost’. R version, 90, 1-66.
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 4, period_exposure = 100) individual_data <- IndividualDataPP(data = input_data_0, categorical_features = "claim_type", continuous_features = "AP", accident_period = "AP", calendar_period = "RP", input_time_granularity = "years", output_time_granularity = "years", years=4) resurv_fit_cox <- ReSurv(individual_data, hazard_model = "COX")
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 4, period_exposure = 100) individual_data <- IndividualDataPP(data = input_data_0, categorical_features = "claim_type", continuous_features = "AP", accident_period = "AP", calendar_period = "RP", input_time_granularity = "years", output_time_granularity = "years", years=4) resurv_fit_cox <- ReSurv(individual_data, hazard_model = "COX")
ReSurv
models on the individual data.This function fits and computes the reserves for the ReSurv
models
## S3 method for class 'IndividualDataPP' ReSurv( IndividualDataPP, hazard_model = "COX", tie = "efron", baseline = "spline", continuous_features_scaling_method = "minmax", random_seed = 1, hparameters = list(), percentage_data_training = 0.8, grouping_method = "exposure", check_value = 1.85 )
## S3 method for class 'IndividualDataPP' ReSurv( IndividualDataPP, hazard_model = "COX", tie = "efron", baseline = "spline", continuous_features_scaling_method = "minmax", random_seed = 1, hparameters = list(), percentage_data_training = 0.8, grouping_method = "exposure", check_value = 1.85 )
IndividualDataPP |
IndividualDataPP object to use for the |
hazard_model |
|
tie |
ties handling, default is the Efron approach. |
baseline |
handling the baseline hazard. Default is a spline. |
continuous_features_scaling_method |
method to preprocess the features |
random_seed |
|
hparameters |
|
percentage_data_training |
|
grouping_method |
Default is |
check_value |
|
The model fit uses the theoretical framework of Hiabu et al. (2023), that relies on the correspondence between hazard models and development factors:
To be completed with final notation of the paper.
The ReSurv
package assumes proportional hazard models.
Given an i.i.d. sample the individual hazard at time
is:
Composed of a baseline and a proportional effect
.
Currently, the implementation allows to optimize the partial likelihood (concerning the proportional effects) using one of the following statistical learning approaches:
ReSurv
fit. A list containing
model.out
: list
containing the pre-processed covariates data for the fit (data
) and the basic model output (model.out
;COX, XGB or NN).
is_lkh
: numeric
Training negative log likelihood.
os_lkh
: numeric
Validation negative log likelihood. Not available for COX.
hazard_frame
: data.frame
containing the fitted hazard model with the corresponding covariates. It contains:
expg
: fitted risk score.
baseline
: fitted baseline.
hazard
: fitted hazard rate (expg
*baseline
).
f_i
: fitted development factors.
cum_f_i
: fitted cumulative development factors.
S_i
:fitted survival function.
S_i_lag
:fitted survival function (lag version, for further information see ?dplyr::lag
).
S_i_lead
:fitted survival function (lead version, for further information see ?dplyr::lead
).
hazard_model
: string
chosen hazard model (COX, NN or XGB)
IndividualDataPP
: starting IndividualDataPP
object.
Pittarello, G., Hiabu, M., & Villegas, A. M. (2023). Chain Ladder Plus: a versatile approach for claims reserving. arXiv preprint arXiv:2301.03858.
Therneau, T. M., & Lumley, T. (2015). Package ‘survival’. R Top Doc, 128(10), 28-33.
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC medical research methodology, 18(1), 1-12.
Chen, T., He, T., Benesty, M., & Khotilovich, V. (2019). Package ‘xgboost’. R version, 90, 1-66.
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 4, period_exposure = 100) individual_data <- IndividualDataPP(data = input_data_0, categorical_features = "claim_type", continuous_features = "AP", accident_period = "AP", calendar_period = "RP", input_time_granularity = "years", output_time_granularity = "years", years=4) resurv_fit_cox <- ReSurv(individual_data, hazard_model = "COX")
input_data_0 <- data_generator( random_seed = 1964, scenario = "alpha", time_unit = 1, years = 4, period_exposure = 100) individual_data <- IndividualDataPP(data = input_data_0, categorical_features = "claim_type", continuous_features = "AP", accident_period = "AP", calendar_period = "RP", input_time_granularity = "years", output_time_granularity = "years", years=4) resurv_fit_cox <- ReSurv(individual_data, hazard_model = "COX")
ReSurv
model.This function computes a K fold cross-validation of a pre-specified machine learning model supported from the ReSurv
package for a given grid of hyperparameters.
The hyperparameters to be tested are provided in a list, namely hparameters_grid
.
Conversely, the parameters for the models run are provided separately as arguments and they are specific for each machine learning model support from.
ReSurvCV( IndividualDataPP, model, hparameters_grid, folds, random_seed, continuous_features_scaling_method = "minmax", print_every_n = 1L, nrounds = NULL, early_stopping_rounds = NULL, epochs = 1, parallel = FALSE, ncores = 1, num_workers = 0, verbose = FALSE, verbose.cv = FALSE )
ReSurvCV( IndividualDataPP, model, hparameters_grid, folds, random_seed, continuous_features_scaling_method = "minmax", print_every_n = 1L, nrounds = NULL, early_stopping_rounds = NULL, epochs = 1, parallel = FALSE, ncores = 1, num_workers = 0, verbose = FALSE, verbose.cv = FALSE )
IndividualDataPP |
|
model |
|
hparameters_grid |
|
folds |
|
random_seed |
|
continuous_features_scaling_method |
|
print_every_n |
|
nrounds |
|
early_stopping_rounds |
|
epochs |
|
parallel |
|
ncores |
|
num_workers |
|
verbose |
|
verbose.cv |
|
Best ReSurv
model fit. The output is different depending on the machine learning approach that is required for cross-validation. A list containing:
out.cv
: data.frame
, total output of the cross-validation (all the input parameters combinations).
out.cv.best.oos
: data.frame
, combination with the best out of sample likelihood.
For XGB the columns in out.cv
and out.cv.best.oos
are the hyperparameters booster
, eta
, max_depth
, subsample
, alpha
, lambda
, min_child_weight
. They also contain the metrics train.lkh
, test.lkh
, and the computational time time
. For NN the columns in out.cv
and out.cv.best.oos
are the hyperparameters num_layers
, optim
, activation
, lr
, xi
, eps
, tie
, batch_size
, early_stopping
, patience
, node
train.lkh test.lkh. They also contain the metrics train.lkh
, test.lkh
, and the computational time time
.
Munir, H., Emil, H., & Gabriele, P. (2023). A machine learning approach based on survival analysis for IBNR frequencies in non-life reserving. arXiv preprint arXiv:2312.14549.
This function computes a K fold cross-validation of a pre-specified ReSurv model for a given grid of parameters.
## Default S3 method: ReSurvCV( IndividualDataPP, model, hparameters_grid, folds, random_seed, continuous_features_scaling_method = "minmax", print_every_n = 1L, nrounds = NULL, early_stopping_rounds = NULL, epochs = 1, parallel = FALSE, ncores = 1, num_workers = 0, verbose = FALSE, verbose.cv = FALSE )
## Default S3 method: ReSurvCV( IndividualDataPP, model, hparameters_grid, folds, random_seed, continuous_features_scaling_method = "minmax", print_every_n = 1L, nrounds = NULL, early_stopping_rounds = NULL, epochs = 1, parallel = FALSE, ncores = 1, num_workers = 0, verbose = FALSE, verbose.cv = FALSE )
IndividualDataPP |
|
model |
|
hparameters_grid |
|
folds |
|
random_seed |
|
continuous_features_scaling_method |
|
print_every_n |
|
nrounds |
|
early_stopping_rounds |
|
epochs |
|
parallel |
|
ncores |
|
num_workers |
|
verbose |
|
verbose.cv |
|
Best ReSurv
model fit. The output is different depending on the machine learning approach that is required for cross-validation. A list containing:
out.cv
: data.frame
, total output of the cross-validation (all the input parameters combinations).
out.cv.best.oos
: data.frame
, combination with the best out of sample likelihood.
For XGB the columns in out.cv
and out.cv.best.oos
are the hyperparameters booster
, eta
, max_depth
, subsample
, alpha
, lambda
, min_child_weight
. They also contain the metrics train.lkh
, test.lkh
, and the computational time time
. For NN the columns in out.cv
and out.cv.best.oos
are the hyperparameters num_layers
, optim
, activation
, lr
, xi
, eps
, tie
, batch_size
, early_stopping
, patience
, node
train.lkh test.lkh. They also contain the metrics train.lkh
, test.lkh
, and the computational time time
.
Munir, H., Emil, H., & Gabriele, P. (2023). A machine learning approach based on survival analysis for IBNR frequencies in non-life reserving. arXiv preprint arXiv:2312.14549.
This function computes a K fold cross-validation of a pre-specified ReSurv model for a given grid of parameters.
## S3 method for class 'IndividualDataPP' ReSurvCV( IndividualDataPP, model, hparameters_grid, folds, random_seed, continuous_features_scaling_method = "minmax", print_every_n = 1L, nrounds = NULL, early_stopping_rounds = NULL, epochs = NULL, parallel = FALSE, ncores = 1, num_workers = 0, verbose = FALSE, verbose.cv = FALSE )
## S3 method for class 'IndividualDataPP' ReSurvCV( IndividualDataPP, model, hparameters_grid, folds, random_seed, continuous_features_scaling_method = "minmax", print_every_n = 1L, nrounds = NULL, early_stopping_rounds = NULL, epochs = NULL, parallel = FALSE, ncores = 1, num_workers = 0, verbose = FALSE, verbose.cv = FALSE )
IndividualDataPP |
|
model |
|
hparameters_grid |
|
folds |
|
random_seed |
|
continuous_features_scaling_method |
|
print_every_n |
|
nrounds |
|
early_stopping_rounds |
|
epochs |
|
parallel |
|
ncores |
|
num_workers |
|
verbose |
|
verbose.cv |
|
Best ReSurv
model fit. The output is different depending on the machine learning approach that is required for cross-validation. A list containing:
out.cv
: data.frame
, total output of the cross-validation (all the input parameters combinations).
out.cv.best.oos
: data.frame
, combination with the best out of sample likelihood.
For XGB the columns in out.cv
and out.cv.best.oos
are the hyperparameters booster
, eta
, max_depth
, subsample
, alpha
, lambda
, min_child_weight
. They also contain the metrics train.lkh
, test.lkh
, and the computational time time
. For NN the columns in out.cv
and out.cv.best.oos
are the hyperparameters num_layers
, optim
, activation
, lr
, xi
, eps
, tie
, batch_size
, early_stopping
, patience
, node
train.lkh test.lkh. They also contain the metrics train.lkh
, test.lkh
, and the computational time time
.
Gives overview of IBNR predictions
## S3 method for class 'ReSurvPredict' summary(object, granularity = "input", ...)
## S3 method for class 'ReSurvPredict' summary(object, granularity = "input", ...)
object |
"ReSurvPredict" object specifying hazard and development factors. |
granularity |
Default is |
... |
Other arguments to be passed to summary. |
Summary of predictions
Return the Survival Continuously Ranked Probability Score (SCRPS) of a ReSurv
model.
survival_crps(ReSurvFit, user_data_set = NULL)
survival_crps(ReSurvFit, user_data_set = NULL)
ReSurvFit |
ReSurvFit object to use for the score computation. |
user_data_set |
data.frame provided from the user to compute the survival CRPS, optional. |
The model fit uses the theoretical framework of Hiabu et al. (2023), that relies on the
Survival CRPS, data.table
that contains the CRPS (crps
) for each observation (id
).
Pittarello, G., Hiabu, M., & Villegas, A. M. (2023). Chain Ladder Plus: a versatile approach for claims reserving. arXiv preprint arXiv:2301.03858.
Therneau, T. M., & Lumley, T. (2015). Package ‘survival’. R Top Doc, 128(10), 28-33.
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC medical research methodology, 18(1), 1-12.
Chen, T., He, T., Benesty, M., & Khotilovich, V. (2019). Package ‘xgboost’. R version, 90, 1-66.
Return the Survival Continuously Ranked Probability Score (SCRPS) of a ReSurv
model.
## Default S3 method: survival_crps(ReSurvFit, user_data_set = NULL)
## Default S3 method: survival_crps(ReSurvFit, user_data_set = NULL)
ReSurvFit |
ReSurvFit object to use for the score computation. |
user_data_set |
data.frame provided from the user to compute the survival CRPS, optional. |
The model fit uses the theoretical framework of Hiabu et al. (2023), that relies on the
Survival CRPS, data.table
that contains the CRPS (crps
) for each observation (id
).
Pittarello, G., Hiabu, M., & Villegas, A. M. (2023). Chain Ladder Plus: a versatile approach for claims reserving. arXiv preprint arXiv:2301.03858.
Therneau, T. M., & Lumley, T. (2015). Package ‘survival’. R Top Doc, 128(10), 28-33.
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC medical research methodology, 18(1), 1-12.
Chen, T., He, T., Benesty, M., & Khotilovich, V. (2019). Package ‘xgboost’. R version, 90, 1-66.
Return the Survival Continuously Ranked Probability Score (SCRPS) of a ReSurv
model.
## S3 method for class 'ReSurvFit' survival_crps(ReSurvFit, user_data_set = NULL)
## S3 method for class 'ReSurvFit' survival_crps(ReSurvFit, user_data_set = NULL)
ReSurvFit |
ReSurvFit object to use for the score computation. |
user_data_set |
data.frame provided from the user to compute the survival CRPS, optional. |
The model fit uses the theoretical framework of Hiabu et al. (2023), that relies on the
Survival CRPS, data.table
that contains the CRPS (crps
) for each observation (id
).
Pittarello, G., Hiabu, M., & Villegas, A. M. (2023). Chain Ladder Plus: a versatile approach for claims reserving. arXiv preprint arXiv:2301.03858.
Therneau, T. M., & Lumley, T. (2015). Package ‘survival’. R Top Doc, 128(10), 28-33.
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC medical research methodology, 18(1), 1-12.
Chen, T., He, T., Benesty, M., & Khotilovich, V. (2019). Package ‘xgboost’. R version, 90, 1-66.