Title: | Causes of Outcome Learning |
---|---|
Description: | Implementing the computational phase of the Causes of Outcome Learning approach as described in Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <doi:10.1093/ije/dyac078>. The optional 'ggtree' package can be obtained through Bioconductor. |
Authors: | Andreas Rieckmann [aut, cre], Piotr Dworzynski [aut], Leila Arras [ctb], Claus Thorn Ekstrom [aut] |
Maintainer: | Andreas Rieckmann <[email protected]> |
License: | GPL-2 |
Version: | 1.1.2 |
Built: | 2024-11-13 06:46:45 UTC |
Source: | CRAN |
This function binary encodes the exposure data set so that each category is coded 0 and 1 (e.g. the variable sex will be two variables men (1/0) and women (0/1)).
CoOL_0_binary_encode_exposure_data(exposure_data)
CoOL_0_binary_encode_exposure_data(exposure_data)
exposure_data |
The exposure data set. |
Data frame with the expanded exposure data, where all variables are binary encoded.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
To reproduce the common causes example.
CoOL_0_common_simulation(n)
CoOL_0_common_simulation(n)
n |
number of observations for the synthetic data. |
A data frame with the columns Y, A, B, C, D, E, F and n rows.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
To reproduce the complex example.
CoOL_0_complex_simulation(n)
CoOL_0_complex_simulation(n)
n |
number of observations for the synthetic data. |
A data frame with the columns Y, Physically_active, Low_SES, Mutation_X, LDL, Night_shifts, Air_pollution and n rows.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
To reproduce the confounding example.
CoOL_0_confounding_simulation(n)
CoOL_0_confounding_simulation(n)
n |
number of observations for the synthetic data. |
A data frame with the columns Y, A, B, C, D, E, F and n rows.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
To reproduce the mediation example.
CoOL_0_mediation_simulation(n)
CoOL_0_mediation_simulation(n)
n |
number of observations for the synthetic data. |
A data frame with the columns Y, A,B ,C, D, E, F and n rows.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
To reproduce the CoOL working example with sex, drug A, and drug B.
CoOL_0_working_example(n)
CoOL_0_working_example(n)
n |
number of observations for the synthetic data. |
A data frame with the columns Y, sex, drug_a, drug_b and rows equal to n.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
while (FALSE) { library(CoOL) set.seed(1) data <- CoOL_0_working_example(n=10000) outcome_data <- data[,1] exposure_data <- data[,-1] exposure_data <- CoOL_0_binary_encode_exposure_data(exposure_data) model <- CoOL_1_initiate_neural_network(inputs=ncol(exposure_data), output = outcome_data,hidden=5) model <- CoOL_2_train_neural_network(lr = 1e-4,X_train=exposure_data, Y_train=outcome_data,X_test=exposure_data, Y_test=outcome_data, model=model, epochs=1000,patience = 200, input_parameter_reg = 1e-3 ) # Train the non-negative model (The model can be retrained) model <- CoOL_2_train_neural_network(lr = 1e-5,X_train=exposure_data, Y_train=outcome_data,X_test=exposure_data, Y_test=outcome_data, model=model, epochs=1000,patience = 100, input_parameter_reg = 1e-3) # Train the non-negative model (The model can be retrained) model <- CoOL_2_train_neural_network(lr = 1e-6,X_train=exposure_data, Y_train=outcome_data,X_test=exposure_data, Y_test=outcome_data, model=model, epochs=1000,patience = 50, input_parameter_reg = 1e-3 ) # Train the non-negative model (The model can be retrained) plot(model$train_performance,type='l',yaxs='i',ylab="Mean squared error", xlab="Epochs",main="A) Performance during training\n\n", ylim=quantile(model$train_performance,c(0,.975))) # Model performance CoOL_3_plot_neural_network(model,names(exposure_data),5/max(model[[1]]), title = "B) Model connection weights\nand intercepts") # Model visualization CoOL_4_AUC(outcome_data,exposure_data,model, title = "C) Receiver operating\ncharacteristic curve") # AUC risk_contributions <- CoOL_5_layerwise_relevance_propagation(exposure_data,model ) # Risk contributions CoOL_6_number_of_sub_groups(risk_contributions = risk_contributions, low_number = 1, high_number = 5) CoOL_6_dendrogram(risk_contributions,number_of_subgroups = 3, title = "D) Dendrogram with 3 sub-groups") # Dendrogram sub_groups <- CoOL_6_sub_groups(risk_contributions,number_of_subgroups = 3 ) # Assign sub-groups CoOL_6_calibration_plot(exposure_data = exposure_data, outcome_data = outcome_data, model = model, sub_groups = sub_groups) CoOL_7_prevalence_and_mean_risk_plot(risk_contributions,sub_groups, title = "E) Prevalence and mean risk of sub-groups") # Prevalence and mean risk plot results <- CoOL_8_mean_risk_contributions_by_sub_group(risk_contributions, sub_groups,outcome_data = outcome_data,exposure_data = exposure_data, model=model,exclude_below = 0.01) # Mean risk contributions by sub-groups CoOL_9_visualised_mean_risk_contributions(results = results, sub_groups = sub_groups) CoOL_9_visualised_mean_risk_contributions_legend(results = results) }
while (FALSE) { library(CoOL) set.seed(1) data <- CoOL_0_working_example(n=10000) outcome_data <- data[,1] exposure_data <- data[,-1] exposure_data <- CoOL_0_binary_encode_exposure_data(exposure_data) model <- CoOL_1_initiate_neural_network(inputs=ncol(exposure_data), output = outcome_data,hidden=5) model <- CoOL_2_train_neural_network(lr = 1e-4,X_train=exposure_data, Y_train=outcome_data,X_test=exposure_data, Y_test=outcome_data, model=model, epochs=1000,patience = 200, input_parameter_reg = 1e-3 ) # Train the non-negative model (The model can be retrained) model <- CoOL_2_train_neural_network(lr = 1e-5,X_train=exposure_data, Y_train=outcome_data,X_test=exposure_data, Y_test=outcome_data, model=model, epochs=1000,patience = 100, input_parameter_reg = 1e-3) # Train the non-negative model (The model can be retrained) model <- CoOL_2_train_neural_network(lr = 1e-6,X_train=exposure_data, Y_train=outcome_data,X_test=exposure_data, Y_test=outcome_data, model=model, epochs=1000,patience = 50, input_parameter_reg = 1e-3 ) # Train the non-negative model (The model can be retrained) plot(model$train_performance,type='l',yaxs='i',ylab="Mean squared error", xlab="Epochs",main="A) Performance during training\n\n", ylim=quantile(model$train_performance,c(0,.975))) # Model performance CoOL_3_plot_neural_network(model,names(exposure_data),5/max(model[[1]]), title = "B) Model connection weights\nand intercepts") # Model visualization CoOL_4_AUC(outcome_data,exposure_data,model, title = "C) Receiver operating\ncharacteristic curve") # AUC risk_contributions <- CoOL_5_layerwise_relevance_propagation(exposure_data,model ) # Risk contributions CoOL_6_number_of_sub_groups(risk_contributions = risk_contributions, low_number = 1, high_number = 5) CoOL_6_dendrogram(risk_contributions,number_of_subgroups = 3, title = "D) Dendrogram with 3 sub-groups") # Dendrogram sub_groups <- CoOL_6_sub_groups(risk_contributions,number_of_subgroups = 3 ) # Assign sub-groups CoOL_6_calibration_plot(exposure_data = exposure_data, outcome_data = outcome_data, model = model, sub_groups = sub_groups) CoOL_7_prevalence_and_mean_risk_plot(risk_contributions,sub_groups, title = "E) Prevalence and mean risk of sub-groups") # Prevalence and mean risk plot results <- CoOL_8_mean_risk_contributions_by_sub_group(risk_contributions, sub_groups,outcome_data = outcome_data,exposure_data = exposure_data, model=model,exclude_below = 0.01) # Mean risk contributions by sub-groups CoOL_9_visualised_mean_risk_contributions(results = results, sub_groups = sub_groups) CoOL_9_visualised_mean_risk_contributions_legend(results = results) }
This function initiates a non-negative neural network. The one-hidden layer non-negative neural network is designed to resemble a DAG with hidden synergistic components. With the model, we intend to learn the various synergistic interactions between the exposures and outcome. The model needs to be non-negative and estimate the risk on an additive scale. Neural networks include hidden activation functions (if the sum of the input exceeds a threshold, information is passed on), which can model minimum threshold values of interactions between exposures. We need to specify the upper limit of the number of possible hidden activation functions and through model fitting, the model may be able to learn both stand-alone and synergistically interacting factors.
CoOL_1_initiate_neural_network(inputs, output, hidden = 10)
CoOL_1_initiate_neural_network(inputs, output, hidden = 10)
inputs |
The number of exposures. |
output |
The outbut variable is used to calcualte the mean of it used to initiate the baseline risk. |
Number of hidden nodes. |
The non-negative neural network can be denoted as:
A list with connection weights, bias weights and meta data.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
This function trains the non-negative neural network. Fitting the model is done in a step-wise procedure one individual at a time, where the model estimates individual's risk of the disease outcome, estimates the prediction's residual error and adjusts the model parameters to reduce this error. By iterating through all individuals for multiple epochs (one complete iterations through all individuals is called an epoch), we end with parameters for the model, where the errors are smallest possible for the full population. The model fit follows the linear expectation that synergism is a combined effect larger than the sum of independent effects. The initial values, derivatives, and learning rates are described in further detail in the Supplementary material. The non-negative model ensures that the predicted value cannot be negative. The model does not prevent estimating probabilities above 1, but this would be unlikely, as risks of disease and mortality even for high risk groups in general are far below 1. The use of a test dataset does not seem to assist deciding on the optimal number of epochs possibly due to the constrains due to the non-negative assumption. We suggest splitting data into a train and test data set, such that findings from the train data set can be confirmed in the test data set before developing hypotheses.
CoOL_2_train_neural_network( X_train, Y_train, X_test, Y_test, C_train = 0, C_test = 0, model, lr = c(1e-04, 1e-05, 1e-06), epochs = 2000, patience = 100, monitor = TRUE, plot_and_evaluation_frequency = 50, input_parameter_reg = 0.001, spline_df = 10, restore_par_options = TRUE, drop_out = 0, fix_baseline_risk = -1, ipw = 1 )
CoOL_2_train_neural_network( X_train, Y_train, X_test, Y_test, C_train = 0, C_test = 0, model, lr = c(1e-04, 1e-05, 1e-06), epochs = 2000, patience = 100, monitor = TRUE, plot_and_evaluation_frequency = 50, input_parameter_reg = 0.001, spline_df = 10, restore_par_options = TRUE, drop_out = 0, fix_baseline_risk = -1, ipw = 1 )
X_train |
The exposure data for the training data. |
Y_train |
The outcome data for the training data. |
X_test |
The exposure data for the test data (currently the training data is used). |
Y_test |
The outcome data for the test data (currently the training data is used). |
C_train |
One variable to adjust the analysis for such as calendar time (training data). |
C_test |
One variable to adjust the analysis for such as calendar time (currently the training data is used). |
model |
The fitted non-negative neural network. |
lr |
Learning rate (several LR can be provided, such that the model training will train for each LR and continue to the next). |
epochs |
Epochs. |
patience |
The number of epochs allowed without an improvement in performance. |
monitor |
Whether a monitoring plot will be shown during training. |
plot_and_evaluation_frequency |
The interval for plotting the performance and checking the patience. |
input_parameter_reg |
Regularisation decreasing parameter value at each iteration for the input parameters. |
spline_df |
Degrees of freedom for the spline fit for the performance plots. |
restore_par_options |
Restore par options. |
drop_out |
To drop connections if their weights reaches zero. |
fix_baseline_risk |
To fix the baseline risk at a value. |
ipw |
a vector of weights per observation to allow for inverse probability of censoring weighting to correct for selection bias |
An updated list of connection weights, bias weights and meta data.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
This function plots the non-negative neural network
CoOL_3_plot_neural_network( model, names, arrow_size = NA, title = "Model connection weights and intercepts", restore_par_options = TRUE )
CoOL_3_plot_neural_network( model, names, arrow_size = NA, title = "Model connection weights and intercepts", restore_par_options = TRUE )
model |
The fitted non-negative neural network. |
names |
Labels of each exposure. |
arrow_size |
Define the arrow_size for the model illustration in the reported training progress. |
title |
Title on the plot. |
restore_par_options |
Restore par options. |
A plot visualizing the connection weights.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Plot the ROC AUC
CoOL_4_AUC( outcome_data, exposure_data, model, title = "Receiver operating\ncharacteristic curve", restore_par_options = TRUE )
CoOL_4_AUC( outcome_data, exposure_data, model, title = "Receiver operating\ncharacteristic curve", restore_par_options = TRUE )
outcome_data |
The outcome data. |
exposure_data |
The exposure data. |
model |
The fitted the non-negative neural network. |
title |
Title on the plot. |
restore_par_options |
Restore par options. |
A plot of the ROC and the ROC AUC value.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Predict the risk of the outcome using the fitted non-negative neural network.
CoOL_4_predict_risks(X, model)
CoOL_4_predict_risks(X, model)
X |
The exposure data. |
model |
The fitted the non-negative neural network. |
A vector with the predicted risk of the outcome for each individual.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Calculates risk contributions for each exposure and a baseline using layer-wise relevance propagation of the fitted non-negative neural network and data.
CoOL_5_layerwise_relevance_propagation(X, model)
CoOL_5_layerwise_relevance_propagation(X, model)
X |
The exposure data. |
model |
The fitted the non-negative neural network. |
For each individual:
The below procedure is conducted for all individuals in a one by one fashion. The baseline risk, $R^b$, is simply parameterised in the model. The decomposition of the risk contributions for exposures, $R^X_i$, takes 3 steps:
Step 1 - Subtract the baseline risk, $R^b$:
Step 2 - Decompose to the hidden layer:
Where $H_j$ is the value taken by each of the $ReLU()_j$ functions for the specific individual.
Step 3 - Hidden layer to exposures:
This creates a dataset with the dimensions equal to the number of individuals times the number of exposures plus a baseline risk value, which can be termed a risk contribution matrix. Instead of exposure values, individuals are given risk contributions, R^X_i.
A data frame with the risk contribution matrix [number of individuals, risk contributors + the baseline risk].
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Shows the calibration curve e.i. the predicted risk vs the actual risk by subgroups.
CoOL_6_calibration_plot( exposure_data, outcome_data, model, sub_groups, ipw = 1, restore_par_options = TRUE )
CoOL_6_calibration_plot( exposure_data, outcome_data, model, sub_groups, ipw = 1, restore_par_options = TRUE )
exposure_data |
The exposure dataset. |
outcome_data |
The outcome vector. |
model |
The fitted non-negative neural network. |
sub_groups |
The vector with the assigned sub_group numbers. |
ipw |
a vector of weights per observation to allow for inverse probability of censoring weighting to correct for selection bias |
restore_par_options |
Restore par options. |
A calibration curve.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Calculates presents a dendrogram coloured by the pre-defined number of sub-groups and provides the vector with sub-groups.
CoOL_6_dendrogram( risk_contributions, number_of_subgroups = 3, title = "Dendrogram", colours = NA, ipw = 1 )
CoOL_6_dendrogram( risk_contributions, number_of_subgroups = 3, title = "Dendrogram", colours = NA, ipw = 1 )
risk_contributions |
The risk contributions. |
number_of_subgroups |
The number of sub-groups chosen (Visual inspection is necessary). |
title |
The title of the plot. |
colours |
Colours indicating each sub-group. |
ipw |
a vector of weights per observation to allow for inverse probability of censoring weighting to correct for selection bias |
A dendrogram illustrating similarities between individuals based on their risk contributions.
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Estimating the risk contribution for each exposure if each individual had been exposed to only one exposure, with the value the individual actually had.
CoOL_6_individual_effects_matrix(X, model)
CoOL_6_individual_effects_matrix(X, model)
X |
The exposure data. |
model |
The fitted the non-negative neural network. |
A matrix [Number of individuals, exposures] with the estimated individual effects by each exposure had all other values been set to zero.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Calculates the mean distance by several number of subgroups to determine the optimal number of subgroups.
CoOL_6_number_of_sub_groups( risk_contributions, low_number = 1, high_number = 5, ipw = 1, restore_par_options = TRUE )
CoOL_6_number_of_sub_groups( risk_contributions, low_number = 1, high_number = 5, ipw = 1, restore_par_options = TRUE )
risk_contributions |
The risk contributions. |
low_number |
The lowest number of subgroups. |
high_number |
The highest number of subgroups. |
ipw |
a vector of weights per observation to allow for inverse probability of censoring weighting to correct for selection bias |
restore_par_options |
Restore par options. |
A plot of the mean distance by the number of subgroups. The mean distance converges when the optimal number of subgroups are found.
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Calculates presents a dendrogram coloured by the pre-defined number of sub-groups and provides the vector with sub-groups.
CoOL_6_sub_groups(risk_contributions, number_of_subgroups = 3, ipw = 1)
CoOL_6_sub_groups(risk_contributions, number_of_subgroups = 3, ipw = 1)
risk_contributions |
The risk contributions. |
number_of_subgroups |
The number of sub-groups chosen (Visual inspection is necessary). |
ipw |
a vector of weights per observation to allow for inverse probability of censoring weighting to correct for selection bias |
A vector [number of individuals] with an assigned sub-group.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
By summing the through the risk as if each individual had been exposed to only one exposure, with the value the individual actually had.
CoOL_6_sum_of_individual_effects(X, model)
CoOL_6_sum_of_individual_effects(X, model)
X |
The exposure data. |
model |
The fitted the non-negative neural network. |
A value the sum of indivisual effects, had there been no interactions between exposures.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
This plot shows the prevalence and mean risk for each sub-group. Its distribution hits at sub-groups with great public health potential.
CoOL_7_prevalence_and_mean_risk_plot( risk_contributions, sub_groups, title = "Prevalence and mean risk\nof sub-groups", y_max = NA, restore_par_options = TRUE, colours = NA, ipw = 1 )
CoOL_7_prevalence_and_mean_risk_plot( risk_contributions, sub_groups, title = "Prevalence and mean risk\nof sub-groups", y_max = NA, restore_par_options = TRUE, colours = NA, ipw = 1 )
risk_contributions |
The risk contributions. |
sub_groups |
The vector with the sub-groups. |
title |
The title of the plot. |
y_max |
Fix the axis of the risk of the outcome. |
restore_par_options |
Restore par options. |
colours |
Colours indicating each sub-group. |
ipw |
a vector of weights per observation to allow for inverse probability of censoring weighting to correct for selection bias |
A plot with prevalence and mean risks by sub-groups.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Table with the mean risk contributions by sub-groups.
CoOL_8_mean_risk_contributions_by_sub_group( risk_contributions, sub_groups, exposure_data, outcome_data, model, exclude_below = 0.001, restore_par_options = TRUE, colours = NA, ipw = 1 )
CoOL_8_mean_risk_contributions_by_sub_group( risk_contributions, sub_groups, exposure_data, outcome_data, model, exclude_below = 0.001, restore_par_options = TRUE, colours = NA, ipw = 1 )
risk_contributions |
The risk contributions. |
sub_groups |
The vector with the sub-groups. |
exposure_data |
The exposure data. |
outcome_data |
The outcome data. |
model |
The trained non-negative model. |
exclude_below |
A lower cut-off for which risk contributions shown. |
restore_par_options |
Restore par options. |
colours |
Colours indicating each sub-group. |
ipw |
a vector of weights per observation to allow for inverse probability of censoring weighting to correct for selection bias |
A plot and a dataset with the mean risk contributions by sub-groups.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Visualisation of the mean risk contributions by sub-groups. The function uses the output
CoOL_9_visualised_mean_risk_contributions( results, sub_groups, ipw = 1, restore_par_options = TRUE )
CoOL_9_visualised_mean_risk_contributions( results, sub_groups, ipw = 1, restore_par_options = TRUE )
results |
CoOL_8_mean_risk_contributions_by_sub_group. |
sub_groups |
The vector with the sub-groups. |
ipw |
a vector of weights per observation to allow for inverse probability of censoring weighting to correct for selection bias |
restore_par_options |
Restore par options. |
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
Legend to the visualisation of the mean risk contributions by sub-groups. The function uses the output
CoOL_9_visualised_mean_risk_contributions_legend( results, restore_par_options = TRUE )
CoOL_9_visualised_mean_risk_contributions_legend( results, restore_par_options = TRUE )
results |
CoOL_8_mean_risk_contributions_by_sub_group. |
restore_par_options |
Restore par options. |
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
#See the example under CoOL_0_working_example
#See the example under CoOL_0_working_example
The analysis and plots presented in the main paper. We recommend using View(CoOL_default) and View() on the many sub-functions to understand the steps and modify to your own research question. 3 sets of training will run with a learning rate of 1e-4 and a patience of 200 epochs, a learning rate of 1e-5 and a patience of 100 epochs, and a learning rate of 1e-6 and a patience of 50 epochs.
CoOL_default( data, sub_groups = 3, exclude_below = 0.01, input_parameter_reg = 0.001, hidden = 10, monitor = TRUE, epochs = 10000 )
CoOL_default( data, sub_groups = 3, exclude_below = 0.01, input_parameter_reg = 0.001, hidden = 10, monitor = TRUE, epochs = 10000 )
data |
A data.frame(cbind(outcome_data,exposure_data)). |
sub_groups |
Define the number of expected sub-groups. |
exclude_below |
Risk contributions below this value are not shown in the table. |
input_parameter_reg |
The regularization of the input parameters. |
The number of synergy-functions. |
|
monitor |
Whether monitoring plots will be shown in R. |
epochs |
The maximum number of epochs. |
A series of plots across the full Causes of Outcome Learning approach.
Rieckmann, Dworzynski, Arras, Lapuschkin, Samek, Arah, Rod, Ekstrom. 2022. Causes of outcome learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome. International Journal of Epidemiology <https://doi.org/10.1093/ije/dyac078>
# Not run while (FALSE) { #See the example under CoOL_0_working_example for a more detailed tutorial library(CoOL) data <- CoOL_0_working_example(n=10000) CoOL_default(data) }
# Not run while (FALSE) { #See the example under CoOL_0_working_example for a more detailed tutorial library(CoOL) data <- CoOL_0_working_example(n=10000) CoOL_default(data) }
Non-negative neural network
cpp_train_network_relu( x, y, c, testx, testy, testc, W1_input, B1_input, W2_input, B2_input, C2_input, ipw, lr = 0.01, maxepochs = 100, input_parameter_reg = 1e-06, drop_out = 0L, fix_baseline_risk = -1 )
cpp_train_network_relu( x, y, c, testx, testy, testc, W1_input, B1_input, W2_input, B2_input, C2_input, ipw, lr = 0.01, maxepochs = 100, input_parameter_reg = 1e-06, drop_out = 0L, fix_baseline_risk = -1 )
x |
A matrix of predictors for the training dataset of shape (nsamples, nfeatures) |
y |
A vector of output values for the training data with a length similar to the number of rows of x |
c |
A vector of the data to adjust the analysis for such as calendar time (training data) with the same number of rows as x. |
testx |
A matrix of predictors for the test dataset of shape (nsamples, nfeatures) |
testy |
A vector of output values for the test data with a length similar to the number of rows of x |
testc |
A vector the data to adjust the analysis for such as calendar time (training data) with the same number of rows as x. |
W1_input |
Input-hidden layer weights of shape (nfeatuers, hidden) |
B1_input |
Biases for the hidden layer of shape (1, hidden) |
W2_input |
Hidden-output layer weights of shape (hidden, 1) |
B2_input |
Bias for the output layer (the baseline risk) af shape (1, 1) |
C2_input |
Bias for the data to adjust the analysis for |
ipw |
a vector of weights per observation to allow for inverse probability of censoring weighting to correct for selection bias |
lr |
Initial learning rate |
maxepochs |
The maximum number of epochs |
input_parameter_reg |
Regularisation decreasing parameter value at each iteration for the input parameters |
drop_out |
To drop connections if their weights reaches zero. |
fix_baseline_risk |
To fix the baseline risk at a value. |
A list of class "SCL" giving the estimated matrices and performance indicators
Andreas Rieckmann, Piotr Dworzynski, Leila Arras, Claus Ekstrøm
Function used as part of other functions
random(r, c)
random(r, c)
r |
rows in matrix |
c |
columns in matrix |
relu-function
rcpprelu(x)
rcpprelu(x)
x |
input in the relu function |
negative relu-function
rcpprelu_neg(x)
rcpprelu_neg(x)
x |
input in the negative relu-function |
Function used as part of other functions
relu(input)
relu(input)
input |
input in the relu function |