Title: | Evolutionary Version of the Metropolis-Hastings Algorithm |
---|---|
Description: | Provides computational methods for detecting adverse high-order drug interactions from individual case safety reports using statistical techniques, allowing the exploration of higher-order interactions among drug cocktails. |
Authors: | Jules Bangard [aut, cre]
|
Maintainer: | Jules Bangard <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2025-02-27 18:33:33 UTC |
Source: | CRAN |
Provides computational methods for detecting adverse high-order drug interactions from individual case safety reports using statistical techniques, allowing the exploration of higher-order interactions among drug cocktails.
Jules Bangard [aut, cre] (<https://orcid.org/0009-0007-4670-7860>)
Maintainer: Jules Bangard <[email protected]>
Example dataset representing the ATC tree structure, sourced from the WHO website (2024-02-23). This dataset is provided for demonstration and testing purposes with the package.
ATC_Tree_UpperBound_2024
ATC_Tree_UpperBound_2024
A data frame with 4 variables:
The code of ATC nodes
The name of ATC nodes
The number of characters in the ATCCode
The index of the last child node in the tree
World Health Organization, ATC classification register
Convert ATC Code for each patients to the corresponding DFS number of the ATC tree
ATCtoNumeric(patientATC, tree)
ATCtoNumeric(patientATC, tree)
patientATC |
: patients observations, for each patient we got a string containing taken medications (ATC code) |
tree |
: ATC tree (we assume that there is a column 'ATCCode' ) |
a matrix of the same size as patientATC but containing integer that are the index of the corresponding ATC code.
ATC_code <- c('A01AA30 A01AB03', 'A10AC30') ATCtoNumeric(ATC_code, ATC_Tree_UpperBound_2024)
ATC_code <- c('A01AA30 A01AB03', 'A10AC30') ATCtoNumeric(ATC_code, ATC_Tree_UpperBound_2024)
Calculate the divergence between 2 distributions (the true Distribution and the learned one)
calculate_divergence( empirical_distribution, true_distribution, method = "TV", Filtered = FALSE )
calculate_divergence( empirical_distribution, true_distribution, method = "TV", Filtered = FALSE )
empirical_distribution |
A numeric vector of values representing the empirical distribution (return value of DistributionAproximation function) |
true_distribution |
A numeric vector of values representing the true distribution computed by the trueDistributionSizeTwoCocktail function |
method |
A string, either "TV" or "KL" to respectively use the total variation distance or the Kullback-Leibler divergence. (default = "TV") |
Filtered |
Should we use the filtered distribution or the normal one |
A numeric value representing the divergence of the 2 distributions
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") estimated_score_distribution = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], Smax =2) true_score_distribution = trueDistributionSizeTwoCocktail(ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], beta = 4) divergence <- calculate_divergence(empirical_distribution = estimated_score_distribution, true_distribution = true_score_distribution)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") estimated_score_distribution = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], Smax =2) true_score_distribution = trueDistributionSizeTwoCocktail(ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], beta = 4) divergence <- calculate_divergence(empirical_distribution = estimated_score_distribution, true_distribution = true_score_distribution)
Clustering of the solutions of the genetic algorithm using the hclust algorithm
clustering_genetic_algorithm( genetic_results, ATCtree, dist.normalize = TRUE, umap_config = NULL )
clustering_genetic_algorithm( genetic_results, ATCtree, dist.normalize = TRUE, umap_config = NULL )
genetic_results |
A list of cocktails in the form of integer vector |
ATCtree |
ATC tree with upper bound of the DFS |
dist.normalize |
Do we normalize the distance (so it belongs to [0;1]) |
umap_config |
The configuration to use in order to project the cocktails in a smaller space (umap::umap.defaults by default) |
A dataframe containing UMAP 1/2 the two coordinates of each cocktails in the plane as well as the cluster number of each cocktails
data("ATC_Tree_UpperBound_2024") results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) hclust_genetic_solution(genetic_results = results, ATCtree = ATC_Tree_UpperBound_2024)
data("ATC_Tree_UpperBound_2024") results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) hclust_genetic_solution(genetic_results = results, ATCtree = ATC_Tree_UpperBound_2024)
Function used to compute the Hypergeometric score on a list of cocktails
compute_hypergeom_on_list(cocktails, ATCtree, observations, num_thread = 1L)
compute_hypergeom_on_list(cocktails, ATCtree, observations, num_thread = 1L)
cocktails |
: A list containing cocktails in the form of vector of integers (ATC index) |
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
observations |
: observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution |
num_thread |
: Number of thread to run in parallel if openMP is available, 1 by default |
Hypergeometric score among "cocktails" parameters
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") cocktails = list(c(561, 904), c(1902, 4585)) Hypergeom_of_cocktails = compute_hypergeom_on_list(cocktails = cocktails, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") cocktails = list(c(561, 904), c(1902, 4585)) Hypergeom_of_cocktails = compute_hypergeom_on_list(cocktails = cocktails, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
Function used to compute the Relative Risk on a list of cocktails
compute_RR_on_list(cocktails, ATCtree, observations, num_thread = 1L)
compute_RR_on_list(cocktails, ATCtree, observations, num_thread = 1L)
cocktails |
: A list containing cocktails in the form of vector of integers (ATC index) |
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
observations |
: observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution |
num_thread |
: Number of thread to run in parallel if openMP is available, 1 by default |
RR score among "cocktails" parameters
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") cocktails = list(c(561, 904), c(1902, 4585)) RR_of_cocktails = compute_RR_on_list(cocktails = cocktails, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") cocktails = list(c(561, 904), c(1902, 4585)) RR_of_cocktails = compute_RR_on_list(cocktails = cocktails, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
Function used in the reference article to compare diverse Disproportionality Analysis metrics
computeMetrics_size2(CocktailList, ATCtree, observations, num_thread = 1L)
computeMetrics_size2(CocktailList, ATCtree, observations, num_thread = 1L)
CocktailList |
: A list of cocktails on which the Disproportionality analysis metrics should be computed |
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
observations |
: observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution |
num_thread |
: Number of thread to run in parallel if openMP is available, 1 by default |
Multiple DA metrics computed on CocktailList cocktails
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") cocktails = list(c(561, 904), c(1902, 4585)) # only size 2 cocktails allowed for this function scores_of_cocktails = computeMetrics_size2(CocktailList = cocktails, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,])
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") cocktails = list(c(561, 904), c(1902, 4585)) # only size 2 cocktails allowed for this function scores_of_cocktails = computeMetrics_size2(CocktailList = cocktails, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,])
Function used to convert your genetic algorithm results that are stored into a .csv file to a Data structure that can be used by the clustering algorithm
csv_to_population(ATC_name, filename, sep = ";")
csv_to_population(ATC_name, filename, sep = ";")
ATC_name |
the ATC_name column of the ATC tree |
filename |
Name of the file where the results are located |
sep |
the separator to use when opening the csv file (';' by default) |
An R List that can be used by other algorithms (e.g. clustering algorithm)
data("ATC_Tree_UpperBound_2024") genetic_results = csv_to_population(ATC_Tree_UpperBound_2024$Name, "path/to/output.csv")
data("ATC_Tree_UpperBound_2024") genetic_results = csv_to_population(ATC_Tree_UpperBound_2024$Name, "path/to/output.csv")
The MCMC method that runs the random walk on a single cocktail in order to estimate the distribution of score among cocktails of size Smax.
DistributionApproximation( epochs, ATCtree, observations, temperature = 1L, nbResults = 5L, Smax = 2L, p_type1 = 0.01, beta = 4L, max_score = 500L, num_thread = 1L, verbose = FALSE )
DistributionApproximation( epochs, ATCtree, observations, temperature = 1L, nbResults = 5L, Smax = 2L, p_type1 = 0.01, beta = 4L, max_score = 500L, num_thread = 1L, verbose = FALSE )
epochs |
: number of steps for the MCMC algorithm |
ATCtree |
: ATC tree with upper bound of the DFS (without the root, also see on the github repo for an example) |
observations |
: real observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) |
temperature |
: starting temperature, default = 1 (denoted T in the article) |
nbResults |
: Number of returned solution (Cocktail of size Smax with the best oberved score during the run), 5 by default |
Smax |
: Size of the cocktail we approximate the distribution from |
p_type1 |
: probability to operate type1 mutation. Note : the probability to operate the type 2 mutation is then 1 - P_type1. P_type1 must be in [0;1]. Default is .01 |
beta |
: filter the minimum number of patients that must have taken the cocktail for his risk to be taken into account in the DistributionScoreBeta default is 4 |
max_score |
: maximum number the score can take. Score greater than this one would be added to the distribution as the value max_score. Default is 500 |
num_thread |
: Number of thread to run in parallel if openMP is available, 1 by default |
verbose |
: Output summary (default is false) |
I no problem, return a List containing : - ScoreDistribution : the distribution of the score as an array with each cells representing the number of risks = (index-1)/ 10 - Outstanding_score : An array of the score greater than max_score, - Best_cocktails : the nbResults bests cocktails encountered during the run. - Best_scores : Score corresponding to the bestCocktails. - FilteredDistribution : Distribution containing score for cocktails taken by at least beta patients. - Best_cocktails_beta : the nbResults bests cocktails taken by at least beta patients encountered during the run. - Best_scores_beta : Score corresponding to the bestCocktailsBeta. - cocktailSize : Smax parameter used during the run. ; Otherwise the list is empty
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") estimation = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") estimation = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
Example dataset representing drug intake and adverse event reports from FAERS. This dataset is provided to demonstrate the functionality of genetic and MCMC algorithms in the package.
FAERS_myopathy
FAERS_myopathy
A data frame with 2 columns:
Drug intake for each patient as a vector of ATC tree indices
Indicates if the patient experienced myopathy as an adverse event
Food & Drug Administration Event Reporting System (FAERS)
Genetic algorithm, trying to reach riskiest cocktails (the ones which maximize the fitness function, Hypergeometric score in our case)
GeneticAlgorithm( epochs, nbIndividuals, ATCtree, observations, num_thread = 1L, diversity = FALSE, p_crossover = 0.8, p_mutation = 0.01, nbElite = 0L, tournamentSize = 2L, alpha = 1, summary = TRUE )
GeneticAlgorithm( epochs, nbIndividuals, ATCtree, observations, num_thread = 1L, diversity = FALSE, p_crossover = 0.8, p_mutation = 0.01, nbElite = 0L, tournamentSize = 2L, alpha = 1, summary = TRUE )
epochs |
: number of step or the algorithm |
nbIndividuals |
: size of the population |
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
observations |
: real observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) |
num_thread |
: Number of thread to run in parallel if openMP is available, 1 by default |
diversity |
: enable the diversity mechanism of the algorithm (favor the diversity of cocktail in the population), default is false |
p_crossover |
: probability to operate a crossover on the crossover phase. Default is 80% |
p_mutation |
: probability to operate a mutation after the crossover phase. Default is 1% |
nbElite |
: number of best individual we keep from generation to generation. Default is 0 |
tournamentSize |
: size of the tournament (select the best individual between tournamentSize sampled individuals) |
alpha |
: when making a type 1 mutation you have (alpha / size of cocktail) chance to add a drug. |
summary |
: print the summary of population at each steps ? |
If no problem, return a List : - meanFitnesses : The mean score of the population at each epochs of the algorithm. - BestFitnesses : The best score of the population at each epochs of the algorithm. - FinalPopulation : The final population of the algorithm when finished (medications and corresponding scores)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in an arbitrary cocktail list
get_dissimilarity_from_cocktail_list(cocktails, ATCtree, normalization = TRUE)
get_dissimilarity_from_cocktail_list(cocktails, ATCtree, normalization = TRUE)
cocktails |
: A list of cocktails in the form of a vector of integer |
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
normalization |
: Do we keep the distance between cocktail in the range [0;1] ? |
The square matrix of distances between cocktails
data("ATC_Tree_UpperBound_2024") cocktails = list(c(561, 904), c(1902, 4585)) # only size 2 cocktails allowed for this function distance_matrix = get_dissimilarity_from_cocktail_list(cocktails = cocktails, ATCtree = ATC_Tree_UpperBound_2024, normalization = TRUE)
data("ATC_Tree_UpperBound_2024") cocktails = list(c(561, 904), c(1902, 4585)) # only size 2 cocktails allowed for this function distance_matrix = get_dissimilarity_from_cocktail_list(cocktails = cocktails, ATCtree = ATC_Tree_UpperBound_2024, normalization = TRUE)
Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the genetic_results list.
get_dissimilarity_from_genetic_results(genetic_results, ATCtree, normalization)
get_dissimilarity_from_genetic_results(genetic_results, ATCtree, normalization)
genetic_results |
the List returned by the genetic algorithm. |
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
normalization |
: Do we keep the distance between cocktail in the range [0;1] ? |
The square matrix of distances between cocktails
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") genetic_results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) distance_matrix = get_dissimilarity_from_genetic_results(genetic_results = genetic_results, ATCtree = ATC_Tree_UpperBound_2024, normalization = TRUE)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") genetic_results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) distance_matrix = get_dissimilarity_from_genetic_results(genetic_results = genetic_results, ATCtree = ATC_Tree_UpperBound_2024, normalization = TRUE)
Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the csv file containing results of genetic algorithm
get_dissimilarity_from_txt_file(filename, ATCtree, normalization = TRUE)
get_dissimilarity_from_txt_file(filename, ATCtree, normalization = TRUE)
filename |
: the name of the file returned by the print_csv function. |
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
normalization |
: Do we keep the distance between cocktail in the range [0;1] ? |
The square matrix of distances between cocktails
data("ATC_Tree_UpperBound_2024") distance_matrix = get_dissimilarity_from_txt_file(filename = '250e_700ind_0.2mr_0ne_2alpha.txt', ATCtree = ATC_Tree_UpperBound_2024, normalization = TRUE)
data("ATC_Tree_UpperBound_2024") distance_matrix = get_dissimilarity_from_txt_file(filename = '250e_700ind_0.2mr_0ne_2alpha.txt', ATCtree = ATC_Tree_UpperBound_2024, normalization = TRUE)
Clustering of the solutions of the genetic algorithm using the hclust algorithm
hclust_genetic_solution( genetic_results, ATCtree, dist.normalize = TRUE, method = "complete" )
hclust_genetic_solution( genetic_results, ATCtree, dist.normalize = TRUE, method = "complete" )
genetic_results |
The return value of the genetic algorithm |
ATCtree |
ATC tree with upper bound of the DFS |
dist.normalize |
Do we normalize the distance (so it bellongs to [0;1]) |
method |
(from hclust function) the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). |
the hierarchical clustering of the results of the genetic algorithm
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) hclust_genetic_solution(genetic_results = results, ATCtree = ATC_Tree_UpperBound_2024)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) hclust_genetic_solution(genetic_results = results, ATCtree = ATC_Tree_UpperBound_2024)
Convert the histogram returned by the DistributionApproximation function, to a real number distribution (that can be used in a test for example)
histogramToDitribution(vec)
histogramToDitribution(vec)
vec |
: distribution returned by the DistributionAproximationFunction |
A vector containing sampled risk during the MCMC algorithm
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) histogramToDitribution(DistributionApproximationResults$ScoreDistribution)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) histogramToDitribution(DistributionApproximationResults$ScoreDistribution)
This function can be used in order to try different set of parameters for the genetic algorithm in a convenient way. This will run each combination of mutation_rate, nb_elite and alphas possible nb_test_desired times. For each sets of parameters, results will be saved in a file named according to the set of parameter. One can regroup the results of each run in a csv file by using the print_csv function specifying the names of each file that needs to be treated and the number of performed runs on each parameter set
hyperparam_test_genetic_algorithm( epochs, nb_individuals, ATCtree, observations, nb_test_desired, mutation_rate, nb_elite, alphas, path = "./", num_thread = 1L )
hyperparam_test_genetic_algorithm( epochs, nb_individuals, ATCtree, observations, nb_test_desired, mutation_rate, nb_elite, alphas, path = "./", num_thread = 1L )
epochs |
: the number of epochs for the genetic algorithm |
nb_individuals |
: the size of the population in the genetic algorithm |
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
observations |
: observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution |
nb_test_desired |
: number of genetic algorithm runs on each sets of parameters |
mutation_rate |
: a vector with each mutation_rate to be tested |
nb_elite |
: a vector with each nb_elite to be tested |
alphas |
: a vector with each alphas to be tested |
path |
: the path where the resulting files should be written |
num_thread |
: Number of thread to run in parallel if openMP is available, 1 by default |
No return value, this function should output results of the runs of the genetic algorithm in a specific format supported by function print_csv and p_value_csv_file. The files are outputed in path which is current directory by default.
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") # different parameter to test for mutation_rate = c(.1,.2,.3) nb_elite = c(0,1,2) alphas = c(0.5,1,2) hyperparam_test_genetic_algorithm(epochs = 2, nb_individuals = 2, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, nb_test_desired = 5, mutation_rate = mutation_rate, nb_elite = nb_elite, alphas = alphas)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") # different parameter to test for mutation_rate = c(.1,.2,.3) nb_elite = c(0,1,2) alphas = c(0.5,1,2) hyperparam_test_genetic_algorithm(epochs = 2, nb_individuals = 2, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, nb_test_desired = 5, mutation_rate = mutation_rate, nb_elite = nb_elite, alphas = alphas)
Function used to convert integer cocktails (like the one outputed by the distributionApproximation function) to string cocktail in order to make them more readable
int_cocktail_to_string_cocktail(cocktails, ATC_name)
int_cocktail_to_string_cocktail(cocktails, ATC_name)
cocktails |
cocktails vector to be converted (index in the ATC tree) |
ATC_name |
The ATC_name column of the ATC tree |
The name of integer cocktails in cocktails
data("ATC_Tree_UpperBound_2024") int_list = list(c(561, 904), c(1902, 4585)) int_cocktail_to_string_cocktail(int_list, ATC_Tree_UpperBound_2024$Name)
data("ATC_Tree_UpperBound_2024") int_list = list(c(561, 904), c(1902, 4585)) int_cocktail_to_string_cocktail(int_list, ATC_Tree_UpperBound_2024$Name)
Output the outstanding score (Outstanding_score) outputed by the MCMC algorithm in a special format
OutsandingScoreToDistribution(outstanding_score, max_score)
OutsandingScoreToDistribution(outstanding_score, max_score)
outstanding_score |
: Outstanding_score outputed by MCMC algorithm to be converted to the ScoreDistribution format |
max_score |
: max_score parameter used during the MCMC algorithm |
outstanding_score in a format compatible with MCMC algorithm output
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) OutsandingScoreToDistribution(DistributionApproximationResults$Outstanding_score, max_score = 100)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) OutsandingScoreToDistribution(DistributionApproximationResults$Outstanding_score, max_score = 100)
Used to add the p_value to each cocktail of cocktail list
p_value_cocktails( distribution_outputs, cocktails, ATCtree, observations, num_thread = 1L, filtred_distribution = FALSE )
p_value_cocktails( distribution_outputs, cocktails, ATCtree, observations, num_thread = 1L, filtred_distribution = FALSE )
distribution_outputs |
A list of distribution of cocktails of different sizes in order to compute the p_value for multiple cocktail sizes |
cocktails |
A list containing cocktails in the form of vector of integers (ATC index) |
ATCtree |
ATC tree with upper bound of the DFS (without the root) |
observations |
observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution |
num_thread |
Number of thread to run in parallel if openMP is available, 1 by default |
filtred_distribution |
Does the p-values have to be computed using filtered distribution or normal distribution (filtered distribution by default) |
A real valued number vector representing the p-value of the inputed cocktails computed on the distribution_outputs List.
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults_size2 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 2) DistributionApproximationResults_size3 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 3) score_distribution_list = list(DistributionApproximationResults_size2, DistributionApproximationResults_size3) cocktails = list(c(561, 904), c(1902, 4585)) p_value_cocktails(score_distribution_list, cocktails, ATC_Tree_UpperBound_2024, FAERS_myopathy)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults_size2 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 2) DistributionApproximationResults_size3 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 3) score_distribution_list = list(DistributionApproximationResults_size2, DistributionApproximationResults_size3) cocktails = list(c(561, 904), c(1902, 4585)) p_value_cocktails(score_distribution_list, cocktails, ATC_Tree_UpperBound_2024, FAERS_myopathy)
Used to add the p_value to each cocktail of a csv_file that is an output of the genetic algorithm
p_value_csv_file( distribution_outputs, filename, filtred_distribution = FALSE, sep = ";" )
p_value_csv_file( distribution_outputs, filename, filtred_distribution = FALSE, sep = ";" )
distribution_outputs |
A list of distribution of cocktails of different sizes in order to compute the p_value for multiple cocktail sizes |
filename |
The file name of the .csv file containing the output |
filtred_distribution |
Does the p-values have to be computed using filtered distribution or normal distribution (filtered distribution by default) |
sep |
The separator used in the csv file (';' by default) |
A real valued number vector representing the p-value of the inputed csv file filename, computed on the distribution_outputs List.
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults_size2 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 2) DistributionApproximationResults_size3 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 3) score_distribution_list = list(DistributionApproximationResults_size2, DistributionApproximationResults_size3) p_value_csv_file(score_distribution_list, "path/to/output.csv")
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults_size2 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 2) DistributionApproximationResults_size3 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 3) score_distribution_list = list(DistributionApproximationResults_size2, DistributionApproximationResults_size3) p_value_csv_file(score_distribution_list, "path/to/output.csv")
Used to add the p_value to each cocktail of an output of the genetic algorithm
p_value_genetic_results( distribution_outputs, genetic_results, filtred_distribution = FALSE )
p_value_genetic_results( distribution_outputs, genetic_results, filtred_distribution = FALSE )
distribution_outputs |
A list of distribution of cocktails of different sizes in order to compute the p_value for multiple cocktail sizes |
genetic_results |
outputs of the genetic algorithm |
filtred_distribution |
Does the p-values have to be computed using filtered distribution or normal distribution (filtered distribution by default) |
A real valued number vector representing the p-value of the inputed genetic algorithm results (genetic_results) computed on the distribution_outputs List.
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults_size2 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 2) DistributionApproximationResults_size3 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 3) score_distribution_list = list(DistributionApproximationResults_size2, DistributionApproximationResults_size3) genetic_results = GeneticAlgorithm(epochs = 10, nbIndividuals = 20, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) p_value_genetic_results(score_distribution_list, genetic_results)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") DistributionApproximationResults_size2 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 2) DistributionApproximationResults_size3 = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 3) score_distribution_list = list(DistributionApproximationResults_size2, DistributionApproximationResults_size3) genetic_results = GeneticAlgorithm(epochs = 10, nbIndividuals = 20, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) p_value_genetic_results(score_distribution_list, genetic_results)
Calculate p-value of sampled value
p_value_on_sampled( empirical_distribution, sampled_values, isFiltered = FALSE, includeZeroValue = FALSE )
p_value_on_sampled( empirical_distribution, sampled_values, isFiltered = FALSE, includeZeroValue = FALSE )
empirical_distribution |
A numeric vector of values representing the empirical distribution (return value of DistributionAproximation function) |
sampled_values |
A scalar or a vector of real valued number representing the sampled value (score to be tested) |
isFiltered |
A boolean representing if we want to use the filtered distribution or the distribution as is (False by default) |
includeZeroValue |
A boolean that indicate if you want to take into account the null score (False by default) |
A numeric value representing the empirical p-value
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") cocktails = list(c(561, 904), c(1902, 4585)) estimated_score_distribution = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) Hypergeom_of_cocktails = compute_hypergeom_on_list(cocktails = cocktails, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) p_value = p_value_on_sampled(empirical_distribution = estimated_score_distribution, sampled_values = Hypergeom_of_cocktails)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") cocktails = list(c(561, 904), c(1902, 4585)) estimated_score_distribution = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) Hypergeom_of_cocktails = compute_hypergeom_on_list(cocktails = cocktails, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) p_value = p_value_on_sampled(empirical_distribution = estimated_score_distribution, sampled_values = Hypergeom_of_cocktails)
Plot the evolution of the mean and the best value of the population used by the GeneticAlgorithm
plot_evolution( list, mean_color = "#F2A900", best_color = "#008080", xlab = "Epochs", ylab = "Score" )
plot_evolution( list, mean_color = "#F2A900", best_color = "#008080", xlab = "Epochs", ylab = "Score" )
list |
A list with 2 elements returned by the GeneticAlgorithm: "mean" and "best", containing the numeric vectors representing the mean and best fitness of the population |
mean_color |
A string specifying the color of the mean values |
best_color |
A string specifying the color of the best values |
xlab |
A string specifying the label for the x-axis |
ylab |
A string specifying the label for the y-axis |
no returned value, should plot the evolution of the genetic algorithm results (mean/max score for each epoch).
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) plot_evolution(list = results)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) plot_evolution(list = results)
Plot the histogram of the approximation of the RR distribution
plot_frequency( estimated, sqrt = FALSE, binwidth = 0.1, hist_color = "#69b3a2", density_color = "#FF5733", xlab = "Score" )
plot_frequency( estimated, sqrt = FALSE, binwidth = 0.1, hist_color = "#69b3a2", density_color = "#FF5733", xlab = "Score" )
estimated |
The ScoreDistribution element in the list returned by the DistributionApproximation function |
sqrt |
A Boolean to specify whether we normalize the estimated or not, it is recommended on large random walk. |
binwidth |
The width of the histogram bins |
hist_color |
The fill color for the histogram bars |
density_color |
The color for the density curve |
xlab |
Label of X axis |
no returned value, should plot the histogram of the estimated distribution (estimated).
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") estimation = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) plot_frequency(estimated = estimation$ScoreDistribution)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") estimation = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy) plot_frequency(estimated = estimation$ScoreDistribution)
Print every cocktails found during the genetic algorithm when used with the hyperparam_test_genetic_algorithm function. This enables to condense the solutions found in each files by collapsing similar cocktail in a single row by cocktail.
print_csv( input_filenames, observations, repetition, ATCtree, csv_filename = "solutions.csv" )
print_csv( input_filenames, observations, repetition, ATCtree, csv_filename = "solutions.csv" )
input_filenames |
: A List containing filename of hyperparam_test_genetic_algorithm output file |
observations |
: observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution |
repetition |
: The parameter nb_test_desired used in the hyperparam test function |
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
csv_filename |
: Name of the output file, "solutions.csv" by default |
No return value, should process the output of the genetic algorithm in files produced by hyperparam_test_genetic_algorithm and output a summary csv file. The csv file is outputed in current directory and named after the csv_filename variable (solutions.csv by default).
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") files = c('250e_700ind_0.2mr_0ne_2alpha.txt') # results of hyperparam_test_genetic_algorithm print_csv(input_filenames = files, observations = FAERS_myopathy, repetition = 5, ATCtree = ATC_Tree_UpperBound_2024)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") files = c('250e_700ind_0.2mr_0ne_2alpha.txt') # results of hyperparam_test_genetic_algorithm print_csv(input_filenames = files, observations = FAERS_myopathy, repetition = 5, ATCtree = ATC_Tree_UpperBound_2024)
Make a Quantile-Quantile diagram from the output of the MCMC algorithm (DistributionAproximation) and the algorithm that exhaustively calculates the distribution
qq_plot_output(estimated, true, filtered = FALSE, color = "steelblue")
qq_plot_output(estimated, true, filtered = FALSE, color = "steelblue")
estimated |
Outputed object of DistributionApproximation function |
true |
Outputed object of either DistributionApproximation function or True distribution computation function |
filtered |
Make use of the classic distributuion estimation or of the filtred one (number of patient taking the cocktail > beta) |
color |
The color of the dashed line of the qq-plot |
no returned value, should plot the quantile-quantile plot of the estimated distribution (estimated) vs the true distribution (true).
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") estimated_score_distribution = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], Smax =2) true_score_distribution = trueDistributionSizeTwoCocktail(ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], beta = 4) qq_plot_output(estimated = estimated_score_distribution, true = true_score_distribution)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") estimated_score_distribution = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], Smax =2) true_score_distribution = trueDistributionSizeTwoCocktail(ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], beta = 4) qq_plot_output(estimated = estimated_score_distribution, true = true_score_distribution)
Function used to convert a string vector of drugs in form "drug1:drug2" to a vector of index of the ATC tree ex: c(ATC_index(drug1), ATC_index(drugs2))
string_list_to_int_cocktails(ATC_name, lines)
string_list_to_int_cocktails(ATC_name, lines)
ATC_name |
the ATC_name column of the ATC tree |
lines |
A string vector of drugs cocktail in the form "drug1:drug2:...:drug_n" |
An R List that can be used by other algorithms (e.g. clustering algorithm)
data("ATC_Tree_UpperBound_2024") string_list = c('hmg coa reductase inhibitors:nervous system', 'metformin:prasugrel') string_list_to_int_cocktails(ATC_Tree_UpperBound_2024$Name, string_list)
data("ATC_Tree_UpperBound_2024") string_list = c('hmg coa reductase inhibitors:nervous system', 'metformin:prasugrel') string_list_to_int_cocktails(ATC_Tree_UpperBound_2024$Name, string_list)
The true distribution of the score among every single nodes of the ATC
trueDistributionDrugs( ATCtree, observations, beta, max_score = 1000L, nbResults = 100L, num_thread = 1L )
trueDistributionDrugs( ATCtree, observations, beta, max_score = 1000L, nbResults = 100L, num_thread = 1L )
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
observations |
: observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution |
beta |
: minimum number of person taking the cocktails in order to consider it in the beta score distribution |
max_score |
: maximum number the score can take. Score greater than this one would be added to the distribution as the value max_score. Default is 1000 |
nbResults |
: Number of returned solution (Cocktail with the best oberved score during the run), 100 by default |
num_thread |
: Number of thread to run in parallel if openMP is available, 1 by default |
Return a List containing : - ScoreDistribution : the distribution of the score as an array with each cells representing the number of risks = (index-1)/ 10 - Filtered_score_distribution : Distribution containing score for cocktails taken by at least beta patients. - Outstanding_score : An array of the score greater than max_score, - Best_cocktails : the nbResults bests cocktails encountered during the run. - Best_cocktails_beta : the nbResults bests cocktails taken by at least beta patients encountered during the run. - Best_scores : Score corresponding to the Best_cocktails. - Best_scores_beta : Score corresponding to the Best_cocktails_beta.
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") size_1_score_distribution = trueDistributionDrugs(ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], beta = 4)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") size_1_score_distribution = trueDistributionDrugs(ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], beta = 4)
The true distribution of the score among every size-two cocktails
trueDistributionSizeTwoCocktail( ATCtree, observations, beta, max_score = 100L, nbResults = 100L, num_thread = 1L )
trueDistributionSizeTwoCocktail( ATCtree, observations, beta, max_score = 100L, nbResults = 100L, num_thread = 1L )
ATCtree |
: ATC tree with upper bound of the DFS (without the root) |
observations |
: observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution |
beta |
: minimum number of person taking the cocktails in order to consider it in the beta score distribution |
max_score |
: maximum number the score can take. Score greater than this one would be added to the distribution as the value max_score. Default is 1000 |
nbResults |
: Number of returned solution (Cocktail with the best oberved score during the run), 100 by default |
num_thread |
: Number of thread to run in parallel if openMP is available, 1 by default |
Return a List containing : - ScoreDistribution : the distribution of the score as an array with each cells representing the number of risks = (index-1)/ 10 - Filtered_score_distribution : Distribution containing score for cocktails taken by at least beta patients. - Outstanding_score : An array of the score greater than max_score, - Best_cocktails : the nbResults bests cocktails encountered during the run. - Best_cocktails_beta : the nbResults bests cocktails taken by at least beta patients encountered during the run. - Best_scores : Score corresponding to the Best_cocktails. - Best_scores_beta : Score corresponding to the Best_cocktails_beta.
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") size_2_score_distribution = trueDistributionSizeTwoCocktail(ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], beta = 4)
data("ATC_Tree_UpperBound_2024") data("FAERS_myopathy") size_2_score_distribution = trueDistributionSizeTwoCocktail(ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy[1:100,], beta = 4)