| Title: | Digital Epidemiological Analysis and Visualization Tools |
|---|---|
| Description: | Integrates methods for epidemiological analysis, modeling, and visualization, including functions for summary statistics, SIR (Susceptible-Infectious-Recovered) modeling, DALY (Disability-Adjusted Life Years) estimation, age standardization, diagnostic test evaluation, NLP (Natural Language Processing) keyword extraction, clinical trial power analysis, survival analysis, SNP (Single Nucleotide Polymorphism) association, and machine learning methods such as logistic regression, k-means clustering, Random Forest, and Support Vector Machine (SVM). Includes datasets for prevalence estimation, SIR modeling, genomic analysis, clinical trials, DALY, diagnostic tests, and survival analysis. Methods are based on Gelman et al. (2013) <doi:10.1201/b16018> and Wickham et al. (2019, ISBN:9781492052040>. |
| Authors: | Esther Atsabina Wanjala [aut, cre] |
| Maintainer: | Esther Atsabina Wanjala <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-06-03 09:10:28 UTC |
| Source: | https://github.com/cran/EpidigiR |
A dataset containing simulated clinical trial data for analyzing treatment outcomes, suitable for power calculations, logistic regression, Random Forest, and SVM.
clinical_dataclinical_data
A data frame with 200 rows and 6 columns:
Character, unique identifier for each trial participant.
Character, treatment arm (e.g., Treatment, Control).
Numeric, binary outcome (0 = no response, 1 = response).
Numeric, patient age (years).
Numeric, baseline health score (0 to 100).
Numeric, treatment dose level (e.g., 0 for control, 1 for low dose, 2 for high dose).
Simulated data for demonstration purposes.
data("clinical_data") clinical_data$outcome <- as.factor(clinical_data$outcome) epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "logistic") epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "rf") epi_visualize(clinical_data, x = "arm", y = "outcome", type = "boxplot")data("clinical_data") clinical_data$outcome <- as.factor(clinical_data$outcome) epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "logistic") epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "rf") epi_visualize(clinical_data, x = "arm", y = "outcome", type = "boxplot")
A dataset containing simulated data for calculating Disability-Adjusted Life Years (DALY) in epidemiological studies.
daly_datadaly_data
A data frame with 20 rows and 3 columns:
Character, population group (e.g., region or age group).
Numeric, years of life lost due to premature mortality.
Numeric, years lived with disability.
Simulated data for demonstration purposes.
data("daly_data") epi_analyze(daly_data, outcome = NULL, type = "daly")data("daly_data") epi_analyze(daly_data, outcome = NULL, type = "daly")
A dataset containing simulated data for evaluating diagnostic tests in epidemiological studies.
diagnostic_datadiagnostic_data
A data frame with 10 rows and 5 columns:
Character, unique identifier for each test.
Numeric, number of true positive results.
Numeric, number of false positive results.
Numeric, number of true negative results.
Numeric, number of false negative results.
Simulated data for demonstration purposes.
data("diagnostic_data") epi_analyze(diagnostic_data, outcome = NULL, type = "diagnostic")data("diagnostic_data") epi_analyze(diagnostic_data, outcome = NULL, type = "diagnostic")
Performs summary statistics, SIR modeling, DALY calculation, age standardization, diagnostic test evaluation, or NLP keyword extraction.
epi_analyze( data, outcome, population, group = NULL, type = c("summary", "sir", "daly", "age_standardize", "diagnostic", "nlp"), ... )epi_analyze( data, outcome, population, group = NULL, type = c("summary", "sir", "daly", "age_standardize", "diagnostic", "nlp"), ... )
data |
Input data frame with relevant columns (e.g., cases, population, yll, yld, text). |
outcome |
Outcome column name (character, e.g., "cases"). |
population |
Population column name (character, e.g., "population", required for summary). |
group |
Grouping column name (character, e.g., "region", optional). |
type |
Analysis type: "summary", "sir", "daly", "age_standardize", "diagnostic", "nlp". |
... |
Additional parameters (e.g., N, beta, gamma for SIR). |
A data frame with analysis results.
Performs clinical trial power calculation, survival analysis, SNP association, logistic regression, k-means clustering, Random Forest, or SVM.
epi_model( data, formula = NULL, type = c("power", "survival", "snp", "logistic", "kmeans", "rf", "svmRadial"), ... )epi_model( data, formula = NULL, type = c("power", "survival", "snp", "logistic", "kmeans", "rf", "svmRadial"), ... )
data |
Input data frame with relevant columns (e.g., outcome, genotypes). |
formula |
Model formula (optional, for survival/logistic/rf/svmRadial, e.g., "outcome ~ x"). |
type |
Model type: "power", "survival", "snp", "logistic", "kmeans", "rf", "svmRadial". |
... |
Additional parameters (e.g., n, effect_size for power; k for kmeans). |
A data frame or list with model results.
A dataset containing disease prevalence data across different regions and age groups, including spatial coordinates.
epi_prevalenceepi_prevalence
A data frame with 12 rows and 7 columns:
Character, region name (e.g., North, South, East, West).
Character, age group (e.g., 0-19, 20-59, 60+).
Numeric, number of disease cases.
Numeric, population size in the region and age group.
Numeric, prevalence percentage (cases / population * 100).
Numeric, latitude for spatial mapping.
Numeric, longitude for spatial mapping.
Simulated data for demonstration purposes.
data("epi_prevalence") library(sp) coordinates(epi_prevalence) <- ~lon+lat epi_visualize(epi_prevalence, x = "prevalence", type = "map") epi_analyze(epi_prevalence,outcome = "cases",population = "population",type = "summary") if (interactive()) { epi_prevalence$region_id <- as.numeric(factor(epi_prevalence$region)) epi_visualize(epi_prevalence, x = "region_id", y = "prevalence", type = "scatter") with(epi_prevalence, axis(1, at = unique(region_id), labels = levels(factor(region)))) }data("epi_prevalence") library(sp) coordinates(epi_prevalence) <- ~lon+lat epi_visualize(epi_prevalence, x = "prevalence", type = "map") epi_analyze(epi_prevalence,outcome = "cases",population = "population",type = "summary") if (interactive()) { epi_prevalence$region_id <- as.numeric(factor(epi_prevalence$region)) epi_visualize(epi_prevalence, x = "region_id", y = "prevalence", type = "scatter") with(epi_prevalence, axis(1, at = unique(region_id), labels = levels(factor(region)))) }
Creates visualizations for prevalence mapping, epidemic curves, or general plots (scatter, boxplot).
epi_visualize( data, x, y = NULL, type = c("map", "curve", "scatter", "boxplot"), ... )epi_visualize( data, x, y = NULL, type = c("map", "curve", "scatter", "boxplot"), ... )
data |
Input data frame or SpatialPolygonsDataFrame with relevant columns. |
x |
X-axis column name (character, e.g., "region"). |
y |
Y-axis column name (character, e.g., "prevalence", optional). |
type |
Plot type: "map", "curve", "scatter", "boxplot". |
... |
Additional plotting parameters (e.g., main, xlab). |
A plot (spplot for maps, base R for others).
A dataset containing simulated genotypes and case-control status for SNP association analysis.
geno_datageno_data
A data frame with 100 rows and 2 columns:
Numeric, genotype (0 = AA, 1 = Aa, 2 = aa).
Numeric, case (1) or control (0) status.
Simulated data for demonstration purposes.
data("geno_data") epi_model(geno_data, type = "snp")data("geno_data") epi_model(geno_data, type = "snp")
A dataset containing simulated patient data for predicting disease risk, suitable for logistic regression, clustering, Random Forest, and SVM.
ml_dataml_data
A data frame with 100 rows and 5 columns:
Numeric, binary disease status (0 = healthy, 1 = diseased).
Numeric, patient age (years).
Numeric, exposure level (0 to 1, e.g., environmental risk).
Numeric, genetic risk score (0 to 1).
Character, region name (e.g., North, South, East, West).
Simulated data for demonstration purposes.
data("ml_data") ml_data$outcome <- as.factor(ml_data$outcome) epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "logistic") epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "rf") epi_visualize(ml_data, x = "age", y = "outcome", type = "scatter")data("ml_data") ml_data$outcome <- as.factor(ml_data$outcome) epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "logistic") epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "rf") epi_visualize(ml_data, x = "age", y = "outcome", type = "scatter")
A dataset containing simulated epidemiological text data, such as outbreak reports or health alerts, for NLP analysis.
nlp_datanlp_data
A data frame with 100 rows and 2 columns:
Character, unique identifier for each text entry.
Character, text content (e.g., outbreak descriptions, health reports).
Simulated data for demonstration purposes.
data("nlp_data") epi_analyze(nlp_data, outcome = NULL, type = "nlp", n = 5)data("nlp_data") epi_analyze(nlp_data, outcome = NULL, type = "nlp", n = 5)
A dataset containing simulated SIR model outputs for a population of 1000.
sir_datasir_data
A data frame with 50 rows and 4 columns:
Numeric, time point (1 to 50 days).
Numeric, number of susceptible individuals.
Numeric, number of infected individuals.
Numeric, number of recovered individuals.
Generated using epi_analyze(type = "sir", N = 1000, beta = 0.3, gamma = 0.1, days = 50).
data("sir_data") epi_visualize(sir_data, x = "time", y = "Infected", type = "curve")data("sir_data") epi_visualize(sir_data, x = "time", y = "Infected", type = "curve")
A dataset containing simulated survey data for age standardization in epidemiological studies.
survey_datasurvey_data
A data frame with 20 rows and 3 columns:
Character, age group (e.g., 0-19, 20-39, 40-59, 60+).
Numeric, disease rates (e.g., cases per 1000).
Numeric, population weights for standardization.
Simulated data for demonstration purposes.
data("survey_data") epi_analyze(survey_data, outcome = NULL, type = "age_standardize")data("survey_data") epi_analyze(survey_data, outcome = NULL, type = "age_standardize")
A dataset containing simulated data for survival analysis in epidemiological studies.
survival_datasurvival_data
A data frame with 100 rows and 3 columns:
Character, unique identifier for each individual.
Numeric, time to event (e.g., years until death or censoring).
Numeric, event status (0 = censored, 1 = event occurred).
Simulated data for demonstration purposes.
data("survival_data") epi_model(survival_data, type = "survival") epi_visualize(survival_data, x = "time", y = "status", type = "scatter")data("survival_data") epi_model(survival_data, type = "survival") epi_visualize(survival_data, x = "time", y = "status", type = "scatter")