Package: UKBAnalytica 1.0.0

Nan He

UKBAnalytica: UK Biobank Data Processing and Survival Analysis Toolkit

Provides an integrated workflow for UK Biobank Research Analysis Platform (RAP) hosted and RAP-generated analysis tables. The package supports RAP phenotype extraction planning, predefined variable sets and disease definitions, standardized baseline preprocessing, multi-source endpoint ascertainment, prevalent and incident case classification, survival-ready cohort construction, regression, multiple imputation, propensity score analysis, mediation analysis, subgroup and sensitivity analyses, machine learning, proteomics enrichment and protein-protein interaction analysis, and publication-oriented visualization. The package workflow is described in He et al. (2026) <doi:10.64898/2026.06.19.26356057>.

Authors:Nan He [aut, cre]

UKBAnalytica_1.0.0.tar.gz
UKBAnalytica_1.0.0.tar.gz(r-4.7-any)UKBAnalytica_1.0.0.tar.gz(r-4.6-any)
UKBAnalytica_1.0.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
UKBAnalytica/json (API)

# Install 'UKBAnalytica' in R:
install.packages('UKBAnalytica', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/hinna0818/ukbanalytica/issues

Datasets:

On CRAN:

Conda:

1.63 score 43 scripts 196 exports 91 dependencies

Last updated from:38137fe91b. Checks:4 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK291
source / vignettesOK335
linux-release-x86_64OK284
wasm-releaseOK252

Exports:assess_balancebuild_survival_datasetcalculate_air_pollutioncalculate_blood_pressurecalculate_diet_scorecalculate_weightsclassify_metabolitescombine_disease_definitionscompare_data_sourcescompute_protein_ppi_metricscreate_baseline_tablecreate_disease_definitioncreate_imputation_listcreate_medication_definitionestimate_propensity_scoreextract_cases_by_sourceextract_diabetes_subtype_baselineextract_disease_diagnosisextract_disease_historyextract_disease_history_sensitivityextract_medicationsextract_self_report_medicationsfit_mi_modelsget_death_datesget_disease_catalogget_field_infoget_field_metadataget_medication_catalogget_pomegranate_diseasesget_pomegranate_source_manifestget_predefined_diseasesget_predefined_medicationsget_protein_ppiget_ukb_demo_colnamesget_variable_infoget_variable_setget_variable_setsload_pomegranate_portal_codingload_ukb_medication_codingload_ukb_metabolite_panelmatch_propensitymetabolite_to_metaboanalyst_nameparse_cancer_registryparse_death_recordsparse_icd10_diagnosesparse_icd9_diagnosesparse_opcs4_proceduresparse_self_reported_illnessesplot_balanceplot_calibrationplot_correlationplot_cox_loghr_correlationplot_cox_sensitivity_correlationplot_enrichment_lollipopplot_forestplot_go_ora_barplot_heatmapplot_km_curveplot_mediationplot_mediation_forestplot_metabolite_ora_barplotplot_metabolite_ora_dotplotplot_mi_diagnosticsplot_mi_pooledplot_ml_calibrationplot_ml_compareplot_ml_confusionplot_ml_dcaplot_ml_gainplot_ml_importanceplot_ml_ksplot_ml_liftplot_ml_prplot_ml_rocplot_ml_roc_compareplot_participant_flowplot_ps_distributionplot_rcsplot_regression_volcanoplot_scatterplot_shap_beeswarmplot_shap_dependenceplot_shap_forceplot_shap_summaryplot_stacked_barplot_top_hr_barsplot_violinpool_custom_estimatespool_mi_modelspreprocess_baselineprotein_to_gene_symbolrank_protein_ppi_nodesrap_extract_phenorap_find_datasetrap_list_fieldsrap_plan_extractrap_submit_extractrun_correlationrun_imputationrun_mediationrun_metabolite_orarun_multi_mediatorrun_multi_subgrouprun_protein_kegg_orarun_protein_orarun_protein_ppi_clusteringrun_protein_ppi_robustnessrun_rcsrun_regressionrun_sensitivity_mediationrun_subgroup_analysisrun_weighted_analysisrunmulti_competingrunmulti_coxrunmulti_cox_lagrunmulti_cox_zphrunmulti_gamrunmulti_glmrunmulti_lmrunmulti_logitrunmulti_negbinrunmulti_trendscore_protein_ppi_clustersselect_incident_by_yearssensitivity_exclude_early_eventssensitivity_exclude_missing_covariatessubset_protein_ppitidy.mi_pooled_resultukb_check_rap_envukb_clean_missingukb_compare_cox_resultsukb_compare_sensitivity_coxukb_cox_diagnosticsukb_create_extraction_manifestukb_decodeukb_decode_column_namesukb_decode_valuesukb_demoukb_download_rap_dictionaryukb_extract_fieldsukb_field_infoukb_metadata_setupukb_ml_as_splitukb_ml_calibrationukb_ml_compareukb_ml_compare_feature_setsukb_ml_compare_flowsukb_ml_confusionukb_ml_cvukb_ml_dcaukb_ml_evaluate_testukb_ml_feature_selectukb_ml_fit_finalukb_ml_flowukb_ml_gain_liftukb_ml_importanceukb_ml_ksukb_ml_metricsukb_ml_modelukb_ml_prukb_ml_predictukb_ml_rocukb_ml_roc_dataukb_ml_split_dataukb_ml_supported_modelsukb_ml_survivalukb_ml_survival_as_splitukb_ml_survival_evaluate_testukb_ml_survival_feature_selectukb_ml_survival_fit_finalukb_ml_survival_importanceukb_ml_survival_predictukb_ml_survival_shapukb_ml_survival_split_dataukb_ml_survival_tuneukb_ml_survival_workflowukb_ml_thresholdukb_ml_tuneukb_ml_workflowukb_participant_flowukb_protein_annotationukb_query_dictionaryukb_scale_with_parametersukb_search_fieldsukb_sensitivity_suiteukb_shapukb_shap_dependenceukb_shap_forceukb_shap_summaryukb_snapshotukb_standardize_by_trainukb_time_skeletonukb_top_hr_resultsukb_train_validation_coxukb_validate_columnsukb_write_extraction_manifest

Dependencies:backportsbitbit64bootbroomclassclicliprcodetoolscpp11crayondata.tableDBIdplyre1071farverforcatsforeachgdatagenericsggplot2glmnetgluegmodelsgtablegtoolshavenhmsigraphisobanditeratorsjomolabelinglabelledlatticelifecyclelme4lmtestmagrittrMASSMatrixmgcvmiceminqamitmlmitoolsnlmenloptrnnetnumDerivordinalpanpillarpkgconfigprettyunitspROCprogressproxypurrrR6rbibutilsRColorBrewerRcppRcppArmadilloRcppEigenRdpackreadrreformulasrlangrpartS7sandwichscalesshapestringistringrsurveysurvivaltableonetibbletidyrtidyselecttzdbucminfutf8vctrsviridisLitevroomwithrxml2zoo

Readme and manuals

Help Manual

Help pageTopics
Assess Covariate Balanceassess_balance
Build Survival Analysis Datasetbuild_survival_dataset
Calculate air pollution exposure averagescalculate_air_pollution
Calculate blood pressure from multiple readingscalculate_blood_pressure
Calculate diet scorecalculate_diet_score
Calculate IPTW Weightscalculate_weights
Classify UK Biobank metabolite namesclassify_metabolites
Extract Coefficients from Mediation Resultscoef.mediation_result
Combine Multiple Disease Definitionscombine_disease_definitions
Compare Case Counts Across Data Sourcescompare_data_sources
Compute topological metrics for a PPI networkcompute_protein_ppi_metrics
Confidence Intervals for Mediation Resultsconfint.mediation_result
Create a baseline table comparing cases and controls under different conditions.create_baseline_table
Create Disease Definition Objectcreate_disease_definition
Create an imputationList Objectcreate_imputation_list
Create a medication definition objectcreate_medication_definition
Estimate Propensity Scoreestimate_propensity_score
Extract Cases by Specified Data Sourcesextract_cases_by_source
Extract Baseline Diabetes Subtypes (T1DM/T2DM) with HbA1c Supportextract_diabetes_subtype_baseline
Extract participant-level disease diagnosis statusextract_disease_diagnosis
Extract Disease History (Prevalent Cases) for Covariatesextract_disease_history
Extract Disease History with Multiple Source Comparisonsextract_disease_history_sensitivity
Extract medication use from UKB drug fieldsextract_medications
Extract self-reported medication indicators from field 20003extract_self_report_medications
Fit Regression Models on Multiple Imputed Datasetsfit_mi_models
Extract Death Dates for All Deceased Participantsget_death_dates
Query the built-in disease code catalogget_disease_catalog
Get one UK Biobank field's metadataget_field_info
Get structured UK Biobank field metadataget_field_metadata
Query the built-in medication code catalogget_medication_catalog
Get Pomegranate-derived disease definitionsget_pomegranate_diseases
Get the Pomegranate source manifestget_pomegranate_source_manifest
Get Predefined Disease Definitionsget_predefined_diseases
Get predefined UK Biobank medication definitionsget_predefined_medications
Retrieve a STRING PPI network for proteomics hitsget_protein_ppi
Get column names of the synthetic UK Biobank-style demo datasetget_ukb_demo_colnames
Get information about available variablesget_variable_info
Get one curated UK Biobank variable setget_variable_set
Curated UK Biobank variable sets for extractionget_variable_sets
Load the Pomegranate portal coding evidence tableload_pomegranate_portal_coding
Load UK Biobank field 20003 medication codingload_ukb_medication_coding
Load the bundled UK Biobank non-ratio metabolite panelload_ukb_metabolite_panel
Propensity Score Matchingmatch_propensity
Map metabolite names to MetaboAnalyst-compatible namesmetabolite_to_metaboanalyst_name
Parse Cancer Registry Recordsparse_cancer_registry
Parse Death Registry Recordsparse_death_records
Parse ICD-10 Hospital Diagnosis Recordsparse_icd10_diagnoses
Parse ICD-9 Hospital Diagnosis Recordsparse_icd9_diagnoses
Parse OPCS4 Hospital Procedure Recordsparse_opcs4_procedures
Parse Self-Reported Illness Recordsparse_self_reported_illnesses
Plot Covariate Balance (Love Plot)plot_balance
Plot Calibration Curveplot_calibration
Visualize correlation matrix as a heatmapplot_correlation
Plot training-validation Cox log(HR) concordanceplot_cox_loghr_correlation
Plot sensitivity-analysis Cox log(HR) concordanceplot_cox_sensitivity_correlation
Plot enrichment results as a lollipop chart via TCMDATAplot_enrichment_lollipop
Plot Forest Plot for Subgroup Analysisplot_forest
Plot GO ORA results as a bar chart via TCMDATAplot_go_ora_bar
Plot a publication-style heatmapplot_heatmap
Plot Kaplan-Meier Survival Curveplot_km_curve
Plot Mediation Analysis Resultsplot_mediation
Plot Forest Plot for Multiple Mediator Analysisplot_mediation_forest
Plot metabolite ORA results as a bar plotplot_metabolite_ora_barplot
Plot metabolite ORA results as a dot plotplot_metabolite_ora_dotplot
Plot Multiple Imputation Diagnosticsplot_mi_diagnostics
Plot Multiple Imputation Pooled Resultsplot_mi_pooled
Plot Calibration Curveplot_ml_calibration
Plot Model Comparisonplot_ml_compare
Plot Confusion Matrixplot_ml_confusion
Plot Decision Curve Analysisplot_ml_dca
Plot Gain Curveplot_ml_gain
Plot Variable Importanceplot_ml_importance
Plot KS Curveplot_ml_ks
Plot Lift Curveplot_ml_lift
Plot PR Curveplot_ml_pr
Plot ROC Curvesplot_ml_roc
Plot One or More ROC Curves from Tidy ROC Dataplot_ml_roc_compare
Plot a participant flow tableplot_participant_flow
Plot Propensity Score Distributionplot_ps_distribution
Plot a restricted cubic spline exposure-response curveplot.ukb_rcs plot_rcs plot_rcs.ukb_rcs
Plot a volcano-style regression summaryplot_regression_volcano
Plot a publication-style scatter plotplot_scatter
Plot SHAP Beeswarm Summaryplot_shap_beeswarm
Plot SHAP Dependenceplot_shap_dependence
Plot SHAP Force (Waterfall)plot_shap_force
Plot SHAP Summaryplot_shap_summary
Plot a publication-style stacked bar chartplot_stacked_bar
Plot top positive and inverse Cox associationsplot_top_hr_bars
Plot a publication-style violin plotplot_violin
Plot a UKB ML Flow Objectplot.ukb_ml_flow
Plot a UKB ML Flow Comparison Objectplot.ukb_ml_flow_compare
Pool Custom Estimates from Multiple Imputationspool_custom_estimates
Pool Results from Multiple Imputation Modelspool_mi_models
Preprocess UKB baseline variablespreprocess_baseline
Print Method for Mediation Resultsprint.mediation_result
Convert protein identifiers to gene symbolsprotein_to_gene_symbol
Rank nodes in a PPI network by integrated centralityrank_protein_ppi_nodes
Extract RAP Phenotype Data Synchronouslyrap_extract_pheno
Find the RAP Dataset File in the Current Projectrap_find_dataset
List Approved RAP Dataset Fieldsrap_list_fields
Plan a RAP Phenotype Extractionrap_plan_extract
Submit a RAP Table-Exporter Phenotype Extraction Jobrap_submit_extract
Calculate correlation between variablesrun_correlation
Multiple imputation and merge back to full datarun_imputation
Run Causal Mediation Analysisrun_mediation
Run metabolite over-representation analysisrun_metabolite_ora
Run Multiple Mediator Analysisrun_multi_mediator
Run Multiple Subgroup Analysesrun_multi_subgroup
Run KEGG ORA enrichment for proteomics hitsrun_protein_kegg_ora
Run GO ORA enrichment for proteomics hitsrun_protein_ora
Cluster a protein-protein interaction networkrun_protein_ppi_clustering
Evaluate PPI network robustness for selected protein targetsrun_protein_ppi_robustness
Fit a restricted cubic spline exposure-response modelrun_rcs
Run a regression model (unified interface)run_regression
Sensitivity Analysis for Mediationrun_sensitivity_mediation
Run Subgroup Analysisrun_subgroup_analysis
Run Weighted Analysisrun_weighted_analysis
Run Multiple Fine-Gray Competing-Risk Modelsrunmulti_competing
Run multiple Cox proportional hazards modelsrunmulti_cox
Run Lagged Cox Sensitivity Analysesrunmulti_cox_lag
Run Multiple Cox Models with PH Diagnosticsrunmulti_cox_zph
Run multiple generalised additive modelsrunmulti_gam
Run multiple generalised linear modelsrunmulti_glm
Run multiple linear regression modelsrunmulti_lm
Run multiple logistic regression modelsrunmulti_logit
Run multiple negative-binomial regression modelsrunmulti_negbin
Run Grouped-Exposure Trend Testsrunmulti_trend
Score network clusters in a PPI graphscore_protein_ppi_clusters
Select Incident Cases by Time Since Enrollmentselect_incident_by_years
Exclude Early Events for Sensitivity Analysissensitivity_exclude_early_events
Exclude Rows with Missing Covariates for Sensitivity Analysissensitivity_exclude_missing_covariates
Filter a STRING PPI network via TCMDATAsubset_protein_ppi
Summary Method for Mediation Resultssummary.mediation_result
Tidy Method for mi_pooled_resulttidy.mi_pooled_result
Check the UK Biobank RAP execution environmentukb_check_rap_env
Clean UK Biobank Missing and Non-response Valuesukb_clean_missing
Compare Cox results between training and validation setsukb_compare_cox_results
Compare sensitivity Cox results against a main analysisukb_compare_sensitivity_cox
Diagnose Proportional Hazards Assumptions for a Cox Modelukb_cox_diagnostics
Create a RAP extraction manifestukb_create_extraction_manifest
Decode UK Biobank RAP exportsukb_decode
Decode UK Biobank column namesukb_decode_column_names
Decode UK Biobank coded valuesukb_decode_values
Generate a small synthetic UK Biobank-style demo datasetukb_demo
Chinese UK Biobank field-path dictionaryukb_dictionary_zh
Download the official RAP data dictionaryukb_download_rap_dictionary
Extract UK Biobank fields from a search result or field listukb_extract_fields
Inspect one UK Biobank fieldukb_field_info
Set up UK Biobank metadata for search, extraction, and decodingukb_metadata_setup
Standardize Manual ML Train/Test Splitsukb_ml_as_split
Calibration Curve Analysisukb_ml_calibration
Compare Multiple ML Modelsukb_ml_compare
Compare Multiple Feature Sets with a Frozen-Test ML Workflowukb_ml_compare_feature_sets
Compare Multiple Feature Sets and/or Modelsukb_ml_compare_flows
Confusion Matrixukb_ml_confusion
Cross-Validation for ML Modelsukb_ml_cv
Decision Curve Analysisukb_ml_dca
Evaluate the Final Model Once on the Frozen Test Setukb_ml_evaluate_test
Select Features for UKB ML Workflowsukb_ml_feature_select
Refit the Final ML Model on Training Development Dataukb_ml_fit_final
Run a Complete Single-Model UKB ML Flowukb_ml_flow
Gain and Lift Curve Analysisukb_ml_gain_lift
Get Variable Importanceukb_ml_importance
KS Curve Analysisukb_ml_ks
Calculate Model Performance Metricsukb_ml_metrics
Train a Machine Learning Modelukb_ml_model
Precision-Recall Curve Analysisukb_ml_pr
Predict from ML Modelukb_ml_predict
ROC Curve Analysisukb_ml_roc
Create ROC Curve Data for Binary ML Predictionsukb_ml_roc_data
Split Data into Frozen ML Train/Test Setsukb_ml_split_data
List Supported Machine Learning Modelsukb_ml_supported_models
Train Survival Machine Learning Modelukb_ml_survival
Standardize Manual Survival ML Train/Test Splitsukb_ml_survival_as_split
Evaluate Survival ML Once on the Frozen Test Setukb_ml_survival_evaluate_test
Select Features for Survival ML Workflowsukb_ml_survival_feature_select
Refit Final Survival ML Modelukb_ml_survival_fit_final
Get Variable Importance for Survival Modelukb_ml_survival_importance
Predict from Survival ML Modelukb_ml_survival_predict
SHAP Values for Survival Modelsukb_ml_survival_shap
Split Data into Frozen Survival ML Train/Test Setsukb_ml_survival_split_data
Tune Survival ML Hyperparameters Without Touching the Test Setukb_ml_survival_tune
Run a Frozen-Test Survival ML Workflowukb_ml_survival_workflow
Learn a Binary Classification Thresholdukb_ml_threshold
Tune ML Hyperparameters Without Touching the Test Setukb_ml_tune
Run a Frozen-Test UKB ML Workflowukb_ml_workflow
Build a participant flow tableukb_participant_flow
Annotate Olink-style protein variablesukb_protein_annotation
Query UK Biobank dictionary metadataukb_query_dictionary
Standardize variables using existing scaling parametersukb_scale_with_parameters
Search UK Biobank fieldsukb_search_fields
Run a Cox sensitivity-analysis suiteukb_sensitivity_suite
Compute SHAP Valuesukb_shap
SHAP Dependence Valuesukb_shap_dependence
SHAP Force Plot Dataukb_shap_force
SHAP Summary Statisticsukb_shap_summary
Record or Retrieve UKB Cohort Snapshotsukb_snapshot
Standardize variables using training-set parametersukb_standardize_by_train
Build a UK Biobank follow-up time skeletonukb_time_skeleton
Select top Cox associations by hazard ratioukb_top_hr_results
Run Cox models in training and validation setsukb_train_validation_cox
Validate requested columns against a data objectukb_validate_columns
Write a RAP extraction manifestukb_write_extraction_manifest
UKBAnalytica: UK Biobank Data Processing and Survival Analysis ToolkitUKBAnalytica-package UKBAnalytica