ukb_time_skeleton() to create a reusable follow-up time skeleton with baseline date, death date, loss-to-follow-up date, administrative censoring, follow-up end reason, and valid follow-up indicators.time_skeleton support to build_survival_dataset() while preserving the default survival workflow.ukb_download_rap_dictionary(), ukb_query_dictionary(), and ukb_validate_columns(), for official RAP dictionary lookup, Chinese/English field search, and column validation.ukb_dictionary_zh metadata dataset for Chinese UKB field-path lookup.run_regression() with covariate_sets for nested epidemiological models such as crude, partially adjusted, and fully adjusted analyses.tests/ from package builds and remote tracking.CancerRegistry disease source using fields 40006, 40005, 40011, and 40012.cancer_icd10_pattern, cancer_histology, and cancer_behaviour to create_disease_definition().Lung_Cancer with cancer registry, ICD-10, and death-registry ascertainment.FirstOccurrence disease source.first_occurrence_fields and first_occurrence_source_fields to create_disease_definition().p13xxxx First Occurrence date/source fields, including UKB special date coding 819 handling.ukb_clean_missing() for converting common UKB non-response labels and numeric missing codes into analysis-ready values.ukb_snapshot() to record row/column counts, missingness, complete rows, object size, and deltas across analysis pipeline checkpoints.ukb_ml_workflow() API for binary, multiclass, and continuous non-survival ML with a frozen final test set.ukb_ml_as_split(), enhanced ukb_ml_split_data(), ukb_ml_feature_select(), ukb_ml_tune(), ukb_ml_threshold(), ukb_ml_fit_final(), and ukb_ml_evaluate_test().split_ratio style in ukb_ml_split_data() by keeping $internal_validation as an alias for the held-out split.ukb_shap() to support ukb_ml_workflow and ukb_ml_final objects, defaulting to the frozen test set for workflow objects.rpart decision tree and naive_bayes model backends to ukb_ml_workflow().options(UKBAnalytica.auto_install_ml = TRUE); by default, optional model packages are checked only when the selected model needs them and are not installed automatically.ukb_ml_survival_workflow() and survival-specific split, feature-selection, tuning, final-refit, and frozen-test evaluation helpers for time-to-event ML.model = "cox" as the lightweight default survival ML backend and aligned survival prediction output with the new workflow object structure.ukb_ml_survival() as deprecated in favor of ukb_ml_survival_workflow().dx extract_dataset and RAP table-exporter.rap_find_dataset(), rap_list_fields(), rap_plan_extract(), rap_extract_pheno(), and rap_submit_extract().variables = ... using UKBAnalytica predefined baseline mappings, while preserving field_id = ... for all instances and arrays of a UKB field.inst/python/ as legacy/helper entry points./mnt/project.ukb_ml_workflow() path.OPCS4 operative procedure support for hospital summary operations via p41272 + p41282_a*.opcs4_pattern to create_disease_definition() so procedure evidence is opt-in and ignored by default when unspecified.OPCS4 in sources, prevalent_sources, and outcome_sources.Arrhythmia, Ventricular_Arrhythmia, AV_Block, Intraventricular_Block, and SVT.Atrial_Fibrillation with OPCS4 support for procedure-augmented atrial arrhythmia ascertainment.opcs4_pattern and arrhythmia phenotyping with ICD10 + OPCS4.README.md with an ICD-10 + OPCS4 phenotyping example and clarified the default opt-in behavior for procedure data.build_survival_dataset() with show_flow to print step-by-step participant attrition in terminal for wide output.n_before, n_after, excluded, retention rates from previous/raw cohort).attr(result, "participant_flow").dt_threads in build_survival_dataset() to let users temporarily configure data.table thread count for large runs..safe_as_date() utility (R/date_utils.R) to parse mixed date formats safely and convert malformed values to NA with warnings instead of stopping execution.as.Date() calls in key pipelines with .safe_as_date() (ICD, death, baseline, incident-time utilities, and case extraction paths).parse_self_reported_illnesses() to handle malformed year values (Inf, -Inf, NaN, non-numeric strings) without charToDate crashes.p{field}_i0 and p{field} naming conventions for date/source fields.Diabetes, T1DM, T2DM) in cohort construction workflows.ukb_ml_split_data() for train/internal-validation splitting.seed.man/ukb_ml_split_data.Rd and NAMESPACE export.add sensitivity analysis module and refine the docs.
select_incident_by_years() utility to split incident cases within or after a year cutoff from enrollment.ml_model.R)ukb_ml_model(): Unified interface for training ML models
ranger)xgboost)glmnet)e1071)nnet)ukb_ml_predict(): Generate predictionsukb_ml_cv(): K-fold cross-validation with optional repeatsukb_ml_compare(): Compare multiple modelsukb_ml_importance(): Extract variable importanceml_evaluate.R)ukb_ml_metrics(): Compute performance metrics (AUC, accuracy, etc.)ukb_ml_roc(): ROC curve analysis with CIukb_ml_calibration(): Calibration curve with Brier score and ECEukb_ml_confusion(): Confusion matrixml_shap.R)ukb_shap(): Compute SHAP values for model interpretationukb_shap_summary(): Feature importance from SHAPukb_shap_dependence(): Single feature SHAP analysisukb_shap_force(): Single observation explanationml_survival.R)ukb_ml_survival(): Survival machine learning models
randomForestSRC)gbm)glmnet)ukb_ml_survival_predict(): Survival probability predictionukb_ml_survival_importance(): Variable importanceukb_ml_survival_shap(): SHAP for survival modelsplot_ml_importance(): Variable importance bar/dot plotplot_ml_roc(): ROC curve plotplot_ml_calibration(): Calibration curve plotplot_ml_confusion(): Confusion matrix heatmapplot_ml_compare(): Model comparison plotplot_shap_summary(): SHAP beeswarm/bar plotplot_shap_dependence(): SHAP dependence plotplot_shap_force(): SHAP waterfall plotranger, xgboost, glmnet, e1071, nnet, fastshap, pROC, randomForestSRCsubgroup.R)run_subgroup_analysis(): Stratified analysis with interaction p-valuesrun_multi_subgroup(): Batch analysis across multiple subgroup variablespropensity.R)estimate_propensity_score(): PS estimation via logistic regression or GBMmatch_propensity(): 1:k nearest neighbor matching with calipercalculate_weights(): IPTW weights (ATE, ATT, ATC)assess_balance(): Covariate balance assessment with SMDrun_weighted_analysis(): Weighted regression analysismediation.R)run_mediation(): Causal mediation analysis (wrapping regmedint)run_multi_mediator(): Test multiple mediatorsrun_sensitivity_mediation(): Sensitivity analysis for unmeasured confoundingmi_pool.R)pool_mi_models(): Combine regression results using Rubin's Rulesfit_mi_models(): Fit models across imputed datasetscreate_imputation_list(): Convert to mitools imputationListpool_custom_estimates(): Pool custom statisticsvisualization.R)plot_forest(): Forest plots for subgroup/regression resultsplot_km_curve(): Kaplan-Meier survival curvesplot_ps_distribution(): Propensity score distribution (histogram/density)plot_balance(): Covariate balance before/after matchingplot_calibration(): Calibration plotsplot_mediation(): Mediation effect plots (bar, decomposition, path diagram)plot_mediation_forest(): Multi-mediator forest plotplot_mi_pooled(): MI pooled results forest plotplot_mi_diagnostics(): FMI and variance diagnosticsdocs/08-advanced-analysis.Rmd)MatchIt, gbm, regmedint, mitools, MASS, cobaltFix bug in survival.R: person who has primary disease before initial time will be set NA in survival time (in order to distinguish it from person who has primary disease after initial time, with non-NA survival time).
Add variable_preprocess.R module for preprocessing baseline variables.
primary_disease argument to compute outcome_status and outcome_surv_time for a single primary endpoint.prevalent_sources and outcome_sources argument into build_survival_dataset function to manage self-report bias.sources (ICD-10, ICD-9, self-report, death).inst/python/ to extract:
inst/extdata/metabolites_non_ratio.txt).man/figures/.