| Title: | Analyse WHO STEPS Survey Data |
|---|---|
| Description: | Provides a complete analysis pipeline for the WHO STEPwise Approach to NCD Risk Factor Surveillance (STEPS) as described in Riley et al. (2016) <doi:10.2105/AJPH.2015.302962>. Imports raw survey data ('CSV', 'Excel', 'Stata', 'SPSS'), applies WHO-standard cleaning and recoding, sets up complex survey designs, computes all standard NCD indicators (tobacco, alcohol, diet, physical activity, anthropometry, blood pressure, biochemical), and generates publication-ready tables, visualisations, and 'Word'/'HTML' reports (fact sheet, data book, country report). |
| Authors: | Abhijit Pakhare [aut, cre], Ankur Joshi [aut], Lena Charlette [aut], WHO STEPS R Pipeline Contributors [ctb] |
| Maintainer: | Abhijit Pakhare <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-11 22:14:20 UTC |
| Source: | https://github.com/cran/stepssurvey |
A complete analysis pipeline for the WHO STEPwise Approach to NCD Risk Factor Surveillance (STEPS).
Maintainer: Abhijit Pakhare [email protected]
Authors:
Ankur Joshi
Lena Charlette
Other contributors:
WHO STEPS R Pipeline Contributors [contributor]
Useful links:
Report bugs at https://github.com/drpakhare/stepssurvey/issues
Build all tables from computed results
build_all_tables(results)build_all_tables(results)
results |
A named list of results from |
A named list of flextable objects. NULL entries excluded.
Creates a horizontal point-and-CI plot (forest plot style) for all key indicators, grouped by domain.
build_forest_plot(key_indicators, country_name, survey_year)build_forest_plot(key_indicators, country_name, survey_year)
key_indicators |
A data frame with domain, indicator, estimate, lower, upper. |
country_name |
Country name for title. |
survey_year |
Survey year for title. |
A ggplot2 object.
Creates a radar-style chart showing prevalence of key risk factors on a polar coordinate system for quick visual comparison.
build_radar_plot(key_indicators, country_name, survey_year)build_radar_plot(key_indicators, country_name, survey_year)
key_indicators |
A data frame with domain, indicator, estimate. |
country_name |
Country name for title. |
survey_year |
Survey year for title. |
A ggplot2 object.
Generates a list of ggplot2 plots showing key NCD risk factor prevalence with 95% confidence intervals, stratified by sex and age group.
build_steps_plots(indicators, key_indicators, country_name, survey_year)build_steps_plots(indicators, key_indicators, country_name, survey_year)
indicators |
A list of indicator results from |
key_indicators |
A data frame with key indicators (domain, indicator, estimate, lower, upper). |
country_name |
Country name for plot titles. |
survey_year |
Survey year for plot titles. |
All plots use the WHO STEPS colour scheme and professional styling. Error bars represent 95% confidence intervals. Prevalence values are displayed on bars/points with light background text.
A named list of ggplot2 objects:
overview: Horizontal bar chart of key indicators
tobacco_by_sex: Sex-stratified tobacco use
bp_by_sex: Sex-stratified blood pressure
obesity_by_sex: Sex-stratified overweight/obesity
glucose_by_sex: Sex-stratified blood glucose
bp_by_age: Age-stratified blood pressure with ribbon CI
obesity_by_age: Age-stratified overweight/obesity with ribbon CI
sex_dashboard: Combined 2x2 dashboard of sex-stratified charts (if >=2 sex plots available)
NULL entries are preserved in the list.
test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) all_ind <- compute_all_indicators(design) plots <- build_steps_plots(all_ind$results, all_ind$key_indicators, "Test", 2023) names(plots)test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) all_ind <- compute_all_indicators(design) plots <- build_steps_plots(all_ind$results, all_ind$key_indicators, "Test", 2023) names(plots)
Generates formatted flextable objects for all available STEPS indicators, with rows for age groups and columns for both sexes combined, males, and females. Tables include 95% confidence intervals.
build_steps_tables(indicators)build_steps_tables(indicators)
indicators |
A list of indicator results from |
Each table has age groups as rows and prevalence (with 95% CI) as a column. The last row shows the total (age-standardised) estimate. Column header styling uses WHO STEPS branding (dark blue background).
A named list of flextable objects, one per indicator.
Names correspond to indicators (e.g., current_tobacco, raised_bp).
NULL entries are excluded. Prints count of tables generated.
test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) all_ind <- compute_all_indicators(design) tables <- build_steps_tables(all_ind$results) names(tables)test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) all_ind <- compute_all_indicators(design) tables <- build_steps_tables(all_ind$results) names(tables)
Dispatches to the appropriate formatting method based on table type.
build_table(result)build_table(result)
result |
A result list from |
A flextable object, or NULL if the table is not available.
Processes raw STEPS survey data: renames columns, coerces types, derives standard indicators, handles missing values, and applies plausibility checks.
clean_steps_data( data, cols, age_min = 18, age_max = 69, bp_sbp_threshold = 140, bp_dbp_threshold = 90, bmi_overweight = 25, bmi_obese = 30, glucose_threshold = 7, glucose_impaired_threshold = 6.1, chol_threshold = 5 )clean_steps_data( data, cols, age_min = 18, age_max = 69, bp_sbp_threshold = 140, bp_dbp_threshold = 90, bmi_overweight = 25, bmi_obese = 30, glucose_threshold = 7, glucose_impaired_threshold = 6.1, chol_threshold = 5 )
data |
A data frame (typically from |
cols |
A named list of column names, as returned by |
age_min |
Minimum age for inclusion (default 18). |
age_max |
Maximum age for inclusion (default 69). |
bp_sbp_threshold |
SBP threshold for raised BP (default 140; Mongolia uses 130). |
bp_dbp_threshold |
DBP threshold for raised BP (default 90; Mongolia uses 80). |
bmi_overweight |
BMI threshold for overweight (default 25.0). |
bmi_obese |
BMI threshold for obesity (default 30.0). |
glucose_threshold |
Fasting glucose threshold for raised glucose / diabetes in mmol/L (default 7.0). |
glucose_impaired_threshold |
Fasting glucose threshold for impaired fasting glucose in mmol/L (default 6.1). |
chol_threshold |
Total cholesterol threshold for raised cholesterol in mmol/L (default 5.0). |
The function performs the following transformations:
Renames columns to standard names (age, sex, wt_final, etc.)
Converts numeric strings to appropriate types
Restricts age to [age_min, age_max]
Creates WHO standard age groups (18-24, 25-34, etc.)
Harmonises sex coding to Male/Female
Derives body mass index (BMI) and categories
Averages blood pressure readings (last 2 of 3)
Recodes yes/no variables to logical
Creates derived risk indicators (raised BP, diabetes, etc.)
Applies plausibility checks to measurements
Drops records with missing age or sex
A data frame with standardised and derived variables, ready for survey design setup.
Calculates prevalence of alcohol use from a survey design object. Computes proportions of current alcohol use and heavy episodic drinking, stratified by sex and age group where available.
compute_alcohol_indicators(design)compute_alcohol_indicators(design)
design |
A survey design object from |
A named list of survey estimates. Each element contains proportion estimates (as tibble with columns: estimate, lower, upper, etc.) for:
current_alcohol_total: current alcohol use, overall
current_alcohol_by_sex: current alcohol use, by sex
current_alcohol_by_age: current alcohol use, by age group
heavy_episodic_total: heavy episodic drinking, overall
heavy_episodic_by_sex: heavy episodic drinking, by sex
heavy_episodic_by_age: heavy episodic drinking, by age group
(if the corresponding variables exist in design)
test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) alcohol_results <- compute_alcohol_indicators(design)test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) alcohol_results <- compute_alcohol_indicators(design)
Runs all indicator modules (tobacco, alcohol, diet & physical activity, anthropometry, blood pressure, and biochemical), using the appropriate step-specific survey design for each domain per WHO STEPS methodology:
Step 1 (behavioural): tobacco, alcohol, diet & physical activity
Step 2 (physical): anthropometry, blood pressure
Step 3 (biochemical): biochemical measures
compute_all_indicators(design)compute_all_indicators(design)
design |
A |
A list with two elements:
results: a named list containing indicator results grouped by domain
(tobacco, alcohol, diet_pa, anthropometry, blood_pressure, biochemical)
key_indicators: a tibble with columns domain, indicator, estimate,
lower, and upper, summarising headline estimates across all domains
test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) all_indicators <- compute_all_indicators(design) names(all_indicators$results)test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) all_indicators <- compute_all_indicators(design) names(all_indicators$results)
Iterates through the full steps_table_registry() and computes every
table that has available data. Returns a named list of results.
compute_all_tables(designs, data = NULL)compute_all_tables(designs, data = NULL)
designs |
A list of survey designs, with elements |
data |
The cleaned data frame. |
A named list of table results (from compute_table()).
Only entries with available == TRUE are included.
Calculates prevalence of overweight, obesity, and central obesity, plus mean BMI and waist circumference, from a survey design object.
compute_anthropometry_indicators(design)compute_anthropometry_indicators(design)
design |
A survey design object from |
A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:
overweight_obese_total: overweight or obese (BMI >=25), overall
overweight_obese_by_sex: overweight or obese, by sex
overweight_obese_by_age: overweight or obese, by age group
obese_total: obese (BMI >=30), overall
obese_by_sex: obese, by sex
obese_by_age: obese, by age group
central_obesity_total: central obesity, overall
central_obesity_by_sex: central obesity, by sex
central_obesity_by_age: central obesity, by age group
bmi_mean_total: mean BMI, overall
bmi_mean_by_sex: mean BMI, by sex
waist_cm_mean_total: mean waist circumference, overall
waist_cm_mean_by_sex: mean waist circumference, by sex
(if the corresponding variables exist in design)
test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) anthropometry_results <- compute_anthropometry_indicators(design)test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) anthropometry_results <- compute_anthropometry_indicators(design)
Calculates prevalence of raised glucose, diabetes, impaired glucose tolerance, and raised cholesterol, plus mean fasting glucose and total cholesterol from a survey design object.
compute_biochemical_indicators(design)compute_biochemical_indicators(design)
design |
A survey design object from |
A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:
raised_glucose_total: raised fasting glucose, overall
raised_glucose_by_sex: raised fasting glucose, by sex
raised_glucose_by_age: raised fasting glucose, by age group
diabetes_total: diabetes, overall
diabetes_by_sex: diabetes, by sex
diabetes_by_age: diabetes, by age group
impaired_glucose_total: impaired fasting glucose, overall
impaired_glucose_by_sex: impaired fasting glucose, by sex
impaired_glucose_by_age: impaired fasting glucose, by age group
raised_chol_total: raised total cholesterol, overall
raised_chol_by_sex: raised total cholesterol, by sex
raised_chol_by_age: raised total cholesterol, by age group
fasting_glucose_mean_total: mean fasting glucose, overall
fasting_glucose_mean_by_sex: mean fasting glucose, by sex
total_chol_mean_total: mean total cholesterol, overall
total_chol_mean_by_sex: mean total cholesterol, by sex
(if the corresponding variables exist in design)
test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) biochemical_results <- compute_biochemical_indicators(design)test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) biochemical_results <- compute_biochemical_indicators(design)
Calculates prevalence of raised blood pressure and mean systolic and diastolic blood pressure from a survey design object.
compute_bp_indicators(design)compute_bp_indicators(design)
design |
A survey design object from |
A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:
raised_bp_total: raised blood pressure, overall
raised_bp_by_sex: raised blood pressure, by sex
raised_bp_by_age: raised blood pressure, by age group
mean_sbp_mean_total: mean systolic BP, overall
mean_sbp_mean_by_sex: mean systolic BP, by sex
mean_sbp_mean_by_age: mean systolic BP, by age group
mean_dbp_mean_total: mean diastolic BP, overall
mean_dbp_mean_by_sex: mean diastolic BP, by sex
mean_dbp_mean_by_age: mean diastolic BP, by age group
(if the corresponding variables exist in design)
test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) bp_results <- compute_bp_indicators(design)test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) bp_results <- compute_bp_indicators(design)
Calculates prevalence of insufficient physical activity and low fruit & vegetable intake, plus mean metabolic equivalent (MET) values, from a survey design object.
compute_diet_pa_indicators(design)compute_diet_pa_indicators(design)
design |
A survey design object from |
A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:
insufficient_pa_total: insufficient physical activity, overall
insufficient_pa_by_sex: insufficient physical activity, by sex
insufficient_pa_by_age: insufficient physical activity, by age group
low_fruit_veg_total: low fruit & vegetable intake, overall
low_fruit_veg_by_sex: low fruit & vegetable intake, by sex
low_fruit_veg_by_age: low fruit & vegetable intake, by age group
met_mean_total: mean MET (if available)
met_mean_by_sex: mean MET by sex (if available)
(if the corresponding variables exist in design)
test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) diet_pa_results <- compute_diet_pa_indicators(design)test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) diet_pa_results <- compute_diet_pa_indicators(design)
Takes a table specification from steps_table_registry() and a survey
design object, and produces the survey-weighted estimates needed to fill
the standard WHO STEPS data book table.
This is the main workhorse: given one registry entry and a survey design,
it dispatches to the appropriate method based on entry$type and returns
a standardised result list.
compute_table(entry, design, data = NULL)compute_table(entry, design, data = NULL)
entry |
A single list element from |
design |
A survey design object (from |
data |
The cleaned data frame (used for variable availability checks). |
A list with:
Table identifier.
Table title.
Table type.
Logical: TRUE if the required variable(s) exist.
A list of data frames: For proportion: total, by_sex, by_age (each with estimate, lower, upper). For mean: total, by_sex, by_age (each with estimate, lower, upper). For category: total, by_sex, by_age (each with level, estimate, lower, upper). For cascade: named list of proportion results.
Calculates prevalence of tobacco use from a survey design object. Computes proportions of current and daily tobacco use, stratified by sex and age group where available.
compute_tobacco_indicators(design)compute_tobacco_indicators(design)
design |
A survey design object from |
When both smoking and smokeless tobacco variables are present,
current_tobacco_any (either smoking or smokeless) is preferred
as the headline tobacco indicator. The function also reports
current_smoker and current_smokeless separately if available.
A named list of survey estimates. Each element contains proportion estimates (as tibble with columns: estimate, lower, upper, etc.) for:
current_tobacco_any_total/by_sex/by_age: any current tobacco use
(smoking or smokeless) – preferred headline variable
current_tobacco_total/by_sex/by_age: current tobacco smoking
current_smoker_total/by_sex/by_age: current smoker
current_smokeless_total/by_sex/by_age: current smokeless tobacco
daily_tobacco_total/by_sex/by_age: daily tobacco use
(only elements for variables present in design are returned)
test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) tobacco_results <- compute_tobacco_indicators(design)test_data <- generate_test_data(n = 500, seed = 42) cols <- detect_steps_columns(test_data) clean <- clean_steps_data(test_data, cols) design <- setup_survey_design(clean) tobacco_results <- compute_tobacco_indicators(design)
Tries to find a column in the data matching one of several candidate names (case-insensitive).
detect_col(data, candidates, label = NULL)detect_col(data, candidates, label = NULL)
data |
A data frame. |
candidates |
Character vector of possible column names. |
label |
Optional label for progress messages. |
The matched column name (character) or NULL.
Scans a data frame for standard WHO STEPS variable names across versions 3.1 and 3.2. Aliases are listed in priority order: the first match wins, so put the most specific / unambiguous name first.
detect_steps_columns(data)detect_steps_columns(data)
data |
A data frame (typically from |
WHO STEPS reorganised variable codes between v3.1 and v3.2:
v3.1 / Epi Info codes (still common in many country datasets): B1-B6 = blood-pressure readings, B7 = BP meds, C1 = fasting glucose, C5 = DM meds, C6 = total cholesterol, C10 = chol meds, M1 = height, M2 = weight, M3 = waist.
v3.2 instrument codes: M4a/M5a/M6a = SBP readings, M4b/M5b/M6b = DBP readings, M7 = BP meds, M11 = height, M12 = weight, M14 = waist, M15 = hip, B5 = fasting glucose, B6 = DM meds, B8 = total cholesterol, B9 = chol meds, B16 = triglycerides, B17 = HDL cholesterol, C1 = sex, C3 = age.
The function includes aliases for both versions so datasets from either instrument version are detected automatically.
A named list of detected column names (or NULL for missing).
Creates a realistic simulated dataset matching WHO STEPS survey structure. Includes sampling design variables, demographics, and measures from all three steps (behavioural, physical, biochemical).
generate_test_data(n = 3000, seed = 42)generate_test_data(n = 3000, seed = 42)
n |
Number of observations (default 3000). |
seed |
Random seed for reproducibility (default 42). |
Simulation parameters are realistic for low-middle income settings:
Tobacco prevalence: 32% males, 8% females
Alcohol current use: 55% males, 28% females
Heavy episodic drinking: 35% of drinkers
Physical activity: MET-minutes/week, mean 1800, SD 1200
Diet: Fruit/veg days and servings per day (0-7, 1-5)
BP increases with age; medication prevalence 12%
Glucose: mean 5.2 mmol/L, increases with age
Total cholesterol: mean 4.8 mmol/L
Use this function for:
Testing the STEPS pipeline
Developing reports before real data arrives
Training analysts on the analysis system
A data frame with n rows and the following columns:
stratum: Strata identifier (S1-S5)
psu: Primary sampling unit (PSU1-PSU40)
wt_final: Final analysis weight
sex: Sex (1=Male, 2=Female)
age: Age in years (18-69)
Step 1 (behavioural): t1, t2 (tobacco), a1, a5 (alcohol),
met_total (physical activity), d1-d4 (diet)
Step 2 (physical): m1 (height), m2 (weight), m3 (waist),
b1-b6 (blood pressure), b7 (BP medication)
Step 3 (biochemical): c1_mmol (glucose), c5 (DM meds),
c6 (cholesterol), c10 (cholesterol meds)
# Generate smaller dataset for quick testing test_data <- generate_test_data(n = 500, seed = 123) head(test_data)# Generate smaller dataset for quick testing test_data <- generate_test_data(n = 500, seed = 123) head(test_data)
Get table registry entries by section
get_registry_by_section(section = NULL)get_registry_by_section(section = NULL)
section |
Section name (e.g., "Tobacco Use", "Blood Pressure"). If NULL, returns all entries. |
A filtered list of registry entries.
Get table registry entries by step
get_registry_by_step(step)get_registry_by_step(step)
step |
STEPS step number (1, 2, or 3). |
A filtered list of registry entries.
Reads a raw STEPS data file (CSV, Excel, Stata, or SPSS) and standardises column names to lowercase with underscores.
import_steps_data(path)import_steps_data(path)
path |
Character. Path to the data file. |
A data frame with cleaned column names.
## Not run: raw <- import_steps_data("data/raw/steps_data.csv") ## End(Not run)## Not run: raw <- import_steps_data("data/raw/steps_data.csv") ## End(Not run)
List all available sections in the registry
list_registry_sections()list_registry_sections()
Character vector of unique section names.
Creates a tile heatmap showing missingness percentage by variable, grouped by STEPS domain.
plot_completeness(dq)plot_completeness(dq)
dq |
A |
A ggplot object.
Creates a bar chart of terminal-digit frequencies with the expected uniform line at 10 %.
plot_digit_preference(dq, measure)plot_digit_preference(dq, measure)
dq |
A |
measure |
Character: one of "SBP", "DBP", "Height", "Weight", "Waist". |
A ggplot object.
Creates a histogram of sampling weights with summary statistics.
plot_weights(dq, step = "weight_step1")plot_weights(dq, step = "weight_step1")
dq |
A |
step |
Character: which weight to plot ("weight_step1", "weight_step2", or "weight_step3"). Defaults to "weight_step1". |
A ggplot object.
Reads a filled-in column mapping template (Excel or CSV) and returns a
named list suitable for passing to clean_steps_data(). The mapping file
should have at least two columns: one with the standard variable name
(column A) and one with the user's column name (column C in the template,
or the third column).
read_column_mapping(path, data = NULL)read_column_mapping(path, data = NULL)
path |
Path to the filled mapping file (.xlsx or .csv). |
data |
Optional data frame. If provided, the function validates that every mapped column actually exists in the data. |
This function is the manual alternative to detect_steps_columns().
Use it when your dataset has non-standard variable names that
auto-detection cannot resolve.
A blank template can be obtained from
system.file("templates", "column_mapping_template.xlsx",
package = "stepssurvey")
or downloaded from the Shiny app.
The function ignores domain-header rows (rows where column A is all-caps with no entry in column C) and skips any row where the user's column name is blank.
A named list where names are standard variable identifiers
(e.g. "age", "sbp1") and values are the corresponding
column names in the user's dataset.
Unmapped variables are set to NULL.
## Not run: cols <- read_column_mapping("my_mapping.xlsx") raw <- import_steps_data("survey.dta") clean <- clean_steps_data(raw, cols) ## End(Not run)## Not run: cols <- read_column_mapping("my_mapping.xlsx") raw <- import_steps_data("survey.dta") clean <- clean_steps_data(raw, cols) ## End(Not run)
Generates a comprehensive Word document with executive summary, indicator-by-indicator analysis, and recommendations for public health action.
render_country_report(config, output_dir = tempdir())render_country_report(config, output_dir = tempdir())
config |
A list from |
output_dir |
Directory for output reports (default |
Sections include:
Executive summary with key findings
Tobacco use
Physical activity
Overweight and obesity
Blood pressure
Blood glucose and cholesterol
Recommendations for public health action
Methodology
Requires pre-computed indicators, tables, and plots in data/processed/.
Path to generated Word document (invisibly). Prints message with output location.
Generates a Word document with detailed age-stratified prevalence tables for all available indicators, organized by STEPS step.
render_data_book(config, output_dir = tempdir())render_data_book(config, output_dir = tempdir())
config |
A list from |
output_dir |
Directory for output reports (default |
Sections correspond to STEPS steps:
Step 1: Behavioural Risk Factors (tobacco, alcohol, diet, physical activity)
Step 2: Physical Measurements (overweight/obesity, blood pressure)
Step 3: Biochemical (glucose, cholesterol)
Requires pre-computed tables and plots in data/processed/.
Path to generated Word document (invisibly). Prints message with output location.
Generates a Word document with an overview of key NCD risk factor prevalence, including summary table and sex-stratified charts.
render_fact_sheet(config, output_dir = tempdir(), format = c("html", "word"))render_fact_sheet(config, output_dir = tempdir(), format = c("html", "word"))
config |
A list from |
output_dir |
Directory for output reports (default |
format |
Output format: |
The fact sheet template uses pre-computed indicators, key_indicators, and plots (via .rds files in data/processed/). Requires rmarkdown, flextable, ggplot2, glue, patchwork packages.
Path to generated output file (invisibly). Prints message with output location.
Starts the interactive STEPS survey analysis app in the user's browser. The app provides a guided workflow: upload data, clean, set survey design, compute indicators, visualise results, and generate Word reports.
run_app(...)run_app(...)
... |
Additional arguments passed to |
A Shiny app object (invisibly). Called for its side effect of launching the application.
## Not run: run_app() ## End(Not run)## Not run: run_app() ## End(Not run)
Imports raw data, cleans it, sets up the survey design, computes all indicators, generates publication-ready tables and plots, and optionally renders Word reports.
run_steps_pipeline( data_path, country_name = "Country Name", survey_year = 2024, age_min = 18, age_max = 69, output_dir = tempdir(), render_reports = TRUE, mapping_file = NULL )run_steps_pipeline( data_path, country_name = "Country Name", survey_year = 2024, age_min = 18, age_max = 69, output_dir = tempdir(), render_reports = TRUE, mapping_file = NULL )
data_path |
Path to raw STEPS data file (CSV, Excel, Stata, or SPSS). |
country_name |
Country name for reports (default "Country Name"). |
survey_year |
Survey year (default 2024). |
age_min |
Minimum age in years (default 18). |
age_max |
Maximum age in years (default 69). |
output_dir |
Directory for all outputs (default |
render_reports |
Logical; render Word documents? (default TRUE). |
mapping_file |
Optional path to a filled column mapping template
(Excel or CSV). If provided, uses |
This is the main entry point for end-to-end STEPS analysis.
A list with elements:
Original imported data frame
Cleaned and recoded data
Detected column mapping from detect_steps_columns()
survey::svydesign object
List of all computed indicator results by domain
Summary tibble of headline estimates
List of flextable::flextable objects
List of ggplot2::ggplot objects
Configuration list from steps_config()
## Not run: # Auto-detect columns result <- run_steps_pipeline("data/raw/steps_data.csv", country_name = "Senegal", survey_year = 2023) result$key_indicators result$plots$overview # Use a custom column mapping result <- run_steps_pipeline("data/raw/steps_data.csv", country_name = "Senegal", survey_year = 2023, mapping_file = "my_mapping.xlsx") ## End(Not run)## Not run: # Auto-detect columns result <- run_steps_pipeline("data/raw/steps_data.csv", country_name = "Senegal", survey_year = 2023) result$key_indicators result$plots$overview # Use a custom column mapping result <- run_steps_pipeline("data/raw/steps_data.csv", country_name = "Senegal", survey_year = 2023, mapping_file = "my_mapping.xlsx") ## End(Not run)
Exports all plots in a list to PNG files in the specified directory.
save_steps_plots(plots, output_dir = tempdir())save_steps_plots(plots, output_dir = tempdir())
plots |
A named list of ggplot2 objects (from |
output_dir |
Output directory path (default |
Files are named:
01_overview_indicators.png (12x8 in)
02_by_sex_dashboard.png (12x8 in)
03_bp_by_age.png (10x6 in)
04_obesity_by_age.png (10x6 in)
All saved at 150 dpi with white background.
NULL (invisibly). Prints messages about saved files.
Creates up to three survey design objects — one per WHO STEPS Step —
each using the appropriate step-specific weight column
(wt_step1, wt_step2, wt_step3).
setup_survey_design(data)setup_survey_design(data)
data |
A data frame (typically from |
The returned object is a list of class "steps_designs" with elements
$step1, $step2, $step3. For backward compatibility it can also
be used directly as a single design (it delegates to $step1).
The function handles five design cases per step:
Full complex design: weights + strata + clusters
Weights + clusters, no strata
Weights + strata, no clusters
Weights only
Unweighted (simple random sampling)
Weights are used as-is without trimming, consistent with the WHO official STEPS analysis scripts.
A list of class "steps_designs" with three
survey::svydesign objects (step1, step2, step3).
A named list of colours used in WHO STEPS reports and visualisations.
steps_colors()steps_colors()
A named list of hex colour codes.
steps_colors()$bluesteps_colors()$blue
Builds a configuration list that specifies data paths, design variables, and report parameters for the STEPS pipeline.
steps_config( data_path, country_name = "Country Name", survey_year = 2024, age_min = 18, age_max = 69, weight_var = "wt_final", strata_var = "stratum", cluster_var = "psu", bp_sbp_threshold = 140, bp_dbp_threshold = 90, bmi_overweight = 25, bmi_obese = 30, glucose_threshold = 7, glucose_impaired_threshold = 6.1, chol_threshold = 5 )steps_config( data_path, country_name = "Country Name", survey_year = 2024, age_min = 18, age_max = 69, weight_var = "wt_final", strata_var = "stratum", cluster_var = "psu", bp_sbp_threshold = 140, bp_dbp_threshold = 90, bmi_overweight = 25, bmi_obese = 30, glucose_threshold = 7, glucose_impaired_threshold = 6.1, chol_threshold = 5 )
data_path |
Path to raw STEPS data file (CSV or Excel). |
country_name |
Country name for reports (default "Country Name"). |
survey_year |
Survey year (default 2024). |
age_min |
Minimum age (default 18). |
age_max |
Maximum age (default 69). |
weight_var |
Weight variable name (default "wt_final", set NULL if none). |
strata_var |
Strata variable name (default "stratum", set NULL if none). |
cluster_var |
Cluster variable name (default "psu", set NULL if none). |
bp_sbp_threshold |
SBP threshold for raised BP (default 140). |
bp_dbp_threshold |
DBP threshold for raised BP (default 90). |
bmi_overweight |
BMI threshold for overweight (default 25.0). |
bmi_obese |
BMI threshold for obesity (default 30.0). |
glucose_threshold |
Fasting glucose threshold in mmol/L (default 7.0). |
glucose_impaired_threshold |
Impaired fasting glucose threshold in mmol/L (default 6.1). |
chol_threshold |
Total cholesterol threshold in mmol/L (default 5.0). |
A list with elements:
data_path: Input file path
country_name: Country name
survey_year: Survey year
age_min, age_max: Age range
weight_var, strata_var, cluster_var: Design variable names
Threshold parameters for BP, BMI, glucose, cholesterol
## Not run: cfg <- steps_config("data/steps_2023.csv", "Senegal", 2023) cfg <- steps_config("data/steps.csv", "Mongolia", 2019, bp_sbp_threshold = 130, bp_dbp_threshold = 80) ## End(Not run)## Not run: cfg <- steps_config("data/steps_2023.csv", "Senegal", 2023) cfg <- steps_config("data/steps.csv", "Mongolia", 2019, bp_sbp_threshold = 130, bp_dbp_threshold = 80) ## End(Not run)
Produces a comprehensive data quality report covering digit preference, completeness, plausibility, and sampling weight diagnostics.
steps_data_quality(raw, cleaned, cols)steps_data_quality(raw, cleaned, cols)
raw |
The raw (pre-cleaning) data frame, typically from
|
cleaned |
The cleaned data frame from |
cols |
Column mapping list from |
Digit preference / heaping is assessed using the Whipple-style heaping index: the ratio of observed frequency at a digit (0 or 5) to the expected frequency under uniform distribution. An index of 1.0 = no preference; >1.5 = moderate heaping; >2.0 = severe.
Completeness reports missing values for key STEPS variables grouped by Step (behavioural, physical, biochemical).
Plausibility counts values outside WHO-recommended ranges (e.g. height 100–250 cm, weight 20–300 kg, SBP 60–300 mmHg).
Weight diagnostics summarise the distribution of sampling weights and flag potential issues (high CV, zero/NA weights).
A list of class "steps_quality" with elements:
Terminal-digit tables and heaping indices for physical measurements (SBP, DBP, height, weight, waist).
Per-variable missingness counts and percentages, grouped by STEPS domain.
Summary of values outside plausible ranges.
Sampling weight distribution statistics.
Defines all standard tables from the WHO STEPS Epi Info report template. Each entry specifies the table metadata; generic compute and formatting functions use this registry to produce the full data book automatically.
steps_table_registry()steps_table_registry()
A list of table specification lists.
Single binary indicator: % (95% CI) by age × sex. Most common type. Example: "Current smokers among all respondents."
Continuous variable: mean (95% CI) by age × sex. Example: "Mean BMI (kg/m²)."
Multi-level factor: % per level (95% CI) by age × sex. Example: "BMI classifications (Underweight / Normal / Overweight / Obese)."
Diagnosis → treatment → control chain: multiple proportions with nested denominators. Example: "Raised BP diagnosis, treatment and control."
Summary of combined risk factors: 0, 1-2, 3-5 risk factors.
Unique short identifier (e.g., "T_smoking_current").
Data book section (e.g., "Tobacco Use", "Blood Pressure").
STEPS step number (1, 2, or 3).
Table title as shown in the data book.
One-line description from the WHO template.
One of: "proportion", "mean", "category", "cascade", "combined".
Column name(s) in the cleaned data frame to analyse. For proportion: single logical variable. For mean: single numeric variable. For category: single factor variable. For cascade: named list of logical variables.
NULL (= all respondents) or column name for subsetting (e.g., "current_alcohol" to restrict to drinkers).
For category type: named character vector of level labels.
Epi Info program name(s) for reference.
Display unit (e.g., "%", "mmHg", "cm", "kg/m²", "mmol/L").
STEPS instrument question codes used.
Logical. TRUE = 3 panels (Men/Women/Both); FALSE = 2 panels (Men/Women only, e.g., height/weight means). Default TRUE.
Calculates weighted means with 95% confidence intervals for a continuous variable, optionally stratified by a grouping variable.
svymn(formula, design, by = NULL, na.rm = TRUE)svymn(formula, design, by = NULL, na.rm = TRUE)
formula |
A formula (e.g., |
design |
A survey design object (from |
by |
Optional formula for stratification (e.g., |
na.rm |
Logical; if TRUE (default), omit NA values. |
A data frame with columns:
estimate: estimated mean
lower: 95% CI lower bound
upper: 95% CI upper bound
se: standard error
If by is specified: grouping column(s) prepended
Calculates weighted proportions (as percentages) with 95% confidence intervals for a yes/no variable, optionally stratified by a grouping variable.
svyprop(formula, design, by = NULL, na.rm = TRUE)svyprop(formula, design, by = NULL, na.rm = TRUE)
formula |
A formula (e.g., |
design |
A survey design object (from |
by |
Optional formula for stratification (e.g., |
na.rm |
Logical; if TRUE (default), omit NA values. |
A data frame with columns:
estimate: estimated proportion (%)
lower: 95% CI lower bound (%)
upper: 95% CI upper bound (%)
se: standard error (%)
If by is specified: grouping column(s) prepended
Takes computed results from compute_table() or compute_all_tables()
and produces formatted flextable objects in the standard WHO STEPS
3-panel format (Men / Women / Both Sexes).
A clean, minimal ggplot2 theme styled with WHO STEPS colours.
theme_steps(base_size = 11)theme_steps(base_size = 11)
base_size |
Base font size (default 11). |
A ggplot2::theme object.
library(ggplot2) ggplot(mtcars, aes(wt, mpg)) + geom_point() + theme_steps()library(ggplot2) ggplot(mtcars, aes(wt, mpg)) + geom_point() + theme_steps()