Package 'stepssurvey' reference manual

Title:	Analyse WHO STEPS Survey Data
Description:	Provides a complete analysis pipeline for the WHO STEPwise Approach to NCD Risk Factor Surveillance (STEPS) as described in Riley et al. (2016) <doi:10.2105/AJPH.2015.302962>. Imports raw survey data ('CSV', 'Excel', 'Stata', 'SPSS'), applies WHO-standard cleaning and recoding, sets up complex survey designs, computes all standard NCD indicators (tobacco, alcohol, diet, physical activity, anthropometry, blood pressure, biochemical), and generates publication-ready tables, visualisations, and 'Word'/'HTML' reports (fact sheet, data book, country report).
Authors:	Abhijit Pakhare [aut, cre], Ankur Joshi [aut], Lena Charlette [aut], WHO STEPS R Pipeline Contributors [ctb]
Maintainer:	Abhijit Pakhare <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.0
Built:	2026-05-11 22:14:20 UTC
Source:	https://github.com/cran/stepssurvey

stepssurvey: Analyse WHO STEPS Survey Data

Description

A complete analysis pipeline for the WHO STEPwise Approach to NCD Risk Factor Surveillance (STEPS).

Author(s)

Maintainer: Abhijit Pakhare [email protected]

Authors:

Ankur Joshi
Lena Charlette

Other contributors:

WHO STEPS R Pipeline Contributors [contributor]

Build all tables from computed results

Description

Build all tables from computed results

Usage

build_all_tables(results)
build_all_tables(results)

Arguments

results

A named list of results from compute_all_tables().

Value

A named list of flextable objects. NULL entries excluded.

Build forest plot of key indicators with 95% CIs

Description

Creates a horizontal point-and-CI plot (forest plot style) for all key indicators, grouped by domain.

Usage

build_forest_plot(key_indicators, country_name, survey_year)
build_forest_plot(key_indicators, country_name, survey_year)

Arguments

key_indicators

A data frame with domain, indicator, estimate, lower, upper.

country_name

Country name for title.

survey_year

Survey year for title.

Value

A ggplot2 object.

Build radar / spider chart of NCD risk factor profile

Description

Creates a radar-style chart showing prevalence of key risk factors on a polar coordinate system for quick visual comparison.

Usage

build_radar_plot(key_indicators, country_name, survey_year)
build_radar_plot(key_indicators, country_name, survey_year)

Arguments

key_indicators

A data frame with domain, indicator, estimate.

country_name

Country name for title.

survey_year

Survey year for title.

Value

A ggplot2 object.

Build publication-ready STEPS visualizations

Description

Generates a list of ggplot2 plots showing key NCD risk factor prevalence with 95% confidence intervals, stratified by sex and age group.

Usage

build_steps_plots(indicators, key_indicators, country_name, survey_year)
build_steps_plots(indicators, key_indicators, country_name, survey_year)

Arguments

indicators

A list of indicator results from compute_all_indicators().

key_indicators

A data frame with key indicators (domain, indicator, estimate, lower, upper).

country_name

Country name for plot titles.

survey_year

Survey year for plot titles.

Details

All plots use the WHO STEPS colour scheme and professional styling. Error bars represent 95% confidence intervals. Prevalence values are displayed on bars/points with light background text.

Value

A named list of ggplot2 objects:

overview: Horizontal bar chart of key indicators
tobacco_by_sex: Sex-stratified tobacco use
bp_by_sex: Sex-stratified blood pressure
obesity_by_sex: Sex-stratified overweight/obesity
glucose_by_sex: Sex-stratified blood glucose
bp_by_age: Age-stratified blood pressure with ribbon CI
obesity_by_age: Age-stratified overweight/obesity with ribbon CI
sex_dashboard: Combined 2x2 dashboard of sex-stratified charts (if >=2 sex plots available) NULL entries are preserved in the list.

Examples


  test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  all_ind <- compute_all_indicators(design)
  plots <- build_steps_plots(all_ind$results, all_ind$key_indicators, "Test", 2023)
  names(plots)


test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  all_ind <- compute_all_indicators(design)
  plots <- build_steps_plots(all_ind$results, all_ind$key_indicators, "Test", 2023)
  names(plots)

Build survey-weighted tables for STEPS indicators

Description

Generates formatted flextable objects for all available STEPS indicators, with rows for age groups and columns for both sexes combined, males, and females. Tables include 95% confidence intervals.

Usage

build_steps_tables(indicators)
build_steps_tables(indicators)

Arguments

indicators

A list of indicator results from compute_all_indicators(), containing elements like tobacco, alcohol, diet_pa, anthropometry, blood_pressure, biochemical, etc. Each indicator list should contain ⁠*_total⁠, ⁠*_by_sex⁠, and ⁠*_by_age⁠ elements.

Details

Each table has age groups as rows and prevalence (with 95% CI) as a column. The last row shows the total (age-standardised) estimate. Column header styling uses WHO STEPS branding (dark blue background).

Value

A named list of flextable objects, one per indicator. Names correspond to indicators (e.g., current_tobacco, raised_bp). NULL entries are excluded. Prints count of tables generated.

Examples


  test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  all_ind <- compute_all_indicators(design)
  tables <- build_steps_tables(all_ind$results)
  names(tables)


test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  all_ind <- compute_all_indicators(design)
  tables <- build_steps_tables(all_ind$results)
  names(tables)

Build a formatted table from a computed result

Description

Dispatches to the appropriate formatting method based on table type.

Usage

build_table(result)
build_table(result)

Arguments

result

A result list from compute_table().

Value

A flextable object, or NULL if the table is not available.

Clean and recode WHO STEPS data

Description

Processes raw STEPS survey data: renames columns, coerces types, derives standard indicators, handles missing values, and applies plausibility checks.

Usage

clean_steps_data(
  data,
  cols,
  age_min = 18,
  age_max = 69,
  bp_sbp_threshold = 140,
  bp_dbp_threshold = 90,
  bmi_overweight = 25,
  bmi_obese = 30,
  glucose_threshold = 7,
  glucose_impaired_threshold = 6.1,
  chol_threshold = 5
)
clean_steps_data(
  data,
  cols,
  age_min = 18,
  age_max = 69,
  bp_sbp_threshold = 140,
  bp_dbp_threshold = 90,
  bmi_overweight = 25,
  bmi_obese = 30,
  glucose_threshold = 7,
  glucose_impaired_threshold = 6.1,
  chol_threshold = 5
)

Arguments

data

A data frame (typically from import_steps_data()).

cols

A named list of column names, as returned by detect_steps_columns().

age_min

Minimum age for inclusion (default 18).

age_max

Maximum age for inclusion (default 69).

bp_sbp_threshold

SBP threshold for raised BP (default 140; Mongolia uses 130).

bp_dbp_threshold

DBP threshold for raised BP (default 90; Mongolia uses 80).

bmi_overweight

BMI threshold for overweight (default 25.0).

bmi_obese

BMI threshold for obesity (default 30.0).

glucose_threshold

Fasting glucose threshold for raised glucose / diabetes in mmol/L (default 7.0).

glucose_impaired_threshold

Fasting glucose threshold for impaired fasting glucose in mmol/L (default 6.1).

chol_threshold

Total cholesterol threshold for raised cholesterol in mmol/L (default 5.0).

Details

The function performs the following transformations:

Renames columns to standard names (age, sex, wt_final, etc.)
Converts numeric strings to appropriate types
Restricts age to [age_min, age_max]
Creates WHO standard age groups (18-24, 25-34, etc.)
Harmonises sex coding to Male/Female
Derives body mass index (BMI) and categories
Averages blood pressure readings (last 2 of 3)
Recodes yes/no variables to logical
Creates derived risk indicators (raised BP, diabetes, etc.)
Applies plausibility checks to measurements
Drops records with missing age or sex

Value

A data frame with standardised and derived variables, ready for survey design setup.

Compute Alcohol Use Indicators

Description

Calculates prevalence of alcohol use from a survey design object. Computes proportions of current alcohol use and heavy episodic drinking, stratified by sex and age group where available.

Usage

compute_alcohol_indicators(design)
compute_alcohol_indicators(design)

Arguments

design

A survey design object from setup_survey_design().

Value

A named list of survey estimates. Each element contains proportion estimates (as tibble with columns: estimate, lower, upper, etc.) for:

current_alcohol_total: current alcohol use, overall
current_alcohol_by_sex: current alcohol use, by sex
current_alcohol_by_age: current alcohol use, by age group
heavy_episodic_total: heavy episodic drinking, overall
heavy_episodic_by_sex: heavy episodic drinking, by sex
heavy_episodic_by_age: heavy episodic drinking, by age group (if the corresponding variables exist in design)

Examples


  test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  alcohol_results <- compute_alcohol_indicators(design)

test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  alcohol_results <- compute_alcohol_indicators(design)

Compute All STEPS Indicators

Description

Runs all indicator modules (tobacco, alcohol, diet & physical activity, anthropometry, blood pressure, and biochemical), using the appropriate step-specific survey design for each domain per WHO STEPS methodology:

Step 1 (behavioural): tobacco, alcohol, diet & physical activity
Step 2 (physical): anthropometry, blood pressure
Step 3 (biochemical): biochemical measures

Usage

compute_all_indicators(design)
compute_all_indicators(design)

Arguments

design

A steps_designs list from setup_survey_design() (with elements ⁠$step1⁠, ⁠$step2⁠, ⁠$step3⁠), or a single survey::svydesign object for backward compatibility.

Value

A list with two elements:

results: a named list containing indicator results grouped by domain (tobacco, alcohol, diet_pa, anthropometry, blood_pressure, biochemical)
key_indicators: a tibble with columns domain, indicator, estimate, lower, and upper, summarising headline estimates across all domains

Examples


  test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  all_indicators <- compute_all_indicators(design)
  names(all_indicators$results)

test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  all_indicators <- compute_all_indicators(design)
  names(all_indicators$results)

Compute all tables from the registry

Description

Iterates through the full steps_table_registry() and computes every table that has available data. Returns a named list of results.

Usage

compute_all_tables(designs, data = NULL)
compute_all_tables(designs, data = NULL)

Arguments

designs

A list of survey designs, with elements step1, step2, step3 (as returned by setup_survey_design()).

data

The cleaned data frame.

Value

A named list of table results (from compute_table()). Only entries with available == TRUE are included.

Compute Anthropometry Indicators

Description

Calculates prevalence of overweight, obesity, and central obesity, plus mean BMI and waist circumference, from a survey design object.

Usage

compute_anthropometry_indicators(design)
compute_anthropometry_indicators(design)

Arguments

design

A survey design object from setup_survey_design().

Value

A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:

overweight_obese_total: overweight or obese (BMI >=25), overall
overweight_obese_by_sex: overweight or obese, by sex
overweight_obese_by_age: overweight or obese, by age group
obese_total: obese (BMI >=30), overall
obese_by_sex: obese, by sex
obese_by_age: obese, by age group
central_obesity_total: central obesity, overall
central_obesity_by_sex: central obesity, by sex
central_obesity_by_age: central obesity, by age group
bmi_mean_total: mean BMI, overall
bmi_mean_by_sex: mean BMI, by sex
waist_cm_mean_total: mean waist circumference, overall
waist_cm_mean_by_sex: mean waist circumference, by sex (if the corresponding variables exist in design)

Examples


  test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  anthropometry_results <- compute_anthropometry_indicators(design)

test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  anthropometry_results <- compute_anthropometry_indicators(design)

Compute Biochemical Indicators

Description

Calculates prevalence of raised glucose, diabetes, impaired glucose tolerance, and raised cholesterol, plus mean fasting glucose and total cholesterol from a survey design object.

Usage

compute_biochemical_indicators(design)
compute_biochemical_indicators(design)

Arguments

design

A survey design object from setup_survey_design().

Value

A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:

raised_glucose_total: raised fasting glucose, overall
raised_glucose_by_sex: raised fasting glucose, by sex
raised_glucose_by_age: raised fasting glucose, by age group
diabetes_total: diabetes, overall
diabetes_by_sex: diabetes, by sex
diabetes_by_age: diabetes, by age group
impaired_glucose_total: impaired fasting glucose, overall
impaired_glucose_by_sex: impaired fasting glucose, by sex
impaired_glucose_by_age: impaired fasting glucose, by age group
raised_chol_total: raised total cholesterol, overall
raised_chol_by_sex: raised total cholesterol, by sex
raised_chol_by_age: raised total cholesterol, by age group
fasting_glucose_mean_total: mean fasting glucose, overall
fasting_glucose_mean_by_sex: mean fasting glucose, by sex
total_chol_mean_total: mean total cholesterol, overall
total_chol_mean_by_sex: mean total cholesterol, by sex (if the corresponding variables exist in design)

Examples


  test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  biochemical_results <- compute_biochemical_indicators(design)

test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  biochemical_results <- compute_biochemical_indicators(design)

Compute Blood Pressure Indicators

Description

Calculates prevalence of raised blood pressure and mean systolic and diastolic blood pressure from a survey design object.

Usage

compute_bp_indicators(design)
compute_bp_indicators(design)

Arguments

design

A survey design object from setup_survey_design().

Value

A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:

raised_bp_total: raised blood pressure, overall
raised_bp_by_sex: raised blood pressure, by sex
raised_bp_by_age: raised blood pressure, by age group
mean_sbp_mean_total: mean systolic BP, overall
mean_sbp_mean_by_sex: mean systolic BP, by sex
mean_sbp_mean_by_age: mean systolic BP, by age group
mean_dbp_mean_total: mean diastolic BP, overall
mean_dbp_mean_by_sex: mean diastolic BP, by sex
mean_dbp_mean_by_age: mean diastolic BP, by age group (if the corresponding variables exist in design)

Examples


  test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  bp_results <- compute_bp_indicators(design)

test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  bp_results <- compute_bp_indicators(design)

Compute Diet and Physical Activity Indicators

Description

Calculates prevalence of insufficient physical activity and low fruit & vegetable intake, plus mean metabolic equivalent (MET) values, from a survey design object.

Usage

compute_diet_pa_indicators(design)
compute_diet_pa_indicators(design)

Arguments

design

A survey design object from setup_survey_design().

Value

A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:

insufficient_pa_total: insufficient physical activity, overall
insufficient_pa_by_sex: insufficient physical activity, by sex
insufficient_pa_by_age: insufficient physical activity, by age group
low_fruit_veg_total: low fruit & vegetable intake, overall
low_fruit_veg_by_sex: low fruit & vegetable intake, by sex
low_fruit_veg_by_age: low fruit & vegetable intake, by age group
met_mean_total: mean MET (if available)
met_mean_by_sex: mean MET by sex (if available) (if the corresponding variables exist in design)

Examples


  test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  diet_pa_results <- compute_diet_pa_indicators(design)

test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  diet_pa_results <- compute_diet_pa_indicators(design)

Generic Compute Engine for WHO STEPS Tables

Description

Takes a table specification from steps_table_registry() and a survey design object, and produces the survey-weighted estimates needed to fill the standard WHO STEPS data book table.

Compute a single table from a registry entry

Description

This is the main workhorse: given one registry entry and a survey design, it dispatches to the appropriate method based on entry$type and returns a standardised result list.

Usage

compute_table(entry, design, data = NULL)
compute_table(entry, design, data = NULL)

Arguments

entry

A single list element from steps_table_registry().

design

A survey design object (from survey::svydesign()).

data

The cleaned data frame (used for variable availability checks).

Value

A list with:

id: Table identifier.
title: Table title.
type: Table type.
available: Logical: TRUE if the required variable(s) exist.
results: A list of data frames: For proportion: total, by_sex, by_age (each with estimate, lower, upper). For mean: total, by_sex, by_age (each with estimate, lower, upper). For category: total, by_sex, by_age (each with level, estimate, lower, upper). For cascade: named list of proportion results.

Compute Tobacco Use Indicators

Description

Calculates prevalence of tobacco use from a survey design object. Computes proportions of current and daily tobacco use, stratified by sex and age group where available.

Usage

compute_tobacco_indicators(design)
compute_tobacco_indicators(design)

Arguments

design

A survey design object from setup_survey_design().

Details

When both smoking and smokeless tobacco variables are present, current_tobacco_any (either smoking or smokeless) is preferred as the headline tobacco indicator. The function also reports current_smoker and current_smokeless separately if available.

Value

A named list of survey estimates. Each element contains proportion estimates (as tibble with columns: estimate, lower, upper, etc.) for:

current_tobacco_any_total/by_sex/by_age: any current tobacco use (smoking or smokeless) – preferred headline variable
current_tobacco_total/by_sex/by_age: current tobacco smoking
current_smoker_total/by_sex/by_age: current smoker
current_smokeless_total/by_sex/by_age: current smokeless tobacco
daily_tobacco_total/by_sex/by_age: daily tobacco use (only elements for variables present in design are returned)

Examples


  test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  tobacco_results <- compute_tobacco_indicators(design)

test_data <- generate_test_data(n = 500, seed = 42)
  cols <- detect_steps_columns(test_data)
  clean <- clean_steps_data(test_data, cols)
  design <- setup_survey_design(clean)
  tobacco_results <- compute_tobacco_indicators(design)

Detect a STEPS column by alias

Description

Tries to find a column in the data matching one of several candidate names (case-insensitive).

Usage

detect_col(data, candidates, label = NULL)
detect_col(data, candidates, label = NULL)

Arguments

data

A data frame.

candidates

Character vector of possible column names.

label

Optional label for progress messages.

Value

The matched column name (character) or NULL.

Auto-detect all standard STEPS columns

Description

Scans a data frame for standard WHO STEPS variable names across versions 3.1 and 3.2. Aliases are listed in priority order: the first match wins, so put the most specific / unambiguous name first.

Usage

detect_steps_columns(data)
detect_steps_columns(data)

Arguments

data

A data frame (typically from import_steps_data()).

Details

WHO STEPS reorganised variable codes between v3.1 and v3.2:

v3.1 / Epi Info codes (still common in many country datasets): B1-B6 = blood-pressure readings, B7 = BP meds, C1 = fasting glucose, C5 = DM meds, C6 = total cholesterol, C10 = chol meds, M1 = height, M2 = weight, M3 = waist.

v3.2 instrument codes: M4a/M5a/M6a = SBP readings, M4b/M5b/M6b = DBP readings, M7 = BP meds, M11 = height, M12 = weight, M14 = waist, M15 = hip, B5 = fasting glucose, B6 = DM meds, B8 = total cholesterol, B9 = chol meds, B16 = triglycerides, B17 = HDL cholesterol, C1 = sex, C3 = age.

The function includes aliases for both versions so datasets from either instrument version are detected automatically.

Value

A named list of detected column names (or NULL for missing).

Generate simulated STEPS test data

Description

Creates a realistic simulated dataset matching WHO STEPS survey structure. Includes sampling design variables, demographics, and measures from all three steps (behavioural, physical, biochemical).

Usage

generate_test_data(n = 3000, seed = 42)
generate_test_data(n = 3000, seed = 42)

Arguments

n

Number of observations (default 3000).

seed

Random seed for reproducibility (default 42).

Details

Simulation parameters are realistic for low-middle income settings:

Tobacco prevalence: 32% males, 8% females
Alcohol current use: 55% males, 28% females
Heavy episodic drinking: 35% of drinkers
Physical activity: MET-minutes/week, mean 1800, SD 1200
Diet: Fruit/veg days and servings per day (0-7, 1-5)
BP increases with age; medication prevalence 12%
Glucose: mean 5.2 mmol/L, increases with age
Total cholesterol: mean 4.8 mmol/L

Use this function for:

Testing the STEPS pipeline
Developing reports before real data arrives
Training analysts on the analysis system

Value

A data frame with n rows and the following columns:

stratum: Strata identifier (S1-S5)
psu: Primary sampling unit (PSU1-PSU40)
wt_final: Final analysis weight
sex: Sex (1=Male, 2=Female)
age: Age in years (18-69)
Step 1 (behavioural): t1, t2 (tobacco), a1, a5 (alcohol), met_total (physical activity), d1-d4 (diet)
Step 2 (physical): m1 (height), m2 (weight), m3 (waist), b1-b6 (blood pressure), b7 (BP medication)
Step 3 (biochemical): c1_mmol (glucose), c5 (DM meds), c6 (cholesterol), c10 (cholesterol meds)

Examples


  # Generate smaller dataset for quick testing
  test_data <- generate_test_data(n = 500, seed = 123)
  head(test_data)


# Generate smaller dataset for quick testing
  test_data <- generate_test_data(n = 500, seed = 123)
  head(test_data)

Get table registry entries by section

Description

Get table registry entries by section

Usage

get_registry_by_section(section = NULL)
get_registry_by_section(section = NULL)

Arguments

section

Section name (e.g., "Tobacco Use", "Blood Pressure"). If NULL, returns all entries.

Value

A filtered list of registry entries.

Get table registry entries by step

Description

Get table registry entries by step

Usage

get_registry_by_step(step)
get_registry_by_step(step)

Arguments

step

STEPS step number (1, 2, or 3).

Value

A filtered list of registry entries.

Import raw STEPS survey data

Description

Reads a raw STEPS data file (CSV, Excel, Stata, or SPSS) and standardises column names to lowercase with underscores.

Usage

import_steps_data(path)
import_steps_data(path)

Arguments

path

Character. Path to the data file.

Value

A data frame with cleaned column names.

Examples

## Not run: 
raw <- import_steps_data("data/raw/steps_data.csv")

## End(Not run)
## Not run: 
raw <- import_steps_data("data/raw/steps_data.csv")

## End(Not run)

List all available sections in the registry

Description

List all available sections in the registry

Usage

list_registry_sections()
list_registry_sections()

Value

Character vector of unique section names.

Plot completeness heatmap across STEPS domains

Description

Creates a tile heatmap showing missingness percentage by variable, grouped by STEPS domain.

Usage

plot_completeness(dq)
plot_completeness(dq)

Arguments

dq

A steps_quality object from steps_data_quality().

Value

A ggplot object.

Plot digit preference histogram for a physical measurement

Description

Creates a bar chart of terminal-digit frequencies with the expected uniform line at 10 %.

Usage

plot_digit_preference(dq, measure)
plot_digit_preference(dq, measure)

Arguments

dq

A steps_quality object from steps_data_quality().

measure

Character: one of "SBP", "DBP", "Height", "Weight", "Waist".

Value

A ggplot object.

Plot sampling weight distribution

Description

Creates a histogram of sampling weights with summary statistics.

Usage

plot_weights(dq, step = "weight_step1")
plot_weights(dq, step = "weight_step1")

Arguments

dq

A steps_quality object from steps_data_quality().

step

Character: which weight to plot ("weight_step1", "weight_step2", or "weight_step3"). Defaults to "weight_step1".

Value

A ggplot object.

Read a column mapping file

Description

Reads a filled-in column mapping template (Excel or CSV) and returns a named list suitable for passing to clean_steps_data(). The mapping file should have at least two columns: one with the standard variable name (column A) and one with the user's column name (column C in the template, or the third column).

Usage

read_column_mapping(path, data = NULL)
read_column_mapping(path, data = NULL)

Arguments

path

Path to the filled mapping file (.xlsx or .csv).

data

Optional data frame. If provided, the function validates that every mapped column actually exists in the data.

Details

This function is the manual alternative to detect_steps_columns(). Use it when your dataset has non-standard variable names that auto-detection cannot resolve.

A blank template can be obtained from system.file("templates", "column_mapping_template.xlsx", package = "stepssurvey") or downloaded from the Shiny app.

The function ignores domain-header rows (rows where column A is all-caps with no entry in column C) and skips any row where the user's column name is blank.

Value

A named list where names are standard variable identifiers (e.g. "age", "sbp1") and values are the corresponding column names in the user's dataset. Unmapped variables are set to NULL.

Examples

## Not run: 
  cols <- read_column_mapping("my_mapping.xlsx")
  raw  <- import_steps_data("survey.dta")
  clean <- clean_steps_data(raw, cols)

## End(Not run)

## Not run: 
  cols <- read_column_mapping("my_mapping.xlsx")
  raw  <- import_steps_data("survey.dta")
  clean <- clean_steps_data(raw, cols)

## End(Not run)

Render STEPS Country Report

Description

Generates a comprehensive Word document with executive summary, indicator-by-indicator analysis, and recommendations for public health action.

Usage

render_country_report(config, output_dir = tempdir())
render_country_report(config, output_dir = tempdir())

Arguments

config

A list from steps_config() with survey metadata. Expected to have country_name, survey_year, age_min, age_max.

output_dir

Directory for output reports (default tempdir()).

Details

Sections include:

Executive summary with key findings
Tobacco use
Physical activity
Overweight and obesity
Blood pressure
Blood glucose and cholesterol
Recommendations for public health action
Methodology

Requires pre-computed indicators, tables, and plots in data/processed/.

Value

Path to generated Word document (invisibly). Prints message with output location.

Render STEPS Data Book report

Description

Generates a Word document with detailed age-stratified prevalence tables for all available indicators, organized by STEPS step.

Usage

render_data_book(config, output_dir = tempdir())
render_data_book(config, output_dir = tempdir())

Arguments

config

A list from steps_config() with survey metadata. Expected to have country_name, survey_year, age_min, age_max.

output_dir

Directory for output reports (default tempdir()).

Details

Sections correspond to STEPS steps:

Step 1: Behavioural Risk Factors (tobacco, alcohol, diet, physical activity)
Step 2: Physical Measurements (overweight/obesity, blood pressure)
Step 3: Biochemical (glucose, cholesterol)

Requires pre-computed tables and plots in data/processed/.

Value

Path to generated Word document (invisibly). Prints message with output location.

Render STEPS Fact Sheet report

Description

Generates a Word document with an overview of key NCD risk factor prevalence, including summary table and sex-stratified charts.

Usage

render_fact_sheet(config, output_dir = tempdir(), format = c("html", "word"))
render_fact_sheet(config, output_dir = tempdir(), format = c("html", "word"))

Arguments

config

A list from steps_config() with survey metadata and paths. Expected to have country_name, survey_year, age_min, age_max.

output_dir

Directory for output reports (default tempdir()).

format

Output format: "html" for self-contained HTML (default) or "word" for Word (.docx).

Details

The fact sheet template uses pre-computed indicators, key_indicators, and plots (via .rds files in data/processed/). Requires rmarkdown, flextable, ggplot2, glue, patchwork packages.

Value

Path to generated output file (invisibly). Prints message with output location.

Launch the stepssurvey Shiny Application

Description

Starts the interactive STEPS survey analysis app in the user's browser. The app provides a guided workflow: upload data, clean, set survey design, compute indicators, visualise results, and generate Word reports.

Usage

run_app(...)
run_app(...)

Arguments

...

Additional arguments passed to shiny::shinyApp().

Value

A Shiny app object (invisibly). Called for its side effect of launching the application.

Examples

## Not run: 
  run_app()

## End(Not run)
## Not run: 
  run_app()

## End(Not run)

Run the complete STEPS analysis pipeline

Description

Imports raw data, cleans it, sets up the survey design, computes all indicators, generates publication-ready tables and plots, and optionally renders Word reports.

Usage

run_steps_pipeline(
  data_path,
  country_name = "Country Name",
  survey_year = 2024,
  age_min = 18,
  age_max = 69,
  output_dir = tempdir(),
  render_reports = TRUE,
  mapping_file = NULL
)
run_steps_pipeline(
  data_path,
  country_name = "Country Name",
  survey_year = 2024,
  age_min = 18,
  age_max = 69,
  output_dir = tempdir(),
  render_reports = TRUE,
  mapping_file = NULL
)

Arguments

data_path

Path to raw STEPS data file (CSV, Excel, Stata, or SPSS).

country_name

Country name for reports (default "Country Name").

survey_year

Survey year (default 2024).

age_min

Minimum age in years (default 18).

age_max

Maximum age in years (default 69).

output_dir

Directory for all outputs (default tempdir()).

render_reports

Logical; render Word documents? (default TRUE).

mapping_file

Optional path to a filled column mapping template (Excel or CSV). If provided, uses read_column_mapping() instead of auto-detection. See the template at system.file("templates", "column_mapping_template.xlsx", package = "stepssurvey").

Details

This is the main entry point for end-to-end STEPS analysis.

Value

A list with elements:

raw_data: Original imported data frame
clean_data: Cleaned and recoded data
cols: Detected column mapping from detect_steps_columns()
design: survey::svydesign object
indicators: List of all computed indicator results by domain
key_indicators: Summary tibble of headline estimates
tables: List of flextable::flextable objects
plots: List of ggplot2::ggplot objects
config: Configuration list from steps_config()

Examples

## Not run: 
# Auto-detect columns
result <- run_steps_pipeline("data/raw/steps_data.csv",
                             country_name = "Senegal",
                             survey_year = 2023)
result$key_indicators
result$plots$overview

# Use a custom column mapping
result <- run_steps_pipeline("data/raw/steps_data.csv",
                             country_name = "Senegal",
                             survey_year = 2023,
                             mapping_file = "my_mapping.xlsx")

## End(Not run)
## Not run: 
# Auto-detect columns
result <- run_steps_pipeline("data/raw/steps_data.csv",
                             country_name = "Senegal",
                             survey_year = 2023)
result$key_indicators
result$plots$overview

# Use a custom column mapping
result <- run_steps_pipeline("data/raw/steps_data.csv",
                             country_name = "Senegal",
                             survey_year = 2023,
                             mapping_file = "my_mapping.xlsx")

## End(Not run)

Save STEPS plots to PNG files

Description

Exports all plots in a list to PNG files in the specified directory.

Usage

save_steps_plots(plots, output_dir = tempdir())
save_steps_plots(plots, output_dir = tempdir())

Arguments

plots

A named list of ggplot2 objects (from build_steps_plots()).

output_dir

Output directory path (default tempdir()).

Details

Files are named:

⁠01_overview_indicators.png⁠ (12x8 in)
⁠02_by_sex_dashboard.png⁠ (12x8 in)
⁠03_bp_by_age.png⁠ (10x6 in)
⁠04_obesity_by_age.png⁠ (10x6 in)

All saved at 150 dpi with white background.

Value

NULL (invisibly). Prints messages about saved files.

Set up survey designs for STEPS data (one per Step)

Description

Creates up to three survey design objects — one per WHO STEPS Step — each using the appropriate step-specific weight column (wt_step1, wt_step2, wt_step3).

Usage

setup_survey_design(data)
setup_survey_design(data)

Arguments

data

A data frame (typically from clean_steps_data()).

Details

The returned object is a list of class "steps_designs" with elements ⁠$step1⁠, ⁠$step2⁠, ⁠$step3⁠. For backward compatibility it can also be used directly as a single design (it delegates to ⁠$step1⁠).

The function handles five design cases per step:

Full complex design: weights + strata + clusters
Weights + clusters, no strata
Weights + strata, no clusters
Weights only
Unweighted (simple random sampling)

Weights are used as-is without trimming, consistent with the WHO official STEPS analysis scripts.

Value

A list of class "steps_designs" with three survey::svydesign objects (step1, step2, step3).

WHO STEPS colour palette

Description

A named list of colours used in WHO STEPS reports and visualisations.

Usage

steps_colors()
steps_colors()

Value

A named list of hex colour codes.

Examples

steps_colors()$blue
steps_colors()$blue

Create STEPS analysis configuration

Description

Builds a configuration list that specifies data paths, design variables, and report parameters for the STEPS pipeline.

Usage

steps_config(
  data_path,
  country_name = "Country Name",
  survey_year = 2024,
  age_min = 18,
  age_max = 69,
  weight_var = "wt_final",
  strata_var = "stratum",
  cluster_var = "psu",
  bp_sbp_threshold = 140,
  bp_dbp_threshold = 90,
  bmi_overweight = 25,
  bmi_obese = 30,
  glucose_threshold = 7,
  glucose_impaired_threshold = 6.1,
  chol_threshold = 5
)
steps_config(
  data_path,
  country_name = "Country Name",
  survey_year = 2024,
  age_min = 18,
  age_max = 69,
  weight_var = "wt_final",
  strata_var = "stratum",
  cluster_var = "psu",
  bp_sbp_threshold = 140,
  bp_dbp_threshold = 90,
  bmi_overweight = 25,
  bmi_obese = 30,
  glucose_threshold = 7,
  glucose_impaired_threshold = 6.1,
  chol_threshold = 5
)

Arguments

data_path

Path to raw STEPS data file (CSV or Excel).

country_name

Country name for reports (default "Country Name").

survey_year

Survey year (default 2024).

age_min

Minimum age (default 18).

age_max

Maximum age (default 69).

weight_var

Weight variable name (default "wt_final", set NULL if none).

strata_var

Strata variable name (default "stratum", set NULL if none).

cluster_var

Cluster variable name (default "psu", set NULL if none).

bp_sbp_threshold

SBP threshold for raised BP (default 140).

bp_dbp_threshold

DBP threshold for raised BP (default 90).

bmi_overweight

BMI threshold for overweight (default 25.0).

bmi_obese

BMI threshold for obesity (default 30.0).

glucose_threshold

Fasting glucose threshold in mmol/L (default 7.0).

glucose_impaired_threshold

Impaired fasting glucose threshold in mmol/L (default 6.1).

chol_threshold

Total cholesterol threshold in mmol/L (default 5.0).

Value

A list with elements:

data_path: Input file path
country_name: Country name
survey_year: Survey year
age_min, age_max: Age range
weight_var, strata_var, cluster_var: Design variable names
Threshold parameters for BP, BMI, glucose, cholesterol

Examples

## Not run: 
  cfg <- steps_config("data/steps_2023.csv", "Senegal", 2023)
  cfg <- steps_config("data/steps.csv", "Mongolia", 2019,
                      bp_sbp_threshold = 130, bp_dbp_threshold = 80)

## End(Not run)

## Not run: 
  cfg <- steps_config("data/steps_2023.csv", "Senegal", 2023)
  cfg <- steps_config("data/steps.csv", "Mongolia", 2019,
                      bp_sbp_threshold = 130, bp_dbp_threshold = 80)

## End(Not run)

Data Quality Diagnostics for WHO STEPS Data

Description

Produces a comprehensive data quality report covering digit preference, completeness, plausibility, and sampling weight diagnostics.

Usage

steps_data_quality(raw, cleaned, cols)
steps_data_quality(raw, cleaned, cols)

Arguments

raw

The raw (pre-cleaning) data frame, typically from import_steps_data().

cleaned

The cleaned data frame from clean_steps_data().

cols

Column mapping list from detect_steps_columns().

Details

Digit preference / heaping is assessed using the Whipple-style heaping index: the ratio of observed frequency at a digit (0 or 5) to the expected frequency under uniform distribution. An index of 1.0 = no preference; >1.5 = moderate heaping; >2.0 = severe.

Completeness reports missing values for key STEPS variables grouped by Step (behavioural, physical, biochemical).

Plausibility counts values outside WHO-recommended ranges (e.g. height 100–250 cm, weight 20–300 kg, SBP 60–300 mmHg).

Weight diagnostics summarise the distribution of sampling weights and flag potential issues (high CV, zero/NA weights).

Value

A list of class "steps_quality" with elements:

digit_preference: Terminal-digit tables and heaping indices for physical measurements (SBP, DBP, height, weight, waist).
completeness: Per-variable missingness counts and percentages, grouped by STEPS domain.
plausibility: Summary of values outside plausible ranges.
weights: Sampling weight distribution statistics.

WHO STEPS Data Book Table Registry

Description

Defines all standard tables from the WHO STEPS Epi Info report template. Each entry specifies the table metadata; generic compute and formatting functions use this registry to produce the full data book automatically.

Usage

steps_table_registry()
steps_table_registry()

Value

A list of table specification lists.

Table types

proportion: Single binary indicator: % (95% CI) by age × sex. Most common type. Example: "Current smokers among all respondents."
mean: Continuous variable: mean (95% CI) by age × sex. Example: "Mean BMI (kg/m²)."
category: Multi-level factor: % per level (95% CI) by age × sex. Example: "BMI classifications (Underweight / Normal / Overweight / Obese)."
cascade: Diagnosis → treatment → control chain: multiple proportions with nested denominators. Example: "Raised BP diagnosis, treatment and control."
combined: Summary of combined risk factors: 0, 1-2, 3-5 risk factors.

Registry fields

id: Unique short identifier (e.g., "T_smoking_current").
section: Data book section (e.g., "Tobacco Use", "Blood Pressure").
step: STEPS step number (1, 2, or 3).
title: Table title as shown in the data book.
description: One-line description from the WHO template.
type: One of: "proportion", "mean", "category", "cascade", "combined".
variable: Column name(s) in the cleaned data frame to analyse. For proportion: single logical variable. For mean: single numeric variable. For category: single factor variable. For cascade: named list of logical variables.
denominator: NULL (= all respondents) or column name for subsetting (e.g., "current_alcohol" to restrict to drinkers).
levels: For category type: named character vector of level labels.
epi_info: Epi Info program name(s) for reference.
unit: Display unit (e.g., "%", "mmHg", "cm", "kg/m²", "mmol/L").
questions: STEPS instrument question codes used.
sex_panels: Logical. TRUE = 3 panels (Men/Women/Both); FALSE = 2 panels (Men/Women only, e.g., height/weight means). Default TRUE.

Weighted mean estimation with 95% CI

Description

Calculates weighted means with 95% confidence intervals for a continuous variable, optionally stratified by a grouping variable.

Usage

svymn(formula, design, by = NULL, na.rm = TRUE)
svymn(formula, design, by = NULL, na.rm = TRUE)

Arguments

formula

A formula (e.g., ~age).

design

A survey design object (from setup_survey_design()).

by

Optional formula for stratification (e.g., ~sex).

na.rm

Logical; if TRUE (default), omit NA values.

Value

A data frame with columns:

estimate: estimated mean
lower: 95% CI lower bound
upper: 95% CI upper bound
se: standard error
If by is specified: grouping column(s) prepended

Weighted proportion estimation with 95% CI

Description

Calculates weighted proportions (as percentages) with 95% confidence intervals for a yes/no variable, optionally stratified by a grouping variable.

Usage

svyprop(formula, design, by = NULL, na.rm = TRUE)
svyprop(formula, design, by = NULL, na.rm = TRUE)

Arguments

formula

A formula (e.g., ~variable or using binary variables).

design

A survey design object (from setup_survey_design()).

by

Optional formula for stratification (e.g., ~sex).

na.rm

Logical; if TRUE (default), omit NA values.

Value

A data frame with columns:

estimate: estimated proportion (%)
lower: 95% CI lower bound (%)
upper: 95% CI upper bound (%)
se: standard error (%)
If by is specified: grouping column(s) prepended

Generic Table Builder for WHO STEPS Data Book

Description

Takes computed results from compute_table() or compute_all_tables() and produces formatted flextable objects in the standard WHO STEPS 3-panel format (Men / Women / Both Sexes).

WHO STEPS ggplot2 theme

Description

A clean, minimal ggplot2 theme styled with WHO STEPS colours.

Usage

theme_steps(base_size = 11)
theme_steps(base_size = 11)

Arguments

base_size

Base font size (default 11).

Value

A ggplot2::theme object.

Examples

library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + theme_steps()
library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + theme_steps()

Package 'stepssurvey'

Help Index

stepssurvey: Analyse WHO STEPS Survey Data

Description

Author(s)

See Also

Build all tables from computed results

Description

Usage

Arguments

Value

Build forest plot of key indicators with 95% CIs

Description

Usage

Arguments

Value

Build radar / spider chart of NCD risk factor profile

Description

Usage

Arguments

Value

Build publication-ready STEPS visualizations

Description

Usage

Arguments

Details

Value

Examples

Build survey-weighted tables for STEPS indicators

Description

Usage

Arguments

Details

Value

Examples

Build a formatted table from a computed result

Description

Usage

Arguments

Value

Clean and recode WHO STEPS data

Description

Usage

Arguments

Details

Value

Compute Alcohol Use Indicators

Description

Usage

Arguments

Value

See Also

Examples

Compute All STEPS Indicators

Description

Usage

Arguments

Value

Examples

Compute all tables from the registry

Description

Usage

Arguments

Value

Compute Anthropometry Indicators

Description

Usage

Arguments

Value

See Also

Examples

Compute Biochemical Indicators

Description

Usage

Arguments

Value

See Also

Examples

Compute Blood Pressure Indicators

Description