Package: canpumf 0.5.2

Jens von Bergmann

canpumf: Parse StatCan PUMF Files

Facilitate working with Statistics Canada (StatCan) Public Use Microdata Files (PUMF). Enables downloading of available PUMF data, parsing of metadata from command files or other sources to infer the layout structure, variable labels and value labels as well as missing data values, and returns a connection to a 'DuckDB' database with the labelled data. Data and documentation come from Statistics Canada's Public Use Microdata Files <https://www.statcan.gc.ca/en/microdata/pumf>, distributed under the Statistics Canada Open Licence <https://www.statcan.gc.ca/en/terms-conditions/open-licence>.

Authors:Jens von Bergmann [aut, cre]

canpumf_0.5.2.tar.gz
canpumf_0.5.2.tar.gz(r-4.7-any)canpumf_0.5.2.tar.gz(r-4.6-any)
canpumf_0.5.2.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
canpumf/json (API)

# Install 'canpumf' in R:
install.packages('canpumf', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/mountainmath/canpumf/issues

Pkgdown/docs site:https://mountainmath.github.io

On CRAN:

Conda:

3.54 score 21 exports 52 dependencies

Last updated from:df6858e294. Checks:4 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK171
source / vignettesOK262
linux-release-x86_64OK148
wasm-releaseOK176

Exports:add_bootstrap_weightsadd_lfs_GENDER_SEXadd_lfs_SURVDATEbsw_infoclose_pumfget_pumfget_pumf_connectionlabel_pumf_columnslist_available_lfs_pumf_versionslist_canpumf_collectionlist_pumf_cachelist_pumf_registrylist_statcan_pumf_catalogueopen_pumf_documentationpumf_metadatapumf_modulepumf_registrypumf_registry_entrypumf_var_labelsremove_bootstrap_weightsremove_pumf_cache

Dependencies:askpassbitbit64blobcachemclicliprcollectionscpp11crayoncurlDBIdbplyrdplyrduckdbduckplyrfastmapforcatsgenericsgluehavenhmshttrjsonlitelifecyclemagrittrmemoisemimeopensslpillarpkgconfigprettyunitsprogresspurrrR6readrrlangrvestselectrstringistringrsystibbletidyrtidyselecttzdbutf8vctrsvroomwithrxml2zip

Bootstrap weights
The method: a resampling bootstrap | Two modes | Where the weights are stored (DuckDB path) | Identifying rows | Stratified bootstrap weights | Estimating uncertainty | Incremental re-runs | Reuse — nothing to do | More replicates requested | Rows added to the survey table | Forcing a full regeneration | Multiple weight columns | Filtered input tables | Connection lifecycle | Inspecting and removing weights

Last update: 2026-07-03
Started: 2026-07-03

canpumf Pipeline Architecture
High-level flow | Stage 1 — Locate or download | Version resolution | Stage 2 — Parse metadata | Format detection | Parsers | SPSS monolithic (parse_spss_mono) | SPSS split-file (parse_spss_split) | SAS reading cards (parse_sas_cards) | LFS codebook CSV (parse_lfs_codebook) | CPSS variables CSV (parse_cpss_csv) | SPSS .sav (parse_spss_sav) | PDF Data Dictionary (parse_pdf_dictionary) | PDF frequency codebook (parse_pdf_codebook) | Metadata encoding | Merge | Stage 3 — Build DuckDB | Data file selection | FWF vs. CSV | Trailing junk row removal (FWF only) | Data fixups (pre-label) | Bootstrap weight join (BSW) | Numeric conversion | Code labels → factors | DuckDB write and ENUM enforcement | Multi-module surveys | LFS pipeline | Connection provenance registry | Registry configuration | Newest-sibling inheritance

Last update: 2026-07-03
Started: 2026-07-03

Census

Last update: 2026-07-03
Started: 2026-07-03

LFS
Timelines

Last update: 2026-07-03
Started: 2026-07-03

Onboarding a new PUMF
Naming conventions and where to put the files | Smart defaults and the newest-sibling fallback | When the automatic import fails | See what is actually in the directory | Parse the metadata in isolation | Start from an existing entry as a template | Tweak fixups for data-level issues | Build the full table | Promote the configuration into the registry | Summary

Last update: 2026-07-03
Started: 2026-07-03

Working with canpumf
Forced moves

Last update: 2026-07-03
Started: 2026-07-03

Working with multi-module PUMF surveys
Loading the primary module | Opening a sibling module | Joining modules for analysis | A second example: the Survey of Household Spending | Cleaning up | Database connections | Notes

Last update: 2026-07-03
Started: 2026-07-03