| Title: | Load Microdata from Colombia's 'GEIH' ('DANE') |
|---|---|
| Description: | Programmatic access to microdata from Colombia's Gran Encuesta Integrada de Hogares ('GEIH'), published by 'DANE'. Provides a tidy interface to download, parse, and harmonize labor market surveys from 2007 to present. R companion to the 'pulso-co' 'Python' package. |
| Authors: | Esteban Labastidas [aut, cre] |
| Maintainer: | Esteban Labastidas <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-12 14:53:18 UTC |
| Source: | https://github.com/cran/pulso |
Returns a human-readable summary of a GEIH survey module: its survey level, available epochs, and harmonized canonical variables.
pulso_describe(module)pulso_describe(module)
module |
Character. Module name (e.g., "ocupados"). Must be one of the modules registered in sources.json. |
A multi-line character string.
pulso_describe("ocupados")pulso_describe("ocupados")
Pretty-prints metadata for a column in a tibble loaded with
metadata = TRUE. Output format mirrors Python's
pulso.describe_column().
pulso_describe_column(df, column)pulso_describe_column(df, column)
df |
A tibble loaded with |
column |
Character. Column name to describe. |
A multi-line character string.
df <- pulso_load(2024, 6, "ocupados", metadata = TRUE) cat(pulso_describe_column(df, "P6020"))df <- pulso_load(2024, 6, "ocupados", metadata = TRUE) cat(pulso_describe_column(df, "P6020"))
Returns a human-readable summary of a canonical variable defined in variable_map.json: its module, description, source mappings per epoch, and any comparability warning.
pulso_describe_variable(variable_name)pulso_describe_variable(variable_name)
variable_name |
Character. Canonical variable name (e.g., "sexo"). The lookup is case-insensitive. |
A single multi-line character string.
pulso_describe_variable("sexo") pulso_describe_variable("edad")pulso_describe_variable("sexo") pulso_describe_variable("edad")
Returns a tibble summary of all columns with their metadata.
pulso_list_columns_metadata(df)pulso_list_columns_metadata(df)
df |
A tibble loaded with |
A tibble with columns: column, label, type, module, source, has_categories.
df <- pulso_load(2024, 6, "ocupados", metadata = TRUE) summary <- pulso_list_columns_metadata(df) print(summary)df <- pulso_load(2024, 6, "ocupados", metadata = TRUE) summary <- pulso_list_columns_metadata(df) print(summary)
Returns the registry entries for which the downloaded data has been manually validated against DANE published figures.
pulso_list_validated_range()pulso_list_validated_range()
A tibble with one row per validated (year, month) pair, with
columns: year (integer), month (integer),
epoch (character), validated (logical, always TRUE),
validated_at (character, ISO-8601 or NA),
num_modules (integer).
Sorted by year, month ascending.
pulso_list_validated_range()pulso_list_validated_range()
Returns a tibble summarizing all canonical variables defined in variable_map.json, optionally filtered by module.
pulso_list_variables(module = NULL)pulso_list_variables(module = NULL)
module |
Character or NULL. When non-NULL, restricts output to
variables whose |
A tibble with one row per variable and columns:
chr. Key used throughout the pulso package.
chr. Survey module the variable belongs to.
chr. Spanish description, or NA if absent.
chr. Always NA in this version (see Note).
lgl. TRUE when a comparability_warning is
present and non-empty for the variable.
int. Number of epoch mappings defined.
Rows are sorted ascending by canonical_name.
The comparability column is always NA_character_ in the
current version because the comparability field (expected values:
"high" / "limited") has not yet been added to variable_map.json.
Only comparability_warning (free text) is present in the data.
# All variables pulso_list_variables() # Subset by module pulso_list_variables(module = "ocupados")# All variables pulso_list_variables() # Subset by module pulso_list_variables(module = "ocupados")
Downloads and parses microdata from Colombia's Gran Encuesta Integrada de Hogares (GEIH), published by DANE.
pulso_load( year, month, module, area = NULL, harmonize = TRUE, cache = TRUE, metadata = FALSE, allow_unvalidated = FALSE )pulso_load( year, month, module, area = NULL, harmonize = TRUE, cache = TRUE, metadata = FALSE, allow_unvalidated = FALSE )
year |
Integer. Year (2007 to current year). |
month |
Integer. Month (1-12). |
module |
Character. Module name (e.g., "ocupados"). |
area |
Character or NULL. Optional area filter. NOT IMPLEMENTED in v0.1.0. |
harmonize |
Logical. Whether to apply harmonization. Default TRUE. |
cache |
Logical. Whether to cache downloads. Default TRUE. |
metadata |
Logical. Whether to attach DANE metadata to result.
Default FALSE for Python parity. When TRUE, attaches metadata via
|
allow_unvalidated |
Logical. When FALSE (default), raises
|
A tibble with the microdata. If metadata = TRUE, the tibble has an attribute "pulso_metadata" with structured column info.
df <- pulso_load(year = 2024, month = 6, module = "ocupados", metadata = TRUE) cat(pulso_describe_column(df, "P6020"))df <- pulso_load(year = 2024, month = 6, module = "ocupados", metadata = TRUE) cat(pulso_describe_column(df, "P6020"))
Downloads, parses, and joins microdata from multiple GEIH modules for a single year-month. All modules must be at the same survey level (persona or hogar).
pulso_load_merged( year, month, modules, harmonize = TRUE, cache = TRUE, metadata = FALSE, allow_unvalidated = FALSE )pulso_load_merged( year, month, modules, harmonize = TRUE, cache = TRUE, metadata = FALSE, allow_unvalidated = FALSE )
year |
Integer. Year (2007 to current year). |
month |
Integer. Month (1-12). |
modules |
Character vector. Module names to merge (length >= 2).
Use |
harmonize |
Logical. Whether to lowercase column names. Default TRUE. |
cache |
Logical. Whether to cache downloads. Default TRUE. |
metadata |
Logical. Whether to attach DANE metadata. Default FALSE. |
allow_unvalidated |
Logical. Passed through to each |
A tibble with columns from all requested modules, joined on shared identifier keys (directorio, secuencia_p, orden for persona-level modules). The join is an outer join by default so that modules covering different person subsets (e.g., ocupados vs desocupados) are combined correctly.
Mixed-level merges (persona + hogar modules in the same call) are deferred to v0.2.0. If you need a hogar-level module, merge the result manually after separate calls.
df <- pulso_load_merged(2024, 6, c("ocupados", "caracteristicas_generales")) nrow(df)df <- pulso_load_merged(2024, 6, c("ocupados", "caracteristicas_generales")) nrow(df)
Returns Colombia's monetary policy rate as a tibble. Data source: Banco de la Republica SDMX API (DF_CBR_DAILY_HIST). When the API is unavailable, falls back to a bundled snapshot extending to 2026-04-21.
pulso_tpm(start = NULL, end = NULL, use_fixture = NULL)pulso_tpm(start = NULL, end = NULL, use_fixture = NULL)
start |
Optional start date as character "YYYY-MM-DD" or Date. |
end |
Optional end date as character "YYYY-MM-DD" or Date. |
use_fixture |
Logical or NULL. If NULL (default), auto-detect: tries API first, falls back to snapshot if unavailable. If TRUE, always use bundled snapshot. If FALSE, force API call. |
A tibble with columns:
Date. Observation date.
numeric. TPM rate in percentage points.
character. Always "tpm".
# Recent TPM tpm_2024 <- pulso_tpm(start = "2024-01-01", end = "2024-12-31") # Full history (uses fixture if API down) tpm_all <- pulso_tpm()# Recent TPM tpm_2024 <- pulso_tpm(start = "2024-01-01", end = "2024-12-31") # Full history (uses fixture if API down) tpm_all <- pulso_tpm()
Returns structured metadata about a specific year-month entry in the pulso registry, including whether it has been manually validated against published DANE figures.
pulso_validation_status(year, month)pulso_validation_status(year, month)
year |
Integer. Year (2007 to current year). |
month |
Integer. Month (1-12). |
A one-row tibble with columns: year, month,
epoch, validated, validated_by,
validated_at, source_url, file_size_mb,
modules_available, checksum_sha256.
pulso_validation_status(2024, 6)pulso_validation_status(2024, 6)