In the United States, ZIP Codes are used to facilitate postal deliveries. They are created and maintained by the United States Postal Service (USPS), but are also required by other shippers and appear in a variety of administrative data contexts. ZIP stands for “Zone Improvement Plan,” which was introduced in 1963.
A complete ZIP Code consists of two elements - a five-digit number and a four-digit number. Together, these nine digits represent a specific delivery area (sometimes referred to as a carrier route). The first five digits, which we’ll refer to as ZIP Codes for shorthand, represent one of the following:
The most common form of a ZIP Code is the general delivery area. While we imagine these to be distinct, non-overlapping regions on a map, they are actually collections of carrier routes that sometimes overlap with each other.
Those individual carrier routes are represented by the second, four-digit number. These may correspond to a particular city block, apartment building, or other delivery area within a ZIP Code. For PO Boxes, individual boxes may be assigned their own four-digit number. Generally speaking, the four-digit add-on is not especially useful for RWE.
ZIP codes are assigned to four types of jurisdictions:
The Compact of Free Association (COFA) is a wide-ranging legal agreement that gives the three nations access to many U.S. federal services usually considered domestic programs. This includes USPS deliveries, and so all three nations are assigned ZIP Codes as well.
ZIP Codes generally are patterned regionally, with ZIP Codes
beginning with 0
being located in the Northeast. Values
increase westward, with the highest ZIP Codes in Alaska, Hawaii, and
islands in the Pacific. These ZIP Codes all begin with a
9
.
States are assigned one or more initial digits. For example, New York
State primarily has ZIP Codes beginning with values between
10
and 14
for their first two digits. However,
there are routine exceptions. 06390
(Fishers Island) is
also a valid ZIP Code in New York even though it begins with
0
. Likewise, federal agencies located in Maryland and
Virginia have Washington, D.C. assigned ZIP Codes. The first digit,
therefore, does not correspond to Census Region or
Division, and it is not possible to aggregate up to
those geographies based on the first digit alone. ZIP Codes also
cannot be aggregated to counties or states with
complete accuracy.
Since ZIP Codes are designed to facilitate the delivery of mail, and not other uses, they do not neatly nest into other jurisdictional boundaries. They regularly cross county boundaries, and sometimes cross state boundaries as well, especially in rural areas where postal delivery is best facilitated from a neighboring state.
Finally, it is important to know that ZIP Codes are not permanent. Carrier routes are updated frequently, and even the ZIP Codes themselves are subject to revision occasionally. For example, some ZIP Codes have been split to accommodate population growth and housing construction.
Three-digit ZIP Codes have a specific meaning for the USPS, and are sometimes also used in patient-level data to preserve confidentiality. More details on working with three-digit ZIPs can be found in a separate vignette.
The USPS does not publish a map of ZIP Codes because carrier routes are constantly changing and being revised. Moreover, since ZIP Codes are not areas on maps as we imagine them, they are not suitable for aggregation or analysis. The Census Bureau has created ZIP Code Tabulation Areas (ZCTAs) to approximate ZIP Code areas for the purposes of data analysis and mapping. ZCTAs are created by aggregating Census blocks that have the same first three digits of their ZIP Codes. This means that ZCTAs are not the same as ZIP Codes, and they are not the same as carrier routes. They are a useful approximation for many purposes, but they are not perfect. ZCTAs are updated every year, though the most significant updates occur every decade with the release of the Decennial Census.
ZCTAs are created by identifying the most common ZIP Code within each
Census Block, and then dissolving those individual Census Blocks
together to create a single polygon. This results in misclassification
at the address-level, where some address points within a given Census
Block will be assigned a ZCTA that differs from their individual ZIP
Code. Correcting this form misclassification requires point-level
address data, which may be available in some areas but are not available
for the entire United States. Addressing this problem is beyond the
scope of zippeR
.
ZCTAs are not created for areas that have no population, or areas that have very sparse populations. This affects areas in the American West, for example, that have very few residents.
Finally, it is important to note that not all ZIP Codes have an analogous ZCTA. Some ZIP Codes are used for Post Office Boxes, and these are not included in the ZCTA data. Using ZIP to ZCTA crosswalk files can help address this form of misclassification, and is the subject of a separate vignette.
One of the core features of zippeR
validate inputs of
ZIP codes or ZCTA codes. For example, here are a set of ZCTAs that lie
on the Missouri/Iowa border:
Notice how the last element contains a non-numeric character. When
zcta5
is passed to zi_validate()
, it will
catch the formatting issue. There are two options, one of which returns
a single logical value (TRUE
or FALSE
):
The other option, with verbose = TRUE
, provides
additional data about where formatting issues may exist:
> zi_validate(zcta5, verbose = TRUE)
# A tibble: 4 × 2
condition result
<chr> <lgl>
1 Input is a character vector? TRUE
2 All input values have 5 characters? TRUE
3 No input values are over 5 characters long? TRUE
4 All input values are numeric? FALSE
For the third and fourth tests, users are strongly encourage to
attempt to manually correct problems. However, zi_repair()
can be used to address the first and second tests, and will return
NA
values for ZIPs or ZCTAs that do not pass the third and
fourth tests:
> zi_repair(zcta5)
[1] "51640" "52542" "52573" NA
Warning message:
In zi_repair(zcta5) : NAs introduced by coercion
When malformed ZIPs or ZCTAs are replaced with NA
values, zi_repair()
will return a warning. Note that
zi_validate()
also works with three-digit ZCTAs as
well:
Note that, at this time, the validation process does not ensure that inputs correspond to valid ZCTAs.