Package 'pii'

Title: Search Data Frames for Personally Identifiable Information
Description: Check a data frame for personal information, including names, location, disability status, and geo-coordinates.
Authors: Jacob Patterson-Stein [aut, cre]
Maintainer: Jacob Patterson-Stein <[email protected]>
License: MIT + file LICENSE
Version: 1.0.0
Built: 2024-10-22 07:25:39 UTC
Source: CRAN

Help Index


Search Data Frames for Personally Identifiable Information

Description

Search Data Frames for Personally Identifiable Information

Usage

check_PII(df)

Arguments

df

a data frame object

Value

Returns a data frame of columns that potentially contain PII

Examples

# create a data frame containing various personally identifiable information
pii_df <- data.frame(
 lat = c(40.7128, 34.0522, 41.8781),
 long = c(-74.0060, -118.2437, -87.6298),
 first_name = c("John", "Michael", "Linda"),
 phone = c("123-456-7890", "234-567-8901", "345-678-9012"),
 age = sample(30:60, 3, replace = TRUE),
 email = c("[email protected]", "[email protected]", "[email protected]"),
 disabled = c("No", "Yes", "No"),
 stringsAsFactors = FALSE
)

check_PII(pii_df)

Split Data Into PII and Non-PII Columns

Description

Split Data Into PII and Non-PII Columns

Usage

split_PII_data(df, exclude_columns = NULL)

Arguments

df

a data frame object

exclude_columns

columns to exclude from the data frame splitdescription

Value

Returns two data frames into the global environment: one containing the PII columns and one without the PII columns. A unique merge key is created to join them. The function then prints the columns that were flagged and split to the console.

Examples

# create a data frame containing various personally identifiable information
pii_df <- data.frame(
 lat = c(40.7128, 34.0522, 41.8781),
 long = c(-74.0060, -118.2437, -87.6298),
 first_name = c("John", "Michael", "Linda"),
 phone = c("123-456-7890", "234-567-8901", "345-678-9012"),
 age = sample(30:60, 3, replace = TRUE),
 email = c("[email protected]", "[email protected]", "[email protected]"),
 disabled = c("No", "Yes", "No"),
 stringsAsFactors = FALSE
)

split_PII_data(pii_df, exclude_columns = c("phone"))