Package 'usdatasets' reference manual

Title:	A Comprehensive Collection of U.S. Datasets
Description:	Provides a diverse collection of U.S. datasets encompassing various fields such as crime, economics, education, finance, energy, healthcare, and more. It serves as a valuable resource for researchers and analysts seeking to perform in-depth analyses and derive insights from U.S.-specific data.
Authors:	Renzo Caceres Rossi [aut, cre]
Maintainer:	Renzo Caceres Rossi <[email protected]>
License:	GPL-3
Version:	0.1.0
Built:	2024-12-08 07:16:11 UTC
Source:	CRAN

American Community Survey 2012

Description

The dataset name has been changed to 'acs12_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble data frame, helping to differentiate it from other datasets within the package. The original content of the dataset has not been modified in any way.

Usage

data(acs12_tbl_df)
data(acs12_tbl_df)

Format

A tibble with 2,000 observations and 13 variables:

income: Income of individuals (integer).
employment: Employment status (factor with 3 levels).
hrs_work: Number of hours worked per week (integer).
race: Race of individuals (factor with 4 levels).
age: Age of individuals (integer).
gender: Gender of individuals (factor with 2 levels: "male", "female").
citizen: Citizenship status (factor with 2 levels: "no", "yes").
time_to_work: Time taken to travel to work in minutes (integer).
lang: Primary language spoken at home (factor with 2 levels: "english", "other").
married: Marital status (factor with 2 levels: "no", "yes").
edu: Educational attainment (factor with 3 levels).
disability: Disability status (factor with 2 levels).
birth_qrtr: Birth quarter of individuals (factor with 4 levels).

Source

American Community Survey, 2012.

Age at first marriage of 5,534 US women.

Description

The dataset name has been changed to 'age_at_mar_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble data frame, helping to differentiate it from other datasets within the package. The original content of the dataset has not been modified in any way.

Usage

data(age_at_mar_tbl_df)
data(age_at_mar_tbl_df)

Format

A tibble with 5,534 observations and 1 variable:

age: Age at first marriage (integer).

Source

United States Census Data.

Airline names - U.S. Airlines Carrier Codes and Names

Description

The dataset name has been changed to 'airlines_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble, helping to differentiate it from other datasets within the package. The original content of the dataset has not been modified in any way.

Usage

data(airlines_tbl_df)
data(airlines_tbl_df)

Format

A tibble with 16 observations and 2 variables:

carrier: Carrier code (character) representing the airline.
name: Name of the airline (character).

Source

U.S. Department of Transportation.

Airport metadata - U.S. Airports Information

Description

The dataset name has been changed to 'airports_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble, helping to differentiate it from other datasets within the package. The original content of the dataset has not been modified in any way.

Usage

data(airports_tbl_df)
data(airports_tbl_df)

Format

A tibble with 1,458 observations and 8 variables:

faa: FAA airport code (character).
name: Name of the airport (character).
lat: Latitude of the airport (numeric).
lon: Longitude of the airport (numeric).
alt: Altitude of the airport (numeric).
tz: Time zone (numeric).
dst: Daylight saving time flag (character).
tzone: Time zone name (character).

Source

U.S. Federal Aviation Administration (FAA).

New York Air Quality Measurements

Description

The dataset name has been changed to 'airquality_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'df' identifies the dataset as a data frame, helping to differentiate it from other datasets within the package. The original content of the dataset has not been modified in any way.

Usage

data(airquality_df)
data(airquality_df)

Format

A data frame with 153 observations and 6 variables:

Ozone: Ozone concentration (parts per billion) from 1 to 331.
Solar.R: Solar radiation (watts per square meter).
Wind: Wind speed (miles per hour).
Temp: Temperature (degrees Fahrenheit).
Month: Month of the observation (integer from 5 to 9).
Day: Day of the observation (integer from 1 to 31).

Source

United States Environmental Protection Agency (EPA).

Housing prices in Ames, Iowa

Description

The dataset name has been changed to 'ames_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(ames_tbl_df)
data(ames_tbl_df)

Format

A tibble with 2,930 observations and 82 variables:

Order: Row number in the dataset.
PID: Parcel Identifier.
area: Total house area in square feet.
price: Sale price of the house.
MS.SubClass: Building class type.
MS.Zoning: Zoning classification of the property.
Lot.Frontage: Lot frontage length in feet.
Lot.Area: Total lot area in square feet.
Street: Street type access to the property.
Alley: Alley type access.
Lot.Shape: Shape of the lot.
Land.Contour: Land contour around the property.
Utilities: Availability of utilities.
Lot.Config: Lot configuration.
Land.Slope: Slope of the land.
Neighborhood: Neighborhood in Ames.
Condition.1: Proximity to main conditions like railroads.
Condition.2: Proximity to secondary conditions.
Bldg.Type: Type of building.
House.Style: Architectural style of the house.
Overall.Qual: Overall quality of the materials and finish.
Overall.Cond: Overall condition of the house.
Year.Built: Year the house was built.
Year.Remod.Add: Year of the last remodel or addition.
Roof.Style: Roof style.
Roof.Matl: Roof material.
Exterior.1st: Primary exterior material.
Exterior.2nd: Secondary exterior material.
Mas.Vnr.Type: Masonry veneer type.
Mas.Vnr.Area: Masonry veneer area in square feet.
Exter.Qual: Exterior material quality.
Exter.Cond: Condition of the exterior material.
Foundation: Type of foundation.
Bsmt.Qual: Basement quality.
Bsmt.Cond: Basement condition.
Bsmt.Exposure: Basement exposure to the outside.
BsmtFin.Type.1: Type 1 of finished basement.
BsmtFin.SF.1: Square feet of finished basement type 1.
BsmtFin.Type.2: Type 2 of finished basement.
BsmtFin.SF.2: Square feet of finished basement type 2.
Bsmt.Unf.SF: Unfinished basement area in square feet.
Total.Bsmt.SF: Total basement area in square feet.
Heating: Type of heating system.
Heating.QC: Heating system quality.
Central.Air: Presence of central air conditioning.
Electrical: Type of electrical system.
X1st.Flr.SF: First floor area in square feet.
X2nd.Flr.SF: Second floor area in square feet.
Low.Qual.Fin.SF: Low-quality finished area in square feet.
Bsmt.Full.Bath: Number of full bathrooms in the basement.
Bsmt.Half.Bath: Number of half bathrooms in the basement.
Full.Bath: Number of full bathrooms above ground.
Half.Bath: Number of half bathrooms above ground.
Bedroom.AbvGr: Number of bedrooms above ground.
Kitchen.AbvGr: Number of kitchens above ground.
Kitchen.Qual: Kitchen quality.
TotRms.AbvGrd: Total number of rooms above ground.
Functional: Functionality of the house.
Fireplaces: Number of fireplaces.
Fireplace.Qu: Fireplace quality.
Garage.Type: Type of garage.
Garage.Yr.Blt: Year the garage was built.
Garage.Finish: Garage finish type.
Garage.Cars: Number of cars the garage can accommodate.
Garage.Area: Garage area in square feet.
Garage.Qual: Garage quality.
Garage.Cond: Garage condition.
Paved.Drive: Indicates whether the driveway is paved.
Wood.Deck.SF: Wood deck area in square feet.
Open.Porch.SF: Open porch area in square feet.
Enclosed.Porch: Enclosed porch area in square feet.
X3Ssn.Porch: Three-season porch area in square feet.
Screen.Porch: Screened porch area in square feet.
Pool.Area: Pool area in square feet.
Pool.QC: Pool quality.
Fence: Type of fence.
Misc.Feature: Miscellaneous features of the property.
Misc.Val: Value of miscellaneous features.
Mo.Sold: Month the house was sold.
Yr.Sold: Year the house was sold.
Sale.Type: Type of sale.
Sale.Condition: Condition of the sale.

Source

Ames Housing Dataset, provided by Dean De Cock

North Carolina births, 100 cases

Description

The dataset name has been changed to 'births_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(births_tbl_df)
data(births_tbl_df)

Format

A tibble with 150 observations and 9 variables:

f_age: Age of the father (in years).
m_age: Age of the mother (in years).
weeks: Number of weeks of pregnancy.
premature: Indicates if the baby is premature (factor: yes/no).
visits: Number of prenatal visits.
gained: Weight gained by the mother during pregnancy (in pounds).
weight: Birth weight of the baby (in grams).
sex_baby: Sex of the baby (factor: male/female).
smoke: Indicates if the mother smoked during pregnancy (factor: yes/no).

Source

National Vital Statistics Reports

US Births 2014

Description

The dataset name has been changed to 'births14_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(births14_tbl_df)
data(births14_tbl_df)

Format

A tibble with 1,000 observations and 13 variables:

fage: Age of the father (in years).
mage: Age of the mother (in years).
mature: Indicates if the mother is mature (yes/no).
weeks: Number of weeks of pregnancy.
premie: Indicates if the baby is a premature birth (yes/no).
visits: Number of prenatal visits.
gained: Weight gained by the mother during pregnancy (in pounds).
weight: Birth weight of the baby (in grams).
lowbirthweight: Indicates if the baby is of low birth weight (yes/no).
sex: Sex of the baby (male/female).
habit: Maternal smoking habits (yes/no).
marital: Marital status of the mother (married/single).
whitemom: Indicates if the mother is white (yes/no).

Source

National Vital Statistics Reports

Housing Values in Suburbs of Boston

Description

The dataset name has been changed to 'Boston_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix '_df' identifies the dataset as a data frame. The original content of the dataset has not been modified in any way.

Usage

data(Boston_df)
data(Boston_df)

Format

A data frame with 506 observations and 14 variables:

crim: Per capita crime rate by town.
zn: Proportion of residential land zoned for lots over 25,000 sq. ft.
indus: Proportion of non-retail business acres per town.
chas: Charles River dummy variable (1 if tract bounds river; 0 otherwise).
nox: Nitric oxides concentration (parts per 10 million).
rm: Average number of rooms per dwelling.
age: Proportion of owner-occupied units built prior to 1940.
dis: Weighted distances to five Boston employment centers.
rad: Index of accessibility to radial highways.
tax: Full-value property tax rate per $10,000.
ptratio: Pupil-teacher ratio by town.
black: 1000(Bk - 0.63)^2 where Bk is the proportion of Black residents by town.
lstat: Percentage of lower status of the population.
medv: Median value of owner-occupied homes in $1000s.

Source

Boston Housing Data

Data from 93 Cars on Sale in the USA in 1993

Description

The dataset name has been changed to 'Cars93_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix '_df' identifies the dataset as a data frame. The original content of the dataset has not been modified in any way.

Usage

data(Cars93_df)
data(Cars93_df)

Format

A data frame with 54 observations and 6 variables:

type: Type of the car (factor with 3 levels).
price: Price of the car (in US dollars).
mpg_city: Miles per gallon in the city.
drive_train: Drive train type (factor with 3 levels).
passengers: Number of passengers the car can accommodate.
weight: Weight of the car (in pounds).

Source

1993 Cars Data

Random sample of 2000 U.S. Census Data

Description

The dataset name has been changed to 'census_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(census_tbl_df)
data(census_tbl_df)

Format

A tibble with 500 observations and 8 variables:

census_year: Year of the census (in integer).
state_fips_code: FIPS code for the state (factor with 47 levels).
total_family_income: Total family income (in US dollars).
age: Age of the individual (in years).
sex: Sex of the individual (factor: male/female).
race_general: General race category (factor with 8 levels).
marital_status: Marital status of the individual (factor with 6 levels).
total_personal_income: Total personal income (in US dollars).

Source

US Census Bureau

CIA Factbook Details on Countries

Description

The dataset name has been changed to 'cia_factbook_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(cia_factbook_tbl_df)
data(cia_factbook_tbl_df)

Format

A tibble with 259 observations and 11 variables:

country: Name of the country (factor with 259 levels).
area: Total area of the country (in square kilometers).
birth_rate: Birth rate (number of live births per 1,000 people).
death_rate: Death rate (number of deaths per 1,000 people).
infant_mortality_rate: Infant mortality rate (number of deaths of infants under one year old per 1,000 live births).
internet_users: Number of internet users (in millions).
life_exp_at_birth: Life expectancy at birth (in years).
maternal_mortality_rate: Maternal mortality rate (number of maternal deaths per 100,000 live births).
net_migration_rate: Net migration rate (number of migrants per 1,000 people).
population: Total population of the country.
population_growth_rate: Population growth rate (percentage).

Source

CIA World Factbook

Cleveland and Sacramento Demographic and Income Data (2000)

Description

The dataset name has been changed to 'cle_sac_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(cle_sac_tbl_df)
data(cle_sac_tbl_df)

Format

A tibble with 500 observations and 8 variables:

year: Year of the observation (integer).
state: State of the observation (factor with 2 levels).
city: City of the observation (character).
age: Age of the individual (integer).
sex: Sex of the individual (factor with 2 levels).
race: Race of the individual (character).
marital_status: Marital status of the individual (character).
personal_income: Personal income of the individual (integer).

Source

Cleveland Study

United States Counties

Description

The dataset name has been changed to 'county_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(county_tbl_df)
data(county_tbl_df)

Format

A tibble with 3,142 observations and 15 variables:

name: Name of the county.
state: State in which the county is located (factor with 51 levels).
pop2000: Population of the county in the year 2000.
pop2010: Population of the county in the year 2010.
pop2017: Population of the county in the year 2017.
pop_change: Change in population over the years.
poverty: Poverty rate in the county.
homeownership: Rate of homeownership in the county.
multi_unit: Percentage of multi-unit housing.
unemployment_rate: Unemployment rate in the county.
metro: Indicates if the county is in a metropolitan area (factor with 2 levels).
median_edu: Median education level in the county (factor with 4 levels).
per_capita_income: Per capita income in the county.
median_hh_income: Median household income in the county.
smoking_ban: Indicates if there is a smoking ban in place (factor with 3 levels).

Source

United States Census Bureau

American Adults on Regulation and Renewable Energy

Description

The dataset name has been changed to 'env_regulation_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(env_regulation_tbl_df)
data(env_regulation_tbl_df)

Format

A tibble with 705 observations and 1 variable:

statement: Environmental regulation statement (character).

Source

Environmental Regulation Study

Summary of male heights from USDA Food Commodity Intake Database

Description

The dataset name has been changed to 'fcid_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(fcid_tbl_df)
data(fcid_tbl_df)

Format

A tibble with 100 observations and 2 variables:

height: Height of the individual (numeric).
num_of_adults: Number of adults in the household (integer).

Source

Family Characteristics and Income Study

Google stock data

Description

The dataset name has been changed to 'goog_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(goog_tbl_df)
data(goog_tbl_df)

Format

A tibble with 98 observations and 7 variables:

date: Date of the stock price observation (factor with 98 levels).
open: Opening price of the stock (numeric).
high: Highest price during the trading session (numeric).
low: Lowest price during the trading session (numeric).
close: Closing price of the stock (numeric).
volume: Number of shares traded (integer).
adj_close: Adjusted closing price of the stock (numeric).

Source

Google Stock Market Data

Election results for 2010 Governor races in the U.S.

Description

The dataset name has been changed to 'govrace10_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(govrace10_tbl_df)
data(govrace10_tbl_df)

Format

A tibble with 37 observations and 23 variables:

id: Identification number (numeric).
state: State name (character).
abbr: State abbreviation (character).
name1: Name of the first candidate (character).
perc1: Percentage of votes for the first candidate (numeric).
party1: Political party of the first candidate (character).
votes1: Number of votes for the first candidate (numeric).
name2: Name of the second candidate (character).
perc2: Percentage of votes for the second candidate (numeric).
party2: Political party of the second candidate (character).
votes2: Number of votes for the second candidate (numeric).
name3: Name of the third candidate (character).
perc3: Percentage of votes for the third candidate (numeric).
party3: Political party of the third candidate (character).
votes3: Number of votes for the third candidate (numeric).
name4: Name of the fourth candidate (character).
perc4: Percentage of votes for the fourth candidate (numeric).
party4: Political party of the fourth candidate (character).
votes4: Number of votes for the fourth candidate (numeric).
name5: Name of the fifth candidate (character).
perc5: Percentage of votes for the fifth candidate (numeric).
party5: Political party of the fifth candidate (character).
votes5: Number of votes for the fifth candidate (numeric).

Source

2010 Gubernatorial Races

Homicides in nine cities in 2015

Description

The dataset name has been changed to 'homicides15_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(homicides15_tbl_df)
data(homicides15_tbl_df)

Format

A tibble with 1922 observations and 15 variables:

uid: Unique identifier (integer).
city_name: City name where the homicide occurred (character).
offense_code: Offense code (character).
offense_type: Type of offense (character).
date_single: Date of the homicide (POSIXct).
address: Location address of the homicide (character).
longitude: Longitude of the homicide location (numeric).
latitude: Latitude of the homicide location (numeric).
location_type: Type of location where the homicide occurred (character).
location_category: Category of the location (character).
fips_state: FIPS code of the state (integer).
fips_county: FIPS code of the county (character).
tract: Census tract where the homicide occurred (character).
block_group: Block group number (integer).
block: Block number (integer).

Source

2015 Homicides Data

United States House of Representatives historical make-up

Description

The dataset name has been changed to 'house_tbl_df' to avoid confusion with other packages in the R ecosystem from which datasets have been sourced. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and assists users in identifying its specific characteristics. The suffix 'tbl_df' identifies the dataset as a tibble. The original content of the dataset has not been modified in any way.

Usage

data(house_tbl_df)
data(house_tbl_df)

Format

A tibble with 116 observations and 12 variables:

congress: Congress number (numeric).
year_start: Starting year of the congress (numeric).
year_end: Ending year of the congress (numeric).
seats: Total number of seats in the House of Representatives (numeric).
p1: Abbreviation of the first party (character).
np1: Number of seats for the first party (numeric).
p2: Abbreviation of the second party (character).
np2: Number of seats for the second party (numeric).
other: Number of seats for other parties (numeric).
vac: Number of vacant seats (numeric).
del: Number of delegate seats (numeric).
res: Number of resident commissioner seats (numeric).

Source

Historical House of Representatives Data

Election results for the 2010 U.S. House of Represenatives races

Description

The dataset name has been changed to 'houserace10_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(houserace10_tbl_df)
data(houserace10_tbl_df)

Format

A tibble with 435 observations and 24 variables:

id: Unique race identifier (numeric).
state: Name of the state (character).
abbr: State abbreviation (character).
num: District number (numeric).
name1: Name of the first candidate (character).
perc1: Percentage of votes for the first candidate (numeric).
party1: Party affiliation of the first candidate (character).
votes1: Number of votes for the first candidate (numeric).
name2: Name of the second candidate (character).
perc2: Percentage of votes for the second candidate (numeric).
party2: Party affiliation of the second candidate (character).
votes2: Number of votes for the second candidate (numeric).
name3: Name of the third candidate (character).
perc3: Percentage of votes for the third candidate (numeric).
party3: Party affiliation of the third candidate (character).
votes3: Number of votes for the third candidate (numeric).
name4: Name of the fourth candidate (character).
perc4: Percentage of votes for the fourth candidate (numeric).
party4: Party affiliation of the fourth candidate (character).
votes4: Number of votes for the fourth candidate (numeric).
name5: Name of the fifth candidate (character).
perc5: Percentage of votes for the fifth candidate (numeric).
party5: Party affiliation of the fifth candidate (character).
votes5: Number of votes for the fifth candidate (numeric).

Source

2010 U.S. House of Representatives Election Data

Poll on illegal workers in the US

Description

The dataset name has been changed to 'immigration_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(immigration_tbl_df)
data(immigration_tbl_df)

Format

A tibble with 910 observations and 2 variables:

response: Factor indicating the response to immigration-related questions, with 4 levels.
political: Factor indicating the political alignment associated with the responses, with 3 levels.

Source

Data from surveys on immigration attitudes

Legalization of Marijuana Support in 2010 California Survey

Description

The dataset name has been changed to 'leg_mari_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(leg_mari_tbl_df)
data(leg_mari_tbl_df)

Format

A tibble with 119 observations and 1 variable:

response: Factor indicating responses related to legal marijuana, with 2 levels.

Source

Data from surveys on attitudes towards legal marijuana

New York City Marathon Times (outdated)

Description

The dataset name has been changed to 'marathon_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(marathon_tbl_df)
data(marathon_tbl_df)

Format

A tibble with 59 observations and 3 variables:

year: Integer indicating the year of the marathon event.
gender: Factor indicating the gender of the participants, with 2 levels.
time: Numeric value representing the marathon completion time in hours.

Source

Data from marathon event results

US Military Demographics

Description

The dataset name has been changed to 'military_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(military_tbl_df)
data(military_tbl_df)

Format

A tibble with an unspecified number of observations and 6 variables:

grade: Factor indicating the military grade, with 3 levels.
branch: Factor indicating the branch of the military, with 4 levels.
gender: Factor indicating the gender of the participants, with 2 levels.
race: Factor indicating the race of the participants, with 7 levels.
hisp: Logical indicating whether the participants identify as Hispanic.
rank: Integer representing the rank of the participants.

Source

Data from military personnel demographics

Minnesota High School Graduates of 1938

Description

The dataset name has been changed to 'minn38_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a data frame. The original content of the dataset has not been modified.

Usage

data(minn38_df)
data(minn38_df)

Format

A data frame with 168 observations and 5 variables:

hs: Factor indicating the high school status, with 3 levels.
phs: Factor indicating the post-high school status, with 4 levels.
fol: Factor indicating the field of study, with 7 levels.
sex: Factor indicating the gender of the participants, with 2 levels.
f: Integer representing the associated numerical value for the participants.

Source

Data from the Minnesota 1938 study

Batter Statistics for 2018 Major League Baseball (MLB) Season

Description

The dataset name has been changed to 'mlb_players_18_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(mlb_players_18_tbl_df)
data(mlb_players_18_tbl_df)

Format

A tibble with 1270 observations and 19 variables:

name: Character string representing the name of the player.
team: Character string indicating the team the player belongs to.
position: Character string indicating the position played by the player.
games: Integer representing the number of games played.
AB: Integer indicating the number of at-bats.
R: Integer representing the number of runs scored.
H: Integer representing the number of hits.
doubles: Integer indicating the number of doubles hit.
triples: Integer indicating the number of triples hit.
HR: Integer representing the number of home runs hit.
RBI: Integer indicating the number of runs batted in.
walks: Integer indicating the number of walks received.
strike_outs: Integer indicating the number of strikeouts.
stolen_bases: Integer representing the number of stolen bases.
caught_stealing_base: Integer indicating the number of times caught stealing.
AVG: Numeric representing the batting average.
OBP: Numeric representing the on-base percentage.
SLG: Numeric representing the slugging percentage.
OPS: Numeric representing the on-base plus slugging percentage.

Source

Data from Major League Baseball (MLB) player statistics for the 2018 season

Minneapolis police use of force data.

Description

The dataset name has been changed to 'mn_police_use_of_force_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a data frame. The original content of the dataset has not been modified.

Usage

data(mn_police_use_of_force_df)
data(mn_police_use_of_force_df)

Format

A data frame with 12925 observations and 13 variables:

response_datetime: Character string representing the date and time of the response.
problem: Character string describing the nature of the problem.
is_911_call: Character string indicating whether the incident was initiated by a 911 call.
primary_offense: Character string indicating the primary offense involved in the incident.
subject_injury: Character string describing the injuries sustained by the subject, if any.
force_type: Character string describing the type of force used by the police.
force_type_action: Character string describing the specific actions related to the use of force.
race: Character string indicating the race of the subject involved in the incident.
sex: Character string indicating the sex of the subject.
age: Integer representing the age of the subject.
type_resistance: Character string describing the type of resistance offered by the subject.
precinct: Character string indicating the precinct in which the incident occurred.
neighborhood: Character string representing the neighborhood where the incident occurred.

Source

Data from police use of force reports in Minnesota

NBA Players for the 2018-2019 season

Description

The dataset name has been changed to 'nba_players_19_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(nba_players_19_tbl_df)
data(nba_players_19_tbl_df)

Format

A tibble with 494 observations and 7 variables:

first_name: Character string representing the player's first name.
last_name: Character string representing the player's last name.
team: Character string indicating the name of the team.
team_abbr: Character string representing the team's abbreviation.
position: Character string indicating the player's position on the team.
number: Character string representing the player's jersey number.
height: Numeric value representing the player's height.

Source

Data from NBA players' statistics in 2019

North Carolina births, 1000 cases

Description

The dataset name has been changed to 'ncbirths_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(ncbirths_tbl_df)
data(ncbirths_tbl_df)

Format

A tibble with 1000 observations and 13 variables:

fage: Integer representing the father's age.
mage: Integer representing the mother's age.
mature: Factor with 2 levels indicating if the mother is mature (>=35 years).
weeks: Integer representing the number of gestation weeks.
premie: Factor with 2 levels indicating if the baby was born prematurely.
visits: Integer representing the number of prenatal visits.
marital: Factor with 2 levels indicating the marital status of the mother.
gained: Integer representing the mother's weight gain during pregnancy (in pounds).
weight: Numeric value representing the baby's birth weight (in grams).
lowbirthweight: Factor with 2 levels indicating if the baby was born with low birth weight.
gender: Factor with 2 levels indicating the baby's gender.
habit: Factor with 2 levels indicating if the mother has a smoking habit.
whitemom: Factor with 2 levels indicating if the mother is white.

Source

Data from birth records in North Carolina

New York City Marathon Times

Description

The dataset name has been changed to 'nyc_marathon_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(nyc_marathon_tbl_df)
data(nyc_marathon_tbl_df)

Format

A tibble with 102 observations and 7 variables:

year: Numeric value representing the year the marathon took place.
name: Character value representing the name of the runner.
country: Character value indicating the country of origin of the runner.
time: Time variable in 'hms' format representing the finish time of the runner.
time_hrs: Numeric value representing the finish time of the runner in hours.
division: Character value indicating the division (category) the runner participated in.
note: Character value containing additional notes, if any, about the runner or the race.

Source

Data from the New York City Marathon records

Thefts of motor vehicles 2014 to 2017

Description

The dataset name has been changed to 'nycvehiclethefts_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(nycvehiclethefts_tbl_df)
data(nycvehiclethefts_tbl_df)

Format

A tibble with 35,746 observations and 9 variables:

uid: Integer value representing a unique identifier for each vehicle theft incident.
date_single: Character value representing the single date of the theft incident.
date_start: Character value representing the start date of the theft incident.
date_end: Character value representing the end date of the theft incident.
longitude: Numeric value indicating the longitude where the incident occurred.
latitude: Numeric value indicating the latitude where the incident occurred.
location_type: Character value representing the type of location where the theft took place.
location_category: Character value indicating the category of the location.
census_block: Character value indicating the census block where the incident took place.

Source

Data from the New York City Vehicle Thefts records

California poll on drilling off the California coast

Description

The dataset name has been changed to 'offshore_drilling_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(offshore_drilling_tbl_df)
data(offshore_drilling_tbl_df)

Format

A tibble with 828 observations and 2 variables:

v1: Factor with 4 levels, representing different responses or categories related to offshore drilling.
v2: Factor with 3 levels, representing secondary categories or classifications related to the responses in v1.

Source

Data related to offshore drilling opinions or classifications

1986 Challenger disaster and O-rings

Description

The dataset name has been changed to 'orings_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(orings_tbl_df)
data(orings_tbl_df)

Format

A tibble with 23 observations and 4 variables:

mission: Integer representing the mission number.
temperature: Integer representing the launch temperature in Fahrenheit.
damaged: Integer representing the number of damaged O-rings in the mission.
undamaged: Numeric representing the number of undamaged O-rings in the mission.

Source

Data from NASA missions related to O-ring performance.

Oscar winners, 1929 to 2018

Description

The dataset name has been changed to 'oscars_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(oscars_tbl_df)
data(oscars_tbl_df)

Format

A tibble with 184 observations and 11 variables:

oscar_no: Numeric indicating the Oscar number.
oscar_yr: Numeric representing the year the Oscar was awarded.
award: Character string indicating the category of the award.
name: Character string with the name of the recipient.
movie: Character string indicating the movie for which the award was given.
age: Numeric indicating the age of the recipient at the time of the award.
birth_pl: Character string indicating the birthplace of the recipient.
birth_date: Date representing the birthdate of the recipient.
birth_mo: Numeric indicating the birth month.
birth_d: Numeric indicating the birth day.
birth_y: Numeric indicating the birth year.

Source

Data from historical Oscar award records.

Piracy and PIPA/SOPA

Description

The dataset name has been changed to 'piracy_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(piracy_tbl_df)
data(piracy_tbl_df)

Format

A tibble with 534 observations and 8 variables:

name: Character string indicating the name of the politician.
party: Factor with 3 levels representing the politician's party affiliation.
state: Factor with 50 levels indicating the U.S. state the politician represents.
money_pro: Numeric representing the amount of pro-piracy funding received.
money_con: Numeric representing the amount of anti-piracy funding received.
years: Integer indicating the number of years in office.
stance: Factor with 5 levels indicating the politician's stance on piracy.
chamber: Factor with 2 levels indicating the chamber of the U.S. Congress (House or Senate).

Source

Data on political stances and funding related to piracy.

Annual Precipitation in US Cities

Description

The dataset name has been changed to 'precip_numeric' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a numeric vector. The original content of the dataset has not been modified.

Usage

data(precip_numeric)
data(precip_numeric)

Format

A numeric vector with 70 observations representing average annual precipitation (in inches) for various cities in the United States.

Mobile: Numeric value representing the average annual precipitation in Mobile.
Juneau: Numeric value representing the average annual precipitation in Juneau.
Phoenix: Numeric value representing the average annual precipitation in Phoenix.
Los Angeles: Numeric value representing the average annual precipitation in Los Angeles.
...: Additional cities included in the dataset.

Source

Data on precipitation for various U.S. cities.

Quarterly Approval Ratings of US Presidents

Description

The dataset name has been changed to 'presidents_ts' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a time series object. The original content of the dataset has not been modified.

Usage

data(presidents_ts)
data(presidents_ts)

Format

A time series object with 120 observations, covering quarterly data from 1945 to 1975. Each observation represents the number of presidents' approval ratings during a given quarter. The data is structured as follows:

Qtr1: Numeric values representing the approval ratings for the first quarter.
Qtr2: Numeric values representing the approval ratings for the second quarter.
Qtr3: Numeric values representing the approval ratings for the third quarter.
Qtr4: Numeric values representing the approval ratings for the fourth quarter.

Source

Data on presidential approval ratings from 1945 to 1975.

Election results for the 2008 U.S. Presidential race

Description

The dataset name has been changed to 'prrace08_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(prrace08_tbl_df)
data(prrace08_tbl_df)

Format

A tibble with 51 observations and 7 variables:

state: Factor indicating the U.S. state (including Washington D.C.) where the election took place.
state_full: Factor providing the full name of the U.S. state corresponding to the abbreviation.
n_obama: Integer representing the number of votes received by Barack Obama in the state.
p_obama: Numeric representing the percentage of total votes received by Barack Obama in the state.
n_mc_cain: Integer representing the number of votes received by John McCain in the state.
p_mc_cain: Numeric representing the percentage of total votes received by John McCain in the state.
el_votes: Integer indicating the number of electoral votes allocated to the state.

Source

Data on the 2008 U.S. presidential race results by state.

Road Accident Deaths in US States

Description

The dataset name has been changed to 'road_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a data frame. The original content of the dataset has not been modified.

Usage

data(road_df)
data(road_df)

Format

A data frame with 26 observations and 6 variables:

deaths: Integer indicating the number of road deaths.
drivers: Integer representing the number of licensed drivers.
popden: Numeric indicating the population density (people per square mile).
rural: Numeric indicating the percentage of rural roads.
temp: Integer representing the average temperature (in degrees Fahrenheit).
fuel: Numeric indicating the fuel consumption per capita (in gallons).

Source

Data on road safety statistics, including deaths, drivers, population density, and environmental factors.

Election results for the 2010 U.S. Senate races

Description

The dataset name has been changed to 'senaterace10_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(senaterace10_tbl_df)
data(senaterace10_tbl_df)

Format

A tibble with 38 observations and 23 variables:

id: Numeric identifier for the election race.
state: Character string indicating the U.S. state where the election took place.
abbr: Character string representing the state abbreviation.
name1: Character string indicating the name of the first candidate.
perc1: Numeric indicating the percentage of votes received by the first candidate.
party1: Character string indicating the party affiliation of the first candidate.
votes1: Numeric indicating the total votes received by the first candidate.
name2: Character string indicating the name of the second candidate.
perc2: Numeric indicating the percentage of votes received by the second candidate.
party2: Character string indicating the party affiliation of the second candidate.
votes2: Numeric indicating the total votes received by the second candidate.
name3: Character string indicating the name of the third candidate.
perc3: Numeric indicating the percentage of votes received by the third candidate.
party3: Character string indicating the party affiliation of the third candidate.
votes3: Numeric indicating the total votes received by the third candidate.
name4: Character string indicating the name of the fourth candidate.
perc4: Numeric indicating the percentage of votes received by the fourth candidate.
party4: Character string indicating the party affiliation of the fourth candidate.
votes4: Numeric indicating the total votes received by the fourth candidate.
name5: Character string indicating the name of the fifth candidate.
perc5: Numeric indicating the percentage of votes received by the fifth candidate.
party5: Character string indicating the party affiliation of the fifth candidate.
votes5: Numeric indicating the total votes received by the fifth candidate.

Source

Data on U.S. Senate races held in 2010, including candidates' names, vote percentages, and party affiliations.

Daily observations for the S&P 500 - Historical Data (1950-2018)

Description

The dataset name has been changed to 'sp500_1950_2018_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(sp500_1950_2018_tbl_df)
data(sp500_1950_2018_tbl_df)

Format

A tibble with 17346 observations and 7 variables:

Date: Factor indicating the date of the recorded stock prices.
Open: Numeric representing the opening price of the stock.
High: Numeric representing the highest price of the stock during the day.
Low: Numeric representing the lowest price of the stock during the day.
Close: Numeric representing the closing price of the stock.
Adj.Close: Numeric representing the adjusted closing price of the stock.
Volume: Numeric representing the trading volume of the stock.

Source

Historical data on S&P 500 stock prices from 1950 to 2018.

Financial information for 50 S&P 500 companies

Description

The dataset name has been changed to 'sp500_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(sp500_tbl_df)
data(sp500_tbl_df)

Format

A tibble with 50 observations and 12 variables:

stock: Factor indicating the stock ticker symbol of the company.
market_cap: Numeric representing the market capitalization of the company.
ent_value: Numeric representing the enterprise value of the company.
trail_pe: Numeric representing the trailing price-to-earnings ratio.
forward_pe: Numeric representing the forward price-to-earnings ratio.
ev_over_rev: Numeric representing the enterprise value to revenue ratio.
profit_margin: Numeric representing the profit margin of the company.
revenue: Numeric representing the total revenue generated by the company.
growth: Numeric representing the growth rate of the company.
earn_before: Numeric representing the earnings before interest and taxes (EBIT).
cash: Numeric representing the cash holdings of the company.
debt: Numeric representing the total debt of the company.

Source

Data on S&P 500 companies, including financial metrics and ratios.

US State Facts and Figures - U.S. State Abbreviations

Description

The dataset name has been changed to 'state_abb_character' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a character vector. The original content of the dataset has not been modified.

Usage

data(state_abb_character)
data(state_abb_character)

Format

A character vector with 50 elements representing U.S. state abbreviations:

state_abb: Character vector of state abbreviations, e.g., "AL" for Alabama, "CA" for California.

Source

U.S. state abbreviations.

US State Facts and Figures - US State Areas

Description

The dataset name has been changed to 'state_area_numeric' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a numeric dataset. The original content of the dataset has not been modified.

Usage

data(state_area_numeric)
data(state_area_numeric)

Format

A numeric dataset with 50 elements representing the area of U.S. states in square kilometers:

state_area: Numeric values indicating the area of each state, measured in square kilometers.

Source

U.S. state areas.

US State Facts and Figures - US State Centers

Description

The dataset name has been changed to 'state_center_list' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a list. The original content of the dataset has not been modified.

Usage

data(state_center_list)
data(state_center_list)

Format

A list with 2 elements, each containing numeric values representing the geographical center coordinates of U.S. states:

x: Numeric vector of length 50 representing the x-coordinates (longitude) of the state centers.
y: Numeric vector of length 50 representing the y-coordinates (latitude) of the state centers.

Source

Geographical data for U.S. state centers.

US State Facts and Figures - US State Divisions

Description

The dataset name has been changed to 'state_division_factor' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a factor. The original content of the dataset has not been modified.

Usage

data(state_division_factor)
data(state_division_factor)

Format

A factor with 50 observations representing the divisions of U.S. states. It contains 9 levels:

East South Central: Region including Alabama, Kentucky, Mississippi, and Tennessee.
Pacific: Region including California, Oregon, and Washington.
Mountain: Region including Colorado, Idaho, Montana, Nevada, Utah, and Wyoming.
West South Central: Region including Arkansas, Louisiana, Oklahoma, and Texas.
New England: Region including Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont.
South Atlantic: Region including Delaware, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, Washington, D.C., and West Virginia.
East North Central: Region including Illinois, Indiana, Michigan, Ohio, and Wisconsin.
West North Central: Region including Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, and South Dakota.
Middle Atlantic: Region including New Jersey, New York, and Pennsylvania.

Source

U.S. Census Bureau regional divisions.

US State Facts and Figures - US State Names

Description

The dataset name has been changed to 'state_name_character' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a character vector. The original content of the dataset has not been modified.

Usage

data(state_name_character)
data(state_name_character)

Format

A character vector with 50 observations representing the names of U.S. states.

"Alabama": Name of the state.
"Alaska": Name of the state.
"Arizona": Name of the state.
"Arkansas": Name of the state.
"California": Name of the state.
"Colorado": Name of the state.
"Connecticut": Name of the state.
"Delaware": Name of the state.
"Florida": Name of the state.
"Georgia": Name of the state.
"Hawaii": Name of the state.
"Idaho": Name of the state.
"Illinois": Name of the state.
"Indiana": Name of the state.
"Iowa": Name of the state.
"Kansas": Name of the state.
"Kentucky": Name of the state.
"Louisiana": Name of the state.
"Maine": Name of the state.
"Maryland": Name of the state.
"Massachusetts": Name of the state.
"Michigan": Name of the state.
"Minnesota": Name of the state.
"Mississippi": Name of the state.
"Missouri": Name of the state.
"Montana": Name of the state.
"Nebraska": Name of the state.
"Nevada": Name of the state.
"New Hampshire": Name of the state.
"New Jersey": Name of the state.
"New Mexico": Name of the state.
"New York": Name of the state.
"North Carolina": Name of the state.
"North Dakota": Name of the state.
"Ohio": Name of the state.
"Oklahoma": Name of the state.
"Oregon": Name of the state.
"Pennsylvania": Name of the state.
"Rhode Island": Name of the state.
"South Carolina": Name of the state.
"South Dakota": Name of the state.
"Tennessee": Name of the state.
"Texas": Name of the state.
"Utah": Name of the state.
"Vermont": Name of the state.
"Virginia": Name of the state.
"Washington": Name of the state.
"West Virginia": Name of the state.
"Wisconsin": Name of the state.
"Wyoming": Name of the state.

Source

U.S. Census Bureau.

US State Facts and Figures - US State Regions

Description

The dataset name has been changed to 'state_region_factor' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a factor variable representing U.S. state regions.

Usage

data(state_region_factor)
data(state_region_factor)

Format

A factor variable with 50 observations, representing the region of each U.S. state. The regions are classified into four levels:

"Northeast": States located in the Northeast region.
"South": States located in the Southern region.
"North Central": States located in the North Central region.
"West": States located in the Western region.

Source

U.S. Census Bureau.

US State Facts and Figures - US State Demographics and Statistics (1977)

Description

The dataset name has been changed to 'state_x77_matrix' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a matrix variable representing various demographic and statistical attributes of U.S. states in 1977.

Usage

data(state_x77_matrix)
data(state_x77_matrix)

Format

A matrix with 50 rows and 8 columns representing various demographic and statistical characteristics of U.S. states. The columns include:

Population: Population of the state.
Income: Median income of the state's residents.
Illiteracy: Illiteracy rate (percentage).
Life Exp: Life expectancy (in years).
Murder: Murder rate (per 100,000 inhabitants).
HS Grad: High school graduation rate (percentage).
Frost: Number of days with frost.
Area: Total area of the state (in square miles).

Source

U.S. Census Bureau (1977).

Student Admissions at UC Berkeley

Description

The dataset name has been changed to 'UCBAdmissions_table' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a table object. The original content of the dataset has not been modified.

Usage

data(UCBAdmissions_table)
data(UCBAdmissions_table)

Format

A table object with 24 entries representing the admissions data at U.C. Berkeley:

Admit: A factor with levels "Admitted" and "Rejected".
Gender: A factor with levels "Male" and "Female".
Dept: A factor representing the department with levels "A", "B", "C", "D", "E", and "F".
values: Numeric counts of admissions based on gender and department.

Source

U.C. Berkeley admissions data from 1973.

US Crime Rates

Description

The dataset 'us_crime_rates_spec_tbl_df' contains crime statistics for the United States, including various types of crimes and population data for each year. This dataset is structured as a tibble for ease of use within the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package.

Usage

data(us_crime_rates_spec_tbl_df)
data(us_crime_rates_spec_tbl_df)

Format

A tibble with 60 rows and 12 columns:

year: Numeric year of the recorded data, e.g., 2000, 2001.
population: Numeric population total for the respective year.
total: Numeric total number of crimes reported.
violent: Numeric total number of violent crimes.
property: Numeric total number of property crimes.
murder: Numeric total number of murders.
forcible_rape: Numeric total number of forcible rapes.
robbery: Numeric total number of robberies.
aggravated_assault: Numeric total number of aggravated assaults.
burglary: Numeric total number of burglaries.
larceny_theft: Numeric total number of larcenies.
vehicle_theft: Numeric total number of vehicle thefts.

Source

Federal Bureau of Investigation (FBI) Uniform Crime Reporting (UCR) Program.

US Temperature Data

Description

The dataset 'us_temp_tbl_df' contains temperature records from various weather stations across the United States, providing both maximum and minimum temperature readings. This dataset is structured as a tibble for ease of use within the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package.

Usage

data(us_temp_tbl_df)
data(us_temp_tbl_df)

Format

A tibble with 10,118 rows and 9 columns:

station: Character string representing the weather station identifier.
name: Character string for the name of the weather station.
latitude: Numeric value for the latitude of the weather station.
longitude: Numeric value for the longitude of the weather station.
elevation: Numeric value for the elevation of the weather station in meters.
date: Date of the recorded temperature data.
tmax: Numeric value for the maximum temperature recorded (in degrees Celsius).
tmin: Numeric value for the minimum temperature recorded (in degrees Celsius).
year: Factor representing the year of the recorded data.

Source

National Oceanic and Atmospheric Administration (NOAA).

American Time Survey 2009 - 2019

Description

The dataset name has been changed to 'us_time_survey_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a tibble. The original content of the dataset has not been modified.

Usage

data(us_time_survey_tbl_df)
data(us_time_survey_tbl_df)

Format

A tibble with 11 observations and 8 variables representing time use in various activities:

year: Numeric value representing the year of the survey.
household_activities: Numeric value representing time spent on household activities (in hours).
eating_and_drinking: Numeric value representing time spent on eating and drinking (in hours).
leisure_and_sports: Numeric value representing time spent on leisure and sports activities (in hours).
sleeping: Numeric value representing time spent sleeping (in hours).
caring_children: Numeric value representing time spent caring for children (in hours).
working_employed: Numeric value representing time spent working while employed (in hours).
working_employed_days_worked: Numeric value representing the number of days worked while employed.

Source

U.S. Bureau of Labor Statistics.

Accidental Deaths in the US 1973-1978

Description

The dataset name has been changed to 'USAccDeaths_ts' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a time series object. The original content of the dataset has not been modified.

Usage

data(USAccDeaths_ts)
data(USAccDeaths_ts)

Format

A time series object with 72 observations representing monthly accidental deaths in the U.S. from 1973 to 1979:

years: A numeric vector representing the years from 1973 to 1979.
months: A character vector representing the months from January to December.
deaths: Numeric values representing the number of accidental deaths for each month.

Source

U.S. accidental deaths data.

Violent Crime Rates by US State

Description

The dataset name has been changed to 'USArrests_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a data frame. The original content of the dataset has not been modified.

Usage

data(USArrests_df)
data(USArrests_df)

Format

A data frame with 50 observations and 4 variables representing the rates of arrests in the U.S.:

Murder: Numeric vector representing the murder rates per 100,000 residents.
Assault: Integer vector representing the assault rates per 100,000 residents.
UrbanPop: Integer vector representing the percentage of the population living in urban areas.
Rape: Numeric vector representing the rape rates per 100,000 residents.

Source

U.S. arrests data from 1973.

Distances Between European Cities and Between US Cities

Description

The dataset name has been changed to 'UScitiesD_dist' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a distance object. The original content of the dataset has not been modified.

Usage

data(UScitiesD_dist)
data(UScitiesD_dist)

Format

A distance object containing the distances (in miles) between selected U.S. cities:

Atlanta: Distance from Atlanta to other cities.
Chicago: Distance from Chicago to other cities.
Denver: Distance from Denver to other cities.
Houston: Distance from Houston to other cities.
LosAngeles: Distance from Los Angeles to other cities.
Miami: Distance from Miami to other cities.
NewYork: Distance from New York to other cities.
SanFrancisco: Distance from San Francisco to other cities.
Seattle: Distance from Seattle to other cities.
Washington.DC: Distance from Washington D.C. to other cities.

Source

U.S. cities distance data.

usdatasets: A Comprehensive Collection of U.S. Datasets

Description

This package provides a wide variety of datasets related to crime, economy, society, politics, and sports within the United States for testing, learning, and research purposes.

Details

usdatasets: A Comprehensive Collection of U.S. Datasets

logo

A Comprehensive Collection of U.S. Datasets.

Author(s)

Maintainer: Renzo Cáceres Rossi [email protected]

Lawyers' Ratings of State Judges in the US Superior Court

Description

The dataset name has been changed to 'USJudgeRatings_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a data frame. The original content of the dataset has not been modified.

Usage

data(USJudgeRatings_df)
data(USJudgeRatings_df)

Format

A data frame with 43 observations and 12 variables representing ratings for U.S. judges:

CONT: Numeric vector representing the judges' ratings on control.
INTG: Numeric vector representing the judges' ratings on integrity.
DMNR: Numeric vector representing the judges' ratings on demeanor.
DILG: Numeric vector representing the judges' ratings on diligence.
CFMG: Numeric vector representing the judges' ratings on communications with clients.
DECI: Numeric vector representing the judges' ratings on decisiveness.
PREP: Numeric vector representing the judges' ratings on preparation.
FAMI: Numeric vector representing the judges' ratings on family law expertise.
ORAL: Numeric vector representing the judges' ratings on oral communications.
WRIT: Numeric vector representing the judges' ratings on written communications.
PHYS: Numeric vector representing the judges' ratings on physical appearance.
RTEN: Numeric vector representing the judges' ratings on overall rating.

Source

U.S. judge ratings data.

Personal Expenditure Data

Description

The dataset name has been changed to 'USPersonalExpenditure_matrix' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a matrix. The original content of the dataset has not been modified.

Usage

data(USPersonalExpenditure_matrix)
data(USPersonalExpenditure_matrix)

Format

A matrix with 5 rows and 5 columns representing U.S. personal expenditures in different categories over selected years:

Food and Tobacco: Numeric values representing expenditures on food and tobacco for the years 1940, 1945, 1950, 1955, and 1960.
Household Operation: Numeric values representing expenditures on household operations for the years 1940, 1945, 1950, 1955, and 1960.
Medical and Health: Numeric values representing expenditures on medical and health services for the years 1940, 1945, 1950, 1955, and 1960.
Personal Care: Numeric values representing expenditures on personal care for the years 1940, 1945, 1950, 1955, and 1960.
Private Education: Numeric values representing expenditures on private education for the years 1940, 1945, 1950, 1955, and 1960.

Source

U.S. personal expenditure data.

Populations Recorded by the US Census

Description

The dataset name has been changed to 'uspop_ts' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a time series object. The original content of the dataset has not been modified.

Usage

data(uspop_ts)
data(uspop_ts)

Format

A time series object with 19 observations representing the U.S. population from 1790 to 1970:

values: Numeric vector containing the population values in millions.

Source

U.S. Census Bureau.

Death Rates in Virginia (1940)

Description

The dataset name has been changed to 'VADeaths_matrix' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a matrix. The original content of the dataset has not been modified.

Usage

data(VADeaths_matrix)
data(VADeaths_matrix)

Format

A matrix containing mortality rates (per 1000) for different demographic groups in Virginia:

Rural Male: Mortality rates for rural males by age group.
Rural Female: Mortality rates for rural females by age group.
Urban Male: Mortality rates for urban males by age group.
Urban Female: Mortality rates for urban females by age group.

Source

Virginia mortality data.

US Voter Turnout Data.

Description

The dataset name has been changed to 'voter_count_spec_tbl_df' to avoid confusion with other packages in the R ecosystem. This naming convention helps distinguish this dataset as part of the 'usdatasets' package and identifies it as a special tibble. The original content of the dataset has not been modified.

Usage

data(voter_count_spec_tbl_df)
data(voter_count_spec_tbl_df)

Format

A special tibble containing voting statistics across different years and regions:

year: Year of the election.
region: Region of the voters.
voting_eligible_population: Total population eligible to vote.
total_ballots_counted: Total number of ballots counted.
highest_office: Total votes for the highest office.
percent_total_ballots_counted: Percentage of total ballots counted.
percent_highest_office: Percentage of votes for the highest office.

Source

Election data from various sources.

Average Heights and Weights for American Women

Description

The dataset name has been kept as 'women_df' to maintain consistency with other datasets in the R ecosystem. This naming convention helps clearly identify this dataset within the context of its application. The original content of the dataset has not been modified.

Usage

data(women_df)
data(women_df)

Format

A data frame containing measurements of women's height and weight:

height: Height of women in inches.
weight: Weight of women in pounds.

Source

Based on statistical data for women's height and weight.

Package 'usdatasets'

Help Index

American Community Survey 2012

Description

Usage

Format

Source

Age at first marriage of 5,534 US women.

Description

Usage

Format

Source

Airline names - U.S. Airlines Carrier Codes and Names

Description

Usage

Format

Source

Airport metadata - U.S. Airports Information

Description

Usage

Format

Source

New York Air Quality Measurements

Description

Usage

Format

Source

Housing prices in Ames, Iowa

Description

Usage

Format

Source

North Carolina births, 100 cases

Description

Usage

Format

Source

US Births 2014

Description

Usage

Format

Source

Housing Values in Suburbs of Boston

Description

Usage

Format

Source

Data from 93 Cars on Sale in the USA in 1993

Description

Usage

Format

Source

Random sample of 2000 U.S. Census Data

Description

Usage

Format

Source

CIA Factbook Details on Countries

Description

Usage

Format

Source

Cleveland and Sacramento Demographic and Income Data (2000)

Description

Usage

Format

Source

United States Counties

Description

Usage

Format

Source

American Adults on Regulation and Renewable Energy

Description

Usage

Format

Source

Summary of male heights from USDA Food Commodity Intake Database

Description

Usage