Title: | A Method to Download Department of Education College Scorecard Data |
---|---|
Description: | A method to download Department of Education College Scorecard data using the public API <https://collegescorecard.ed.gov/data/data-documentation/>. It is based on the 'dplyr' model of piped commands to select and filter data in a single chained function call. An API key from the U.S. Department of Education is required. |
Authors: | Benjamin Skinner [aut, cre] |
Maintainer: | Benjamin Skinner <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.30.0 |
Built: | 2024-10-24 04:28:39 UTC |
Source: | CRAN |
This function is used to search the College Scorecard data dictionary.
sc_dict( search_string, search_col = c("all", "description", "varname", "dev_friendly_name", "dev_category", "label", "source"), ignore_case = TRUE, limit = 10, confirm = FALSE, print_dev = FALSE, print_notes = FALSE, return_df = FALSE, print_off = FALSE, can_filter = FALSE, filter_vars = FALSE )
sc_dict( search_string, search_col = c("all", "description", "varname", "dev_friendly_name", "dev_category", "label", "source"), ignore_case = TRUE, limit = 10, confirm = FALSE, print_dev = FALSE, print_notes = FALSE, return_df = FALSE, print_off = FALSE, can_filter = FALSE, filter_vars = FALSE )
search_string |
Character string for search. Can use regular
expression for search. Must escape special characters,
|
search_col |
Column to search. The default is to search all columns. Other options include: "varname", "dev_friendly_name", "dev_category", "label". |
ignore_case |
Search is case insensitive by default. Change to
|
limit |
Only the first 10 dictionary items are returned by
default. Increase to return more values. Set to |
confirm |
Use to confirm status of variable name in
dictionary. Returns |
print_dev |
Set to |
print_notes |
Set to |
return_df |
Return a tibble of the subset data dictionary. |
print_off |
Do not print to console; useful if you only want to return a tibble of dictionary values. |
can_filter |
Use to confirm that a variable can be used as a
filtering variable. Returns |
filter_vars |
Use to print variables that can be used to
filter calls. Use with argument |
## simple search for 'state' in any part of the dictionary sc_dict('state') ## variable names starting with 'st' sc_dict('^st', search_col = 'varname') ## return full dictionary (only recommended if not printing and ## storing in object) df <- sc_dict('.', limit = Inf, print_off = TRUE, return_df = TRUE) ## print list of variables that can be used to filter df <- sc_dict('.', filter_vars = TRUE, return_df = TRUE)
## simple search for 'state' in any part of the dictionary sc_dict('state') ## variable names starting with 'st' sc_dict('^st', search_col = 'varname') ## return full dictionary (only recommended if not printing and ## storing in object) df <- sc_dict('.', limit = Inf, print_off = TRUE, return_df = TRUE) ## print list of variables that can be used to filter df <- sc_dict('.', filter_vars = TRUE, return_df = TRUE)
This function is used to filter the downloaded scorecard data. It converts idiomatic R into the format required by the API call.
sc_filter(sccall, ...) sc_filter_(sccall, filter_string)
sc_filter(sccall, ...) sc_filter_(sccall, filter_string)
sccall |
Current list of parameters carried forward from prior functions in the chain (ignore) |
... |
Expressions to evaluate |
filter_string |
Filter as character string or vector of filters as character strings |
sc_filter_()
: Standard evaluation version of
sc_filter
(filter_string
must be a string
or vector of strings when using this version)
## Not run: sc_filter(region == 1) # New England institutions sc_filter(stabbr == c("TN","KY")) # institutions in Tennessee and Kentucky sc_filter(control != 3) # exclude private, for-profit institutions sc_filter(control == c(1,2)) # same as above sc_filter(control == 1:2) # same as above sc_filter(stabbr == "TN", control == 1, locale == 41:43) # TN rural publics ## End(Not run) ## Not run: sc_filter_("region == 1") sc_filter_("control != 3") ## With internal strings, you must either use both double and single quotes ## or escape internal quotes sc_filter_("stabbr == c('TN','KY')") sc_filter_('stabbr == c(\'TN\',\'KY\')') ## stored in object filters <- c("control == 1", "locale == 41:43") sc_filter_(filters) ## End(Not run)
## Not run: sc_filter(region == 1) # New England institutions sc_filter(stabbr == c("TN","KY")) # institutions in Tennessee and Kentucky sc_filter(control != 3) # exclude private, for-profit institutions sc_filter(control == c(1,2)) # same as above sc_filter(control == 1:2) # same as above sc_filter(stabbr == "TN", control == 1, locale == 41:43) # TN rural publics ## End(Not run) ## Not run: sc_filter_("region == 1") sc_filter_("control != 3") ## With internal strings, you must either use both double and single quotes ## or escape internal quotes sc_filter_("stabbr == c('TN','KY')") sc_filter_('stabbr == c(\'TN\',\'KY\')') ## stored in object filters <- c("control == 1", "locale == 41:43") sc_filter_(filters) ## End(Not run)
This function gets the College Scorecard data by compiling and converting all the previous piped output into a single URL string that is used to get the data.
sc_get( sccall, api_key, debug = FALSE, print_key_debug = FALSE, return_json = FALSE )
sc_get( sccall, api_key, debug = FALSE, print_key_debug = FALSE, return_json = FALSE )
sccall |
Current list of parameters carried forward from prior functions in the chain (ignore) |
api_key |
Personal API key requested from
https://api.data.gov/signup stored in a string. If you
first set your key using |
debug |
Set to true to print and return API call (URL string) rather than make actual request. Should only be used when debugging calls. |
print_key_debug |
Only used when |
return_json |
Return data in JSON format rather than as a tibble. |
To obtain an API key, visit https://api.data.gov/signup
## Not run: sc_get("<API KEY IN STRING>") key <- "<API KEY IN STRING>" sc_get(key) ## End(Not run)
## Not run: sc_get("<API KEY IN STRING>") key <- "<API KEY IN STRING>" sc_get(key) ## End(Not run)
This function initializes the data request. It should always be the first in the series of piped functions.
sc_init(dfvars = FALSE)
sc_init(dfvars = FALSE)
dfvars |
Set to |
## Not run: sc_init() sc_init(dfvars = TRUE) ## End(Not run)
## Not run: sc_init() sc_init(dfvars = TRUE) ## End(Not run)
This function stores your data.gov API key in the system environment
so that you only have to load it once at the start of the session.
If you set your key using sc_key
, then you may omit
api_key
parameter in the sc_get
function.
sc_key(api_key)
sc_key(api_key)
api_key |
Personal API key requested from https://api.data.gov/signup stored in a string. |
To obtain an API key, visit https://api.data.gov/signup.
## Not run: sc_key('<API KEY IN STRING>') ## End(Not run)
## Not run: sc_key('<API KEY IN STRING>') ## End(Not run)
This function is used to select the variables returned in the final dataset.
sc_select(sccall, ...) sc_select_(sccall, vars)
sc_select(sccall, ...) sc_select_(sccall, vars)
sccall |
Current list of parameters carried forward from prior functions in the chain (ignore) |
... |
Desired variable names separated by commas (not case sensitive) |
vars |
Character string of variable name or vector of character string variable names |
sc_select_()
: Standard evaluation version of
sc_select
(vars
must be string or vector
of strings when using this version)
## Not run: sc_select(UNITID) sc_select(UNITID, INSTNM) sc_select(unitid, instnm) ## End(Not run) ## Not run: sc_select_("UNITID") sc_select_(c("UNITID", "INSTNM")) sc_select_(c("unitid", "instnm")) ## stored in object vars_to_pull <- c("unitid","instnm") sc_select(vars_to_pull) ## End(Not run)
## Not run: sc_select(UNITID) sc_select(UNITID, INSTNM) sc_select(unitid, instnm) ## End(Not run) ## Not run: sc_select_("UNITID") sc_select_(c("UNITID", "INSTNM")) sc_select_(c("unitid", "instnm")) ## stored in object vars_to_pull <- c("unitid","instnm") sc_select(vars_to_pull) ## End(Not run)
This function is used to select the year of the data.
sc_year(sccall, year)
sc_year(sccall, year)
sccall |
Current list of parameters carried forward from prior functions in the chain (ignore) |
year |
Four-digit year or string |
Not all variables have a year option.
At this time, only one year at a time is allowed.
The year selected is not necessarily the year the data were produced. It may be the year the data were collected. For data collected over split years (fall to spring), it is likely the year represents the fall data (e.g., 2011 for 2011/2012 data).
Be sure to check with the College Scorecard data documentation report when choosing the year.
## Not run: sc_year() # latest sc_year("latest") sc_year(2012) ## End(Not run)
## Not run: sc_year() # latest sc_year("latest") sc_year(2012) ## End(Not run)
Subset results to those within specified area around zip code.
sc_zip(sccall, zip, distance = 25, km = FALSE)
sc_zip(sccall, zip, distance = 25, km = FALSE)
sccall |
Current list of parameters carried forward from prior functions in the chain (ignore) |
zip |
A 5-digit zipcode |
distance |
An integer distance in miles or kilometers |
km |
A boolean value set to |
Zip codes with leading zeros (Northeast) can be
called either using a string ("02111"
) or as a numeric
(02111
). R will drop the leading zero from the second
version, but sc_zip()
will add it back before the
call. The shortened version without the leading zero may also
be used (2111 and "2111" both become "02111"), but is not
recommended for clarity.
## Not run: sc_zip(37203) sc_zip(37203, 50) sc_zip(37203, 50, km = TRUE) sc_zip("02111") # 1. Using string sc_zip(02111) # 2. Dropped leading zero will be added sc_zip(2111) # 3. Will become "02111" (not recommended) ## End(Not run)
## Not run: sc_zip(37203) sc_zip(37203, 50) sc_zip(37203, 50, km = TRUE) sc_zip("02111") # 1. Using string sc_zip(02111) # 2. Dropped leading zero will be added sc_zip(2111) # 3. Will become "02111" (not recommended) ## End(Not run)