Title: | NHS Data Dictionary Toolset for NHS Lookups |
---|---|
Description: | Providing a common set of simplified web scraping tools for working with the NHS Data Dictionary <https://datadictionary.nhs.uk/data_elements_overview.html>. The intended usage is to access the data elements section of the NHS Data Dictionary to access key lookups. The benefits of having it in this package are that the lookups are the live lookups on the website and will not need to be maintained. This package was commissioned by the NHS-R community <https://nhsrcommunity.com/> to provide this consistency of lookups. The OpenSafely lookups have now been added <https://www.opencodelists.org/docs/>. |
Authors: | Gary Hutson [aut, cre] , Calum Polwart [aut], Tom Jemmett [aut] |
Maintainer: | Gary Hutson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.5 |
Built: | 2024-12-02 06:49:23 UTC |
Source: | CRAN |
left_xl function This function replicates the LEFT function in Excel and is utilised for left trimming of character strings
left_xl(text, num_char = 0)
left_xl(text, num_char = 0)
text |
The text you want to LEFT trim |
num_char |
The number of characters your want to trim by |
Trims the text entered by the number of character parameter and returns the trimmed string
left_xl(text= "This is some example text", num_char = 4)
left_xl(text= "This is some example text", num_char = 4)
This function replicates the LEN function in Excel and is utilised for finding the length of character strings.
len_xl(text, ...)
len_xl(text, ...)
text |
The text you want to calculate the length |
... |
Function forwarding to work with the base nchar method |
An integer value calculating the length of the text passed
len_xl("Guess the length of me!")
len_xl("Guess the length of me!")
This is used to scrape all hyperlinks from a specific web page.
linkScrapeR(url, SSL_needed = FALSE)
linkScrapeR(url, SSL_needed = FALSE)
url |
The website URL to detect active anchor hyperlink tags and extract them into a tibble |
SSL_needed |
Default - FALSE - Boolean to indicate whether to need a SSL certificate |
Once the links have been scraped they will be outputted into a tibble for exploration.
This can be used on any website to pull back the hyperlink content of a web page.
A tibble (class data.frame) with all active hyperlinks on the website for the URL (uniform resource locator) passed to the function.
result - the extracted html table from url and xpath passed
link_name - the name of the link
url - the full url of the active href tag from HTML
linkScrapeR("https://www.datadictionary.nhs.uk/", FALSE)
linkScrapeR("https://www.datadictionary.nhs.uk/", FALSE)
This function replicates the MID function in Excel and is utilised for left trimming of character strings.
mid_xl(text, start_num = 1, num_char = 0)
mid_xl(text, start_num = 1, num_char = 0)
text |
The text you want to MID trim |
start_num |
The start number to start the trim. This needs to be numeric. |
num_char |
The number of characters your want to trim by. This field needs to be numeric. |
This has been included as a convenience function for working with text and string data.
The extracted text between the start_num and the num_char to produce a sub string result.
mid_xl(text= "This is some example text", start_num = 6, num_char = 10) mid_xl(text= "This is some example text", start_num = 6, num_char = 10)
mid_xl(text= "This is some example text", start_num = 6, num_char = 10) mid_xl(text= "This is some example text", start_num = 6, num_char = 10)
Searches all the data elements in the data element index of the NHS data dictionary and returns the links.
nhs_data_elements()
nhs_data_elements()
This function has no input parameters and returns the
A tibble (class data frame) with the results of scraping the NHS Data Dictionary website for the data elements look ups, if no return this will produce an appropriate informational message.
link_name - the name of the scraped link. This relates to the actual name of the data element from the NHS Data Dictionary.
url - the url passed to the parameter
full_url - the full url of where the data element is on the NHS Data Dictionary website
xpath_nat_code - utilises the element in the website and appends the link_short - to pull back only national codes from the dictionary site. NOTE: not all of the returns will have national code tables.
xpath_default_codes - pulls back the data dictionary default codes - these can be then used with the national codes
xpath_also_known - pulls back the data dictionary elements alias table - this will be available for all data elements
nhs_data_lookup <- nhs_data_elements() head(nhs_data_lookup, 10)
nhs_data_lookup <- nhs_data_elements() head(nhs_data_lookup, 10)
This function uses the tableR
parent function to return a table of elements, specifically from the NHS Data Dictionary
nhs_table_findeR(data_element_name, ...)
nhs_table_findeR(data_element_name, ...)
data_element_name |
The data element name from NHS Data Dictionation i.e. ACCOMMODATION STATUS CODE |
... |
Function forwarding to parent function to pass additional arguments to function (e.g. title, add_zero_prefix) |
A tibble (class data.frame) output from the results of the web scrape
result - the extracted national HTML code table from the element page of the NHS Data Dictionary
DictType - defaults to Not Specified if nothing passed, however allows for custom dictionary / data frame tags to be created
DttmExtracted - a date and time stamp
#Returns a tibble from tableR parent function nhs_table_findeR("ACCOMMODATION STATUS CODE", title="ACCOM_STATUS") nhs_table_findeR("accommodation status code") #Changes case to match
#Returns a tibble from tableR parent function nhs_table_findeR("ACCOMMODATION STATUS CODE", title="ACCOM_STATUS") nhs_table_findeR("accommodation status code") #Changes case to match
This function uses the tableR
parent function to return a table of elements, specifically from the OpenSafely Code List
https://www.opencodelists.org/
openSafely_listR(list_name, version = "", ...)
openSafely_listR(list_name, version = "", ...)
list_name |
The code list ID from https://www.opencodelists.org/ for which to return the National table of elements, for example "opensafely/ace-inhibitor-medications" |
version |
The version of the code list if not the most recent |
... |
Function forwarding to parent function to pass additional arguments to function (e.g. title, add_zero_prefix) |
A tibble (class data.frame) output from the results of the web scrape
type - the OpenSafely type
id - the id for the OpenSafely element
bnf_code - British National Formulary - NICE guidelines code
nm - medicine type, dosage and manufacturer
Dict_type - title specified for dictionary
DttmExtracted - the date and time the code set was extracted
openSafely_listR("opensafely/ace-inhibitor-medications") #Pull back current list openSafely_listR("opensafely/ace-inhibitor-medications", "2020-05-19") #Pull back list with date
openSafely_listR("opensafely/ace-inhibitor-medications") #Pull back current list openSafely_listR("opensafely/ace-inhibitor-medications", "2020-05-19") #Pull back list with date
This function replicates the RIGHT function in Excel and is utilised for right trimming of character strings.
right_xl(text, num_char = 0)
right_xl(text, num_char = 0)
text |
The text you want to RIGHT trim |
num_char |
The number of characters your want to trim by. This field needs to be numeric. |
This has been included as a convenience function for working with text and string data.
The trimmed string from the text parameter and trimming by the number of characters num_char passed to the parameter.
right_xl(text= "This is some example text", num_char = 10) right_xl(text= "This is some example text", num_char = 10)
right_xl(text= "This is some example text", num_char = 10) right_xl(text= "This is some example text", num_char = 10)
Takes the url and xpath and scrapes HTML table elements from a website.
scrapeR(url, xpath, ...)
scrapeR(url, xpath, ...)
url |
Website address to connect to |
xpath |
Xpath obtained through inspecting the individual HTML elements |
... |
Function to pass additional function forwarding options |
This function is specifically designed to work with HTML tables and x path links through to direct HTML elements. The function is versatile and can be used on any URL where an xpath can be obtained through the URL and HTML inspection process.
Returns the results of the scraping operation and the relevant fields from the html table - the xpath should make reference to an html table, otherwise an error is returned advising the user to check the xpath and url are correct.
This function uses the scapeR parent function to return a table of elements
tableR(url, xpath, title = "Not Specified", add_zero_prefix = FALSE, ...)
tableR(url, xpath, title = "Not Specified", add_zero_prefix = FALSE, ...)
url |
The URL of the website to scrape the table element from |
xpath |
The unique xpath of the HTML element to be scraped |
title |
Unique name for the relevant HTML table that has been scraped |
add_zero_prefix |
Adds zero prefixes to certain codes that get converted by native functions |
... |
Function forwarding to parent function to pass additional arguments to function |
A tibble (class data.frame) output from the results of the web scrape
result - the extracted html table from url and xpath passed
DictType - defaults to Not Specified if nothing passed, however allows for custom dictionary / data frame tags to be created
DttmExtracted - a date and time stamp
Returns xpath text from websites and can be used to access specific HTML nodes
xpathTextR(url, xpath, ssl_needed = FALSE)
xpathTextR(url, xpath, ssl_needed = FALSE)
url |
The link for the website |
xpath |
The xpath string derived by using the Inspect functionality in a web browser. |
ssl_needed |
Default - FALSE - Boolean to indicate whether to need a SSL certificate |
A list with the results of scraping the specific xpath element
result - the extracted text from the website element that has been scraped
website_passed - a copy of the input url for the website
html_node_result - returns the extracted html node result
datetime_access - returns a timestamp of when the results of the scraping operation have been completed
person_accessed - retrieves the system environment stored username and domain - this is concatenated together to form a mixed charatcer string