Title: | Text Conversion from Nexis Uni PDFs to R Data Frames |
---|---|
Description: | Transform 'newswire' and earnings call transcripts as PDF obtained from 'Nexis Uni' to R data frames. Various 'newswires' and 'FairDisclosure' earnings call formats are supported. Further, users can apply several pre-defined dictionaries on the data based on Graffin et al. (2016)<doi:10.5465/amj.2013.0288> and Gamache et al. (2015)<doi:10.5465/amj.2013.0377>. |
Authors: | Jonas Röttger [aut, cre] |
Maintainer: | Jonas Röttger <[email protected]> |
License: | GPL-3 |
Version: | 0.6.0 |
Built: | 2024-11-05 06:24:32 UTC |
Source: | CRAN |
Converts one earnings call transcript from 'FairDisclosure' obtained from 'NexisUni' to an R data frame.
conference_call_segmenter( file, sentiment = FALSE, emotion = FALSE, regulatory_focus = FALSE, laughter = FALSE, narcissism = FALSE )
conference_call_segmenter( file, sentiment = FALSE, emotion = FALSE, regulatory_focus = FALSE, laughter = FALSE, narcissism = FALSE )
file |
The name of the PDF file which the data are to be read from. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). |
sentiment |
Performs dictionary-based sentiment analysis
based on the |
emotion |
Performs dictionary-based emotion analysis based on the
|
regulatory_focus |
Calculates the number of words indicative for promotion and prevention focus based on the dictionary developed by Gamache et al., 2015 (default: FALSE) |
laughter |
Counts the number of times laughter was indicated in a quote. (default: FALSE) |
narcissism |
Counts the number of pronoun usage and calculates the ratio of first-person singular to first-person plural pronouns. This measure is derived from Zhu & Chen, (2015 (default: FALSE) |
An R data frame with each row representing one quote. The columns indicate the quarter, year, section (presentation versus Q&A), the speaker's name, role, affiliation, and also three binary indicators on whether the speaker is the host company's (1) CEO, (2) CFO, and/or (3) Chairman.
earnings_calls_df <- conference_call_segmenter(file = system.file("inst", "examples", "earnings_calls", "earnings_example_01.pdf", package = "disclosuR")); earnings_calls_df_sentiment <- conference_call_segmenter(file = system.file("inst", "examples", "newswire", "earnings_example_01.pdf", package = "disclosuR"), sentiment = TRUE);
earnings_calls_df <- conference_call_segmenter(file = system.file("inst", "examples", "earnings_calls", "earnings_example_01.pdf", package = "disclosuR")); earnings_calls_df_sentiment <- conference_call_segmenter(file = system.file("inst", "examples", "newswire", "earnings_example_01.pdf", package = "disclosuR"), sentiment = TRUE);
Converts all 'FairDisclosure' earnings call transcripts obtained from 'NexisUni' in a folder to an R data frame.
conference_call_segmenter_folder( folder_path, sentiment = FALSE, emotion = FALSE, regulatory_focus = FALSE, laughter = FALSE, narcissism = FALSE )
conference_call_segmenter_folder( folder_path, sentiment = FALSE, emotion = FALSE, regulatory_focus = FALSE, laughter = FALSE, narcissism = FALSE )
folder_path |
The name of the folder which the data are to be read from. If it does not contain an absolute path, the file name is relative to the current working directory. |
sentiment |
Performs dictionary-based sentiment analysis
based on the |
emotion |
Performs dictionary-based emotion analysis based on the |
regulatory_focus |
Calculates the number of words indicative for promotion and prevention focus based on the dictionary developed by Gamache et al., 2015 (default: FALSE) |
laughter |
Counts the number of times laughter was indicated in a quote. (default: FALSE) |
narcissism |
Counts the number of pronoun usage and calculates the ratio of first-person singular to first-person plural pronouns. This measure is derived from Zhu & Chen, (2015 (default: FALSE) |
An R data frame with each row representing one quote. The columns indicate the quarter, year, section (presentation versus Q&A), the speaker's name, role, affiliation, and also three binary indicators on whether the speaker is the host company's (1) CEO, (2) CFO, and/or (3) Chairman.
earnings_calls_df <- conference_call_segmenter_folder( folder_path = system.file("inst", "examples", "earnings_calls", package = "disclosuR")); earnings_calls_df_sentiment <- conference_call_segmenter_folder( folder_path = system.file("inst", "examples", "newswire", sentiment = TRUE, package = "disclosuR"));
earnings_calls_df <- conference_call_segmenter_folder( folder_path = system.file("inst", "examples", "earnings_calls", package = "disclosuR")); earnings_calls_df_sentiment <- conference_call_segmenter_folder( folder_path = system.file("inst", "examples", "newswire", sentiment = TRUE, package = "disclosuR"));
Takes an event data set containing of dates and CUSIPs which have to correspond
to a press data frame compiled by the function newswire_segmenter_folder
.
impression_offsetting(event_data, press_data_categorized)
impression_offsetting(event_data, press_data_categorized)
event_data |
An R data that contains three columns which have to be labeled "date_announced", "cusip", and "ID". The date_announced column contains the dates of the events for which impression offsetting is calculated. The cusip column contains the 8-digit cusip of the companies for which impression offsetting is calculated. The ID column should contain a unique ID that identifies the specific event. |
press_data_categorized |
An R data frame with each row representing one 'newswire' article. The columns indicate the title, text,
'newswire', date, and weekday. It should be the outcome of |
An R data frame which contains the column of the event_data plus three columns for the baseline announcements (positive, neutral, and negative) and three columns for the impression offsetting announcements (positive, neutral, and negative).
## Not run: impression_offsetting(event_data, press_data) ## End(Not run)
## Not run: impression_offsetting(event_data, press_data) ## End(Not run)
Takes a PDF document containing a 'newswire' document obtained from 'NexisUni' and transforms it into an R data frame consisting of one row
newswire_segmenter( file, sentiment = FALSE, emotion = FALSE, regulatory_focus = FALSE, laughter = FALSE, narcissism = FALSE, text_clustering = FALSE )
newswire_segmenter( file, sentiment = FALSE, emotion = FALSE, regulatory_focus = FALSE, laughter = FALSE, narcissism = FALSE, text_clustering = FALSE )
file |
The name of the PDF file which the data are to be read from. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). |
sentiment |
Performs dictionary-based sentiment analysis
based on the |
emotion |
Performs dictionary-based emotion analysis based on the
|
regulatory_focus |
Calculates the number of words indicative for promotion and prevention focus based on the dictionary developed by Gamache et al., 2015 (default: FALSE) |
laughter |
Counts the number of times laughter was indicated in a quote. (default: FALSE) |
narcissism |
Counts the number of pronoun usage and calculates the ratio of first-person singular to first-person plural pronouns. This measure is derived from Zhu & Chen, (2015 (default: FALSE) |
text_clustering |
Applies a document categorization using a dictionary developed based on the framework developed by Graffin et al., 2016. (default: FALSE) |
An R data frame with each row representing one 'newswire' article. The columns indicate the title, text, 'newswire', date, and weekday.
newswire_df <- newswire_segmenter( file = system.file("inst", "examples", "newswire", "newswire_example_01.pdf", package = "disclosuR")); newswire_df_sentiment <- newswire_segmenter( file = system.file("inst", "examples", "newswire", "newswire_example_01.pdf", sentiment = TRUE, package = "disclosuR"));
newswire_df <- newswire_segmenter( file = system.file("inst", "examples", "newswire", "newswire_example_01.pdf", package = "disclosuR")); newswire_df_sentiment <- newswire_segmenter( file = system.file("inst", "examples", "newswire", "newswire_example_01.pdf", sentiment = TRUE, package = "disclosuR"));
Takes all PDF documents in a folder containing 'newswire' documents obtained from 'NexisUni' and transforms them into an R data frame consisting of one row per document.
newswire_segmenter_folder( folder_path, sentiment = FALSE, emotion = FALSE, regulatory_focus = FALSE, laughter = FALSE, narcissism = FALSE, text_clustering = FALSE )
newswire_segmenter_folder( folder_path, sentiment = FALSE, emotion = FALSE, regulatory_focus = FALSE, laughter = FALSE, narcissism = FALSE, text_clustering = FALSE )
folder_path |
The path to the folder in which the 'newswire' PDFs reside. If it does not contain an absolute path, the folder name is relative to the current working directory, getwd(). |
sentiment |
Performs dictionary-based sentiment analysis
based on the |
emotion |
Performs dictionary-based emotion analysis based on the
|
regulatory_focus |
Calculates the number of words indicative for promotion and prevention focus based on the dictionary developed by Gamache et al., 2015 (default: FALSE) |
laughter |
Counts the number of times laughter was indicated in a quote. (default: FALSE) |
narcissism |
Counts the number of pronoun usage and calculates the ratio of first-person singular to first-person plural pronouns. This measure is derived from Zhu & Chen, (2015 (default: FALSE) |
text_clustering |
Applies a document categorization using a dictionary developed based on the framework developed by Graffin et al., 2016. (default: FALSE) |
An R data frame with each row representing one 'newswire' article. The columns indicate the title, text, 'newswire', date, and weekday. (default: FALSE)
An R data frame with each row representing one 'newswire' article. The columns indicate the title, text, 'newswire', date, and weekday. Depending on the additional arguments, the output data can also contain sentiment, emotion, regulatory focus, laughter, narcissism and text cluster based on the Graffin et al. categories.
newswire_df <- newswire_segmenter_folder( folder_path = system.file("inst", "examples", "newswire", package = "disclosuR")); newswire_df_sentiment <- newswire_segmenter_folder( folder_path = system.file("inst", "examples", "newswire", package = "dislosuR"), sentiment = TRUE);
newswire_df <- newswire_segmenter_folder( folder_path = system.file("inst", "examples", "newswire", package = "disclosuR")); newswire_df_sentiment <- newswire_segmenter_folder( folder_path = system.file("inst", "examples", "newswire", package = "dislosuR"), sentiment = TRUE);