Package 'disclosuR' reference manual

Title:	Text Conversion from Nexis Uni PDFs to R Data Frames
Description:	Transform 'newswire' and earnings call transcripts as PDF obtained from 'Nexis Uni' to R data frames. Various 'newswires' and 'FairDisclosure' earnings call formats are supported. Further, users can apply several pre-defined dictionaries on the data based on Graffin et al. (2016)<doi:10.5465/amj.2013.0288> and Gamache et al. (2015)<doi:10.5465/amj.2013.0377>.
Authors:	Jonas Röttger [aut, cre]
Maintainer:	Jonas Röttger <[email protected]>
License:	GPL-3
Version:	0.6.0
Built:	2025-03-05 06:48:16 UTC
Source:	CRAN

Earnings call segmenter

Description

Converts one earnings call transcript from 'FairDisclosure' obtained from 'NexisUni' to an R data frame.

Usage

conference_call_segmenter(
  file,
  sentiment = FALSE,
  emotion = FALSE,
  regulatory_focus = FALSE,
  laughter = FALSE,
  narcissism = FALSE
)
conference_call_segmenter(
  file,
  sentiment = FALSE,
  emotion = FALSE,
  regulatory_focus = FALSE,
  laughter = FALSE,
  narcissism = FALSE
)

Arguments

`file`	The name of the PDF file which the data are to be read from. If it does not contain an absolute path, the file name is relative to the current working directory, getwd().
`sentiment`	Performs dictionary-based sentiment analysis based on the `analyzeSentiment` function (default: FALSE)
`emotion`	Performs dictionary-based emotion analysis based on the `get_nrc_sentiment` function (default: FALSE)
`regulatory_focus`	Calculates the number of words indicative for promotion and prevention focus based on the dictionary developed by Gamache et al., 2015 (default: FALSE)
`laughter`	Counts the number of times laughter was indicated in a quote. (default: FALSE)
`narcissism`	Counts the number of pronoun usage and calculates the ratio of first-person singular to first-person plural pronouns. This measure is derived from Zhu & Chen, (2015 (default: FALSE)

Value

An R data frame with each row representing one quote. The columns indicate the quarter, year, section (presentation versus Q&A), the speaker's name, role, affiliation, and also three binary indicators on whether the speaker is the host company's (1) CEO, (2) CFO, and/or (3) Chairman.

Examples

earnings_calls_df <- conference_call_segmenter(file = system.file("inst",
"examples",
"earnings_calls", "earnings_example_01.pdf",
package = "disclosuR"));
earnings_calls_df_sentiment <- conference_call_segmenter(file = system.file("inst",
"examples",
"newswire", "earnings_example_01.pdf",
package = "disclosuR"),
sentiment = TRUE);

earnings_calls_df <- conference_call_segmenter(file = system.file("inst",
"examples",
"earnings_calls", "earnings_example_01.pdf",
package = "disclosuR"));
earnings_calls_df_sentiment <- conference_call_segmenter(file = system.file("inst",
"examples",
"newswire", "earnings_example_01.pdf",
package = "disclosuR"),
sentiment = TRUE);

Earnings call segmenter (multiple files)

Description

Converts all 'FairDisclosure' earnings call transcripts obtained from 'NexisUni' in a folder to an R data frame.

Usage

conference_call_segmenter_folder(
  folder_path,
  sentiment = FALSE,
  emotion = FALSE,
  regulatory_focus = FALSE,
  laughter = FALSE,
  narcissism = FALSE
)
conference_call_segmenter_folder(
  folder_path,
  sentiment = FALSE,
  emotion = FALSE,
  regulatory_focus = FALSE,
  laughter = FALSE,
  narcissism = FALSE
)

Arguments

`folder_path`	The name of the folder which the data are to be read from. If it does not contain an absolute path, the file name is relative to the current working directory.
`sentiment`	Performs dictionary-based sentiment analysis based on the `analyzeSentiment` function (default: FALSE)
`emotion`	Performs dictionary-based emotion analysis based on the
`regulatory_focus`	Calculates the number of words indicative for promotion and prevention focus based on the dictionary developed by Gamache et al., 2015 (default: FALSE)
`laughter`	Counts the number of times laughter was indicated in a quote. (default: FALSE)
`narcissism`	Counts the number of pronoun usage and calculates the ratio of first-person singular to first-person plural pronouns. This measure is derived from Zhu & Chen, (2015 (default: FALSE)

Value

Examples

earnings_calls_df <- conference_call_segmenter_folder(
folder_path = system.file("inst",
"examples",
"earnings_calls",
package = "disclosuR"));
earnings_calls_df_sentiment <- conference_call_segmenter_folder(
folder_path = system.file("inst",
"examples",
"newswire",
sentiment = TRUE,
package = "disclosuR"));

earnings_calls_df <- conference_call_segmenter_folder(
folder_path = system.file("inst",
"examples",
"earnings_calls",
package = "disclosuR"));
earnings_calls_df_sentiment <- conference_call_segmenter_folder(
folder_path = system.file("inst",
"examples",
"newswire",
sentiment = TRUE,
package = "disclosuR"));

Impression offsetting

Description

Takes an event data set containing of dates and CUSIPs which have to correspond to a press data frame compiled by the function newswire_segmenter_folder.

Usage

impression_offsetting(event_data, press_data_categorized)
impression_offsetting(event_data, press_data_categorized)

Arguments

`event_data`	An R data that contains three columns which have to be labeled "date_announced", "cusip", and "ID". The date_announced column contains the dates of the events for which impression offsetting is calculated. The cusip column contains the 8-digit cusip of the companies for which impression offsetting is calculated. The ID column should contain a unique ID that identifies the specific event.
`press_data_categorized`	An R data frame with each row representing one 'newswire' article. The columns indicate the title, text, 'newswire', date, and weekday. It should be the outcome of `newswire_segmenter` in which both the argument sentiment and text_clustering have been set to TRUE.

Value

An R data frame which contains the column of the event_data plus three columns for the baseline announcements (positive, neutral, and negative) and three columns for the impression offsetting announcements (positive, neutral, and negative).

Examples

## Not run: 
impression_offsetting(event_data, press_data)

## End(Not run)
## Not run: 
impression_offsetting(event_data, press_data)

## End(Not run)

Newswire segmenter

Description

Takes a PDF document containing a 'newswire' document obtained from 'NexisUni' and transforms it into an R data frame consisting of one row

Usage

newswire_segmenter(
  file,
  sentiment = FALSE,
  emotion = FALSE,
  regulatory_focus = FALSE,
  laughter = FALSE,
  narcissism = FALSE,
  text_clustering = FALSE
)
newswire_segmenter(
  file,
  sentiment = FALSE,
  emotion = FALSE,
  regulatory_focus = FALSE,
  laughter = FALSE,
  narcissism = FALSE,
  text_clustering = FALSE
)

Arguments

`file`	The name of the PDF file which the data are to be read from. If it does not contain an absolute path, the file name is relative to the current working directory, getwd().
`sentiment`	Performs dictionary-based sentiment analysis based on the `analyzeSentiment` function (default: FALSE)
`emotion`	Performs dictionary-based emotion analysis based on the `get_nrc_sentiment` function (default: FALSE)
`regulatory_focus`	Calculates the number of words indicative for promotion and prevention focus based on the dictionary developed by Gamache et al., 2015 (default: FALSE)
`laughter`	Counts the number of times laughter was indicated in a quote. (default: FALSE)
`narcissism`	Counts the number of pronoun usage and calculates the ratio of first-person singular to first-person plural pronouns. This measure is derived from Zhu & Chen, (2015 (default: FALSE)
`text_clustering`	Applies a document categorization using a dictionary developed based on the framework developed by Graffin et al., 2016. (default: FALSE)

Value

An R data frame with each row representing one 'newswire' article. The columns indicate the title, text, 'newswire', date, and weekday.

Examples

newswire_df <- newswire_segmenter(
file = system.file("inst",
"examples",
"newswire", "newswire_example_01.pdf",
package = "disclosuR"));
newswire_df_sentiment <- newswire_segmenter(
file = system.file("inst",
"examples",
"newswire", "newswire_example_01.pdf",
sentiment = TRUE,
package = "disclosuR"));
newswire_df <- newswire_segmenter(
file = system.file("inst",
"examples",
"newswire", "newswire_example_01.pdf",
package = "disclosuR"));
newswire_df_sentiment <- newswire_segmenter(
file = system.file("inst",
"examples",
"newswire", "newswire_example_01.pdf",
sentiment = TRUE,
package = "disclosuR"));

Newswire segmenter (multiple files)

Description

Takes all PDF documents in a folder containing 'newswire' documents obtained from 'NexisUni' and transforms them into an R data frame consisting of one row per document.

Usage

newswire_segmenter_folder(
  folder_path,
  sentiment = FALSE,
  emotion = FALSE,
  regulatory_focus = FALSE,
  laughter = FALSE,
  narcissism = FALSE,
  text_clustering = FALSE
)
newswire_segmenter_folder(
  folder_path,
  sentiment = FALSE,
  emotion = FALSE,
  regulatory_focus = FALSE,
  laughter = FALSE,
  narcissism = FALSE,
  text_clustering = FALSE
)

Arguments

`folder_path`	The path to the folder in which the 'newswire' PDFs reside. If it does not contain an absolute path, the folder name is relative to the current working directory, getwd().
`sentiment`	Performs dictionary-based sentiment analysis based on the `analyzeSentiment` function (default: FALSE)
`emotion`	Performs dictionary-based emotion analysis based on the `get_nrc_sentiment` function (default: FALSE)
`regulatory_focus`	Calculates the number of words indicative for promotion and prevention focus based on the dictionary developed by Gamache et al., 2015 (default: FALSE)
`laughter`	Counts the number of times laughter was indicated in a quote. (default: FALSE)
`narcissism`	Counts the number of pronoun usage and calculates the ratio of first-person singular to first-person plural pronouns. This measure is derived from Zhu & Chen, (2015 (default: FALSE)
`text_clustering`	Applies a document categorization using a dictionary developed based on the framework developed by Graffin et al., 2016. (default: FALSE)

Value

An R data frame with each row representing one 'newswire' article. The columns indicate the title, text, 'newswire', date, and weekday. (default: FALSE)

An R data frame with each row representing one 'newswire' article. The columns indicate the title, text, 'newswire', date, and weekday. Depending on the additional arguments, the output data can also contain sentiment, emotion, regulatory focus, laughter, narcissism and text cluster based on the Graffin et al. categories.

Examples

newswire_df <- newswire_segmenter_folder(
folder_path = system.file("inst",
"examples",
"newswire",
package = "disclosuR"));
newswire_df_sentiment <- newswire_segmenter_folder(
folder_path = system.file("inst",
"examples",
"newswire",
package = "dislosuR"), sentiment = TRUE);

newswire_df <- newswire_segmenter_folder(
folder_path = system.file("inst",
"examples",
"newswire",
package = "disclosuR"));
newswire_df_sentiment <- newswire_segmenter_folder(
folder_path = system.file("inst",
"examples",
"newswire",
package = "dislosuR"), sentiment = TRUE);

Package 'disclosuR'

Help Index

Earnings call segmenter

Description

Usage

Arguments

Value

Examples

Earnings call segmenter (multiple files)

Description

Usage

Arguments

Value

Examples

Impression offsetting

Description

Usage

Arguments

Value

Examples

Newswire segmenter

Description

Usage

Arguments

Value

Examples

Newswire segmenter (multiple files)

Description

Usage

Arguments

Value

Examples