Title: | Parsing, Anonymizing and Visualizing Exported 'WhatsApp' Chat Logs |
---|---|
Description: | Imports 'WhatsApp' chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS phones and on Linux, macOS and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data. |
Authors: | Julian Kohne <[email protected]> |
Maintainer: | Julian Kohne <[email protected]> |
License: | GPL-3 |
Version: | 1.0.4 |
Built: | 2024-11-27 06:53:58 UTC |
Source: | CRAN |
Creates a .txt file in the working directory that has the same structure as chat logs exported from 'WhatsApp'. Messages have a timestamp, sender name and message body containing lorem ipsum, emoji, links, smilies, location, omitted media files, linebreaks, self-deleting photos, and 'WhatsApp' system messages. Timestamps are formatted according to specified phone operating system and time format settings. 'WhatsApp' system messages are formatted according to specified phone operating system and language.
create_chatlog( n_messages = 150, n_chatters = 2, n_emoji = 50, n_diff_emoji = 20, n_links = 20, n_locations = 5, n_smilies = 20, n_diff_smilies = 15, n_media = 10, media_excluded = TRUE, n_sdp = 3, n_deleted = 5, startdate = "01.01.2019", enddate = "31.12.2022", language = "german", time_format = "24h", os = "android", path = getwd(), chatname = "Simulated_WhatsR_chatlog" )
create_chatlog( n_messages = 150, n_chatters = 2, n_emoji = 50, n_diff_emoji = 20, n_links = 20, n_locations = 5, n_smilies = 20, n_diff_smilies = 15, n_media = 10, media_excluded = TRUE, n_sdp = 3, n_deleted = 5, startdate = "01.01.2019", enddate = "31.12.2022", language = "german", time_format = "24h", os = "android", path = getwd(), chatname = "Simulated_WhatsR_chatlog" )
n_messages |
Number of messages that are contained in the created .txt file. |
n_chatters |
Number of different chatters present in the created .txt file. |
n_emoji |
Number of messages that contain emoji. Must be smaller or equal to n_messages. |
n_diff_emoji |
Number of different emoji that are used in the simulated chat. |
n_links |
Number of messages that contain links. Must be smaller or equal to n_messages. |
n_locations |
Number of messages that contain locations. Must be smaller or equal to n_messages. |
n_smilies |
Number of messages that contain smilies. Must be smaller or equal to n_messages. |
n_diff_smilies |
Number of different smilies that are used in the simulated chat. |
n_media |
Number of messages that contain media files. Must be smaller or equal to n_messages. |
media_excluded |
Whether media files were excluded in simulated export or not. Default is TRUE. |
n_sdp |
Number of messages that contain self-deleting photos. Must be smaller or equal to n_messages. |
n_deleted |
Number of messages that contain deleted messages. Must be smaller or equal to n_messages. |
startdate |
Earliest possible date for messages. Format is 'dd.mm.yyyy'. Timestamps for messages are created automatically between startdate and enddate. Input is interpreted as UTC |
enddate |
Latest possible date for messages. Format is 'dd.mm.yyyy'. Timestamps for messages are created automatically between startdate and enddate. Input is interpreted as UTC |
language |
Parameter for the language setting of the exporting phone. Influences structure of system messages |
time_format |
Parameter for the time format setting of the exporting phone (am/pm vs. 24h). Influences the structure of timestamps. |
os |
Parameter for the operating system setting of the exporting phone. Influences the structure of timestamps and 'WhatsApp' system messages. |
path |
Character string for indicating the file path of where to save the file. Can be NA to not save a file. Default is getwd() |
chatname |
Name for the created .txt file. |
A .txt file with a simulated 'WhatsApp' chat containing lorem ipsum but all structural properties of actual chats.
SimulatedChat <- create_chatlog(path = NA)
SimulatedChat <- create_chatlog(path = NA)
Scrapes a dictionary of emoji from https://www.unicode.org/, assuming that the website is available and its structure does not change. Can be used to update the emoji dictionary contained in this package by replacing the file and recompiling the package. The dictionary is ordered according to the length of the emojis' byte representation (longer ones first) to prevent partial matching of shorter strings when iterating through the data frame.
download_emoji( unicode_page = "https://www.unicode.org/Public/emoji/15.1/emoji-test.txt", delete_header = 32, nlines = -1L )
download_emoji( unicode_page = "https://www.unicode.org/Public/emoji/15.1/emoji-test.txt", delete_header = 32, nlines = -1L )
unicode_page |
URL to the unicode page containing the emoji dictionary. |
delete_header |
Number of lines to delete from the top of the file. |
nlines |
Number of lines to read from the file. Passed to |
A data frame containing:
1) The native representation (glyphs) of all emoji in R
2) A textual description of what the emoji is displaying
3) The hexadecimal codepoints of the emoji
4) The status of the emoji (e.g. "fully-qualified" or "component")
5) Original order of the .txt file that the emoji were fetched from
Emoji_dictionary <- download_emoji(nlines = 50)
Emoji_dictionary <- download_emoji(nlines = 50)
Creates a data frame from an exported 'WhatsApp' chat log containing one row per message
and a column for DateTime when the message was sent, name of the sender and body of the message. Only works as an intermediary function
called from within parse_chat
parse_android( chatlog, newline_indicator = "\n", media_omitted = "<media omitted>", media_indicator = "(file attached)", sent_location = paste0("location: (?=https:\\/\\/maps\\.google\\.com\\/", "\\?q=\\d\\d.\\d{6}\\,\\d\\.\\d{6})"), live_location = "^live location shared$", datetime_indicator = paste("(?!^)(?=((\\d{2}\\.\\d{2}\\.\\d{2})|(\\d{1,2}", "\\/\\d{1,2}\\/\\d{2})),\\s\\d{2}\\:\\d{2}((\\s\\-)|(\\s(?i:(am|pm))\\s\\-)))", sep = ""), newline_replace = " start_newline ", media_replace = " media_omitted ", foursquare_loc = "^.*: https://foursquare.com/v/.*$" )
parse_android( chatlog, newline_indicator = "\n", media_omitted = "<media omitted>", media_indicator = "(file attached)", sent_location = paste0("location: (?=https:\\/\\/maps\\.google\\.com\\/", "\\?q=\\d\\d.\\d{6}\\,\\d\\.\\d{6})"), live_location = "^live location shared$", datetime_indicator = paste("(?!^)(?=((\\d{2}\\.\\d{2}\\.\\d{2})|(\\d{1,2}", "\\/\\d{1,2}\\/\\d{2})),\\s\\d{2}\\:\\d{2}((\\s\\-)|(\\s(?i:(am|pm))\\s\\-)))", sep = ""), newline_replace = " start_newline ", media_replace = " media_omitted ", foursquare_loc = "^.*: https://foursquare.com/v/.*$" )
chatlog |
'WhatsApp' chat preprocessed by |
newline_indicator |
character string defining character for newline indicators. Default is a Unicode newline. |
media_omitted |
character string inserted by 'WhatsApp' instead of file names when not exporting media. |
media_indicator |
character string for detecting media and file attachments. |
sent_location |
Regex for detecting auto generated messages for locations shared via chat. |
live_location |
Regex for detecting auto generated messages for live locations shared via chat. |
datetime_indicator |
Regex for detecting the DateTime indicator at the beginning of each message. |
newline_replace |
replacement string for a newline character in parsed message. Default is " start_newline ". |
media_replace |
replacement string for omitted media files. Default is " media_omitted ". |
foursquare_loc |
Regex for detecting sent Locations as FourSquare Links. |
A data frame containing the timestamp, name of the sender and message body
ParsedChat <- parse_android("29.01.18, 23:33 - Alice: Hi?\n 29.01.18, 23:45 - Bob: Hi\n")
ParsedChat <- parse_android("29.01.18, 23:33 - Alice: Hi?\n 29.01.18, 23:45 - Bob: Hi\n")
Creates a data frame from an exported 'WhatsApp' chat log containing one row per message. Some columns are saved as lists using the I() function so that multiple elements can be stored per message while still maintaining the general structure of one row per message. These columns should be treated as lists or unlisted first.
parse_chat( path, os = "auto", language = "auto", anonymize = "add", consent = NA, emoji_dictionary = "internal", smilie_dictionary = "wikipedia", rpnl = " start_newline ", verbose = FALSE )
parse_chat( path, os = "auto", language = "auto", anonymize = "add", consent = NA, emoji_dictionary = "internal", smilie_dictionary = "wikipedia", rpnl = " start_newline ", verbose = FALSE )
path |
Character string containing the file path to the exported 'WhatsApp' chat log as a .txt file. |
os |
Operating system of the phone the chat was exported from. Default "auto" tries to automatically detect the OS. Also supports "android" or "iOS". |
language |
Indicates the language setting of the phone with which the messages were exported. Default is "auto" trying to match either 'English' or 'German'. More languages might be supported in the future. |
anonymize |
TRUE results in the vector of sender names being anonymized and columns containing personal identifiable information to be deleted or restricted, FALSE displays the actual names and all content, "add" adds anonomized columns to the full info columns. Do not blindly trust this and always double check. |
consent |
String containing a consent message. All messages from chatters who have not posted this *exact* message into the chat will be deleted. Default is NA, no deleting anything. |
emoji_dictionary |
Dictionary for emoji matching. Can use a version included in this package when set to "internal" or
an updated data frame created by |
smilie_dictionary |
Value "emoticons" uses |
rpnl |
Replace newline. A character string for replacing line breaks within messages for the parsed message for better readability. Default is " start_newline ". |
verbose |
Prints progress messages for parse_chat() to the console if TRUE, default is FALSE. |
A dataframe containing one row per message and 11,15, or 19 columns, depending on the setting of the anonymize parameter
data <- parse_chat(system.file("englishandroid24h.txt", package = "WhatsR"))
data <- parse_chat(system.file("englishandroid24h.txt", package = "WhatsR"))
Creates a data frame from an exported 'WhatsApp' chat log containing one row per message
and a column for DateTime when the message was send, name of the sender and body of the message. Only works as an intermediary function
called from within parse_chat
parse_ios( chatlog, newline_indicator = "\n", media_omitted = "<media omitted>", media_indicator = "^<attached:\\s(.)*?\\.(.)*?>$", sent_location = paste0("location: (?=https:\\/\\/maps\\.google\\.com\\/", "\\?q=\\d\\d.\\d{6}\\,\\d\\.\\d{6})"), live_location = "^live location shared$", datetime_indicator = paste("(?!^)(?=\\[((\\d{2}\\.\\d{2}\\.\\d{2})|", "(\\d{1,2}\\/\\d{1,2}\\/\\d{2})),\\s\\d{1,2}\\:\\d{2}((\\:\\d{2}\\", "s(?i:(pm|am)))|(\\s(?i:(pm|am)))|(\\:\\d{2}\\])|(\\:\\d{2})|(\\s))\\])", sep = ""), newline_replace = " start_newline ", media_replace = " media_omitted ", foursquare_loc = "^.*: https://foursquare.com/v/.*$" )
parse_ios( chatlog, newline_indicator = "\n", media_omitted = "<media omitted>", media_indicator = "^<attached:\\s(.)*?\\.(.)*?>$", sent_location = paste0("location: (?=https:\\/\\/maps\\.google\\.com\\/", "\\?q=\\d\\d.\\d{6}\\,\\d\\.\\d{6})"), live_location = "^live location shared$", datetime_indicator = paste("(?!^)(?=\\[((\\d{2}\\.\\d{2}\\.\\d{2})|", "(\\d{1,2}\\/\\d{1,2}\\/\\d{2})),\\s\\d{1,2}\\:\\d{2}((\\:\\d{2}\\", "s(?i:(pm|am)))|(\\s(?i:(pm|am)))|(\\:\\d{2}\\])|(\\:\\d{2})|(\\s))\\])", sep = ""), newline_replace = " start_newline ", media_replace = " media_omitted ", foursquare_loc = "^.*: https://foursquare.com/v/.*$" )
chatlog |
'WhatsApp' chat preprocessed by |
newline_indicator |
Character string defining character for newline indicators. Default is a Unicode newline. |
media_omitted |
Character string inserted by 'WhatsApp' instead of file names when not exporting media. |
media_indicator |
Character string for detecting media and file attachments. |
sent_location |
Regex for detecting auto generated messages for locations shared via chat. |
live_location |
Regex for detecting auto generated messages for locations shared via chat. |
datetime_indicator |
Regex for detecting the DateTime indicator at the beginning of each message. |
newline_replace |
Replacement string for a newline character in parsed message. Default is " start_newline ". |
media_replace |
Replacement string for omitted media files. Default is " media_omitted ". |
foursquare_loc |
Regex for detecting sent Locations as FourSquare Links. |
A data frame containing the timestamp, name of the sender and message body
ParsedChat <- parse_ios("[29.01.18, 23:33:00] Alice: Hello?\\n [29.01.18, 23:45:01] Bob: Hello")
ParsedChat <- parse_ios("[29.01.18, 23:33:00] Alice: Hello?\\n [29.01.18, 23:45:01] Bob: Hello")
Plots four different types of graphs for the emoji contained in a parsed 'WhatsApp' chat log. Returns dataframe used for plotting if desired.
plot_emoji( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", min_occur = 1, return_data = FALSE, emoji_vec = "all", plot = "bar", emoji_size = 10, font_family = "Noto Color Emoji", exclude_sm = FALSE )
plot_emoji( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", min_occur = 1, return_data = FALSE, emoji_vec = "all", plot = "bar", emoji_size = 10, font_family = "Noto Color Emoji", exclude_sm = FALSE )
data |
A 'WhatsApp' chat log that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
min_occur |
Minimum number of occurrences for emoji to be included in the plots. Default is 1. |
return_data |
If TRUE, returns the subsetted data frame used for plotting. Default is FALSE. |
emoji_vec |
A vector of emoji that the visualizations and data will be restricted to. |
plot |
The type of plot that should be returned. Options are "heatmap", "cumsum", "bar" and "splitbar". |
emoji_size |
Determines the size of the emoji displayed on top of the bars for "bar" and "splitbar", default is 10. |
font_family |
Character string for indicating font family used to plot_emoji. Fonts might need to be installed manually, see |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
Plots and/or the subset data frame based on author names, datetime and emoji occurrence
# importing data data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) # opening AGG graphics device from the ragg package (replace tempfile with filepath) ragg::agg_png(tempfile(), width = 800, height = 600, res = 150) # plotting emoji plot_emoji(data,font_family="Times", exclude_sm = TRUE) #font_family = "Noto Color Emoji" on Linux # Close the AGG device dev.off()
# importing data data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) # opening AGG graphics device from the ragg package (replace tempfile with filepath) ragg::agg_png(tempfile(), width = 800, height = 600, res = 150) # plotting emoji plot_emoji(data,font_family="Times", exclude_sm = TRUE) #font_family = "Noto Color Emoji" on Linux # Close the AGG device dev.off()
Visualizes the occurrence of specific keywords within the chat. Requires the raw message content to be contained in the preprocessed data
plot_lexical_dispersion( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", keywords = c("hello", "world"), return_data = FALSE, exclude_sm = FALSE, ... )
plot_lexical_dispersion( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", keywords = c("hello", "world"), return_data = FALSE, exclude_sm = FALSE, ... )
data |
A 'WhatsApp' chatlog that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
keywords |
A vector of keywords to be displayed, default is c("hello","world"). |
return_data |
Default is FALSE, returns data frame used for plotting when TRUE. |
exclude_sm |
If TRUE, excludes the 'WhatsApp' System Messages from the descriptive statistics. Default is FALSE. |
... |
Further arguments passed down to |
Lexical Dispersion plots for specified keywords
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_lexical_dispersion(data, keywords = c("auch"))
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_lexical_dispersion(data, keywords = c("auch"))
Visualizes the occurrence of links in a 'WhatsApp' chatlog
plot_links( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", use_domains = TRUE, exclude_long = 50, min_occur = 1, return_data = FALSE, link_vec = "all", plot = "bar", exclude_sm = FALSE )
plot_links( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", use_domains = TRUE, exclude_long = 50, min_occur = 1, return_data = FALSE, link_vec = "all", plot = "bar", exclude_sm = FALSE )
data |
A 'WhatsApp' chatlog that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
use_domains |
If TRUE, links are shortened to domains. This includes the inputs in link_vec. Default is TRUE. |
exclude_long |
Either NA or a numeric value. If numeric value is provided, removes all links/domains longer than x characters. Default is 50. |
min_occur |
The minimum number of occurrences a link has to have to be included in the visualization. Default is 1. |
return_data |
If TRUE, returns the subset data frame. Default is FALSE. |
link_vec |
A vector of links that the visualizations will be restricted to. |
plot |
The type of plot that should be returned Options are "heatmap", "cumsum", "bar" and "splitbar". |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
Plots and/or the subset data frame based on author names, datetime and emoji occurrence
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_links(data)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_links(data)
Plots the location data that is sent in the 'WhatsApp' chatlog on an auto-scaled map. Requires unanonymized 'Location' column in data
plot_locations( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", mapzoom = 5, return_data = FALSE, jitter_value = 0.01, jitter_seed = 123, map_leeway = 0.1, exclude_sm = FALSE, API_key = "fbb7105f-27c1-49a0-96f8-926dfddcae32", map_type = "alidade_smooth" )
plot_locations( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", mapzoom = 5, return_data = FALSE, jitter_value = 0.01, jitter_seed = 123, map_leeway = 0.1, exclude_sm = FALSE, API_key = "fbb7105f-27c1-49a0-96f8-926dfddcae32", map_type = "alidade_smooth" )
data |
A 'WhatsApp' chatlog that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
mapzoom |
Value for zoom into the map passed down to |
return_data |
If TRUE, returns a data frame of LatLon coordinates extracted from the chat for more elaborate plotting. Default is FALSE. |
jitter_value |
Amount of random jitter to add to the geolocations to hide exact locations. Default value is 0.01. Can be NA for exact locations. |
jitter_seed |
Seed for adding random jitter to coordinates. Passed to |
map_leeway |
Adds additional space to the map so that points do not sit exactly at the border of the plot. Default value is 5. |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
API_key |
API key for |
map_type |
Type of map to be used. Passed down to |
Plots for geolocation and/or a data frame of latitude and longitude coordinates
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_locations(data, mapzoom = 10)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_locations(data, mapzoom = 10)
Creates summary data frames or visualizations of sent media files or file types
plot_media( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", use_filetype = TRUE, min_occur = 1, return_data = FALSE, media_vec = "all", plot = "bar", exclude_sm = FALSE )
plot_media( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", use_filetype = TRUE, min_occur = 1, return_data = FALSE, media_vec = "all", plot = "bar", exclude_sm = FALSE )
data |
A 'WhatsApp' chatlog that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
use_filetype |
If TRUE, shortens sent file attachments to file types. |
min_occur |
The minimum number of occurrences a media (type) has to have to be included in the visualization. Default is 1. |
return_data |
If TRUE, returns the subset data frame. Default is FALSE. |
media_vec |
A vector of media (types) that the visualizations will be restricted to. |
plot |
The type of plot that should be returned Options include "heatmap", "cumsum", "bar" and "splitbar". |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
Plots and/or the subset data frame based on author names, datetime and media (type) occurrence
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_media(data, plot = "heatmap")
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_media(data, plot = "heatmap")
Plots summarizing the amount of messages per person
plot_messages( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", plot = "bar", return_data = FALSE, exclude_sm = FALSE )
plot_messages( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", plot = "bar", return_data = FALSE, exclude_sm = FALSE )
data |
A 'WhatsApp' chat log that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
plot |
Type of plot to be returned, options are "bar", "cumsum", "heatmap" and "pie". Default is "bar". |
return_data |
If TRUE, returns the subset data frame. Default is FALSE. |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
Plots summarizing the number of messages per person
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_messages(data)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_messages(data)
Plots a network for replies between authors in chat logs. Each message is evaluated as a reply to the previous one.
plot_network( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", return_data = FALSE, collapse_sessions = FALSE, edgetype = "n", exclude_sm = FALSE )
plot_network( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", return_data = FALSE, collapse_sessions = FALSE, edgetype = "n", exclude_sm = FALSE )
data |
A 'WhatsApp' chatlog that was parsed with |
names |
A vector of author names that the visualization will be restricted to. Non-listed authors will be removed. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
return_data |
If TRUE, returns a data frame of subsequent interactions with senders and recipients. Default is FALSE. |
collapse_sessions |
Whether multiple subsequent messages by the same sender should be collapsed into one row. Default is FALSE. |
edgetype |
What type of content is displayed as an edge. Must be one of "TokCount","EmojiCount","SmilieCount","LocationCount","URLCount","MediaCount" or "n". |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
A network visualization of authors in 'WhatsApp' chat logs where each subsequent message is considered a reply to the previous one. Input will be ordered by TimeOrder column.
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_network(data)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_network(data)
Visualizes the reply times and reaction times to messages per author
plot_replytimes( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", return_data = FALSE, aggregate_sessions = TRUE, plot = "box", type = "replytime", exclude_sm = FALSE )
plot_replytimes( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", return_data = FALSE, aggregate_sessions = TRUE, plot = "box", type = "replytime", exclude_sm = FALSE )
data |
A 'WhatsApp' chat log that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
return_data |
If TRUE, returns a data frame of response times extracted from the chat for more elaborate plotting. Default is FALSE. |
aggregate_sessions |
If TRUE, concurrent messages of the same author are aggregated into one session. Default is TRUE. |
plot |
Type of plot to be returned, options are "box" and "heatmap". |
type |
If "replytime", plots display how much time it takes authors to reply to previous message, if "reactiontime", plots display how much time it takes for authors to get responded to. |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the data. Default is FALSE. |
Plots for Replytimes or Reactiontimes of authors. Input will be ordered by TimeOrder column.
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_replytimes(data)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_replytimes(data)
Plots the smilies used in 'WhatsApp' chat logs by sender
plot_smilies( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", min_occur = 1, return_data = FALSE, smilie_vec = "all", plot = "bar", exclude_sm = FALSE )
plot_smilies( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", min_occur = 1, return_data = FALSE, smilie_vec = "all", plot = "bar", exclude_sm = FALSE )
data |
A 'WhatsApp' chat log that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
min_occur |
The minimum number of occurrences a smiley has to have to be included in the visualization. Default is 1. |
return_data |
If TRUE, returns a data frame of smilies extracted from the chat for more elaborate plotting. Default is FALSE. |
smilie_vec |
A vector of smilies that the visualizations will be restricted to. |
plot |
The type of plot that should be returned. Options are "heatmap", "cumsum", "bar" and "splitbar". |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the data. Default is FALSE. |
Plots for distribution of smilies in 'WhatsApp' chats
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_smilies(data)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_smilies(data)
Visualizing token distribution per person
plot_tokens( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", plot = "bar", return_data = FALSE, exclude_sm = FALSE )
plot_tokens( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", plot = "bar", return_data = FALSE, exclude_sm = FALSE )
data |
A 'WhatsApp' chatlog that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
plot |
The type of plot to be used. Options include "bar","box","violin" and "cumsum". Default is "bar". NA values will be removed before plotting. For "violin", Senders with less than 2 messages are removed. |
return_data |
If TRUE, returns the subsetted data frame. Default is FALSE. |
exclude_sm |
If TRUE, excludes the 'WhatsApp' System Messages from the descriptive statistics. Default is FALSE. |
Plots showcasing the distribution of tokens per person
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_tokens(data)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_tokens(data)
Summarizes the distribution of user-generated tokens over time
plot_tokens_over_time( data, names = "all", names_col = "Sender", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", plot = "alltime", return_data = FALSE, exclude_sm = FALSE )
plot_tokens_over_time( data, names = "all", names_col = "Sender", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", plot = "alltime", return_data = FALSE, exclude_sm = FALSE )
data |
A 'WhatsApp' chat log that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
names_col |
A column indicated by a string that should be accessed to determine the names. Only needs to be changed when |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
plot |
Type of plot to be returned. Options are "year", "day", "hour", "heatmap" and "alltime". Default is "alltime". |
return_data |
If TRUE, returns the subset data frame. Default is FALSE. |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
A summary of tokens over time. Input will be ordered by TimeOrder column.
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_tokens_over_time(data)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_tokens_over_time(data)
Creates a wordcloud by author for 'WhatsApp' chat logs. Requires raw message text to be present in data.
plot_wordcloud( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", remove_stops = TRUE, stop = "english", comparison = FALSE, return_data = FALSE, font_size = 10, min_occur = 5, exclude_sm = FALSE )
plot_wordcloud( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", remove_stops = TRUE, stop = "english", comparison = FALSE, return_data = FALSE, font_size = 10, min_occur = 5, exclude_sm = FALSE )
data |
A 'WhatsApp' chat log that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
remove_stops |
Either TRUE or FALSE, default is TRUE. Configures whether stopwords from |
stop |
The language for stopword removal. Stopwords are taken from |
comparison |
Must be TRUE or FALSE. If TRUE, will split up wordcloud by sender. Default is FALSE. |
return_data |
Will return the data frame used to create the plot if TRUE. Default is FALSE. |
font_size |
Size of the words in the wordcloud, passed to |
min_occur |
Sets the minimum frequency a token must occur in the chat for it to be included in the plot. Default is 5. |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from word clouds. Default is FALSE. |
A wordcloud plot per author for 'WhatsApp' chat logs
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_wordcloud(data, comparison = TRUE, min_occur = 6)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) plot_wordcloud(data, comparison = TRUE, min_occur = 6)
Creates a list of basic information about a single 'WhatsApp' chat log
summarize_chat(data, exclude_sm = FALSE)
summarize_chat(data, exclude_sm = FALSE)
data |
A 'WhatsApp' chat log that was parsed with |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
A list containing:
1) The number of messages in the chat
2) The number of tokens in the chat
3) The number of participants in the chat
4) The date of the first message
6) The date of the last message
7) The total duration of the chat
8) The number of system messages in the chat
9) The number of emoji in the chat
10) The number of smilies in the chat
11) The number of links in the chat
12) The number of media in the chat
12) The number of locations in the chat
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) summarize_chat(data)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) summarize_chat(data)
Summarizing the distribution of tokens for sent messages
summarize_tokens_per_person( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", exclude_sm = FALSE )
summarize_tokens_per_person( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", exclude_sm = FALSE )
data |
A 'WhatsApp' chat log that was parsed with |
names |
A vector of author names that the plots will be restricted to. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
A summary of tokens per message distribution per author
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) summarize_tokens_per_person(data)
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) summarize_tokens_per_person(data)
Excluding parts of the chat by senders or timestamps
tailor_chat( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", exclude_sm = FALSE )
tailor_chat( data, names = "all", starttime = "1960-01-01 00:00", endtime = "2200-01-01 00:00", exclude_sm = FALSE )
data |
A 'WhatsApp' chat log that was parsed with |
names |
A vector of names that the output is restricted to. Messages from other non-contained authors are excluded. |
starttime |
Datetime that is used as the minimum boundary for exclusion. Is parsed with |
endtime |
Datetime that is used as the maximum boundary for exclusion. Is parsed with |
exclude_sm |
If TRUE, excludes the 'WhatsApp' system messages from the descriptive statistics. Default is FALSE. |
A dataframe that is restricted to the specified timeframe and authors
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) tailor_chat(data, names = c("Mallory", "Alice"))
data <- readRDS(system.file("ParsedWhatsAppChat.rds", package = "WhatsR")) tailor_chat(data, names = c("Mallory", "Alice"))