Title: | Easily Handling Data from the ‘m-Path’ Platform |
---|---|
Description: | Provides tools for importing and cleaning Experience Sampling Method (ESM) data collected via the 'm-Path' platform. The goal is to provide with a few utility functions to be able to read and perform some common operations in ESM data collected through the 'm-Path' platform (<https://m-path.io/landing/>). Functions include raw data handling, format standardization, and basic data checks, as well as to calculate the response rate in data from ESM studies. |
Authors: | Merijn Mestdagh [aut, cre] , Lara Navarrete [aut], Koen Niemeijer [aut] , m-Path Software [cph] |
Maintainer: | Merijn Mestdagh <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.2 |
Built: | 2024-11-25 16:15:43 UTC |
Source: | CRAN |
Contains the preprocessed example data for an m-path research study.
In the study, 20 participants completed 11 beeps over the course of 10 days. The study consisted of:
An intake questionnaire, that participants answered at the study's start.
A main questionnaire (10 times per day), where participants answered questions about their emotions and context at the time.
An evening questionnaire (once, at the end of the day), about their emotions and activities throughout the day.
Each row corresponds to one beep sent during the study.
example_data
example_data
A data frame with 1980 rows and 47 columns:
Participant identifier.
Code the participants used to sign up for the study.
The questionnaire that participants answered in that beep (it can be the main or the evening questionnaire).
Time stamp for when the notification was scheduled for, in unix time.
Time stamp for when the notification was sent, in unix time.
Time stamp for when the notification was answered, in unix time. If the notification was never answered, this value is an NA.
Time stamp for when the notification was completed, in unix time. If the notification was never answered, this value is an NA.
The difference between the phone time and the server time.
Observation number for each participant. Goes from 1 (first observation), to 110 (last observation of the study).
Day number of the study, for the participant. Goes from 1 to 10.
Observation number within the day (for each participant). Goes from 1 to 11.
Logical, whether the beep was answered or not.
Average heart rate per day. Note that unlike the rest of the variables, this corresponds to simulated data.
Participant's gender. 1 means 'Male', 2 means 'Female', 3 'Other'.
Participant's gender, as a string.
Participant's age in years.
Composite variable corresponding to participant's life satisfaction according to the Satisfaction With Life Scale (SWLS).
Composite variable corresponding to participant's neuroticism according to the Big Five Inventory (BFI).
Participants' self-reported happiness at the time of the beep. From 0 (not happy at all) to 100 (very happy).
Participants' self-reported sadness at the time of the beep. From 0 (not sad at all) to 100 (very sad).
Participants' self-reported anger at the time of the beep. From 0 (not angry at all) to 100 (very angry).
Participants' self-reported relaxation at the time of the beep. From 0 (not relaxed at all) to 100 (very relaxed).
Participants' self-reported anxiety at the time of the beep. From 0 (not anxious at all) to 100 (very anxious).
Participants' self-reported energy at the time of the beep. From 0 (not energetic at all) to 100 (very energetic).
Participants' self-reported tiredness at the time of the beep. From 0 (not tired at all) to 100 (very tired).
Index corresponding to the participant's answer to the question "Where are you now?", from a list of multiple options.
Text corresponding to the participant's selected location at the time of the beep.
Index corresponding to the participant's answer to the question "With whom are you right now?", from a list of multiple options.
Text corresponding to the participant's selected company at the time of the beep.
Index corresponding to the participant's answer to the question "What are you doing now?", from a list of multiple options.
Text corresponding to the participant's selected activity at the time of the beep.
Step count between the previous answered beep and the current beep
Participants' happiness during the day, from 0 (not happy at all) to 100 (very happy).
Participants' sadness during the day, from 0 (not sad at all) to 100 (very sad).
Participants' anger during the day, from 0 (not angry at all) to 100 (very angry).
Participants' relaxation during the day, from 0 (not relaxed at all) to 100 (very relaxed).
Participants' anxiety during the day, from 0 (not anxious at all) to 100 (very anxious).
Participants' energy during the day, from 0 (not energetic at all) to 100 (very energetic).
Participants' tiredness during the day, from 0 (not tired at all) to 100 (very tired).
Participant's answer to whether something stressful had happened during the day. 1 means 'yes', 0 means 'no'.
Participant's answer to whether something positive had happened during the day. 1 means 'yes', 0 means 'no'.
Explanation of the positive event (if participants responded 'yes' to the previous question).
Explanation of the stressful event (if participants responded 'yes' to the previous question).
Index corresponding to the participant's answer(s) to the question "What activities did you do today?", from a list of multiple options.
Text corresponding to the participant's selected activities during the day.
Delay in minutes between the scheduled beep and the time the participants started the beep.
Time in minutes the participants took to fill in the beep (difference between the columns start and stop).
This function provides an easy way to access the m-Path example files.
mpath_example(file = NULL)
mpath_example(file = NULL)
file |
the name of the file to be accessed. If |
a character string with the path to the m-Path example data
# Example 1: access 'example_basic.csv' data mpath_example('example_basic.csv') # returns the full path to the file 'example_basic.csv' # Example 2: list all the example files mpath_example() # returns the example files as a vector
# Example 1: access 'example_basic.csv' data mpath_example('example_basic.csv') # returns the full path to the file 'example_basic.csv' # Example 2: list all the example files mpath_example() # returns the example files as a vector
This function returns a ggplot object with the response rate per day (x axis) and participant (color). Note that instead of using calendar dates, the function returns a plot grouped by the day inside the study for the participant.
plot_response_rate(data, valid_col, participant_col, time_col)
plot_response_rate(data, valid_col, participant_col, time_col)
data |
data frame with data |
valid_col |
name of the column that stores whether the beep was answered or not |
participant_col |
name of the column that stores the participant id (or equivalent) |
time_col |
name of the column that stores the time of the beep |
a ggplot object with the response rate per day (x axis) and participant (color)
# load data data(example_data) # make plot with plot_response_rate plot_response_rate(data = example_data, time_col = sent, participant_col = participant, valid_col = answered) # The resulting ggplot object can be formatted using ggplot2 functions (see ggplot2 # documentation).
# load data data(example_data) # make plot with plot_response_rate plot_response_rate(data = example_data, time_col = sent, participant_col = participant, valid_col = answered) # The resulting ggplot object can be formatted using ggplot2 functions (see ggplot2 # documentation).
This function reads an m-Path CSV file into a tibble, an extension of a
data.frame
.
read_mpath(file, meta_data, warn_changed_columns = TRUE)
read_mpath(file, meta_data, warn_changed_columns = TRUE)
file |
A string with the path to the m-Path file. |
meta_data |
A string with the path to the meta data file. |
warn_changed_columns |
Warn if the question text, type of question, or type of answer has
changed during the study. Default is |
Note that this function has been tested with the meta data version v.1.1, so it is advised to use that version of the meta data. In the m-Path dashboard, change the version in 'Export data' > "export version".
A tibble with the m-Path data.
write_mpath()
for saving the data back to a CSV file.
# We can use the function mpath_examples to get the path to the example data basic_path <- mpath_example(file ="example_basic.csv") meta_path <- mpath_example("example_meta.csv") data <- read_mpath(file = basic_path, meta_data = meta_path)
# We can use the function mpath_examples to get the path to the example data basic_path <- mpath_example(file ="example_basic.csv") meta_path <- mpath_example("example_meta.csv") data <- read_mpath(file = basic_path, meta_data = meta_path)
Calculate response rate
response_rate( data, valid_col, participant_col, time_col = NULL, period_start = NULL, period_end = NULL )
response_rate( data, valid_col, participant_col, time_col = NULL, period_start = NULL, period_end = NULL )
data |
data frame with data |
valid_col |
name of the column that stores whether the beep was answered or not |
participant_col |
name of the column that stores the participant id (or equivalent) |
time_col |
optional: name of the column that stores the time of the beep, as a 'POSIXct' object. |
period_start |
string representing the starting date to
calculate response rates (optional). Accepts dates in the following
formats: |
period_end |
period end to calculate response rates (optional). |
a data frame with the response rate for each participant, and the number of beeps used to calculate the response rate
# Example 1: calculate response rates for the whole study # Get example data data(example_data) # Calculate response rate for each participant # We don't specify time_col, period_start or period_end. # Response rates will be based on all the participant's data response_rate <- response_rate(data = example_data, valid_col = answered, participant_col = participant) # Example 2: calculate response rates for a specific time period data(example_data) # Calculate response rate for each participant between dates response_rate <- response_rate(data = example_data, valid_col = answered, participant_col = participant, time_col = sent, period_start = '2024-05-15', period_end = '2024-05-31') # Get participants with a response rate below 0.5 response_rate[response_rate$response_rate < 0.5,]
# Example 1: calculate response rates for the whole study # Get example data data(example_data) # Calculate response rate for each participant # We don't specify time_col, period_start or period_end. # Response rates will be based on all the participant's data response_rate <- response_rate(data = example_data, valid_col = answered, participant_col = participant) # Example 2: calculate response rates for a specific time period data(example_data) # Calculate response rate for each participant between dates response_rate <- response_rate(data = example_data, valid_col = answered, participant_col = participant, time_col = sent, period_start = '2024-05-15', period_end = '2024-05-31') # Get participants with a response rate below 0.5 response_rate[response_rate$response_rate < 0.5,]
m-Path timestamps are based on the participant's local time zone, and when converted to R datetime format, they may display as UTC. This function allows for the conversion of m-Path timestamps to datetime, and optionally allows for the specification of a UTC offset or a forced time zone.
timestamps_to_datetime(x, tz_offset = NULL, force_tz = NULL)
timestamps_to_datetime(x, tz_offset = NULL, force_tz = NULL)
x |
A vector of timestamps to be transformed to datetime. |
tz_offset |
A numeric value to be added to the timestamps before transforming to datetime.
This is typically derived from the |
force_tz |
A string specifying the time zone to force the timestamps to. This is useful when
the data is to be compared to other data sources that are in a different time zone. Note that
this will not change the actual time of the timestamp, but only the time zone that is
displayed. The |
Timestamps in m-Path, like those in timeStampScheduled
and timeStampStart
, are a variation on
UNIX timestamps, defined as the number of seconds since January 1, 1970, at 00:00:00. However,
unlike standard UNIX timestamps (which use UTC), m-Path timestamps are based on the participant's
local time zone. When converted to R
datetime format, they may display as UTC, which could lead
to confusion. This typically isn't an issue when analyzing ESM data within the participant's
local context, but it can affect comparisons with other data sources. For accurate
cross-referencing with other data, consider specifying the UTC offset to correctly adjust for the
participant’s local time. Alternatively, you can force the timestamps to display in a specific
time zone using the force_tz
argument.
A vector of POSIXct
objects representing the timestamps in the UTC time zone. The time
zone may differ if force_tz
is specified.
data <- read_mpath( mpath_example("example_basic.csv"), mpath_example("example_meta.csv") )[1:10,] # The most common use case for this function: Convert # `timeStampStart` to datetime. Remember that these are in the # local time zone, but R displays them as being in UTC. timestamps_to_datetime(data$timeStampStart) # Convert `timeStampStop` to datetime, but as being the correct # value in UTC. timestamps_to_datetime( x = data$timeStampStop, tz_offset = data$timeZoneOffset ) # Let's convert `timeStampSent` to datetime, but this time we want to # force the time zone to be in "America/New_York" as we know all # participants were in this time zone and so we can link with other # data that is also in New York's time zone. timestamps_to_datetime( x = data$timeStampSent, force_tz = "America/New_York" )
data <- read_mpath( mpath_example("example_basic.csv"), mpath_example("example_meta.csv") )[1:10,] # The most common use case for this function: Convert # `timeStampStart` to datetime. Remember that these are in the # local time zone, but R displays them as being in UTC. timestamps_to_datetime(data$timeStampStart) # Convert `timeStampStop` to datetime, but as being the correct # value in UTC. timestamps_to_datetime( x = data$timeStampStop, tz_offset = data$timeZoneOffset ) # Let's convert `timeStampSent` to datetime, but this time we want to # force the time zone to be in "America/New_York" as we know all # participants were in this time zone and so we can link with other # data that is also in New York's time zone. timestamps_to_datetime( x = data$timeStampSent, force_tz = "America/New_York" )
Save a data frame or tibble to a CSV file in the same format as the downloaded data from the
m-Path website. This function is useful when you have made modifications to the original data
and would like to save it in the same format. Note that reading back the data using
read_mpath()
may not always work, as the data may no longer be in line with the meta data of
the original data file.
write_mpath(x, file, .progress = TRUE)
write_mpath(x, file, .progress = TRUE)
x |
A data frame or tibble to write to disk. |
file |
File or connection to write to. |
.progress |
Logical indicating whether to show a progress bar. Default is |
Even though saving a data frame to a CSV file may seem trivial, there are several issues that
need to be addressed when saving m-Path data. The main issue is that m-Path data contains list
columns that need to be "collapsed" to a single string before they can be saved to a CSV file.
This function collapses most list columns to a single string using paste()
with commas as a
delimiter of the values. However, for columns that contain strings, this is not possible as the
strings themselves may contains commas as well. To address this, the function converts all
character columns to JSON strings using jsonlite::toJSON()
before saving them to disk.
While write_mpath()
aims to provide a similar CSV file as the m-Path dashboard, we cannot
provide any guarantees that the data can be read back using read_mpath()
, especially when the
data has been modified. If you want to save the data to use it at a later point in R (even when
transferring it to another computer), we recommend using saveRDS()
or save()
instead.
Note that the resulting data file may not exactly be equal to the original, even if it was not
modified after reading it with read_mpath()
. The main reason is that CSV files from the m-Path
dashboard do not contain all necessary file delimiters corresponding to the number of rows in the
data. This function, however, does contain the correct number of file delimiters which makes the
files slightly bigger compared to the original file.
Returns x
invisibly.
read_mpath()
to read m-Path data into R.
data <- read_mpath( mpath_example("example_basic.csv"), mpath_example("example_meta.csv") ) write_mpath(data, "data.csv")
data <- read_mpath( mpath_example("example_basic.csv"), mpath_example("example_meta.csv") ) write_mpath(data, "data.csv")