Title: | Tools to Read and Convert Wearables Data |
---|---|
Description: | Package to read Empatica E4 data, perform several transformations, perform signal processing and analyses, including batch analyses. |
Authors: | Peter de Looff [aut, cre], Remko Duursma [aut], Saskia Koldijk [aut], Kees de Schepper [aut], Matthijs Noordzij [ctb], Natasha Jaques [ctb], Sara Taylor [ctb] |
Maintainer: | Peter de Looff <[email protected]> |
License: | GPL-2 |
Version: | 0.8.1 |
Built: | 2024-11-26 06:51:10 UTC |
Source: | CRAN |
partition data into chunks of a fixed number of rows in order to calculate aggregated features per chunk
add_chunk_group(data, rows_per_chunk)
add_chunk_group(data, rows_per_chunk)
data |
df to partition into chunks |
rows_per_chunk |
size of a chunk |
Aggregate E4 data into 1min timesteps
aggregate_e4_data(x)
aggregate_e4_data(x)
x |
An object read by |
Converts Unix time to as.POSIXct
as_time(x, tz = "UTC")
as_time(x, tz = "UTC")
x |
takes a unixtime and converts to as.POSIXct |
tz |
timezone is set to UTC |
Creates an xts object indexed by time
as_timeseries(data, index = 2, name_col = "V1")
as_timeseries(data, index = 2, name_col = "V1")
data |
A dataframe, subelements of list as output by read_e4 function |
index |
Which column (integer) to use as the data in the timeseries. Default: 2. |
name_col |
Column name to give to the timeseries data. |
Read and process all ZIP files in a directory
batch_analysis(path_in = NULL, path_out = ".")
batch_analysis(path_in = NULL, path_out = ".")
path_in |
input path |
path_out |
output path |
Configuration of the SVM algorithm for binary classification
binary_classifier_config
binary_classifier_config
An object of class list
of length 4.
Sara Taylor [email protected]
https://eda-explorer.media.mit.edu/
Calculation of RMSSD over 1 minute time periods for plotting
calculate_RMSSD(IBIdata)
calculate_RMSSD(IBIdata)
IBIdata |
Uses the IBI data frame as created by |
Force character datetime variable ("yyyy-mm-dd hh:mm:ss") to system timezone
char_clock_systime(time)
char_clock_systime(time)
time |
Datetime variable ("yyyy-mm-dd hh:mm:ss") |
Make choice between two classes based on kernel values
choose_between_classes(class_a, class_b, kernels)
choose_between_classes(class_a, class_b, kernels)
class_a |
Number by which class a is indicated |
class_b |
Number by which class b is indicated |
kernels |
Kernel values from SVM |
Compute amplitude features.
compute_amplitude_features(data)
compute_amplitude_features(data)
data |
vector of amplitude values |
Compute derivative features.
compute_derivative_features(derivative, feature_name)
compute_derivative_features(derivative, feature_name)
derivative |
vector of derivatives |
feature_name |
name of feature |
Compute features for SVM
compute_features2(data)
compute_features2(data)
data |
df with eda, filtered eda and timestamp columns |
Compute wavelet coefficients.
compute_wavelet_coefficients(data)
compute_wavelet_coefficients(data)
data |
data with an EDA element |
Compute wavelet decomposition.
compute_wavelet_decomposition(data)
compute_wavelet_decomposition(data)
data |
vector of values |
Create output folder for E4 analysis results
create_e4_output_folder(obj, out_path = ".")
create_e4_output_folder(obj, out_path = ".")
obj |
e4 analysis object |
out_path |
output folder |
A function to determine how many intervals should be created. The question is at what time do you want the filecut to start, what should be the period that you want separate files for, and what should the interval be?
e4_filecut_intervals(time_start, time_end, interval)
e4_filecut_intervals(time_start, time_end, interval)
time_start |
User input start time in the character format "yyyy-mm-dd hh:mm:ss" / e.g., "2019-11-27 08:32:00". Where do you want the file cut to start? |
time_end |
User input end time (same format as time_start) |
interval |
# Interval: User input interval (in minutes/ e.g., 5) What is the duration of the interval you want to divide the period into? For example, the paper by de Looff et al. (2019) uses 5 minute intervals over a 30 minute period preceding aggressive behavior. The 5 minute interval is chosen as for the calculation of some of the heart rate variability parameters one needs at least 5 minutes of data, but shorter intervals are possible as well, see for instance: Shaffer, Fred, en J. P. Ginsberg. ‘An Overview of Heart Rate Variability Metrics and Norms’. Frontiers in Public Health 5 (28 september 2017). https://doi.org/10.3389/fpubh.2017.00258. |
Function to filter the data object based on the time period and intervals that are needed for the files to be cut. The function also creates identical Empatica E4 zipfiles in the same directory as where the original zipfile is located.
filter_createdir_zip( data, time_start, time_end, interval, out_path = NULL, fn_name = NULL )
filter_createdir_zip( data, time_start, time_end, interval, out_path = NULL, fn_name = NULL )
data |
Object read with |
time_start |
User input start time in the character format "yyyy-mm-dd hh:mm:ss" / e.g., "2019-11-27 08:32:00". Where do you want the file cut to start? |
time_end |
User input end time (same format as time_start) |
interval |
# Interval: User input interval (in minutes/ e.g., 5) What is the duration of the interval you want to divide the period into? For example, the paper by de Looff et al. (2019) uses 5 minute intervals over a 30 minute period preceding aggressive behavior. The 5 minute interval is chosen as for the calculation of some of the heart rate variability parameters one needs at least 5 minutes of data. |
out_path |
The directory where to write the cut files; defaults to the input folder. |
fn_name |
The directory where to write the cut files without the extension. |
out_path fn_name
Filter all four datasets for a Datetime start + end
filter_e4data_datetime(data, start, end)
filter_e4data_datetime(data, start, end)
data |
Object read with |
start |
Start Datetime (posixct) |
end |
End Datetime (posixct) |
This function finds the peaks of an EDA signal and adds basic properties to the datafile.
find_peaks( data, offset = 1, start_WT = 4, end_WT = 4, thres = 0.005, sample_rate = getOption("SAMPLE_RATE", 8) )
find_peaks( data, offset = 1, start_WT = 4, end_WT = 4, thres = 0.005, sample_rate = getOption("SAMPLE_RATE", 8) )
data |
DataFrame with EDA as one of the columns and indexed by a datetimeIndex |
offset |
the number of rising seconds and falling seconds after a peak needed to be counted as a peak |
start_WT |
maximum number of seconds before the apex of a peak that is the "start" of the peak |
end_WT |
maximum number of seconds after the apex of a peak that is the "end" of the peak 50 percent of amp |
thres |
the minimum microsecond change required to register as a peak, defaults as .005 |
sample_rate |
number of samples per second, default=8 |
Also, peak_end is assumed to be no later than the start of the next peak. Is that OK?
data frame with several columns peaks 1 if apex peak_start 1 if start of peak peak_end 1 if end of preak peak_start_times if apex then corresponding start timestamp peak_end_times if apex then corresponding end timestamp half_rise if sharp decaying apex then time to halfway point in rise amp if apex then value of EDA at apex - value of EDA at start max_deriv if apex then max derivative within 1 second of apex rise_time if apex then time from start to apex decay_time if sharp decaying apex then time from apex to end SCR_width if sharp decaying apex then time from half rise to end
Get the amplitude of the peaks
get_amp(data)
get_amp(data)
data |
df with peak info |
finds the apex of electrodermal activity eda signal within an optional time window
get_apex(eda_deriv, offset = 1)
get_apex(eda_deriv, offset = 1)
eda_deriv |
uses the eda derivative to find the apex |
offset |
minimum number of downward measurements after the apex, in order to be considered a peak (default 1 means no restrictions) |
Get the time (in seconds) it takes to decay for each peak
get_decay_time(data, i_apex_with_decay)
get_decay_time(data, i_apex_with_decay)
data |
df with peak info |
i_apex_with_decay |
indexes of relevant peaks |
Get the first derivative.
get_derivative(values)
get_derivative(values)
values |
vector of numbers |
Finds the first derivatives of the eda signal
get_eda_deriv(eda)
get_eda_deriv(eda)
eda |
eda vector |
Get the amplitude value halfway between peak start and apex
get_half_amp(data, i)
get_half_amp(data, i)
data |
df with peak info |
i |
apex index |
Get the time (in seconds) it takes to get to halfway the rise in a peak
get_half_rise(data, i_apex_with_decay)
get_half_rise(data, i_apex_with_decay)
data |
df with peak info |
i_apex_with_decay |
relevant apices |
Identify peaks with a decent decay (at least half the amplitude of rise)
get_i_apex_with_decay(data)
get_i_apex_with_decay(data)
data |
df with peak info |
Generate kernel needed for SVM
get_kernel(kernel_transformation, sigma, columns)
get_kernel(kernel_transformation, sigma, columns)
kernel_transformation |
Data matrix used to transform EDA features into kernel values |
sigma |
The inverse kernel width used by the kernel |
columns |
Features computed from EDA signal |
Get the largest slope before apex, interpolated to seconds
get_max_deriv(data, eda_deriv, sample_rate)
get_max_deriv(data, eda_deriv, sample_rate)
data |
df with info on the peaks |
eda_deriv |
derivative of the signal |
sample_rate |
sample rate of the signal |
Find the end of the peaks, with some restrictions on the search
get_peak_end(data, max_lookahead)
get_peak_end(data, max_lookahead)
data |
df with peak info |
max_lookahead |
max distance from apex to search for end |
Get the end timstamp of the peaks
get_peak_end_times(data)
get_peak_end_times(data)
data |
df with peak info |
Provide info for each measurement whether it is the start of a peak (0 or 1)
get_peak_start(data, sample_rate)
get_peak_start(data, sample_rate)
data |
df with peak info |
sample_rate |
sample rate of the signal |
Get the start times of the peaks
get_peak_start_times(data)
get_peak_start_times(data)
data |
df with peak info |
Calculates the rise time of all peaks
get_rise_time(eda_deriv, apices, sample_rate, start_WT)
get_rise_time(eda_deriv, apices, sample_rate, start_WT)
eda_deriv |
first derivative of signal |
apices |
apex status per measurement (0 or 1) |
sample_rate |
sample rate of the signal |
start_WT |
window within which to look for rise time (in seconds) |
Get the width of the peak (in seconds, from halfway the rise until the end)
get_SCR_width(data, i_apex_with_decay)
get_SCR_width(data, i_apex_with_decay)
data |
df with peak info |
i_apex_with_decay |
relevant apices |
Get the second derivative.
get_second_derivative(values)
get_second_derivative(values)
values |
vector of numbers |
Analysis of interbeat interval (IBI)
ibi_analysis(IBI)
ibi_analysis(IBI)
IBI |
IBI data, component of object (the number of seconds since the start of the recording) read with |
Give the maximum value of a vector of values per segment of length n.
max_per_n(values, n, output_length)
max_per_n(values, n, output_length)
values |
array of numbers |
n |
length of each segment |
output_length |
argument to adjust for final segment not being full |
Configuration of the SVM algorithm for ternary classification
multiclass_classifier_config
multiclass_classifier_config
An object of class list
of length 4.
Sara Taylor [email protected]
https://eda-explorer.media.mit.edu/
function to combine several e4 files, and sets the length of the x-axis
pad_e4(x)
pad_e4(x)
x |
index of dataframe |
Plot artifacts after eda_data is classified
plot_artifacts(labels, eda_data)
plot_artifacts(labels, eda_data)
labels |
labels with artifact classification |
eda_data |
data upon which the labels are plotted |
Generate classifiers (artifact, no artifact)
predict_binary_classifier(data)
predict_binary_classifier(data)
data |
features from EDA signal |
Generate classifiers (artifact, unclear, no artifact)
predict_multiclass_classifier(data)
predict_multiclass_classifier(data)
data |
features from EDA signal |
Column binds a time_column to the dataframe
prepend_time_column(data, timestart, hertz, tz = Sys.timezone())
prepend_time_column(data, timestart, hertz, tz = Sys.timezone())
data |
dataframe |
timestart |
the start of the recording |
hertz |
hertz in which the E4 data was recorded |
tz |
The timezone, defaults to user timezone |
Returns 'object of class'
## S3 method for class 'e4data' print(x, ...)
## S3 method for class 'e4data' print(x, ...)
x |
An e4 data list |
... |
Further arguments currently ignored. |
Process EDA data
process_eda(eda_data)
process_eda(eda_data)
eda_data |
Data read with |
Row-bind E4 datasets
rbind_e4(data)
rbind_e4(data)
data |
An object read in by read_e4 |
Reads the raw ZIP file using 'read_e4', performs analyses with 'ibi_analysis' and 'eda_analysis'.
read_and_process_e4(zipfile, tz = Sys.timezone()) process_e4(data)
read_and_process_e4(zipfile, tz = Sys.timezone()) process_e4(data)
zipfile |
zip file with e4 data to be read |
tz |
timezone where data were recorded (default system timezone) |
data |
object from read_e4 function |
An object with processed data and analyses, object of class 'e4_analysis'.
Reads in E4 data as a list (with EDA, HR, Temp, ACC, BVP, IBI as dataframes), and prepends timecolumns
read_e4(zipfile = NULL, tz = Sys.timezone())
read_e4(zipfile = NULL, tz = Sys.timezone())
zipfile |
A zip file as exported by the instrument |
tz |
The timezone used by the instrument (defaults to user timezone). |
This function reads in a zipfile as exported by Empatica Connect. Then it extracts the zipfiles in a temporary folder and unzips the csv files in the temporary folder.
The EDA, HR, BVP, and TEMP csv files have a similar structure in which the starting time of the recording is read from the first row of the file (in unix time). The frequency of the measurements is read from the second row of the recording (in Hz). Subsequently, the raw data is read from row three onward.
The ACC csv file contain the acceleration of the Empatica E4 on the three axes x,y and z. The first row contains the starting time of the recording in unix time. The second row contains the frequency of the measurements in Hz. Subsequently, the raw x, y, and z data is read from row three onward.
The IBI file has a different structure, the starting time in unix is in the first row, first column. The firs column contins the number of seconds past since the start of the recording. The number of seconds past since the start of the recording represent a heartbeat as derived from the algorithms from the photo plethysmogrophy sensor. The second column contains the duration of the interval from one heartbeat to the next heartbeat.
ACC.csv = 32 Hz BVP.csv = 64 Hz EDA.csv = 4 HZ HR.csv = 1 HZ TEMP.csv = 4 Hz
Please also see the info.txt file provided in the zip file for additional information.
The function returns an object of class "e4_data" with a prepended datetime columns that defaults to user timezone. The object contains a list with dataframes from the physiological signals.
library(wearables) #read_e4()
library(wearables) #read_e4()
Remove peaks with a small rise from start to apex are removed
remove_small_peaks(data, thres = 0)
remove_small_peaks(data, thres = 0)
data |
df with info on peaks |
thres |
threshold of amplitude difference in order to be removed (default 0 means no removals) |
Upsample EDA data to 8 Hz
upsample_data_to_8Hz(eda_data)
upsample_data_to_8Hz(eda_data)
eda_data |
Data read with |
Slow!
write_processed_e4(obj, out_path = ".")
write_processed_e4(obj, out_path = ".")
obj |
e4 analysis object |
out_path |
output folder |