Working with Waveform Data using GCalcium

Calcium indicator methods such as GCaMP produce massive amounts of data; in some cases producing hundreds-of-thousands of data points for a single subject. Further, there is currently no ubiquitous way to organize or analyze this type of data. To even analyze the data, the researcher must:

  • Organize the data into a format that is easy to manipulate

  • Extract useful data points for visualization and analysis

  • Visualize the data

The GCalcium package gets researchers to the analysis phase more quickly by:

  • Providing simple, quick commands to format data for use both with, and without the GCalcium package

  • Extracting data points and waveform characteristics for summarizing and comparing activity within and between trials

  • Shortcuts for useful visualizations

This document explains how to use GCalcium’s functions to format, extract, and manipulate calcium indicator data.

Data: GCaMP

The data included with the GCalcium provides a sample of a time series-like dataset exported from Matlab using the TDTFilter command with a modified version of Dr. David Root’s Matlab script. This data was collected using GCaMP6.

This dataset consists of 11 rows and 814 columns. 10 trials from a pilot study were used, with calcium activity from 4 seconds before and after stimulus onset (0s).

Data formatting

In order to use the rest of the package, data must be in a data frame format, with the requirements being:

  1. The first column is a measure of time

  2. The following columns are recorded values from trials in ascending order that correspond to the times the values were recorded at, with 1 column per trial

Fortunately, the GCalcium package includes functions that quickly reformat the data for ease with both user manipulation and use of this package. All formatting commands output this type of data frame.

Format organized data with format_data

Currently, the only command for formatting data is format_data, which takes a matrix of a time measurement in the first column or row, and one trial per column or row in the subsequent columns/rows. A data frame with the first row “Time” and subsequent rows “Trial#” is outputted.

GCaMP.form <- format_data(GCaMP)

### Old table
GCaMP[1:5, 1:5]
#>         [,1]    [,2]    [,3]    [,4]    [,5]
#> [1,] -3.9902 -3.9803 -3.9705 -3.9607 -3.9508
#> [2,] 82.6890 82.6560 82.6500 82.6710 82.7120
#> [3,] 82.8580 82.9220 82.9840 83.0420 83.0850
#> [4,] 81.7090 81.7020 81.7130 81.7260 81.7320
#> [5,] 89.7470 89.6370 89.5450 89.4760 89.4280

### New table
GCaMP.form[1:5, 1:5]
#>      Time Trial1 Trial2 Trial3 Trial4
#> 1 -3.9902 82.689 82.858 81.709 89.747
#> 2 -3.9803 82.656 82.922 81.702 89.637
#> 3 -3.9705 82.650 82.984 81.713 89.545
#> 4 -3.9607 82.671 83.042 81.726 89.476
#> 5 -3.9508 82.712 83.085 81.732 89.428

Note: the data frame used with the GCalcium package does not have to be labeled the same as the format_data frame. This is simply for ease of calling each trial using outside functions.

Extracting useful information for analysis

To perform analyses or explore differences in activity waveforms, one must filter and summarize the data. Knowing what wave characteristics to compare can be confusing; as many scientists do not typically work with this type of data. The following commands extract and/or summarize numbers comparisons that have been used in past research. These functions are split into 2 types: vector inputs, and matrix (format_data style) inputs.

Vector

  • find_peaks looks for peaks based on inflection points and a simple filter

  • inflect_points finds the peaks and valleys of waves

find_peaks

find_peaks finds peaks or valleys in waveforms by using inflection points, with filter of ‘n’ increasing/decreasing points on both sides of each inflection point. A positive numerical input for ‘n.points’ returns the indices of peaks for the input vector, while a negative value returns indices of valleys.

Let’s say we wanted to find all peaks of trial 1 that have 10 consecutive decreasing points on the left and right of each peak, and use these indices to subset the data.

peak.indices <- find_peaks(GCaMP.form$Trial1, n.points = 10)

## Subset using indexing
peaks.df <- data.frame(
  times = GCaMP.form$Time[peak.indices],
  vals = GCaMP.form$Trial1[peak.indices]
)

head(peaks.df, 4)
#>     times   vals
#> 1 -3.9214 82.794
#> 2 -3.7346 82.752
#> 3 -3.5675 82.700
#> 4 -3.2332 82.896

inflect_points

inflect_points uses derivatives to find and label the inflection points (peaks and valleys) of a vector, along with the points between them.

inf.pts <- inflect_points(GCaMP.form$Trial1)

print(inf.pts[1:46])
#>  [1]  0  2  0  0  0  0 -2  0  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0
#> [26] -2  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0 -2  0  0  0

The value -2 indicates a peak, 2 indicates a valley, and 0 indicates a point on the curve between -2 and 2, or vice versa.

Matrix or Data frame

  • averaged_trials Averages values across trials

  • avg_curve_slope gets the average slope of a curve (half wave)

  • between_trial_change finds the difference in mean activity between trials

  • centered_AUC finds the area under each curve

  • consecutive_trial_change finds the difference in mean activity between trial n and n+1

  • moving_window summarizes data within windows of time

  • inflect_points_df finds provides a summary of the data between each inflection point (half wave)

  • perc_baseline calculates the percent change from a user-specified baseline period

  • within_trial_change finds the difference in mean activity within a trial

averaged_trials

averaged_trials averages values over each time point, across the specified trials. This is especially useful when blocking groups of trials.

Let’s say we want to plot the averaged values of trials 1-5

df.1thru5 <- averaged_trials(GCaMP.form, 1:5)

plot(x = df.1thru5$Time, df.1thru5$Values, type = 'l',
     xlab = 'Time (s)', ylab = 'Values')

avg_curve_slope

avg_curve_slope takes advantage of the ‘lm’ function to get the average slope of a curve

Let’s say we wanted to find the average curve slopes of the waves in trial 2:

avg.slopes <- avg_curve_slope(Dataframe = GCaMP.form, Trial = 2)
print(avg.slopes[1:25])
#>  [1]   4.7952454  -6.5501308   6.0529910          NA  -0.8673469   2.6255412
#>  [7]  -2.2515783   1.5663381  -0.2961311   4.1880802  -2.1473810   4.2200754
#> [13]  -7.1355861   1.2795264  -2.5091774   4.7090888  -6.6075358   7.0368126
#> [19]  -2.4614273   6.9958416  -4.5331100   3.2473198  -4.5933131  18.3817832
#> [25] -13.3153301

between_trial_change

between_trial_change finds the difference in means during the same time range between sets of trials

For example: we want to see how neural activity during the trial changes after manipulating the experimental variable. The control trials are 1-5, and the experimental trials are 6-10.

between_trial_change(Dataframe = GCaMP.form, TrialRange1 = c(1, 5), TrialRange2 = c(6, 10), Time.period = c(0, 4))
#> [1] 42.14306

centered_AUC

centered_AUC centers input values using a user-specified function such as the mean, then uses trapezoidal integration from the ‘pracma’ package to find the area under each curve.

Let’s say we wanted a metric besides the mean to measure neural activity to compare trials 2 and 4.

### Trial 2
centered_AUC(Dataframe = GCaMP.form, Trial = 2, FUN = mean)[1:10,]
#>    Curve.Num          AUC
#> 1          1  0.006891704
#> 2          2  0.004587406
#> 3          3 -0.026744846
#> 4          4           NA
#> 5          5 -0.003146397
#> 6          6 -0.009559396
#> 7          7 -0.001872021
#> 8          8 -0.004173221
#> 9          9 -0.001198623
#> 10        10  0.002160707

### Trial 4
centered_AUC(Dataframe = GCaMP.form, Trial = 4, FUN = mean)[1:10,]
#>    Curve.Num          AUC
#> 1          1 -0.103412748
#> 2          2 -0.158943291
#> 3          3 -0.007522804
#> 4          4 -0.005996655
#> 5          5 -0.058516248
#> 6          6 -0.052588348
#> 7          7 -0.023744899
#> 8          8 -0.036970504
#> 9          9 -0.024854394
#> 10        10  0.009037108

consecutive_trial_change

consecutive_trial_change finds the difference in means between consecutive trials during the same time range.

For example: we want to know how much the change in activity is along trials 1-10.

consecutive_trial_change(Dataframe = GCaMP.form, Trials = c(1, 10), Time.period = c(0, 4)) 
#>   Trial Mean.Change
#> 1   1.5   0.4838684
#> 2   2.5  -0.6399090
#> 3   3.5   8.3537811
#> 4   4.5  -1.1332103
#> 5   5.5  45.0354846
#> 6   6.5 -12.9349766
#> 7   7.5  -2.7213407
#> 8   8.5  -0.8451046
#> 9   9.5  -3.5104551

inflect_points_df

inflect_points_df uses inflect_points to find the inflection points, then summarizes the data and returns a data frame with the following variables: Time, raw (input) values, inflection points, and the number of the respective curve.

inf.pts.df <- inflect_points_df(Dataframe = GCaMP.form, Trial = 1)
head(inf.pts.df, 6)
#>      Time Raw.Values Curve.Num Inf.Pts
#> 1 -3.9902     82.689         1       0
#> 2 -3.9803     82.656         2       2
#> 3 -3.9705     82.650         2       0
#> 4 -3.9607     82.671         2       0
#> 5 -3.9508     82.712         2       0
#> 6 -3.9410     82.757         2       0

In differentiating between inflect_points and inflect_points_df, notice that the purpose of this function fully corresponds to its name. The output and first input are both data frames.

moving_window

moving_window summarizes data within time windows of a specified length, across a single trial.

Let’s say we want to find how the average fluorescence changes within trial 5 in 1 second intervals

moving_window(Dataframe = GCaMP.form, Trial = 5, Window.length = 1, FUN = mean)
#>   Summary.vals Start.time Stop.time Window.num
#> 1     90.29526    -3.9902   -2.9902          1
#> 2     90.34658    -2.9902   -1.9902          2
#> 3     90.77381    -1.9902   -0.9902          3
#> 4     90.36636    -0.9902    0.0098          4
#> 5     87.85451     0.0098    1.0098          5
#> 6     88.40449     1.0098    2.0098          6
#> 7     88.26752     2.0098    3.0098          7
#> 8     89.60593     3.0098    4.0098          8

perc_baseline

perc_baseline calculates the percent change from a baseline period specified by the user for all trials in the input matrix or data frame. This outputs the same object, but with values transformed to percent change from baseline. This is a good way for standardizing data within trial periods; especially when the baseline period has low standard deviations that cause inflated values in transforming into z-scores.

perc_baseline(Dataframe = GCaMP.form, Baseline.times = c(-3, -1), Baseline.frame = FALSE)[1:3, 1:4]
#>      Time     Trial1      Trial2     Trial3
#> 1 -3.9902 0.09668138 -0.06632278 0.06807828
#> 2 -3.9803 0.05673422  0.01086658 0.05950546
#> 3 -3.9705 0.04947110  0.08564376 0.07297703

within_trial_change

within_trial_change finds the change in mean values between the beginning and end of the entered time ranges for a single trial.

For example: we want to know how the mean activity changes between the first two seconds before epoc (baseline) and during the trial.

within_trial_change(Dataframe = GCaMP.form, Trial = 1, Beg.period = c(-2, 0), End.period = c(0, 2))
#> [1] -0.231172

Transformations and filtering

z_score

z_score transforms input values into z-scores. This also allows for a user-specified mean and standard deviation to compare distributions.

Let’s say we wanted to see how the variability of baseline and trial compare by using a mean and standard deviation from a baseline period before epoc.

### Extract values
basevals <- GCaMP.form$Trial1[GCaMP.form$Time <= 0]
eventvals <- GCaMP.form$Trial1[GCaMP.form$Time > 0]

### Find baseline (pre-epoc) values
base.mu <- mean(basevals)
base.sigma <- sd(basevals)

### Compute centered z-scores
z.scores <- z_score(x = eventvals, mu = base.mu, sigma = base.sigma)

print(z.scores[1:30])
#>  [1]  0.704283272  0.492798494  0.237857939  0.000299695 -0.176420462
#>  [6] -0.292302532 -0.361831775 -0.422669862 -0.503787311 -0.645742847
#> [11] -0.860124677 -1.117962283 -1.381593993 -1.610461082 -1.746622515
#> [16] -1.781387136 -1.708960842 -1.532240685 -1.285991285 -1.033947782
#> [21] -0.828257108 -0.715272089 -0.723963244 -0.839845315 -1.033947782
#> [26] -1.265711923 -1.465608494 -1.581490565 -1.590181720 -1.506167219

Note that the return format is different from the base R ‘scale’ function, in that it does not create new attributes.

Plotting

plot_trials

plot_trials uses the base R graphics to create a quick plot of the trial waves.

For example: we want to visualize the first 2 and last 2 trials

my.trials <- c(1, 2, 9, 10)
plot_trials(Dataframe = GCaMP.form, Trials = my.trials)

Note: this function automatically adjusts the x- and y- axes to fit all values. It also creates a legend for the corresponding trials.