Calcium indicator methods such as GCaMP produce massive amounts of data; in some cases producing hundreds-of-thousands of data points for a single subject. Further, there is currently no ubiquitous way to organize or analyze this type of data. To even analyze the data, the researcher must:
Organize the data into a format that is easy to manipulate
Extract useful data points for visualization and analysis
Visualize the data
The GCalcium package gets researchers to the analysis phase more quickly by:
Providing simple, quick commands to format data for use both with, and without the GCalcium package
Extracting data points and waveform characteristics for summarizing and comparing activity within and between trials
Shortcuts for useful visualizations
This document explains how to use GCalcium’s functions to format, extract, and manipulate calcium indicator data.
The data included with the GCalcium provides a sample of a time series-like dataset exported from Matlab using the TDTFilter command with a modified version of Dr. David Root’s Matlab script. This data was collected using GCaMP6.
This dataset consists of 11 rows and 814 columns. 10 trials from a pilot study were used, with calcium activity from 4 seconds before and after stimulus onset (0s).
In order to use the rest of the package, data must be in a data frame format, with the requirements being:
The first column is a measure of time
The following columns are recorded values from trials in ascending order that correspond to the times the values were recorded at, with 1 column per trial
Fortunately, the GCalcium package includes functions that quickly reformat the data for ease with both user manipulation and use of this package. All formatting commands output this type of data frame.
Currently, the only command for formatting data is
format_data
, which takes a matrix of a time measurement in
the first column or row, and one trial per column or row in the
subsequent columns/rows. A data frame with the first row “Time” and
subsequent rows “Trial#” is outputted.
GCaMP.form <- format_data(GCaMP)
### Old table
GCaMP[1:5, 1:5]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -3.9902 -3.9803 -3.9705 -3.9607 -3.9508
#> [2,] 82.6890 82.6560 82.6500 82.6710 82.7120
#> [3,] 82.8580 82.9220 82.9840 83.0420 83.0850
#> [4,] 81.7090 81.7020 81.7130 81.7260 81.7320
#> [5,] 89.7470 89.6370 89.5450 89.4760 89.4280
### New table
GCaMP.form[1:5, 1:5]
#> Time Trial1 Trial2 Trial3 Trial4
#> 1 -3.9902 82.689 82.858 81.709 89.747
#> 2 -3.9803 82.656 82.922 81.702 89.637
#> 3 -3.9705 82.650 82.984 81.713 89.545
#> 4 -3.9607 82.671 83.042 81.726 89.476
#> 5 -3.9508 82.712 83.085 81.732 89.428
Note: the data frame used with the GCalcium package does not have to be labeled the same as the format_data frame. This is simply for ease of calling each trial using outside functions.
To perform analyses or explore differences in activity waveforms, one
must filter and summarize the data. Knowing what wave characteristics to
compare can be confusing; as many scientists do not typically work with
this type of data. The following commands extract and/or summarize
numbers comparisons that have been used in past research. These
functions are split into 2 types: vector inputs, and matrix
(format_data
style) inputs.
find_peaks
looks for peaks based on inflection
points and a simple filter
inflect_points
finds the peaks and valleys of
waves
find_peaks
finds peaks or valleys in waveforms by using
inflection points, with filter of ‘n’ increasing/decreasing points on
both sides of each inflection point. A positive numerical input for
‘n.points’ returns the indices of peaks for the input vector, while a
negative value returns indices of valleys.
Let’s say we wanted to find all peaks of trial 1 that have 10 consecutive decreasing points on the left and right of each peak, and use these indices to subset the data.
peak.indices <- find_peaks(GCaMP.form$Trial1, n.points = 10)
## Subset using indexing
peaks.df <- data.frame(
times = GCaMP.form$Time[peak.indices],
vals = GCaMP.form$Trial1[peak.indices]
)
head(peaks.df, 4)
#> times vals
#> 1 -3.9214 82.794
#> 2 -3.7346 82.752
#> 3 -3.5675 82.700
#> 4 -3.2332 82.896
inflect_points
uses derivatives to find and label the
inflection points (peaks and valleys) of a vector, along with the points
between them.
inf.pts <- inflect_points(GCaMP.form$Trial1)
print(inf.pts[1:46])
#> [1] 0 2 0 0 0 0 -2 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
#> [26] -2 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 -2 0 0 0
The value -2 indicates a peak, 2 indicates a valley, and 0 indicates a point on the curve between -2 and 2, or vice versa.
averaged_trials
Averages values across
trials
avg_curve_slope
gets the average slope of a curve
(half wave)
between_trial_change
finds the difference in mean
activity between trials
centered_AUC
finds the area under each
curve
consecutive_trial_change
finds the difference in
mean activity between trial n and n+1
moving_window
summarizes data within windows of
time
inflect_points_df
finds provides a summary of the
data between each inflection point (half wave)
perc_baseline
calculates the percent change from a
user-specified baseline period
within_trial_change
finds the difference in mean
activity within a trial
averaged_trials
averages values over each time point,
across the specified trials. This is especially useful when blocking
groups of trials.
Let’s say we want to plot the averaged values of trials 1-5
avg_curve_slope
takes advantage of the ‘lm’ function to
get the average slope of a curve
Let’s say we wanted to find the average curve slopes of the waves in trial 2:
avg.slopes <- avg_curve_slope(Dataframe = GCaMP.form, Trial = 2)
print(avg.slopes[1:25])
#> [1] 4.7952454 -6.5501308 6.0529910 NA -0.8673469 2.6255412
#> [7] -2.2515783 1.5663381 -0.2961311 4.1880802 -2.1473810 4.2200754
#> [13] -7.1355861 1.2795264 -2.5091774 4.7090888 -6.6075358 7.0368126
#> [19] -2.4614273 6.9958416 -4.5331100 3.2473198 -4.5933131 18.3817832
#> [25] -13.3153301
between_trial_change
finds the difference in means
during the same time range between sets of trials
For example: we want to see how neural activity during the trial changes after manipulating the experimental variable. The control trials are 1-5, and the experimental trials are 6-10.
centered_AUC
centers input values using a user-specified
function such as the mean, then uses trapezoidal integration from the
‘pracma’ package to find the area under each curve.
Let’s say we wanted a metric besides the mean to measure neural activity to compare trials 2 and 4.
### Trial 2
centered_AUC(Dataframe = GCaMP.form, Trial = 2, FUN = mean)[1:10,]
#> Curve.Num AUC
#> 1 1 0.006891704
#> 2 2 0.004587406
#> 3 3 -0.026744846
#> 4 4 NA
#> 5 5 -0.003146397
#> 6 6 -0.009559396
#> 7 7 -0.001872021
#> 8 8 -0.004173221
#> 9 9 -0.001198623
#> 10 10 0.002160707
### Trial 4
centered_AUC(Dataframe = GCaMP.form, Trial = 4, FUN = mean)[1:10,]
#> Curve.Num AUC
#> 1 1 -0.103412748
#> 2 2 -0.158943291
#> 3 3 -0.007522804
#> 4 4 -0.005996655
#> 5 5 -0.058516248
#> 6 6 -0.052588348
#> 7 7 -0.023744899
#> 8 8 -0.036970504
#> 9 9 -0.024854394
#> 10 10 0.009037108
consecutive_trial_change
finds the difference in means
between consecutive trials during the same time range.
For example: we want to know how much the change in activity is along trials 1-10.
inflect_points_df
uses inflect_points
to
find the inflection points, then summarizes the data and returns a data
frame with the following variables: Time, raw (input) values, inflection
points, and the number of the respective curve.
inf.pts.df <- inflect_points_df(Dataframe = GCaMP.form, Trial = 1)
head(inf.pts.df, 6)
#> Time Raw.Values Curve.Num Inf.Pts
#> 1 -3.9902 82.689 1 0
#> 2 -3.9803 82.656 2 2
#> 3 -3.9705 82.650 2 0
#> 4 -3.9607 82.671 2 0
#> 5 -3.9508 82.712 2 0
#> 6 -3.9410 82.757 2 0
In differentiating between inflect_points
and
inflect_points_df
, notice that the purpose of this function
fully corresponds to its name. The output and first input are both data
frames.
moving_window
summarizes data within time windows of a
specified length, across a single trial.
Let’s say we want to find how the average fluorescence changes within trial 5 in 1 second intervals
moving_window(Dataframe = GCaMP.form, Trial = 5, Window.length = 1, FUN = mean)
#> Summary.vals Start.time Stop.time Window.num
#> 1 90.29526 -3.9902 -2.9902 1
#> 2 90.34658 -2.9902 -1.9902 2
#> 3 90.77381 -1.9902 -0.9902 3
#> 4 90.36636 -0.9902 0.0098 4
#> 5 87.85451 0.0098 1.0098 5
#> 6 88.40449 1.0098 2.0098 6
#> 7 88.26752 2.0098 3.0098 7
#> 8 89.60593 3.0098 4.0098 8
perc_baseline
calculates the percent change from a
baseline period specified by the user for all trials in the input matrix
or data frame. This outputs the same object, but with values transformed
to percent change from baseline. This is a good way for standardizing
data within trial periods; especially when the baseline period has low
standard deviations that cause inflated values in transforming into
z-scores.
within_trial_change
finds the change in mean values
between the beginning and end of the entered time ranges for a single
trial.
For example: we want to know how the mean activity changes between the first two seconds before epoc (baseline) and during the trial.
z_score
transforms input values into z-scores. This also
allows for a user-specified mean and standard deviation to compare
distributions.
Let’s say we wanted to see how the variability of baseline and trial compare by using a mean and standard deviation from a baseline period before epoc.
### Extract values
basevals <- GCaMP.form$Trial1[GCaMP.form$Time <= 0]
eventvals <- GCaMP.form$Trial1[GCaMP.form$Time > 0]
### Find baseline (pre-epoc) values
base.mu <- mean(basevals)
base.sigma <- sd(basevals)
### Compute centered z-scores
z.scores <- z_score(x = eventvals, mu = base.mu, sigma = base.sigma)
print(z.scores[1:30])
#> [1] 0.704283272 0.492798494 0.237857939 0.000299695 -0.176420462
#> [6] -0.292302532 -0.361831775 -0.422669862 -0.503787311 -0.645742847
#> [11] -0.860124677 -1.117962283 -1.381593993 -1.610461082 -1.746622515
#> [16] -1.781387136 -1.708960842 -1.532240685 -1.285991285 -1.033947782
#> [21] -0.828257108 -0.715272089 -0.723963244 -0.839845315 -1.033947782
#> [26] -1.265711923 -1.465608494 -1.581490565 -1.590181720 -1.506167219
Note that the return format is different from the base R ‘scale’ function, in that it does not create new attributes.
plot_trials
uses the base R graphics to create a quick
plot of the trial waves.
For example: we want to visualize the first 2 and last 2 trials
Note: this function automatically adjusts the x- and y- axes to fit all values. It also creates a legend for the corresponding trials.