--- title: "Xcertainty Growth Curves" output: html_document: keep_md: yes description: > Here we provide an example of the Xcertainty growth_curve_sampler(). You'll learn how to use data containing individuals with replicate samples consisting of drone-based measurments and age information to fit a Von Bertalanffy-Putter growth curve in a Bayesian framework. vignette: > %\VignetteIndexEntry{Xcertainty Growth Curves} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} fontsize: 13pt author: "KC Bierlich & Josh Hewitt, CODEX" date: "2024-10-13" --- ## Introduction Before moving forward, be sure to first check out the [Xcertainty](Xcertainty.html) vignette on how to use the `independent_length_sampler()` for a proper introduction on how to use `Xcertainty`. This vignette focuses on the `growth_curve_sampler()`, which uses data containing individuals with replicate body length measurements and age information over time. This model fits a Von Bertalanffy-Putter growth curve to observations and incorporates measurement uncertainty associated with multiple drones following [Pirotta and Bierlich et al., 2024](https://doi.org/10.1111/gcb.17366). Von Bertalanffy-Putter growth curve equations involve three parameters: growth rate (*k*), asymptotic length (*A*), and the (theoretical) age when size is equal to 0 (*t0*). In this vignette, we'll use the `growth_curve_sampler()` to reproduce results derived from the data and methods described by [Pirotta and Bierlich et al., 2024](https://doi.org/10.1111/gcb.17366). We'll first load the Xcertainty package, as well as other packages we will use throughout this example. ``` ## ℹ Updating Xcertainty documentation ## ✖ Installed roxygen2 is older than the version used with this package ## ℹ You have "7.3.1" but you need "7.3.2" ## ℹ Loading Xcertainty ``` ```r library(Xcertainty) library(tidyverse) library(ggdist) library(coda) ``` ### Data We will use calibration and observation (whale) length and age data from [Pirotta and Bierlich et al., 2024](https://doi.org/10.1111/gcb.17366). This data includes imagery collected by five different UAS. Whale age's are estimated using photo-identification history and are labeled as either a 'known age' (if seen as a calf) or a 'minimum age' (based on the date of the first sighting).   #### Calibration data We'll use a calibration dataset consisting of measurements of a 1 m wooden board collected by five different UAS at various altitudes (13-62 m). ```r data('calibration') # sample size for each UAS table(calibration$uas) ``` ``` ## ## I2F I2O P3P P4P P4S ## 259 198 32 155 13 ``` ### parse_observations() We'll use `parse_observations()` to prepare the calibration and whale data. Measurements are often recorded in a wide-format dataframe, so parse_observations() converts to long-format data. ```r # parse calibration study calibration_data = parse_observations( x = calibration, subject_col = 'CO.ID', meas_col = 'Lpix', tlen_col = 'CO.L', image_col = 'image', barometer_col = 'Baro_Alt', laser_col = 'Laser_Alt', flen_col = 'Focal_Length', iwidth_col = 'Iw', swidth_col = 'Sw', uas_col = 'uas' ) ``` * This creates a list of three dataframes: + `calibration_data$pixel_counts`. + `calibration_data$training_objects`. + `calibration_data$image_info`. Next, we'll use `parse_observations()` to prepare the whale data. Note, that the `timepoint_col` is set to year since we are interested in the total body length of each individual summarized at the yearly scale. ```r data('whales') # parse field study whale_data = parse_observations( x = whales, subject_col = 'whale_ID', meas_col = 'TL.pix', image_col = 'Image', barometer_col = 'AltitudeBarometer', laser_col = 'AltitudeLaser', flen_col = 'FocalLength', iwidth_col = 'ImageWidth', swidth_col = 'SensorWidth', uas_col = 'UAS', timepoint_col = 'year' ) ```   Now the calibration and whale data are both ready. Time to set up the sampler.   ## The sampler Note that `combine_observations()` is used to combine the `parse_calibrations()` outputs, `calibration_data` and `whale_data`. We will use non-informative priors here. ```r sampler = growth_curve_sampler( data = combine_observations(calibration_data, whale_data), priors = list( image_altitude = c(min = 0.1, max = 130), altimeter_bias = rbind( data.frame(altimeter = 'Barometer', mean = 0, sd = 1e2), data.frame(altimeter = 'Laser', mean = 0, sd = 1e2) ), altimeter_variance = rbind( data.frame(altimeter = 'Barometer', shape = .01, rate = .01), data.frame(altimeter = 'Laser', shape = .01, rate = .01) ), altimeter_scaling = rbind( data.frame(altimeter = 'Barometer', mean = 0, sd = 1e1), data.frame(altimeter = 'Laser', mean = 0, sd = 1e1) ), pixel_variance = c(shape = .01, rate = .01), # priors from Agbayani et al. zero_length_age = c(mean = -5.09, sd = 0.4), growth_rate = c(mean = .18, sd = .01), # additional priors group_asymptotic_size = rbind( Female = c(mean = 12, sd = .5), Male = c(mean = 12, sd = .5) ), group_asymptotic_size_trend = rbind( Female = c(mean = 0, sd = 1), Male = c(mean = 0, sd = 1) ), subject_group_distribution = c(Female = .5, Male = .5), asymptotic_size_sd = c(min = 0, max = 10), min_calf_length = 3.5, # To model break points between 1990 and 2015 group_size_shift_start_year = c(min = 1990, max = 2015) ), subject_info = whale_info ) ``` ``` ## Joining with `by = join_by(altimeter)` ## Joining with `by = join_by(altimeter)` ## Joining with `by = join_by(altimeter)` ## Joining with `by = join_by(UAS, altimeter)` ## Defining model ## [Note] Detected use of non-constant indexes: subject_group[1], subject_group[2], subject_group[3], subject_group[4], subject_group[5], subject_group[6], ## subject_group[7], subject_group[8], subject_group[9], subject_group[10], subject_group[11], subject_group[12], subject_group[13], subject_group[14], ## subject_group[15], subject_group[16], subject_group[17], subject_group[18], subject_group[19], subject_group[20], subject_group[21], subject_group[22], ## subject_group[24], subject_group[25], subject_group[26], subject_group[27], subject_group[28], subject_group[29], subject_group[31], subject_group[32], ## subject_group[33], subject_group[34], subject_group[35], subject_group[36], subject_group[37], subject_group[38], subject_group[39], subject_group[40], ## subject_group[41], subject_group[42], subject_group[43], subject_group[44], subject_group[45], subject_group[46], subject_group[47], subject_group[48], ## subject_group[49], subject_group[50], ... For computational efficiency we recommend specifying these in 'constants'. ## Building model ## Setting data and initial values ## Running calculate on model [Note] Any error reports that follow may simply reflect missing values in model variables. ## Checking model sizes and dimensions ## Compiling [Note] This may take a minute. [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details. ``` ``` ## ===== Monitors ===== ## thin = 1: altimeter_bias, altimeter_scaling, altimeter_variance, asymptotic_size_sd, group_asymptotic_size, group_asymptotic_size_trend, group_size_shift_start_year, growth_rate, image_altitude, pixel_variance, subject_age_offset, subject_group, zero_length_age ## ===== Samplers ===== ## RW sampler (1452) ## - image_altitude[] (1302 elements) ## - zero_length_age ## - growth_rate ## - subject_age_offset[] (130 elements) ## - asymptotic_size_sd ## - group_size_shift_start_year ## - subject_asymptotic_size[] (8 elements) ## - object_length[] (8 elements) ## conjugate sampler (151) ## - altimeter_bias[] (8 elements) ## - altimeter_scaling[] (8 elements) ## - altimeter_variance[] (8 elements) ## - pixel_variance ## - group_asymptotic_size[] (2 elements) ## - group_asymptotic_size_trend[] (2 elements) ## - subject_asymptotic_size[] (122 elements) ## categorical sampler (32) ## - subject_group[] (32 elements) ## thin = 1: altimeter_bias, altimeter_scaling, altimeter_variance, asymptotic_size_sd, group_asymptotic_size, group_asymptotic_size_trend, group_size_shift_start_year, growth_rate, image_altitude, non_calf_length_age, object_length, pixel_variance, subject_age_offset, subject_asymptotic_size, subject_birth_year, subject_group, zero_length_age ``` ``` ## Compiling ## [Note] This may take a minute. ## [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details. ``` $nbsp; Now run the sampler! When exploring data outputs, 1e4 or 1e5 can be good place for exploration, as this won't take too much time to run. We recommend using 1e6 for the final analysis since 1e6 MCMC samples is often enough to get a reasonable posterior effective sample size. ```r output_growth = sampler(niter = 1e4) ``` ``` ## Sampling ``` ``` ## |-------------|-------------|-------------|-------------| ## |-------------------------------------------------------| ``` ``` ## Extracting altimeter output ``` ``` ## Extracting image output ``` ``` ## Extracting pixel error output ``` ``` ## Extracting object output ``` ``` ## Extracting growth curve model output ``` ``` ## Extracting summaries ``` ## View Sampler Outputs Our saved `output_growth` contains all the posterior samples and summaries from the sampler. Note, that there are many objects stored in `output_growth`, so it is best to view specific selections rather than viewing all of the objects stored in `output_growth` at once, as this can take a very long time to load and cause R to freeze.   You can view the posterior summaries (mean, sd, etc.) of total body length (TL) for each individual (Subject) at each Timepoint (year). Note that the `lower` and `upper` represent the 95% highest posterior density intervals (HPDI) of the posterior distribution for TL. ```r head(output_growth$summaries$objects) ``` ``` ## Subject Measurement Timepoint parameter mean sd lower upper ESS PSS ## 1 6 TL.pix 2020 length 13.46612 0.1229604 13.20953 13.70696 12.984632 5001 ## 2 6 TL.pix 2021 length 13.46671 0.1229490 13.20642 13.70382 12.981883 5001 ## 3 84 TL.pix 2017 length 12.29429 0.2611588 11.82116 12.76542 4.100977 5001 ## 4 84 TL.pix 2018 length 12.29633 0.2613655 11.82277 12.76716 4.091021 5001 ## 5 84 TL.pix 2020 length 12.29940 0.2616896 11.82643 12.77086 4.080244 5001 ## 6 87 TL.pix 2018 length 11.86318 0.2962057 11.24694 12.33210 5.230519 5001 ```   We can view the posterior summaries (mean, sd, and lower and upper 95% HPDI) of the posterior distribution. ```r output_growth$summaries$altimeters ``` ``` ## UAS altimeter parameter mean sd lower upper ESS PSS ## 1 I2F Barometer bias -0.7657916 0.42329017 -1.5548130 0.0366860 136.93684 5001 ## 2 I2F Barometer variance 3.5494599 0.28178490 3.0262148 4.1083104 423.01736 5001 ## 3 I2F Barometer scaling 0.9858973 0.01005701 0.9677114 1.0054144 116.34449 5001 ## 4 I2F Laser bias -0.2219479 0.87396877 -1.8762869 1.4532594 108.80657 5001 ## 5 I2F Laser variance 5.0199766 0.64363170 3.8207695 6.2813462 134.35557 5001 ## 6 I2F Laser scaling 0.9587486 0.02014123 0.9205360 0.9979327 101.68899 5001 ## 7 I2O Barometer bias 2.8099101 0.41705828 2.0660121 3.6801129 80.66925 5001 ## 8 I2O Barometer variance 2.8164330 0.23040459 2.3650334 3.2681158 244.04878 5001 ## 9 I2O Barometer scaling 0.9077929 0.01214954 0.8834502 0.9302366 57.57603 5001 ## 10 I2O Laser bias 1.8258200 0.52940856 0.7713228 2.8554556 115.96889 5001 ## 11 I2O Laser variance 2.4908160 0.29235744 1.9258622 3.0446303 173.42277 5001 ## 12 I2O Laser scaling 0.9205164 0.01578635 0.8916669 0.9546474 101.00099 5001 ## 13 P3P Barometer bias 9.1356589 1.37467876 6.5521453 11.8844912 71.67811 5001 ## 14 P3P Barometer variance 6.2525358 0.97228082 4.3888618 8.0744491 351.02109 5001 ## 15 P3P Barometer scaling 0.7033574 0.05354301 0.5923213 0.8001202 69.38209 5001 ## 16 P4P Barometer bias 11.1573089 0.73608444 9.8830030 12.7273361 48.20939 5001 ## 17 P4P Barometer variance 3.8350990 0.29518056 3.2626934 4.4080264 971.44248 5001 ## 18 P4P Barometer scaling 0.6317132 0.02969604 0.5682184 0.6842163 48.34513 5001 ## 19 P4P Laser bias 9.6090170 5.42354767 -0.4678810 20.1408459 57.40420 5001 ## 20 P4P Laser variance 37.0115577 7.22949843 23.9370206 51.4092210 3286.13379 5001 ## 21 P4P Laser scaling 0.7689800 0.21753218 0.3545531 1.1832203 56.06666 5001 ## 22 P4S Barometer bias 19.4632203 1.95577601 15.8607367 23.3741740 99.07682 5001 ## 23 P4S Barometer variance 9.8557192 1.76102283 6.8484457 13.4674244 3800.05856 5001 ## 24 P4S Barometer scaling 0.3129661 0.07861159 0.1599713 0.4616343 93.74616 5001 ``` We can view the posterior summaries of the pixel variance ```r output_growth$pixel_error$summary ``` ``` ## error parameter mean sd lower upper ESS PSS ## 1 pixel variance 12.29105 3.202259 6.652635 18.61714 23.98477 5001 ```   Let's view a preview of the posterior summaries of the growth curve parameters, including k, t0, A for males and females, and individual's birth year. ```r output_growth$summaries$growth_curve[1:10,] ``` ``` ## parameter mean sd lower upper ESS PSS ## 1 zero_length_age -6.49103598 0.170597788 -6.85508692 -6.22350029 1.649540 5001 ## 2 growth_rate 0.19337211 0.005476867 0.18487255 0.20150978 1.254382 5001 ## 3 Female group_asymptotic_size 12.89885200 0.206754090 12.51025734 13.27186801 8.267542 5001 ## 4 Male group_asymptotic_size 12.02597451 0.119321843 11.80133864 12.26157746 151.320452 5001 ## 5 Female group_asymptotic_size_trend -0.16827129 0.075617616 -0.31904100 -0.05961210 29.847742 5001 ## 6 Male group_asymptotic_size_trend -0.02400296 0.027155490 -0.06928842 0.03783454 85.784208 5001 ## 7 6 birth_year 1982.90534735 2.826534267 1976.76509911 1984.99880549 134.649951 5001 ## 8 84 birth_year 1986.28453959 4.911944566 1977.30208672 1988.99989043 36.250824 5001 ## 9 87 birth_year 1989.11808410 5.462169860 1977.22068263 1991.99798049 36.169864 5001 ## 10 89 birth_year 1990.00437717 2.887303540 1984.42877355 1991.98857888 128.013575 5001 ```   Let's now create a data frame with the posterior summaries for each individual's length. We will first save the total length output summaries, then save the birth year outputs, then combine both of these dataframes by `Subject`, and then join with `whale_info` to sync sex and AgeType Finally, we'll calculate the estimated age from the sample year from the estimated birth year. ```r # total length summary outputs for each subject sums_L <- output_growth$summaries$objects # birth year summary outputs for each subject birth_year <- output_growth$growth_curve$birth_year$summary %>% rename_with(~str_c("birth_year_", .), everything()) %>% separate(birth_year_parameter, c("Subject", "Parameter"), sep = " ") %>% dplyr::select(!"Parameter") # combine total length and birth year summary outputs. Then join with whale info to get sex, AgeType. Finally, calculated new estimated age. sum <- sums_L %>% left_join(birth_year, by = "Subject") %>% mutate(Year = as.integer(Timepoint)) %>% left_join(whale_info %>% mutate(Subject = as.factor(Subject)) %>% rename(sex = Group), by = c("Subject", "Year")) %>% relocate(Year, .before = Timepoint) %>% mutate(Age_est_mean = Year - birth_year_mean, Age_est_lower = Year - birth_year_upper, Age_est_upper = Year - birth_year_lower) ```   ## Plot Growth Curves We can view these results and plot the uncertainty associated with total body length (solid vertical line) and estimated age (dashed horizontal line). ```r ggplot() + theme_bw() + geom_pointrange(data = sum, aes(x = Age_est_mean, y = mean, ymin = lower, ymax = upper)) + geom_errorbarh(data = sum, aes(xmin = Age_est_lower, xmax = Age_est_upper, y = mean), lty = 2) + xlab("estimated age") + ylab("total body length (m)") ``` ![plot of chunk length&age](img/length&age-1.png)   Now we'll calculate the expected growth for male and females separately between ages 1-40. We'll first save the posterior samplers of the growth parameters and sex-based asymptotic lengths in a dataframe. We'll then calculate the expected length for male and females at each age using the Von Bertalanffy-Putter growth equation for each MCMC iteration. This will generate a distribution for the expected length at each age for male and females separately. We can then summarize these distributions to get the mean expected length and uncertainty as the 95% HPDI. ```r # create a dataframe of the output growth parameters pred_growth <- rbind(data.frame(t0 = output_growth$growth_curve$zero_length_age$samples, k = output_growth$growth_curve$growth_rate$samples, sex = "Female", A = output_growth$growth_curve$group_asymptotic_size$samples[,1]), data.frame( t0 = output_growth$growth_curve$zero_length_age$samples, k = output_growth$growth_curve$growth_rate$samples, sex = "Male", A = output_growth$growth_curve$group_asymptotic_size$samples[,2])) # write a loop to calculate the expected length for male and females between ages 1-40 for each MCMC iteration to create a distribution. Then calculate the mean and HPDIs from the distribution for each expected length. age_list <- seq(from = 1, to = 40, by = 1) sex_list <- c("Male", "Female") age_df <- data.frame() full_df <- data.frame() for (s in sex_list){ s_x = s for (i in age_list){ yr0 = i growth_filt = pred_growth %>% filter(sex == s_x) Exp.L = growth_filt$A * (1-exp(-growth_filt$k * (yr0 - growth_filt$t0))) Exp.L_mean = mean(Exp.L) Exp.L_lower = HPDinterval(mcmc(Exp.L))[1] Exp.L_upper = HPDinterval(mcmc(Exp.L))[2] temp_df <- data.frame(age = yr0, sex = s_x, Exp.L_mean, Exp.L_lower = Exp.L_lower, Exp.L_upper = Exp.L_upper) age_df <- rbind(age_df, temp_df) } full_df <- rbind(full_df, age_df) } ``` $nbsp; Now we can plot the results ```r ggplot() + theme_bw() + xlab("estimated age") + ylab("total body length (m)") + geom_ribbon(data = full_df, aes(x = age, ymin = Exp.L_lower, ymax = Exp.L_upper, fill = sex), alpha = 0.4) + geom_line(data = full_df, aes(x = age, y = Exp.L_mean, color = sex)) + scale_color_manual(values = c(Female = "lightblue3", Male = "darkorange")) + scale_fill_manual(values = c(Female = "lightblue3", Male = "darkorange")) + geom_pointrange(data = sum %>% filter(!is.na(sex)), aes(x = Age_est_mean, y = mean, ymin = lower, ymax = upper, color = sex)) + geom_errorbarh(data = sum, aes(xmin = Age_est_lower, xmax = Age_est_upper, y = mean, color = sex), lty = 3) ``` ![plot of chunk growth_curves](img/growth_curves-1.png)   Note, you can create custom samplers to use different growth equations (i.e., Gompertz, etc.). See [Xcertainty](https://github.com/MMI-CODEX/Xcertainty/tree/main) to learn more.