Title: | Methods to Create Hierarchical Age Length Keys for Age Assignment |
---|---|
Description: | Provides methods for implementing hierarchical age length keys to estimate fish ages from lengths using data borrowing. Users can create hierarchical age length keys and use them to assign ages given length. |
Authors: | Paul Frater [aut, cre] |
Maintainer: | Paul Frater <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.5 |
Built: | 2024-10-29 06:36:53 UTC |
Source: | CRAN |
These functions performs two tasks. It lumps all ages greater than the
plus group into that age, and it filters data only to those greater than
or equal to the minimum age. adjust_plus_min_ages
works on a vector
whereas adjust_plus_min_ages_df
words on a data.frame
adjust_plus_min_ages_df(data, minage = NULL, pls_grp = NULL) adjust_plus_min_ages(age_vec, minage = NULL, pls_grp = NULL)
adjust_plus_min_ages_df(data, minage = NULL, pls_grp = NULL) adjust_plus_min_ages(age_vec, minage = NULL, pls_grp = NULL)
data |
Data with age as a column, or a numeric vector of ages |
minage |
Numeric. The minimum age; everything else is excluded |
pls_grp |
Numeric. The plus group; all ages older will be lumped into this group |
age_vec |
A vector of ages |
A data.frame similar to data
, but with ages less than
minage
excluded and ages >= plus_group
aggregated into that age
In order for the machine learning models to properly predict ages, the known ages should be converted to an ordered factor during model fitting. This will ensure that the predict.* functions return age values that actually make sense.
ages_as_ordered_factor(data, age_col = "age") ages_as_integer(data, age_col = "est.age")
ages_as_ordered_factor(data, age_col = "age") ages_as_integer(data, age_col = "est.age")
data |
A data.frame with a column corresopnding to |
age_col |
Character. The name of the column that contains ages |
A data.frame with the values in age_col
converted to an
ordered factor
Assign ages to non-aged data based on a fitted age model
assign_ages(newdata, object, ...)
assign_ages(newdata, object, ...)
newdata |
A vector or data.frame with size/length measurements |
object |
An object of class "alk", "halk_fit" as
produced by |
... |
Additional parameters to pass to the S3 object methods |
A data.frame the same as newdata
, but with ages assigned based
on the model provided in object
spp_alk <- make_halk(spp_data, levels = "spp") spp_est_ages <- assign_ages(spp_data, spp_alk)
spp_alk <- make_halk(spp_data, levels = "spp") spp_est_ages <- assign_ages(spp_data, spp_alk)
This is just a helper function to assign the needed attributes and classes
to a data.frame that is produced by either make_alk
or
make_halk
.
assign_alk_attributes( data, size_col = "length", age_col = "age", autobin = TRUE, size_bin = 1, min_age = NULL, plus_group = NULL, alk_n = NULL, classes = "alk", dnorm_params = NULL, levels = NULL )
assign_alk_attributes( data, size_col = "length", age_col = "age", autobin = TRUE, size_bin = 1, min_age = NULL, plus_group = NULL, alk_n = NULL, classes = "alk", dnorm_params = NULL, levels = NULL )
data |
A data.frame |
size_col |
Character. Name of the column representing sizes |
age_col |
Character. Name of the column representing ages |
autobin |
Logical to set the attribute of autobin |
size_bin |
Numeric. What is the width of size bins |
min_age |
Numeric. The minimum age that was included in the alk |
plus_group |
Numeric. The age that represents the plus group |
alk_n |
Numeric. The number of samples that went into creating the alk |
classes |
Character. The class that should get prepended to the data.frame class(es) |
dnorm_params |
The value of parameters that went into creating the normal distributions on the age groups |
levels |
Character vector of the levels used. This creates the "levels" attribute if present |
A data.frame with associated attributes assigned
A vector of NA will be returned that is the length of x
assign_na_age(x)
assign_na_age(x)
x |
Any vector of any length |
A vector the same length as x containing only NA values
This will take a vector of numeric values and bin them according to the value specified in binwidth
bin_lengths(x, binwidth, include_upper = FALSE, ...)
bin_lengths(x, binwidth, include_upper = FALSE, ...)
x |
Numeric vector of values |
binwidth |
Numeric vector specifying how wide the length bins should be |
include_upper |
Logical. Append the upper value of the bin and return the length range as a character string (TRUE), or return the lower value as numeric (FALSE, default) |
... |
Additional arguments passed onto |
A vector of values the same length as x, but binned to the values according to binwidth
bin_lengths(length_data$length, binwidth = 2)
bin_lengths(length_data$length, binwidth = 2)
These functions will calculate MSE and RMSE for estimated ages produced by
assign_ages
. Output is specific to each level used by the
age-length key to assign ages
calc_mse(data, age_col = "age") calc_rmse(data, age_col = "age")
calc_mse(data, age_col = "age") calc_rmse(data, age_col = "age")
data |
A data.frame as created by |
age_col |
Character. Name of the age column in |
Numeric value for estimated ages with no levels or a data.frame with a MSE or RMSE value for each level used to fit ages
wae_data <- spp_data[spp_data$spp == "walleye", ] alk <- make_alk(wae_data) wae_est_age <- assign_ages(wae_data, alk) calc_mse(wae_est_age) calc_rmse(wae_est_age)
wae_data <- spp_data[spp_data$spp == "walleye", ] alk <- make_alk(wae_data) wae_est_age <- assign_ages(wae_data, alk) calc_mse(wae_est_age) calc_rmse(wae_est_age)
This function is the engine for calc_mse
and
calc_rmse
. It was only created to remove the root
argument from the user in the main calc_mse
function
calc_mse_(data, age_col = "age", root = FALSE)
calc_mse_(data, age_col = "age", root = FALSE)
data |
A data.frame as created by |
age_col |
Character. Name of the age column in |
root |
Logical. computer MSE (FALSE, default) or RMSE (TRUE) |
Using these functions you can compute either a Kolmogorov-Smirnov (KS) statistic or a Chi-squared test statistic to compare estimated ages to actual ages. See details for how each test works and what is reported.
calc_ks_score( data, summary_fun = mean, age_col = "age", suppress_warnings = TRUE, return_val = "statistic", ... ) calc_chi_score( data, age_col = "age", suppress_warnings = TRUE, return_val = "statistic", ... )
calc_ks_score( data, summary_fun = mean, age_col = "age", suppress_warnings = TRUE, return_val = "statistic", ... ) calc_chi_score( data, age_col = "age", suppress_warnings = TRUE, return_val = "statistic", ... )
data |
A data.frame containing estimated ages as returned by
|
summary_fun |
Function used to compute summary statistics for
|
age_col |
Character string specifying the name of the age column |
suppress_warnings |
Logical. Should any warnings from the function
call to |
return_val |
Character. The name of the object to return from the given test |
... |
Additional arguments to pass to |
The KS test compares length distributions for each age class from known ages
against that of estimated ages computed by the assign_ages
function. The output is a summary value of the test statistics as specified
by summary_fun
.
The calc_chi_score
function performs a Chi-square test (using the
chisq.test
function) on the number of estimated and
actual ages for each age group.
A numeric value for each level that was used in the model to assign ages
halk <- make_halk(spp_data, levels = c("spp")) newdat <- laa_data newdat$spp <- "bluegill" pred_ages <- assign_ages(newdat, halk) calc_ks_score(pred_ages) calc_chi_score(pred_ages)
halk <- make_halk(spp_data, levels = c("spp")) newdat <- laa_data newdat$spp <- "bluegill" pred_ages <- assign_ages(newdat, halk) calc_ks_score(pred_ages) calc_chi_score(pred_ages)
These are just simple helper functions used within other functions that check to make sure that ages and lengths are present in the data and stop the fucntion call if they are missing
check_age_data(data, age_col) check_length_data(data, size_col)
check_age_data(data, age_col) check_length_data(data, size_col)
data |
A data.frame |
age_col |
Character. The column name for the age column in |
size_col |
Character. The column name for the size column in |
NULL. An error will be called if age/length data is missing
This is a non-exported function to check whether the model type specified is available and return a standardized version of the model name. This standardized version will then feed into a S3 method for the given model.
check_model_type(model)
check_model_type(model)
model |
A character string naming the model |
A standardized version of the model name, or an error if
model
doesn't exist yet
This is a method for comparing how "close" or "accurate" one curve is to another (reference) curve. The method works by dividing the area between the curves by the area under the reference curve. See Details for more information
integral_quotient( ref_curve_params, comp_curve_params, min_x, max_x, curve_fun = function(x, linf, k, t0) { out <- linf * (1 - exp(-k * (x - t0))) return(out) } )
integral_quotient( ref_curve_params, comp_curve_params, min_x, max_x, curve_fun = function(x, linf, k, t0) { out <- linf * (1 - exp(-k * (x - t0))) return(out) } )
ref_curve_params |
A list of named parameters for the reference curve (i.e. the standard that is being compared to) |
comp_curve_params |
A list of named parameters for the curve that is being compared |
min_x |
The minimum value across which to integrate |
max_x |
The maximum value across which to integrate |
curve_fun |
The function that is being compared. Defaults to an anonymous function that is the von Bertalanffy growth function. |
The integral quotient method provides a basis for comparison between two curves by dividing the area between the curves by the area under the reference curve (i.e. the quotient of integrals)
A value of the area between curves divided by the area under the reference curve
ref_curve_params <- list(linf = 60, k = 0.25, t0 = -0.5) comp_curve_params <- list(linf = 62, k = 0.25, t0 = -0.4) comp_curve2_params <- list(linf = 65, k = 0.25, t0 = -1) comp_curve_iq <- integral_quotient(ref_curve_params, comp_curve_params, 0, 10) comp_curve2_iq <- integral_quotient(ref_curve_params, comp_curve2_params, 0, 10) vbgf <- function (x, linf, k, t0) {linf * (1 - exp(-k * (x - t0)))} curve( vbgf(x, ref_curve_params$linf, ref_curve_params$k, ref_curve_params$t0), from = 0, to = 10, ylim = c(0, 60), xlab = "Age", ylab = "Length" ) curve( vbgf(x, comp_curve_params$linf, comp_curve_params$k, comp_curve_params$t0), add = TRUE, col = "blue" ) curve( vbgf(x, comp_curve2_params$linf, comp_curve2_params$k, comp_curve2_params$t0), add = TRUE, col = "red" ) text(9, 40, labels = paste0(comp_curve_iq, "%"), col = "blue") text(9, 43, labels = paste0(comp_curve2_iq, "%"), col = "red")
ref_curve_params <- list(linf = 60, k = 0.25, t0 = -0.5) comp_curve_params <- list(linf = 62, k = 0.25, t0 = -0.4) comp_curve2_params <- list(linf = 65, k = 0.25, t0 = -1) comp_curve_iq <- integral_quotient(ref_curve_params, comp_curve_params, 0, 10) comp_curve2_iq <- integral_quotient(ref_curve_params, comp_curve2_params, 0, 10) vbgf <- function (x, linf, k, t0) {linf * (1 - exp(-k * (x - t0)))} curve( vbgf(x, ref_curve_params$linf, ref_curve_params$k, ref_curve_params$t0), from = 0, to = 10, ylim = c(0, 60), xlab = "Age", ylab = "Length" ) curve( vbgf(x, comp_curve_params$linf, comp_curve_params$k, comp_curve_params$t0), add = TRUE, col = "blue" ) curve( vbgf(x, comp_curve2_params$linf, comp_curve2_params$k, comp_curve2_params$t0), add = TRUE, col = "red" ) text(9, 40, labels = paste0(comp_curve_iq, "%"), col = "blue") text(9, 43, labels = paste0(comp_curve2_iq, "%"), col = "red")
Simple age-structured population data with age and length records for each
individual. laa_data
represents a well-sampled age-length dataset,
whereas laa_data_low_n
is one with few total samples,
laa_data_low_age_n
is one with few samples in some ages,
and laa_data_few_ages
is a dataset with few age groups sampled.
Species specific datasets are similar, but with the prefix laa_
replaced by spp_
. These datasets contain species specific
length-at-age data
laa_data laa_data_low_n laa_data_low_age_n laa_data_few_ages spp_data spp_data_low_n spp_data_low_age_n spp_data_few_ages
laa_data laa_data_low_n laa_data_low_age_n laa_data_few_ages spp_data spp_data_low_n spp_data_low_age_n spp_data_few_ages
## 'laa_data' A data.frame with 244 rows and 2 columns:
Species, only applicable for spp_data_* data.frames
Age of individual
Length of individual (arbitrary units)
## 'laa_data_low_n' A data.frame with 27 rows and 2 columns:
## 'laa_data_low_age_n' A data.frame with 74 rows and 2 columns:
## 'laa_data_few_ages' A data.frame with 49 rows and 2 columns:
## 'spp_data' A data.frame with 1022 rows and 3 columns:
## 'spp_data_low_n' A data.frame with 87 rows and 3 columns:
## 'spp_data_low_age_n' A data.frame with 160 rows and 3 columns:
## 'spp_data_few_ages' A data.frame with 261 rows and 3 columns:
Simple vector and data.frame containing length measurements. These are used in examples for functions that assign ages.
length_data spp_length_data
length_data spp_length_data
## length data A data.frame with one column and 244 rows
Species, only in spp_length_data
Length of individual (arbitrary units)
## 'spp_length_data' A data.frame with 1022 rows and 2 columns:
Make an age-length key out of length-at-age data
make_alk( laa_data, sizecol = "length", autobin = TRUE, binwidth = 1, agecol = "age", min_age = NULL, plus_group = NULL, numcol = NULL, min_age_sample_size = 5, min_total_sample_size = min_age_sample_size * min_age_groups, min_age_groups = 5, warnings = TRUE )
make_alk( laa_data, sizecol = "length", autobin = TRUE, binwidth = 1, agecol = "age", min_age = NULL, plus_group = NULL, numcol = NULL, min_age_sample_size = 5, min_total_sample_size = min_age_sample_size * min_age_groups, min_age_groups = 5, warnings = TRUE )
laa_data |
A data.frame with length-at-age data |
sizecol |
Character string naming the column that holds size data |
autobin |
Logical. Should the function automatically assign length bins (default is TRUE) |
binwidth |
Numeric. If |
agecol |
Character string naming the column that holds age data |
min_age |
Numeric. All ages less than this value will not be used in ALK |
plus_group |
Numeric value of the oldest age to include in the ALK. All older individuals will be included in this plus group |
numcol |
Character string naming the column that holds numbers data |
min_age_sample_size |
Only applicable to alk models. The minimum number of samples that must be in each age group in order to create an alk |
min_total_sample_size |
Only applicable to alk models. The minimum number of samples that must be in data in order to create an alk |
min_age_groups |
Only applicable to alk models. The minimum number of age groups that must be in data in order to create an alk |
warnings |
Logical. Display warnings (TRUE, default) |
A data.frame containing the proportions of records for each size that are at each age.
make_alk(laa_data)
make_alk(laa_data)
This function creates a hierarchically nested age-length key that can be used to estimate age of an organism based on proportion of sampled organisms in each age group.
make_halk(data, levels = NULL, age_col = "age", size_col = "length", ...)
make_halk(data, levels = NULL, age_col = "age", size_col = "length", ...)
data |
A data.frame with age and size samples |
levels |
Character vector specifying the levels for HALK creation |
age_col |
Optional. String of the column name in |
size_col |
Optional. String of the column name in |
... |
Additional arguments passed to |
A tibble
with columns for each level and
a column called alk that houses the age-length key for that particular level
make_halk(spp_data, levels = "spp")
make_halk(spp_data, levels = "spp")
These are helper shortcut functions to determine if data meet the minimum desired number of age groups and/or sample sizes.
min_count_laa_data( data, sub_levels = NULL, min_age_sample_size = NULL, min_total_sample_size = NULL, min_age_groups = NULL ) min_age_groups(data, sub_levels = NULL, min_age_grps)
min_count_laa_data( data, sub_levels = NULL, min_age_sample_size = NULL, min_total_sample_size = NULL, min_age_groups = NULL ) min_age_groups(data, sub_levels = NULL, min_age_grps)
data |
Data.frame with length-at-age data |
sub_levels |
The levels at which to check |
min_age_sample_size |
Only applicable to alk models. The minimum number of samples that must be in each age group in order to create an alk |
min_total_sample_size |
Only applicable to alk models. The minimum number of samples that must be in data in order to create an alk |
min_age_groups |
Only applicable to alk models. The minimum number of age groups that must be in data in order to create an alk |
min_age_grps |
The minimum number of age groups that must be present in data to create an ALK |
A data.frame just like data
, but with samples excluded that
don't meet the required number of samples in min_sample_size
Simple helper function to rename size and age column names to age and length
rename_laa_cols( data, size_col = "length", age_col = "age", num_col = NULL, goback = FALSE )
rename_laa_cols( data, size_col = "length", age_col = "age", num_col = NULL, goback = FALSE )
data |
Any data.frame with some columns representing age and size |
size_col |
Character. The name of the column containing sizes |
age_col |
Character. The name of the column containing ages |
num_col |
Character. The name of the column containing number of individuals |
goback |
Logical. Reverse names once they've already been renamed |
A data.frame the same as data
, but with names changed
These helper functions just check to see if a species column exists in the data (designated as 'spp' or 'species'). If one of those columns exists, but the column name is not in the levels argument it will get added to levels.
is_spp_in_levels(levels) is_spp_in_data(data) spp_level(levels) rm_spp_level(levels) add_spp_level(data, levels)
is_spp_in_levels(levels) is_spp_in_data(data) spp_level(levels) rm_spp_level(levels) add_spp_level(data, levels)
levels |
The levels argument passed from |
data |
A data.frame with length-at-age data |
A character vector of levels possibly with 'spp' or 'species' added
Simple age-structured population with age and/or length records, but
expanded across multiple counties and waterbodies for tests and examples
in make_halk
used with levels.
wb_spp_laa_data wb_spp_length_data
wb_spp_laa_data wb_spp_length_data
## 'wb_spp_laa_data' A data.frame with 36,849 records and 5 columns
Species
Arbitrary example county name
Arbitrary example waterbody name nested within county
Age of individual, only in wb_spp_laa_data
Length of individual (arbitrary units)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 9182 rows and 4 columns.