Contributing to unmarked: guide to adding a new model to unmarked

Follow the steps in this guide to add a new model to the unmarked package. Note that the order can be adjusted based on your preferences. For instance, you can start with the likelihood function, as it forms the core of adding a model to unmarked, and then build the rest of the code around it. In this document, the steps are ordered as they would occur in an unmarked analysis workflow.

This guide uses the recently developed gdistremoval function for examples, mainly because most of the relevant code is in a single file instead of spread around. It also uses occu functions to show simpler examples that may be easier to understand.

Prerequisites and advices

  • Before you start coding, you should use git to version your code:

    • Fork the unmarked repository on Github
    • Make a new branch with your new function as the name
    • Add the new code
  • unmarked uses S4 for objects and methods - if you aren’t familiar with S4 you may want to consult a book or tutorial such as this one.

  • If you are unfamiliar with building a package in R, here are two tutorials that may help you: Karl Broman’s guide to building packages and the official R-project guide. If you are using RStudio, their documentation on writing package could also be useful, especially to understand how to use the Build pane.

  • To avoid complex debugging in the end, I suggest you to regularly install and load the package as you add new code. You can easily do so in RStudio in the Build pane, by clicking on “Install > Clean and install”. This will also allow you to test your functions cleanly.

  • Write tests and documentation as you add new functions, classes, and methods. This eases the task, avoiding the need to write everything at the end.

Organise the input data: design the unmarkedFrame object

Most model types in unmarked have their own unmarkedFrame, a specialized kind of data frame. This is an S4 object which contains, at a minimum, the response (y). It may also include site covariates, observation covariates, primary period covariates, and other info related to study design (such as distance breaks).

In some cases you may be able to use an existing unmarkedFrame subclass. You can list all the existing unmarkedFrame subclasses by running the following code:

showClass("unmarkedFrame")
## Class "unmarkedFrame" [package "unmarked"]
## 
## Slots:
##                                                                               
## Name:                  y           obsCovs          siteCovs           mapInfo
## Class:            matrix optionalDataFrame optionalDataFrame   optionalMapInfo
##                         
## Name:             obsToY
## Class:    optionalMatrix
## 
## Extends: "unmarkedFrameOrNULL"
## 
## Known Subclasses: 
## Class "unmarkedMultFrame", directly
## Class "unmarkedFrameDS", directly
## Class "unmarkedFrameOccu", directly
## Class "unmarkedFrameOccuFP", directly
## Class "unmarkedFrameOccuMulti", directly
## Class "unmarkedFramePCount", directly
## Class "unmarkedFrameMPois", directly
## Class "unmarkedFrameOccuCOP", directly
## Class "unmarkedFrameOccuMS", by class "unmarkedMultFrame", distance 2
## Class "unmarkedFrameOccuTTD", by class "unmarkedMultFrame", distance 2
## Class "unmarkedFrameG3", by class "unmarkedMultFrame", distance 2
## Class "unmarkedFramePCO", by class "unmarkedMultFrame", distance 2
## Class "unmarkedFrameGDR", by class "unmarkedMultFrame", distance 2
## Class "unmarkedFrameGMM", by class "unmarkedMultFrame", distance 3
## Class "unmarkedFrameGDS", by class "unmarkedMultFrame", distance 3
## Class "unmarkedFrameGPC", by class "unmarkedMultFrame", distance 3
## Class "unmarkedFrameGOccu", by class "unmarkedMultFrame", distance 3
## Class "unmarkedFrameMMO", by class "unmarkedMultFrame", distance 4
## Class "unmarkedFrameDSO", by class "unmarkedMultFrame", distance 4

You can have more information about each unmarkedFrame subclass by looking at the documentation of the function that was written to create the unmarkedFrame object of this subclass, for example with ?unmarkedFrameGDR, or on the package’s website.

Define the unmarkedFrame subclass for this model

Write the function that creates the unmarkedFrame object

Write the S4 methods associated with the unmarkedFrame object

Note that you may not have to write all of the S4 methods below. Most of them will work without having to re-write them, but you should test it to verify it. All the methods associated with unmarkedFrame objects are listed in the unmarkedFrame class documentation accessible with help("unmarkedFrame-class").

Specific methods

Here are methods you probably will have to rewrite.

Generic methods

Here are methods that you should test but probably will not have to rewrite. They are defined in the unmarkedFrame.R file, for the unmarkedFrame mother class.

  • coordinates
  • getY
  • numSites
  • numY
  • obsCovs
  • obsCovs<-
  • obsNum
  • obsToY
  • obsToY<-
  • plot
  • projection
  • show
  • siteCovs
  • siteCovs<-
  • summary

Methods to access new attributes

You may also need to add specific methods to allow users to access an attribute you added to your unmarkedFrame subclass.

  • For example, getL for unmarkedFrameOccuCOP

Fitting the model

The fitting function can be declined into three main steps: reading the unmarkedFrame object, maximising the likelihood, and formatting the outputs.

Inputs of the fitting function

  • R formulas for each submodel (e.g. state, detection). We have found over time it is better to have separate arguments per formula (e.g. the way gdistremoval does it) instead of a combined formula (e.g. the way occu does it).
  • data for the unmarkedFrame
  • Parameters for optim: optimisation algorithm (method), initial parameters, and other parameters (...)
  • engine parameter to call one of the implemented likelihood functions
  • Other model-specific settings, such as key functions or parameterizations to use

Read the unmarkedFrame object: write the getDesign method

Most models have their own getDesign function, an S4 method. The purpose of this method is to convert the information in the unmarkedFrame into a format usable by the likelihood function.

  • It generates design matrices from formulas and components of the unmarkedFrame.
  • It often also has code to handle missing values, such as by dropping sites that don’t have measurements, or giving the user warnings if covariates are missing, etc.

Writing the getDesign method is frequently the most tedious and difficult part of the work adding a new function.

The likelihood function

  • Inputs: a vector of parameter values, the response variable, design matrices, and other settings/required data
  • Outputs: a numeric, the negative log-likelihood
  • Should be written so it can be used with the optim() function
  • Models can have three likelihood functions : coded in R, in C++ and with TMB (which is technically in C++ too). Users can specify which likelihood function to use in the engine argument of the fitting function.

The R likelihood function: easily understandable

If you are mainly used to coding in R, you should probably start here. If users want to dig deeper into the likelihood of a model, it may be useful for them to be able to read the R code to calculate likelihood, as they may not be familiar with other languages. This likelihood function can be used only for fixed-effects models.

  • Example for occu
  • gdistremoval doesn’t have an R version of the likelihood function

The C++ likelihood function: faster

The C++ likelihood function is essentially a C++ version of the R likelihood function, also designed exclusively for fixed-effects models. This function uses the RcppArmadillo R package, presented here. In the C++ code, you can use functions of the Armadillo C++ library, documented here.

Your C++ function should be in a .cpp file in the ./src/ folder of the package. You do not need to write a header file (.hpp), nor do you need to compile the code by yourself as it is all handled by the RcppArmadillo package. To test if your C++ function runs and gives you the expected result, you can compile and load the function with Rcpp::sourceCpp(./src/nll_yourmodel.cpp), and then use it like you would use a R function: nll_yourmodel(params=params, arg1=arg1).

The TMB likelihood function: for random effects

#TODO

Organise the output data

unmarkedEstimate objects per submodel

Outputs from optim should be organized unto unmarkedEstimate (S4) objects, with one unmarkedEstimate per submodel (e.g. state, detection). These objects include the parameter estimates and other information about link functions etc.

The unmarkedEstimate class is defined here in the unmarkedEstimate.R file, and the unmarkedEstimate function is defined here, and is used to create new unmarkedEstimate objects. You normally will not need to create unmarkedEstimate subclass.

Design the unmarkedFit object

You’ll need to create a new unmarkedFit subclass for your model. The main component of unmarkedFit objects is a list of the unmarkedEstimates described above.

After you defined your unmarkedFit subclass, you can create the object in your fitting function.

The fitting function return this unmarkedFit object.

Test the complete fitting function process

  • Simulate some data using your model
  • Construct the unmarkedFrame
  • Provide formulas, unmarkedFrame, other options to your draft fitting function
  • Process them with getDesign
  • Pass results from getDesign as inputs to your likelihood function
  • Optimize the likelihood function
  • Check the resulting parameter estimates for accuracy

Write the methods associated with the unmarkedFit object

Develop methods specific to your unmarkedFit type for operating on the output of your model. Like for the methods associated with an unmarkedFrame object above, you probably will not have to re-write all of them, but you should test them to see if they work. All the methods associated with unmarkedFit objects are listed in the unmarkedFit class documentation accessible with help("unmarkedFit-class").

Specific methods

Those are methods you will want to rewrite, adjusting them for your model.

getP

The getP method (defined here) “back-transforms” the detection parameter (p the detection probability or λ the detection rate, depending on the model). It returns a matrix of the estimated detection parameters. It is called by several other methods that are useful to extract information from the unmarkedFit object.

simulate

The generic simulate method (defined here) calls the simulate_fit method that depends on the class of the unmarkedFit object, which depends on the model.

The simulate method can be used in two ways:

You should test both ways with your model.

plot

This method plots the results of your model. The generic plot method for unmarkedFit (defined here) plot the residuals of the model.

Generic methods

Here are methods that you should test but probably will not have to rewrite. They are defined in the unmarkedFit.R file, for the unmarkedFit mother class.

  • [
  • backTransform
  • coef
  • confint
  • fitted
  • getData
  • hessian
  • linearComb
  • mle
  • names
  • nllFun
  • parboot
  • nonparboot
  • predict
  • profile
  • residuals
  • sampleSize
  • SE
  • show
  • summary
  • update
  • vcov
  • logLik
  • LRT

Methods to access new attributes

You may also need to add specific methods to allow users to access an attribute you added to your unmarkedFit subclass.

For example, some methods are relevant for some type of models only:

  • getFP for occupancy models that account for false positives
  • getB for occupancy models that account for false positives
  • smoothed for colonization-extinction models
  • projected for colonization-extinction models

Update the NAMESPACE file

  • Add your fitting function to the functions export here
  • Add the new subclasses (unmarkedFrame, unmarkedFit) to the classes export here
  • Add the function you wrote to create your unmarkedFrame object to the functions export here
  • If you wrote new methods, for example to access new attributes for objects of a subclass, add them to the methods export here
  • If required, export other functions you created that may be called by users of the package

Write tests

Using testthat package, you need to write tests for your unmarkedFrame function, your fitting function, and methods described above. The tests should be fast, but cover all the key configurations.

Write your tests in the ./tests/testthat/ folder, creating a R file for your model. If you are using RStudio, you can run the tests of your file easily by clicking on the “Run tests” button. You can run all the tests by clicking on the “Test” button in the Build pane.

Write documentation

You need to write the documentation files for the new classes and functions you added. Documentation .Rd files are stored in the man folder. Here is a documentation on how to format your documentation.

Depending on how much you had to add, you may also need to update existing files:

  • If you added specific methods for your new unmarkedFrame class: add them to unmarkedFrame-class.Rd
  • If you added specific methods for your new unmarkedFit class: add them to unmarkedFit-class.Rd. The same goes for your new unmarkedFitList class in unmarkedFitList-class.Rd.
  • Add any specific function, method or class you created. For example, specific distance-sampling functions are documented in detFuns.Rd.

Add to unmarked

  • Send a pull request on Github
  • Probably fix a few things
  • Merged and done!