bifurcatingr

Introduction

The bifurcating autiegressive (BAR) model is a useful tool for studying binary tree-structured data, often seen in cell-lineage studies. The “bifurcatingr” R package makes it easier to work with these models. It can estimate different orders of BAR models and improve accuracy through various correction methods. These methods include bootstrap-based corrections and linear bias functions. The package also helps generate and visualize data from BAR models and offers eight ways to estimate confidence intervals for the model’s coefficients, with or without bias correction.

Installation

To install the library from CRAN, use:

install.packages("bifurcatingr")

Usage

To use bifurcatingr in your R code, you must first load the library:

library(bifurcatingr)
#> Loading required package: fMultivar

Generating binary tree data

To generate binary tree data based on a the bifurcating autoregressive model (BAR) model, you can use the “bfa.tree.gen” function. Generating a binary tree with 127 observations based on a first-order BAR model with errors that follow a contaminated bivariate normal distribution with 20% contamination. A zero mean vector is used and the standard deviations are 1 and 2, respectively. The intercept is set to 10 and the autoregressive coefficient is 0.7.

dat<-bfa_tree_gen(n=127,p=1,s1=1,s2=2,r1=0.5,r2=0.5,g=0.2,intercept=10,ar_coef=c(0.7))
dat <- round(dat,1)
dat
#>   [1] 35.1 35.0 33.5 33.3 33.0 33.6 34.4 36.4 34.4 32.9 31.9 34.9 32.4 35.3 36.0
#>  [16] 34.9 35.8 34.6 34.2 34.0 34.0 32.5 31.4 35.5 34.0 32.1 31.4 34.5 35.9 35.1
#>  [31] 35.5 34.8 34.4 33.9 34.2 32.4 32.5 33.3 32.9 34.7 35.3 34.1 34.8 33.8 33.4
#>  [46] 34.0 32.7 36.4 36.1 33.5 33.5 31.9 33.0 32.2 31.8 35.2 34.3 33.7 34.5 34.5
#>  [61] 32.7 38.5 36.6 35.1 36.4 33.3 32.7 33.2 33.1 35.3 35.0 32.3 32.9 32.2 32.6
#>  [76] 34.5 34.5 32.4 32.2 31.6 33.2 34.6 35.4 34.4 33.8 33.8 34.2 35.2 35.6 35.1
#>  [91] 34.7 32.0 33.8 34.1 34.6 35.2 36.3 32.3 33.8 34.0 33.1 33.8 34.4 33.6 31.5
#> [106] 34.0 34.1 32.9 32.8 31.3 31.2 34.3 35.0 35.5 36.1 34.1 33.5 34.2 33.8 33.7
#> [121] 32.9 30.5 31.5 34.6 35.1 34.6 34.4

Graphing binary tree data

The function “bfa.tree.plot” is designed to represent the binary tree data usong igraph library. By making use of the cell lineage dataset found in Cowan and Staudte (1986) and stored in the bifurcatingr package as “ecoli”, executing the following command will generate the tree.

bfa_tree_plot(ecoli$lifetime, vertex.size = 10, shape = "circle", text.size = 0.3, vertex.color = "white",vertex.asp = 0.5, arrow.size = 0.2, arrow.width = 0.3, plot.margin = 0)

Graphing scatterplots

The BAR model can be visualized using a scatterplot depicting the relationship between observations at time “t” and their lagged observations. This is done through the “bfa.scatterplot” function. This function requires two inputs: the binary tree data as a numeric vector “z” and the lag order “p”. The following command generates a scatterplot for the genrated binary tree above based on the BAR(1) model:

bfa_scatterplot(dat,1)

The following command generat scatterplot based on the BAR(2) model for the same data above.

bfa_scatterplot(dat,2)

Least squares estimation for BAR models

The “bfa.ls” function is used for obtaining least squares estimates for the parameters of a BAR model of any order. This function yields the LS estimates for the BAR model coefficients along with their corresponding p-values for testing if the coefficient is different from zero, the correlation among the errors, the variance-covariance matrix of the estimates, and both Wald-type asymptotic and standard normal or percentile) bootstrap confidence intervals. The following command illustrates this function on the simulated data above “dat” when fitted by a BAR(1) model.

bfa_ls(dat,p=1,conf=TRUE,cov_matrix=TRUE,conf_level=0.9,p_value=TRUE)
#> $coef
#>      Intercept   X_[t/2]
#> [1,]  15.91095 0.5284153
#> 
#> $p_value
#>         Intercept      X_[t/2]
#> [1,] 6.163463e-06 3.036677e-07
#> 
#> $error_cor
#> [1] 0.6457192
#> 
#> $cov_matrix
#>            Intercept   X_[t/2]
#> Intercept 1573.19935 -46.08187
#> X_[t/2]    -46.08187   1.35213
#> 
#> $asymptotic_ci
#>                   5%        95%
#> Intercept 10.1217625 21.7001284
#> X_[t/2]    0.3586947  0.6981359
#> 
#> $bootstarp_ci
#>                  5%        95%
#> Intercept 9.1637014 22.6581894
#> X_[t/2]   0.3293414  0.7274892
#> 
#> $percentile_ci
#>                  5%        95%
#> Intercept 9.8836061 24.1909492
#> X_[t/2]   0.2866289  0.7062173

Least squares bias correction for the coefficients of BAR models

The “bfa.ls.bc” function provides bias-corrected LS estimates of the BAR model coefficients. Currently, four distinct bias correction techniques are available: single, double, and fast-double bootstrapping, along with the linear-bias-function approach. The “bfa.ls.bc” function to obtain the bias-corrected estimates of the AR coefficient in BAR(1) model based on the single bootstrap bias correction method for the simulated data above “dat” is as follows

bfa_ls_bc(dat,p = 1, method = "boot1",B1 = 999, B2 = 499)
#> [1] 0.5674185

The “bfa.ls.bc” function to obtain the bias-corrected estimates of the AR coefficient in BAR(1) model based on the fast double bootstrap bias correction method for the simulated data above “dat” is as follows

bfa_ls_bc(dat,p = 1, method = "boot2fast",B1 = 999, B2 = 499)
#> [1] 0.5654562

The “bfa.ls.bc” function to obtain the bias-corrected estimates of the AR coefficient in BAR(1) model based on the asymptotic linear bias-correction method for the simulated data above “dat” is as follows

bfa_ls_bc(dat, p=1, method="LBC")
#> [1] 0.563271

Confidence Intervals

The bifurcatingr package allows for constructing various types of confidence intervals based on the LS estimation of the BAR model coefficients. Both corrected and uncorrected bias confidence intervals are available through the “bfa.ls.bc.ci” function. For instance, the following command can be used to construct a 95% BCa confidence interval for the autoregressive coefficient of the BAR(1) model fitted to the simulated data abouve “dat”.

bfa_ls_bc_ci(dat, p=1, method="BCa", B=4)
#> $Bias_corrected_coef
#>         Estimates
#> X_[t/2]  0.536114
#> 
#> $BCa_ci
#>              2.5%     97.5%
#> X_[t/2] 0.4324038 0.6032837

To construct a 95% single bootstrap bias-corrected standard normal bootstrap confidence intervals for the autoregressive coefficients of the BAR(1) model for the simulated data above “dat” as follows.

bfa_ls_bc_ci(dat, p=1, method="boot1", B=4)
#> $Bias_corrected_coef
#>           [,1]
#> [1,] 0.6040107
#> 
#> $standard_normal_ci
#>              2.5%     97.5%
#> X_[t/2] 0.4319816 0.7760399

Where to go next

This tutorial is a brief introduction to bifurcatingr in R. We sincerely hope you enjoyed reading it and that it will be useful for your binary tree analysis based on the autoregressive models. For a detailed description of specific functions, see https://cran.r-project.org/package=bifurcatingr. For questions on how to use bifurcatingr, or to report a bug, please email Dr. Tamer Elbayoumi at email: .