In this example we demonstrate how to set correlation structures for latent variables and random effects in gllvm. The species abundances are often correlated in time or spatially in community ecology. In gllvm, such correlation structures can now be set either for latent variables or comunity level random row effects.
Denote the abundance of the jth species (j ∈ {1, …, p}) at the ith site (i ∈ {1, …, n}) as yij. A set of k environmental variables, or experimental treatments, may also be recorded at each site and stored in the vector xi = (xi1, ..., xik). A common form for the GLLVM is then defined for the mean abundance μij:
g(μij) = ηij = rsa(i) + β0j + xi′βj + usu(i)′θj,
Now we can specify a correlation structure for latent variables usu(i) = (u1, su(i), …, ud, su(i))′
(d= number of latent
variables, defined by num.lv
in gllvm)
and/or community level row effects rsr(i).
The subscripts su(i)
defines the structure of the latent variables. For instance, assume that
we have a hierarchical sampling design with ns < n
sites and sites are sampled nt times in
consecutive years. Thus we could want to include a site level latent
variables such that su(i) = k
if sampling unit i is from
site k (k ∈ {1, …, ns}).
Similarly, we can set a structure for the row effects defining subscript
sr(i),
eg. for time sr(i) = t
if sampling unit i is from
sampling year t (t ∈ {1, …, nt}).
Then we can define the correlation structures for latent variables
and/or row effects: denote uq. = (uq1, ..., uqns)
uq. ∼ N(0, Σq), q = 1, …, d.
The covariance matrix Σq defines the
correlation structure and has unit variances on diagonal. The options
are identity (independent), AR(1) correlation, exponentially decaying
correlation structure and Matern correlation. Currently we can only use
the same structure for all latent variables q = 1, …, d, but the
parameters defining the strength of the correlation is estimated
uniquely for each latent variable.
Similarly, we can define the correlation structure for random row effect, denote r = (r1, ..., rnt), and set r ∼ N(0, Σr), where the covariance matrix Σr is a nt × nt matrix and defines the correlation structure for random row effect. For row effects we have the same options as we have for latent variables. The structure for the row effects can be different from the structure of the latent variables and this is illustrated in the case study below.
As an example, we use time series of sessile algal and invertebrate species from permanent transects (Arkema et al. 2009, Reed and Miller (2023)). Data consists of percent cover of 132 species from 11 sites which each have 2-8 transects, yearly recorded from 2000 to 2021. It also includes measurements of the environment; the number of stripes of giant kelp and percent rock. In total, data has 836 sampling units.
Species percent cover matrix is in Yabund
, for the
analyses we consider only sessile invertebrates and include species
observed at least 10 times. The rarest species are modeled as a sum.
## AB AL AMZO ANSP AR ARUD AS ATM AU BA BAEL BCAL BF BLD BN
## 1 0 0 0 0.0125 0 0 0.1125 0 0.0125 0.0000 0 0 0.0000 0 0.0125
## 2 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0125 0 0.0000
## 3 0 0 0 0.0000 0 0 0.0750 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 4 0 0 0 0.0000 0 0 0.0125 0 0.0000 0.0000 0 0 0.0125 0 0.0000
## 5 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 6 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 7 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 8 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 9 0 0 0 0.0000 0 0 0.0500 0 0.0000 0.0125 0 0 0.0000 0 0.0000
## 10 0 0 0 0.0000 0 0 0.0250 0 0.0000 0.0000 0 0 0.0125 0 0.0000
## BO BOW BPSE BR BRA
## 1 0 0 0 0.0250 0.0375
## 2 0 0 0 0.0250 0.0125
## 3 0 0 0 0.0375 0.0000
## 4 0 0 0 0.0000 0.0000
## 5 0 0 0 0.0125 0.0000
## 6 0 0 0 0.0000 0.0125
## 7 0 0 0 0.0000 0.0000
## 8 0 0 0 0.0000 0.0000
## 9 0 0 0 0.1000 0.0000
## 10 0 0 0 0.1000 0.0875
Yinvert <- Yabund[SPinfo$GROUP == "INVERT"]
Yinvert10 = Yinvert[,colSums(Yinvert>0, na.rm = TRUE)>9]
# Sum species that were observed less than 9 times,
Yinvert10$Sum_inv = rowSums(Yinvert[,(colSums(Yinvert>0, na.rm = TRUE)<=9)], na.rm = TRUE)
There is information about the species:
## SP_CODE GROUP COMMON_NAME SCIENTIFIC_NAME TAXON_KINGDOM
## 1 AB INVERT Coarse Sea Fir Hydroid Abietinaria spp. Animalia
## 2 AL INVERT Aggregating cup Coral Astrangia haimei Animalia
## 3 AMZO ALGAE <NA> Amphiroa beauvoisii Plantae
## 4 ANSP INVERT Aggregating anemone Anthopleura spp. Animalia
## 5 AR INVERT Sand Tunicate Eudistoma psammion Animalia
## 6 ARUD INVERT Octocoral Discophyton rudyi Animalia
## TAXON_PHYLUM TAXON_CLASS TAXON_ORDER TAXON_FAMILY TAXON_GENUS
## 1 Cnidaria Hydrozoa Leptothecata Sertulariidae Abietinaria
## 2 Cnidaria Anthozoa Scleractinia Rhizangiidae Astrangia
## 3 Rhodophyta Florideophyceae Corallinales Lithophyllaceae Amphiroa
## 4 Cnidaria Anthozoa Actiniaria Actiniidae Anthopleura
## 5 Chordata Ascidiacea Aplousobranchia Polycitoridae Eudistoma
## 6 Cnidaria Anthozoa Alcyonacea Alcyoniidae Discophyton
SPinfo10 = SPinfo[SPinfo$SP_CODE %in% colnames(Yinvert10),]
SPinfo10 = rbind(SPinfo10,c("Sum_inv","INVERT", rep(NA, ncol(SPinfo10)-2)))
And finally, information about the study design (site name, year, transect index) and environmental coefficients (average number of kelp fronds, percent rock). The distribution of kelp fronds is very skewed to the right so a log of the number of kelp fronds is used as a covariate instead. In addition, covariates are scaled.
## SITE YEAR TRANSECT KELP_FRONDS PERCENT_ROCKY
## 1 ABUR 2001 1 13.8750 35.0
## 2 ABUR 2001 2 0.0000 37.5
## 3 ABUR 2002 1 8.5125 10.0
## 4 ABUR 2002 2 0.3875 2.5
## 5 ABUR 2003 1 0.6000 0.0
## 6 ABUR 2003 2 0.0000 0.0
par(mfrow=c(1,2))
hist(Xenv$KELP_FRONDS, main = "KELP FRONDS")
Xenv$logKELP_FRONDS = log(Xenv$KELP_FRONDS+1)
Xenv$logKELP_FRONDSsc = scale(Xenv$logKELP_FRONDS)
Xenv$PERCENT_ROCKYsc = scale(Xenv$PERCENT_ROCKY)
hist(Xenv$logKELP_FRONDSsc, main = "log of KELP FRONDS")
For modeling percent cover data, has three options, beta for values (0,1), ordered beta for [0,1] and beta hurdle model for [0,1). Here we use beta-hurdle response model.