Title: | Generalized Joint Attribute Modeling |
---|---|
Description: | Analyzes joint attribute data (e.g., species abundance) that are combinations of continuous and discrete data with Gibbs sampling. Full model and computation details are described in Clark et al. (2018) <doi:10.1002/ecm.1241>. |
Authors: | James S. Clark, Daniel Taylor-Rodriquez |
Maintainer: | James S. Clark <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.6.2 |
Built: | 2024-12-16 07:00:26 UTC |
Source: | CRAN |
Inference and prediction for jointly distributed responses that
are combinations of continous and discrete data. Functions begin with 'gjam'
to avoid conflicts with other packages.
Package: | gjam |
Type: | Package |
Version: | 2.6.2 |
Date: | 2022-5-23 |
License: | GPL (>= 2) |
URL: | http://sites.nicholas.duke.edu/clarklab/code/ |
The generalized joint attribute model (gjam) analyzes multivariate data that are combinations of presence-absence, ordinal, continuous, discrete, composition, zero-inflated, and censored. It does so as a joint distribution over response variables. gjam provides inference on sensitivity to input variables, correlations between responses on the data scale, model selection, and prediction.
Importantly, analysis is done on the observation scale. That is, coefficients and covariances are interpreted on the same scale as the data. Contrast this approach with standard Generalized Linear Models, where coefficients and covariances are difficult to interpret and cannot be compared across responses that are modeled on different scales.
gjam was motivated by species distribution and abundance data in ecology, but can provide an attractive alternative to traditional methods wherever observations are multivariate and combine multiple scales and mixtures of continuous and discrete data.
gjam can be used to model ecological trait data, where species traits are translated to locations as community-weighted means and modes.
Posterior simulation is done by Gibbs sampling. Analysis is done by these functions, roughly in order of how frequently they might be used:
gjam
fits model with Gibbs sampling.
gjamSimData
simulates data for analysis by gjam
.
gjamPriorTemplate
sets up prior distribution for coefficients.
gjamSensitivity
evaluates sensitivity to predictors from gjam
.
gjamCensorY
defines censored values and intervals.
gjamTrimY
trims the response matrix and aggregates rare types.
gjamPlot
plots output from gjam
.
gjamSpec2Trait
constructs plot by trait matrix.
gjamPredict
does conditional prediction.
gjamConditionalParameters
obtains the conditional coefficient matrices.
gjamOrdination
ordinates the response matrix.
gjamDeZero
de-zeros response matrix for storage.
gjamReZero
recovers response matrix from de-zeroed format.
gjamIIE
evaluates indirect effects and interactions.
gjamIIEplot
plots indirect effects and interactions.
gjamSpec2Trait
generates trait values.
gjamPoints2Grid
aggregates incidence data to counts on a lattice.
Author: James S Clark, [email protected], Daniel Taylor-Rodriquez
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
Clark, J.S. 2016. Why species tell more about traits than traits tell us about species: Predictive models. Ecology 97, 1979-1993.
Taylor-Rodriguez, D., K. Kaufeld, E. M. Schliep, J. S. Clark, and A. E. Gelfand. 2016. Joint species distribution modeling: dimension eduction using Dirichlet processes. Bayesian Analysis, 12, 939-967. doi: 10.1214/16-BA1031.
gjam
,
gjamSimData
,
gjamPriorTemplate
,
gjamSensitivity
,
gjamCensorY
,
gjamTrimY
,
gjamPredict
,
gjamSpec2Trait
,
gjamPlot
,
gjamPredict
,
gjamConditionalParameters
,
gjamDeZero
,
gjamReZero
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
Analyzes joint attribute data (e.g., species abundance) with Gibbs sampling.
Input can be output from gjamSimData
. Returns a list of objects from Gibbs sampling that can be plotted by gjamPlot
.
gjam(formula, xdata, ydata, modelList) ## S3 method for class 'gjam' print(x, ...) ## S3 method for class 'gjam' summary(object, ...)
gjam(formula, xdata, ydata, modelList) ## S3 method for class 'gjam' print(x, ...) ## S3 method for class 'gjam' summary(object, ...)
formula |
R formula for model, e.g., |
xdata |
|
ydata |
|
modelList |
|
x |
object of |
object |
currently, also an object of |
... |
further arguments not used here. |
Note that formula
begins with ~
, not y ~
. The response matrix
is passed in the form of a n
by S
matrix or data.frame
ydata
.
Both predictors in xdata
and responses in ydata
can include missing values as NA
. Factors in xdata
should be declared using factor
. For computational stability variables that are not factors are standardized by mean and variance, then transformed back to original scales in output
. To retain a variable in its original scale during computation include it in the character string notStandard
as part of the list modelList
. (example shown in the vignette
on traits).
modelList
has these defaults and provides these options:
ng = 2000
, number of Gibbs steps.
burnin = 500
, no. initial steps, must be less than ng
.
typeNames
can be 'PA'
(presenceAbsence), 'CON'
(continuous on (-Inf, Inf)
), 'CA'
(continuous abundance, zero censoring), 'DA'
(discrete abundance), 'FC'
(fractional composition),
'CC'
(count composition), 'OC'
(ordinal counts), 'CAT'
(categorical classes). typeNames
can be a single value that applies to all columns in ydata
, or there can be one value for each column.
holdoutN = 0
, number of observations to hold out for out-of-sample
prediction.
holdoutIndex = numeric(0)
, numeric vector
of observations (row numbers) to holdout for out-of-sample prediction.
censor = NULL
, list
specifying columns, values, and intervals for
censoring, see gjamCensorY
.
effort = NULL
, list
containing 'columns'
, a vector of length <= S
giving the names of columns in in y
, and 'values'
, a length-n
vector of effort or a n
by S
matrix (see Examples). effort
can be plot area, search time, etc. for discrete count data 'DA'
.
FULL = F
in modelList
will save full prediction chains in $chains$ygibbs
.
notStandard = NULL
, character vector
of column names in xdata
that should not be standardized.
reductList = list(N = 20, r = 3)
, list
of dimension reduction parameters, invoked when reductList
is included in modelList
or automatically when ydata
has too many columns. See vignette
on Dimension Reduction.
random
, character
string giving the name of a column in xdata
that will be used to specify random effects. The random group column should be declared as a factor
. There should be replication, i.e., each group level occurs multiple times.
REDUCT = F
in modelList
overrides automatic dimension reduction.
FCgroups, CCgroups
, are length-S vectors
assigned to columns in ydata
indicating composition 'FC'
or 'CC'
group membership. For example, if there are two 'CA' columns in ydata
followed by two groups of fractional composition data, each having three columns, then typeNames = c('CA','CA','FC','FC','FC','FC','FC','FC')
and FCgroups = c(0,0,1,1,1,2,2,2)
. note: gjamSimData
is not currently set up to simulate multiple composition groups, but gjam
will model it.
PREDICTX = T
executes inverse prediction of x
. Speed-up by setting PREDICTX = F
.
ematAlpha = .5
is the probability assigned for conditional and marginal independence in the ematrix
.
traitList = list(plotByTrait, traitTypes, specByTrait)
, list of trait objects. See vignette on Trait analysis.
More detailed vignettes can be obtained with:
browseVignettes('gjam')
Returns an object of class "gjam"
, which is a list containing the following components:
call |
function call |
chains |
|
fit |
|
inputs |
|
missing |
|
modelList |
|
parameters |
|
prediction |
|
If traits are modeled, then parameters
will additionally include betaTraitMu
, betaTraitSe
(coefficients), sigmaTraitMu
, sigmaTraitSe
(covariance). prediction
will additionally include tMuOrd
(ordinal trait means), tMu
, tSe
(trait predictions).
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs, 87, 34-56.
gjamSimData
simulates data
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
website 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: ## combinations of scales types <- c('DA','DA','OC','OC','OC','OC','CON','CON','CON','CON','CON','CA','CA','PA','PA') f <- gjamSimData(S = length(types), typeNames = types) ml <- list(ng = 500, burnin = 50, typeNames = f$typeNames) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) summary(out) # repeat with ng = 5000, burnin = 500, then plot data: pl <- list(trueValues = f$trueValues) gjamPlot(out, plotPars = pl) ## discrete abundance with heterogeneous effort S <- 5 n <- 1000 eff <- list( columns = 1:S, values = round(runif(n,.5,5),1) ) f <- gjamSimData(n, S, typeNames='DA', effort=eff) ml <- list(ng = 2000, burnin = 500, typeNames = f$typeNames, effort = eff) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) summary(out) # repeat with ng = 2000, burnin = 500, then plot data: pl <- list(trueValues = f$trueValues) gjamPlot(out, plotPars = pl) ## End(Not run)
## Not run: ## combinations of scales types <- c('DA','DA','OC','OC','OC','OC','CON','CON','CON','CON','CON','CA','CA','PA','PA') f <- gjamSimData(S = length(types), typeNames = types) ml <- list(ng = 500, burnin = 50, typeNames = f$typeNames) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) summary(out) # repeat with ng = 5000, burnin = 500, then plot data: pl <- list(trueValues = f$trueValues) gjamPlot(out, plotPars = pl) ## discrete abundance with heterogeneous effort S <- 5 n <- 1000 eff <- list( columns = 1:S, values = round(runif(n,.5,5),1) ) f <- gjamSimData(n, S, typeNames='DA', effort=eff) ml <- list(ng = 2000, burnin = 500, typeNames = f$typeNames, effort = eff) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) summary(out) # repeat with ng = 2000, burnin = 500, then plot data: pl <- list(trueValues = f$trueValues) gjamPlot(out, plotPars = pl) ## End(Not run)
Returns a list
with censored values, intervals, and censored response matrix y
.
gjamCensorY(values, intervals, y, type='CA', whichcol = c(1:ncol(y)))
gjamCensorY(values, intervals, y, type='CA', whichcol = c(1:ncol(y)))
values |
Values in |
intervals |
|
y |
Response |
type |
Response type, see |
whichcol |
Columns in |
Any values in y
that fall within censored intervals
are replaced with censored values
. The example below simulates data collected on an 'octave scale': 0, 1, 2, 4, 8, ..., an approach to accelerate data collection with approximate bins.
Returns a list containing two elements.
y |
n by S matrix updated with censored values substituted for
those falling within |
censor |
|
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
gjamSimData
simulates data
gjam
analyzes data
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
website 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: # data in octaves v <- up <- c(0, 2^c(0:4), Inf) dn <- c(-Inf, v[-length(v)]) i <- rbind( dn, up ) # intervals f <- gjamSimData(n = 2000, S = 15, Q = 3, typeNames='CA') y <- f$y cc <- c(3:6) # censored columns g <- gjamCensorY(values = v, intervals = i, y = y, whichcol = cc) y[,cc] <- g$y # replace columns ml <- list(ng = 500, burnin = 100, censor = g$censor, typeNames = f$typeNames) output <- gjam(f$formula, xdata = f$xdata, ydata = y, modelList = ml) #repeat with ng = 2000, burnin = 500, then: pl <- list(trueValues = f$trueValues, width = 3, height = 3) gjamPlot(output, pl) # upper detection limit up <- 5 v <- up i <- matrix(c(up,Inf),2) rownames(i) <- c('down','up') f <- gjamSimData(typeNames='CA') g <- gjamCensorY(values = v, intervals = i, y = f$y) ml <- list(ng = 500, burnin = 100, censor = g$censor, typeNames = f$typeNames) out <- gjam(f$formula, xdata = f$xdata, ydata = g$y, modelList = ml) #repeat with ng = 2000, burnin = 500, then: pl <- list(trueValues = f$trueValues, width = 3, height = 3) gjamPlot(out, pl) # lower detection limit lo <- .001 values <- upper <- lo intervals <- matrix(c(-Inf,lo),2) rownames(intervals) <- c('lower','upper') ## End(Not run)
## Not run: # data in octaves v <- up <- c(0, 2^c(0:4), Inf) dn <- c(-Inf, v[-length(v)]) i <- rbind( dn, up ) # intervals f <- gjamSimData(n = 2000, S = 15, Q = 3, typeNames='CA') y <- f$y cc <- c(3:6) # censored columns g <- gjamCensorY(values = v, intervals = i, y = y, whichcol = cc) y[,cc] <- g$y # replace columns ml <- list(ng = 500, burnin = 100, censor = g$censor, typeNames = f$typeNames) output <- gjam(f$formula, xdata = f$xdata, ydata = y, modelList = ml) #repeat with ng = 2000, burnin = 500, then: pl <- list(trueValues = f$trueValues, width = 3, height = 3) gjamPlot(output, pl) # upper detection limit up <- 5 v <- up i <- matrix(c(up,Inf),2) rownames(i) <- c('down','up') f <- gjamSimData(typeNames='CA') g <- gjamCensorY(values = v, intervals = i, y = f$y) ml <- list(ng = 500, burnin = 100, censor = g$censor, typeNames = f$typeNames) out <- gjam(f$formula, xdata = f$xdata, ydata = g$y, modelList = ml) #repeat with ng = 2000, burnin = 500, then: pl <- list(trueValues = f$trueValues, width = 3, height = 3) gjamPlot(out, pl) # lower detection limit lo <- .001 values <- upper <- lo intervals <- matrix(c(-Inf,lo),2) rownames(intervals) <- c('lower','upper') ## End(Not run)
Conditional parameters quantify the direct effects of predictors including those that come through other species.
gjamConditionalParameters(output, conditionOn, nsim = 2000)
gjamConditionalParameters(output, conditionOn, nsim = 2000)
output |
object of |
conditionOn |
a |
nsim |
number of draws from the posterior distribution. |
Responses in ydata
are random with a joint distribution that comes through the residual covariance having mean matrix parameters$sigMu
and standard error matrix parameters$sigSe
. Still, it can be desirable to use some responses, along with covariates, as predictors of others. The responses (columns) in ydata
are partitioned into two groups, a group to condition on (the names included in character vector conditionOn
) and the remaining columns. conditionOn
gives the names of response variables (colnames
for ydata
). The conditional distribution is parameterized as the sum of effects that come directly from predictors in xdata
, in a matrix C
, and from the other responses, i.e., those in conditionOn
, a matrix A
. A third matrix P
holds the conditional covariance. If dimension reduction is used in model fitting, then there will some redundancy in conditional coefficients.
See examples below.
Amu |
posterior mean for matrix |
Ase |
standard error for matrix |
Atab |
parameter summary for matrix |
Cmu |
posterior mean for matrix |
Cse |
standard error for matrix |
Ctab |
parameter summary for matrix |
Pmu |
posterior mean for matrix |
Pse |
standard error for matrix |
Ptab |
parameter summary for matrix |
James S Clark, [email protected]
Qiu, T., S. Shubhi, C. W. Woodall, and J.S. Clark. 2021. Niche shifts from trees to fecundity to recruitment that determine species response to climate change. Frontiers in Ecology and Evolution 9, 863. 'https://www.frontiersin.org/article/10.3389/fevo.2021.719141'.
gjamSimData
simulates data
gjam
fits the model
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
web site 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: f <- gjamSimData(n = 200, S = 10, Q = 3, typeNames = 'CA') ml <- list(ng = 2000, burnin = 50, typeNames = f$typeNames, holdoutN = 10) output <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) # condition on three species gjamConditionalParameters( output, conditionOn = c('S1','S2','S3') ) ## End(Not run)
## Not run: f <- gjamSimData(n = 200, S = 10, Q = 3, typeNames = 'CA') ml <- list(ng = 2000, burnin = 50, typeNames = f$typeNames, holdoutN = 10) output <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) # condition on three species gjamConditionalParameters( output, conditionOn = c('S1','S2','S3') ) ## End(Not run)
Returns a de-zeroed (sparse matrix) version of matrix ymat
with objects needed to re-zero it.
gjamDeZero(ymat)
gjamDeZero(ymat)
ymat |
|
Many abundance data sets are mostly zeros. gjamDeZero
extacts non-zero elements for storage.
Returns a list containing the de-zeroed ymat
as a vector yvec
.
yvec |
non-zero elements of |
n |
no. rows of |
S |
no. cols of |
index |
index for non-zeros |
ynames |
column names of |
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2016. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
gjamReZero
to recover ymat
browseVignettes('gjam')
website 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/fungEnd.RData?raw=True") ymat <- gjamReZero(fungEnd$yDeZero) # OTUs stored without zeros length(fungEnd$yDeZero$yvec) # size of stored version length(ymat) # full size yDeZero <- gjamDeZero(ymat) length(yDeZero$yvec) # recover de-zeroed vector ## End(Not run)
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/fungEnd.RData?raw=True") ymat <- gjamReZero(fungEnd$yDeZero) # OTUs stored without zeros length(fungEnd$yDeZero$yvec) # size of stored version length(ymat) # full size yDeZero <- gjamDeZero(ymat) length(yDeZero$yvec) # recover de-zeroed vector ## End(Not run)
Fills in predictor, response, and effort matrics for time series data where there are multiple multivariate time series. Time series gjam is here https://htmlpreview.github.io/?https://github.com/jimclarkatduke/gjam/blob/master/gjamTimeMsVignette.html
gjamFillMissingTimes(xdata, ydata, edata, groupCol, timeCol, groupVars = groupCol, FILLMEANS = FALSE, typeNames = NULL, missingEffort = .1)
gjamFillMissingTimes(xdata, ydata, edata, groupCol, timeCol, groupVars = groupCol, FILLMEANS = FALSE, typeNames = NULL, missingEffort = .1)
xdata |
|
ydata |
|
edata |
|
groupCol |
column name in |
timeCol |
column name in |
groupVars |
|
FILLMEANS |
fill new rows in |
typeNames |
typenames current limited to |
missingEffort |
effort assigned to missing values of |
Missing times in the data occur where there are gaps in timeCol
column of xdata
and the initial time 0
for each sequence. New versions of the data have NA
(xdata
) or prior values with appropriate weight (ydata
). Missing times are filled in xdata, ydata, edata
, including a time 0
which serves as a prior mean for ydata
for time code1. The group and time indices in columns groupCol
and timeCol
of xdata
reference the time for a given time series. Missing values in the columns groupVars
of xdata
are filled automatically filled in. This assumes that values for these variables are fixed for the group. If FILLMEANS
, the missing values in ydata
are filled with means for the group and given a low weight specified in missingEffort
.
A list containing the following:
xdata |
filled version of |
ydata |
filled version of |
edata |
filled version of |
timeList |
time indices used for computation, including, |
James S Clark, [email protected]
Clark, J. S., C. L. Scher, and M. Swift. 2020. The emergent interactions that govern biodiversity change. Proceedings of the National Academy of Sciences, 117, 17074-17083.
gjam
for more on xdata
, ydata
, and effort.
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
web site 'http://sites.nicholas.duke.edu/clarklab/code/'.
Evaluates direct, indirect, and interactions from a gjam
object. Returns a list
of objects that can be plotted by gjamIIEplot
.
gjamIIE(output, xvector, MEAN = T, keepNames = NULL, omitY = NULL, sdScaleX = T, sdScaleY = F)
gjamIIE(output, xvector, MEAN = T, keepNames = NULL, omitY = NULL, sdScaleX = T, sdScaleY = F)
output |
object of |
xvector |
vector of predictor values, with names, corresponding to columns in |
MEAN |
|
omitY |
|
keepNames |
|
sdScaleX |
standardize coefficients to X scale. |
sdScaleY |
standardize coefficients to correlation scale. |
For plotting or recovering effects. The list fit$IIE
has matrices for main effects (mainEffect
), interactions (intEffect
), direct effects (dirEffect
), indirect effects (indEffectTo
), and standard deviations for each. The direct effects are the sum of main effects and interactions. The indirect effects include main effects and interactions that come through other species, determined by covariance matrix sigma
.
If sdScaleX = T
effects are standandardized from the Y/X to Y scale. This is the typical standardization for predictor variables. If sdScaleY = T
effects are given on the correlation scale. If both are true effects are dimensionless. See the gjam vignette on dimension reduction.
A list of objects for plotting by gjamIIEplot
.
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2016. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
gjamIIEplot
plots output from gjamIIE
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
web site 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: sim <- gjamSimData(S = 12, Q = 5, typeNames = 'CA') ml <- list(ng = 50, burnin = 5, typeNames = sim$typeNames) out <- gjam(sim$formula, sim$xdata, sim$ydata, modelList = ml) xvector <- colMeans(out$inputs$xStand) #predict at mean values for data xvector[1] <- 1 fit <- gjamIIE(output = out, xvector) gjamIIEplot(fit, response = 'S1', effectMu = c('main','ind'), effectSd = c('main','ind'), legLoc = 'topleft') ## End(Not run)
## Not run: sim <- gjamSimData(S = 12, Q = 5, typeNames = 'CA') ml <- list(ng = 50, burnin = 5, typeNames = sim$typeNames) out <- gjam(sim$formula, sim$xdata, sim$ydata, modelList = ml) xvector <- colMeans(out$inputs$xStand) #predict at mean values for data xvector[1] <- 1 fit <- gjamIIE(output = out, xvector) gjamIIEplot(fit, response = 'S1', effectMu = c('main','ind'), effectSd = c('main','ind'), legLoc = 'topleft') ## End(Not run)
Using the object returned by gjamIIEplot
generates a plot for a response variable.
gjamIIEplot(fit, response, effectMu, effectSd = NULL, ylim = NULL, col='black', legLoc = 'topleft', cex = 1)
gjamIIEplot(fit, response, effectMu, effectSd = NULL, ylim = NULL, col='black', legLoc = 'topleft', cex = 1)
fit |
object from |
response |
name of a column in fit$y to plot. |
effectMu |
character vector of mean effects to plot, can include |
effectSd |
character vector can include all or some of |
ylim |
vector of two values defines vertical axis range. |
col |
vector of colors for barplot. |
legLoc |
character for legend location. |
cex |
font size. |
For plotting direct effects, interactions, and indirect effects from an object fit
generated by gjamIIE
. The character vector supplied as effectMu
can include main effects ('main'
), interactions ('int'
), main effects plus interactions ('direct'
), and/or indirect effects ('ind'
). The list effectSd
draws 0.95 predictive intervals for all or some of the effects listed in effectM
u. Bars are contributions of each effect to the response
.
For factors, effects are plotted relative to the mean over all factor levels.
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
gjamIIE
generates output for gjamIIEplot
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
web site 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: f <- gjamSimData(S = 10, Q = 6, typeNames = 'OC') ml <- list(ng = 50, burnin = 5, typeNames = f$typeNames) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) xvector <- colMeans(out$inputs$xStand) #predict at mean values for data, standardized x xvector[1] <- 1 fit <- gjamIIE(out, xvector) gjamIIEplot(fit, response = 'S1', effectMu = c('main','ind'), effectSd = c('main','ind'), legLoc = 'topleft') ## End(Not run)
## Not run: f <- gjamSimData(S = 10, Q = 6, typeNames = 'OC') ml <- list(ng = 50, burnin = 5, typeNames = f$typeNames) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) xvector <- colMeans(out$inputs$xStand) #predict at mean values for data, standardized x xvector[1] <- 1 fit <- gjamIIE(out, xvector) gjamIIEplot(fit, response = 'S1', effectMu = c('main','ind'), effectSd = c('main','ind'), legLoc = 'topleft') ## End(Not run)
Ordinate data from a gjam object using correlation corresponding to reponse matrix E
.
gjamOrdination(output, specLabs = NULL, col = NULL, cex = 1, PLOT=T, method = 'PCA')
gjamOrdination(output, specLabs = NULL, col = NULL, cex = 1, PLOT=T, method = 'PCA')
output |
object of |
specLabs |
|
col |
|
cex |
text size in plot. |
PLOT |
|
method |
|
Ordinates the response correlation ematrix
contained in output$parameterTables
. If method = 'PCA'
returns eigenvalues and eigenvectors. If method = 'PCA'
returns three NMDS dimensions. If PLOT
, then plots will be generated. Uses principle components analysis or non-metric multidimensional scale (NMDS).
eVecs |
|
eValues |
If |
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
gjam
fits the data
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
website 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: f <- gjamSimData(S = 30, typeNames = 'CA') ml <- list(ng = 1000, burnin = 200, typeNames = f$typeNames, holdoutN = 10) output <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) ePCA <- gjamOrdination(output, PLOT=TRUE) eNMDS <- gjamOrdination(output, PLOT=TRUE, method='NMDS') ## End(Not run)
## Not run: f <- gjamSimData(S = 30, typeNames = 'CA') ml <- list(ng = 1000, burnin = 200, typeNames = f$typeNames, holdoutN = 10) output <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) ePCA <- gjamOrdination(output, PLOT=TRUE) eNMDS <- gjamOrdination(output, PLOT=TRUE, method='NMDS') ## End(Not run)
Constructs plots of posterior distributions, predictive distributions,
and additional analysis from output of gjam
.
gjamPlot(output, plotPars)
gjamPlot(output, plotPars)
output |
object of |
plotPars |
|
plotPars
a list
that can contain the following, listed with default values:
PLOTY = T |
plot predicted y . |
PLOTX = T |
plot inverse predicted x . |
PREDICTX = T |
inverse prediction of x ; does not work if PREDICTX = F in link{gjam} . |
ncluster |
number of clusters to highlight in
cluster diagrams, default based on S . |
CORLINES = T |
draw grid lines on grid plots of R and E. |
cex = 1 |
text size for grid plots, see par . |
BETAGRID = T |
draw grid of beta coefficients. |
PLOTALLY = F |
an individual plot for each column in y . |
SMALLPLOTS = T |
avoids plot margin error on some devices, better appearance if FALSE . |
GRIDPLOTS = F |
cluster and grid plots derived from parameters; matrices R and E are discussed in Clark et al. (2016). |
SAVEPLOTS = F |
plots saved in pdf format. |
outfolder = 'gjamOutput' |
folder for plot files if SAVEPLOTS = T . |
width, height = 4 |
can be small values, in inches, to avoid plot margin error on some devices. |
specColor = 'black' |
color for posterior box-and-whisker plots. |
ematAlpha = .95 |
prob threshold used to infer that a covariance value in Emat is not zero. |
ncluster = 4 |
number of clusters to identify in ematrix . |
The 'plot margin' errors mentioned above are device-dependent. They can be avoided by specifying small width, height
(in inches) and by omitting the grid plots (GRIDPLOTS = F
). If plotting does not produce a 'plot margin error', better appearance is obtained with SMALLPLOTS = F
.
Names will not be legible for large numbers of species. Specify specLabs = F
and use a character vector for specColor
to identify species groups (see the gjam vignette on dimension reduction).
Box and whisker plots bound 0.68 and 0.95 credible and predictive intervals.
Summary tables of parameter estimates are:
betaEstimates |
Posterior summary of beta coefficients. |
clusterIndex |
cluster index for responses in grid/cluster plots. |
clusterOrder |
order for responses in grid/cluster plots. |
eComs |
groups based on clustering |
ematrix |
|
eValues |
eigenvalues of |
eVecs |
eigenvectors of |
fit |
|
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
gjam
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
website 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: ## ordinal data f <- gjamSimData(S = 15, Q = 3, typeNames = 'OC') ml <- list(ng = 1500, burnin = 500, typeNames = f$typeNames, holdoutN = 10) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) # repeat with ng = 2000, burnin = 500, then plot data here: pl <- list(trueValues = f$trueValues, width=3, height=2) fit <- gjamPlot(output = out, plotPars = pl) ## End(Not run)
## Not run: ## ordinal data f <- gjamSimData(S = 15, Q = 3, typeNames = 'OC') ml <- list(ng = 1500, burnin = 500, typeNames = f$typeNames, holdoutN = 10) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) # repeat with ng = 2000, burnin = 500, then plot data here: pl <- list(trueValues = f$trueValues, width=3, height=2) fit <- gjamPlot(output = out, plotPars = pl) ## End(Not run)
From point pattern data in (x, y) generates counts on a lattice supplied by the user or specified by lattice size or density. For analysis in gjam as counts (known effort) or count composition (unknown effort) data
.
gjamPoints2Grid(specs, xy, nxy = NULL, dxy = NULL, predGrid = NULL, effortOnly = TRUE)
gjamPoints2Grid(specs, xy, nxy = NULL, dxy = NULL, predGrid = NULL, effortOnly = TRUE)
specs |
|
xy |
|
nxy |
length-2 |
dxy |
length-2 |
predGrid |
|
effortOnly |
|
For incidence data with species names specs
and locations (x, y)
constructs a lattice based a prediction grid predGrid
, at a density of (dx, dy)
, or with numbers of lattice points (nx, ny)
. If effortOnly = T
, returns only points with non-zero values.
A prediction grid predGrid
would be passed when counts by locations of known effort are required or where multiple groups should be assign to the same lattice points.
The returned gridBySpec
can be analyzed in gjam
with known effort as count data "DA"
or with unknown effort as count composition data "CC"
.
gridBySpec |
|
predGrid |
|
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2016. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
gjam
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
## Not run: ## random data n <- 100 s <- sample( letters[1:3], n, replace = TRUE) xy <- cbind( rnorm(n,0,.2), rnorm(n,10,2) ) nx <- ny <- 5 # uniform 5 X 5 lattice f <- gjamPoints2Grid(s, xy, nxy = c(nx, ny)) plot(f$predGrid[,1], f$predGrid[,2], cex=.1, xlim=c(-1,1), ylim=c(0,20), xlab = 'x', ylab = 'y') text(f$predGrid[,1], f$predGrid[,2], rowSums(f$gridBySpec)) dx <- .2 # uniform density dy <- 1.5 g <- gjamPoints2Grid(s, xy, dxy = c(dx, dy)) text(g$predGrid[,1], g$predGrid[,2], rowSums(g$gridBySpec), col='brown') p <- cbind( runif(30, -1, 1), runif(30, 0, 20) ) # irregular lattice h <- gjamPoints2Grid(s, xy, predGrid = p) text(h$predGrid[,1], h$predGrid[,2], rowSums(h$gridBySpec), col='blue') ## End(Not run)
## Not run: ## random data n <- 100 s <- sample( letters[1:3], n, replace = TRUE) xy <- cbind( rnorm(n,0,.2), rnorm(n,10,2) ) nx <- ny <- 5 # uniform 5 X 5 lattice f <- gjamPoints2Grid(s, xy, nxy = c(nx, ny)) plot(f$predGrid[,1], f$predGrid[,2], cex=.1, xlim=c(-1,1), ylim=c(0,20), xlab = 'x', ylab = 'y') text(f$predGrid[,1], f$predGrid[,2], rowSums(f$gridBySpec)) dx <- .2 # uniform density dy <- 1.5 g <- gjamPoints2Grid(s, xy, dxy = c(dx, dy)) text(g$predGrid[,1], g$predGrid[,2], rowSums(g$gridBySpec), col='brown') p <- cbind( runif(30, -1, 1), runif(30, 0, 20) ) # irregular lattice h <- gjamPoints2Grid(s, xy, predGrid = p) text(h$predGrid[,1], h$predGrid[,2], rowSums(h$gridBySpec), col='blue') ## End(Not run)
Predicts data from a gjam object, including conditional and out-of-sample prediction.
gjamPredict(output, newdata = NULL, y2plot = NULL, ylim = NULL, FULL = FALSE)
gjamPredict(output, newdata = NULL, y2plot = NULL, ylim = NULL, FULL = FALSE)
output |
object of |
newdata |
a |
y2plot |
|
ylim |
vector of lower and upper bounds for prediction plot |
FULL |
will return full chains for predictions as |
If newdata
is not specified, the response is predicted from xdata
as an in-sample prediction. If newdata
is specified, prediction is either conditional or out-of-sample.
Conditional prediction on a new set of y
values is done if newdata
includes the matrix ycondData
, which holds columns to condition on. ycondData
must be a matrix
and have column names matching those in y
that it will replace. ycondData
must have at least one column, but fewer than ncol(y)
columns. Columns not included in ycondData
will be predicted conditionally. Note that conditional prediction can be erratic if the observations on which the prediction is conditioned are unlikely given the model.
Alternatively, the list newdata
can include a new version of xdata
for out-of-sample prediction. The version of xdata
passed in newdata
has the columns with the same names and variable types as xdata
passed to gjam
. Note that factor levels must also match those included when fitting the model. All columns in y
will be predicted out-of-sample.
For count composition data the effort (total count) is 1000.
Because there is no out-of-sample effort for 'CC'
data, values are predicted on the [0, 1] scale.
See examples below.
x |
design matrix. |
sdList |
|
piList |
predictive intervals, only generated if |
prPresent |
|
ematrix |
effort |
ychains |
full prediction chains if |
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2016. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs, 87, 34-56.
gjamSimData
simulates data
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
web site 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: S <- 10 f <- gjamSimData(n = 200, S = S, Q = 3, typeNames = 'CC') ml <- list(ng = 1500, burnin = 50, typeNames = f$typeNames, holdoutN = 10) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) # predict data cols <- c('#018571', '#a6611a') par(mfrow=c(1,3),bty='n') gjamPredict(out, y2plot = colnames(f$ydata)) # predict the data in-sample title('full sample') # out-of-sample prediction xdata <- f$xdata[1:20,] xdata[,3] <- mean(f$xdata[,3]) # mean for x[,3] xdata[,2] <- seq(-2,2,length=20) # gradient x[,2] newdata <- list(xdata = xdata, nsim = 50 ) p1 <- gjamPredict( output = out, newdata = newdata) # plus/minus 1 prediction SE, default effort = 1000 x2 <- p1$x[,2] ylim <- c(0, max(p1$sdList$yMu[,1] + p1$sdList$yPe[,1])) plot(x2, p1$sdList$yMu[,1],type='l',lwd=2, ylim=ylim, xlab='x2', ylab = 'Predicted', col = cols[1]) lines(x2, p1$sdList$yMu[,1] + p1$sdList$yPe[,1], lty=2, col = cols[1]) lines(x2, p1$sdList$yMu[,1] - p1$sdList$yPe[,1], lty=2, col = cols[1]) # upper 0.95 prediction error lines(x2, p1$piList$yLo[,1], lty=3, col = cols[1]) lines(x2, p1$piList$yHi[,1], lty=3, col = cols[1]) title('SE and prediction, Sp 1') # conditional prediction, assume first species is absent ydataCond <- out$inputs$y[,1,drop=FALSE]*0 # set first column to zero newdata <- list(ydataCond = ydataCond, nsim=50) p0 <- gjamPredict(output = out, newdata = newdata) ydataCond <- ydataCond + 10 # first column is 10 newdata <- list(ydataCond = ydataCond, nsim=50) p1 <- gjamPredict(output = out, newdata = newdata) s <- 4 # species chosen at random to compare ylim <- range( p0$sdList$yMu[,s], p1$sdList$yMu[,s] ) plot(out$inputs$y[,s],p0$sdList$yMu[,s], cex=.2, col=cols[1], xlab = 'Observed', ylab = 'Predicted', ylim = ylim) abline(0,1,lty=2) points(out$inputs$y[,s],p1$sdList$yMu[,s], cex=.2, col=cols[2]) title('Cond. on 1st Sp') legend( 'topleft', c('first species absent', 'first species = 10'), text.col = cols, bty = 'n') # conditional, out-of-sample n <- 1000 S <- 10 f <- gjamSimData(n = n, S = S, Q = 3, typeNames = 'CA') holdOuts <- sort( sample(n, 50) ) xdata <- f$xdata[-holdOuts,] # fitted data ydata <- f$ydata[-holdOuts,] xx <- f$xdata[holdOuts,] # use for prediction yy <- f$ydata[holdOuts,] ml <- list(ng = 2000, burnin = 500, typeNames = f$typeNames) # fit the non-holdouts out <- gjam(f$formula, xdata, ydata, modelList = ml) cdex <- sample(S, 4) # condition on 4 species ndex <- c(1:S)[-cdex] # conditionally predict others newdata <- list(xdata = xx, ydataCond = yy[,cdex], nsim = 200) # conditionally predict out-of-sample p2 <- gjamPredict(output = out, newdata = newdata) par(bty='n', mfrow=c(1,1)) plot( as.matrix(yy[,ndex]), p2$sdList$yMu[,ndex], xlab = 'Observed', ylab = 'Predicted', cex=.3, col = cols[1]) abline(0,1,lty=2) title('RMSPE') mspeC <- sqrt( mean( (as.matrix(yy[,ndex]) - p2$sdList$yMu[,ndex])^2 ) ) #predict unconditionally, out-of-sample newdata <- list(xdata = xx, nsim = 200 ) p1 <- gjamPredict(out, newdata = newdata) points( as.matrix(yy[,ndex]), p1$sdList$yMu[,ndex], col=cols[2], cex = .3) mspeU <- sqrt( mean( (as.matrix(yy[,ndex]) - p1$sdList$yMu[,ndex])^2 ) ) e1 <- paste( 'cond, out-of-sample =', round(mspeC, 2) ) e2 <- paste( 'uncond, out-of-sample =', round(mspeU, 2) ) legend('topleft', c(e1, e2), text.col = cols, bty = 'n') ## End(Not run)
## Not run: S <- 10 f <- gjamSimData(n = 200, S = S, Q = 3, typeNames = 'CC') ml <- list(ng = 1500, burnin = 50, typeNames = f$typeNames, holdoutN = 10) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) # predict data cols <- c('#018571', '#a6611a') par(mfrow=c(1,3),bty='n') gjamPredict(out, y2plot = colnames(f$ydata)) # predict the data in-sample title('full sample') # out-of-sample prediction xdata <- f$xdata[1:20,] xdata[,3] <- mean(f$xdata[,3]) # mean for x[,3] xdata[,2] <- seq(-2,2,length=20) # gradient x[,2] newdata <- list(xdata = xdata, nsim = 50 ) p1 <- gjamPredict( output = out, newdata = newdata) # plus/minus 1 prediction SE, default effort = 1000 x2 <- p1$x[,2] ylim <- c(0, max(p1$sdList$yMu[,1] + p1$sdList$yPe[,1])) plot(x2, p1$sdList$yMu[,1],type='l',lwd=2, ylim=ylim, xlab='x2', ylab = 'Predicted', col = cols[1]) lines(x2, p1$sdList$yMu[,1] + p1$sdList$yPe[,1], lty=2, col = cols[1]) lines(x2, p1$sdList$yMu[,1] - p1$sdList$yPe[,1], lty=2, col = cols[1]) # upper 0.95 prediction error lines(x2, p1$piList$yLo[,1], lty=3, col = cols[1]) lines(x2, p1$piList$yHi[,1], lty=3, col = cols[1]) title('SE and prediction, Sp 1') # conditional prediction, assume first species is absent ydataCond <- out$inputs$y[,1,drop=FALSE]*0 # set first column to zero newdata <- list(ydataCond = ydataCond, nsim=50) p0 <- gjamPredict(output = out, newdata = newdata) ydataCond <- ydataCond + 10 # first column is 10 newdata <- list(ydataCond = ydataCond, nsim=50) p1 <- gjamPredict(output = out, newdata = newdata) s <- 4 # species chosen at random to compare ylim <- range( p0$sdList$yMu[,s], p1$sdList$yMu[,s] ) plot(out$inputs$y[,s],p0$sdList$yMu[,s], cex=.2, col=cols[1], xlab = 'Observed', ylab = 'Predicted', ylim = ylim) abline(0,1,lty=2) points(out$inputs$y[,s],p1$sdList$yMu[,s], cex=.2, col=cols[2]) title('Cond. on 1st Sp') legend( 'topleft', c('first species absent', 'first species = 10'), text.col = cols, bty = 'n') # conditional, out-of-sample n <- 1000 S <- 10 f <- gjamSimData(n = n, S = S, Q = 3, typeNames = 'CA') holdOuts <- sort( sample(n, 50) ) xdata <- f$xdata[-holdOuts,] # fitted data ydata <- f$ydata[-holdOuts,] xx <- f$xdata[holdOuts,] # use for prediction yy <- f$ydata[holdOuts,] ml <- list(ng = 2000, burnin = 500, typeNames = f$typeNames) # fit the non-holdouts out <- gjam(f$formula, xdata, ydata, modelList = ml) cdex <- sample(S, 4) # condition on 4 species ndex <- c(1:S)[-cdex] # conditionally predict others newdata <- list(xdata = xx, ydataCond = yy[,cdex], nsim = 200) # conditionally predict out-of-sample p2 <- gjamPredict(output = out, newdata = newdata) par(bty='n', mfrow=c(1,1)) plot( as.matrix(yy[,ndex]), p2$sdList$yMu[,ndex], xlab = 'Observed', ylab = 'Predicted', cex=.3, col = cols[1]) abline(0,1,lty=2) title('RMSPE') mspeC <- sqrt( mean( (as.matrix(yy[,ndex]) - p2$sdList$yMu[,ndex])^2 ) ) #predict unconditionally, out-of-sample newdata <- list(xdata = xx, nsim = 200 ) p1 <- gjamPredict(out, newdata = newdata) points( as.matrix(yy[,ndex]), p1$sdList$yMu[,ndex], col=cols[2], cex = .3) mspeU <- sqrt( mean( (as.matrix(yy[,ndex]) - p1$sdList$yMu[,ndex])^2 ) ) e1 <- paste( 'cond, out-of-sample =', round(mspeC, 2) ) e2 <- paste( 'uncond, out-of-sample =', round(mspeU, 2) ) legend('topleft', c(e1, e2), text.col = cols, bty = 'n') ## End(Not run)
Constructs coefficient matrices for low and high limits on the uniform prior distribution for beta
.
gjamPriorTemplate(formula, xdata, ydata, lo = NULL, hi = NULL)
gjamPriorTemplate(formula, xdata, ydata, lo = NULL, hi = NULL)
formula |
object of class |
xdata |
|
ydata |
|
lo |
|
hi |
|
The prior distribution for a coefficient beta[q,s]
for predictor q
and response s
, is dunif(lo[q,s], hi[q,s])
. gjamPriorTemplate
generates these matrices. The default values are (-Inf, Inf
), i.e., all values in lo
equal to -Inf
and hi
equal to Inf
. These templates can be modified by changing specific values in lo
and/or hi
.
Alternatively, desired lower limits can be passed as the list lo
, assigned to names in xdata
(same limit for all species in ydata
), in ydata
(same limit for all predictors in xdata
), or both, separating names in xdata
and ydata
by "_"
. The same convention is used for upper limits in hi
.
These matrices are supplied in as list betaPrior
, which is included in modelList
passed to gjam
. See examples and browseVignettes('gjam')
.
Note that the informative prior slows computation.
A list
containing two matrices. lo
is a Q x S matrix
of lower coefficient limits. hi
is a Q x S matrix
of upper coefficient limits. Unless specied in lo
, all values in lo = -Inf
. Likewise, unless specied in hi
, all values in hiBeta = -Inf
.
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/forestTraits.RData?raw=True") xdata <- forestTraits$xdata plotByTree <- gjamReZero(forestTraits$treesDeZero) # re-zero traitTypes <- forestTraits$traitTypes specByTrait <- forestTraits$specByTrait tmp <- gjamSpec2Trait(pbys = plotByTree, sbyt = specByTrait, tTypes = traitTypes) tTypes <- tmp$traitTypes traity <- tmp$plotByCWM censor <- tmp$censor formula <- as.formula(~ temp + deficit) lo <- list(temp_gmPerSeed = 0, temp_dioecious = 0 ) # positive effect on seed size, dioecy b <- gjamPriorTemplate(formula, xdata, ydata = traity, lo = lo) ml <- list(ng=3000, burnin=1000, typeNames = tTypes, censor = censor, betaPrior = b) out <- gjam(formula, xdata, ydata = traity, modelList = ml) S <- ncol(traity) sc <- rep('black',S) sc[colnames(traity) pl <- list(specColor = sc) gjamPlot(output = out, plotPars = pl) ## End(Not run)
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/forestTraits.RData?raw=True") xdata <- forestTraits$xdata plotByTree <- gjamReZero(forestTraits$treesDeZero) # re-zero traitTypes <- forestTraits$traitTypes specByTrait <- forestTraits$specByTrait tmp <- gjamSpec2Trait(pbys = plotByTree, sbyt = specByTrait, tTypes = traitTypes) tTypes <- tmp$traitTypes traity <- tmp$plotByCWM censor <- tmp$censor formula <- as.formula(~ temp + deficit) lo <- list(temp_gmPerSeed = 0, temp_dioecious = 0 ) # positive effect on seed size, dioecy b <- gjamPriorTemplate(formula, xdata, ydata = traity, lo = lo) ml <- list(ng=3000, burnin=1000, typeNames = tTypes, censor = censor, betaPrior = b) out <- gjam(formula, xdata, ydata = traity, modelList = ml) S <- ncol(traity) sc <- rep('black',S) sc[colnames(traity) pl <- list(specColor = sc) gjamPlot(output = out, plotPars = pl) ## End(Not run)
Returns a re-zeroed matrix y
from the de-zeroed vector, a sparse matrix.
gjamReZero( yDeZero )
gjamReZero( yDeZero )
yDeZero |
|
Many abundance data sets are mostly zeros. gjamReZero
recovers the full matrix from de-zeroed list yDeZero
written by gjamDeZero
ymat |
re-zeroed |
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2016. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
gjamDeZero
to de-zero ymat
browseVignettes('gjam')
website: 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/fungEnd.RData?raw=True") ymat <- gjamReZero(fungEnd$yDeZero) # OTUs stored without zeros length(fungEnd$yDeZero$yvec) # size of stored version length(ymat) # full size ## End(Not run)
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/fungEnd.RData?raw=True") ymat <- gjamReZero(fungEnd$yDeZero) # OTUs stored without zeros length(fungEnd$yDeZero$yvec) # size of stored version length(ymat) # full size ## End(Not run)
Evaluates sensitivity coefficients for full response matrix or subsets of it.
Uses output from gjam
. Returns a matrix
of samples by predictors.
gjamSensitivity(output, group = NULL, nsim = 100, PERSPECIES = TRUE)
gjamSensitivity(output, group = NULL, nsim = 100, PERSPECIES = TRUE)
output |
object fitted with |
group |
|
nsim |
number of samples from posterior distribution. |
PERSPECIES |
divide variance by number of species in the group |
Sensitivity to predictors of entire reponse matrix or a subset of it, identified by the character string group
. The equations for sensitivity are given here:
browseVignettes('gjam')
Returns a nsim
by predictor matrix of sensitivities to predictor variables, evaluated by draws from the posterior. Because sensitivities may be compared across groups represented by different numbers of species, PERSPECIES = TRUE
returns sensitivity on a per-species basis.
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs, 87, 34-56.
gjamSimData
simulates data
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
website 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: ## combinations of scales types <- c('DA','DA','OC','OC','OC','OC','CC','CC','CC','CC','CC','CA','CA','PA','PA') f <- gjamSimData(S = length(types), typeNames = types) ml <- list(ng = 50, burnin = 5, typeNames = f$typeNames) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) ynames <- colnames(f$y) group <- ynames[types == 'OC'] full <- gjamSensitivity(out) cc <- gjamSensitivity(out, group) nt <- ncol(full) ylim <- range(rbind(full, cc)) boxplot( full, boxwex = 0.25, at = 1:nt - .21, col='blue', log='y', ylim = ylim, xaxt = 'n', xlab = 'Predictors', ylab='Sensitivity') boxplot( cc, boxwex = 0.25, at = 1:nt + .2, col='forestgreen', add=T, xaxt = 'n') axis(1,at=1:nt,labels=colnames(full)) legend('bottomleft',c('full response','CC data'), text.col=c('blue','forestgreen')) ## End(Not run)
## Not run: ## combinations of scales types <- c('DA','DA','OC','OC','OC','OC','CC','CC','CC','CC','CC','CA','CA','PA','PA') f <- gjamSimData(S = length(types), typeNames = types) ml <- list(ng = 50, burnin = 5, typeNames = f$typeNames) out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml) ynames <- colnames(f$y) group <- ynames[types == 'OC'] full <- gjamSensitivity(out) cc <- gjamSensitivity(out, group) nt <- ncol(full) ylim <- range(rbind(full, cc)) boxplot( full, boxwex = 0.25, at = 1:nt - .21, col='blue', log='y', ylim = ylim, xaxt = 'n', xlab = 'Predictors', ylab='Sensitivity') boxplot( cc, boxwex = 0.25, at = 1:nt + .2, col='forestgreen', add=T, xaxt = 'n') axis(1,at=1:nt,labels=colnames(full)) legend('bottomleft',c('full response','CC data'), text.col=c('blue','forestgreen')) ## End(Not run)
Simulates data for analysis by gjam
.
gjamSimData(n = 1000, S = 10, Q = 5, x = NULL, nmiss = 0, typeNames, effort = NULL)
gjamSimData(n = 1000, S = 10, Q = 5, x = NULL, nmiss = 0, typeNames, effort = NULL)
n |
Sample size |
S |
Number of response variables (columns) in |
Q |
Number of predictors (columns) in design matrix |
x |
design |
nmiss |
Number of missing values to in |
typeNames |
Character vector of data types, see Details |
effort |
List containing ' |
Generates simulated data and parameters for analysis by gjam
. Because both parameters and data are stochastic, not all simulations will give good results.
typeNames
can be 'PA
' (presenceAbsence), 'CA
'
(continuous), 'DA
' (discrete), 'FC
' (fractional composition),
'CC
' (count composition), 'OC
' (ordinal counts), and 'CAT
' (categorical levels). If more than one 'CAT'
is included, each defines a multilevel categorical reponse.
One additional type, 'CON
' (continuous), is not censored at zero by default.
If defined as a single character
value typeNames
applies to all columns in y
. If not, typeNames
is length-S
character vector
, identifying each response by column in y
. If a column 'CAT'
is included, a random number of levels will be generated, a, b, c, ...
.
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
website 'http://sites.nicholas.duke.edu/clarklab/code/'.
formula |
R formula for model, e.g., |
xdata |
|
ydata |
|
y |
response as a |
w |
|
typeY |
vector of data types corresponding to columns in |
typeNames |
vector of data types corresponding to columns in |
trueValues |
list containing true parameter values |
effort |
see Arguments. |
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2016. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
## Not run: ## ordinal data, show true parameter values sim <- gjamSimData(S = 5, typeNames = 'OC') sim$ydata[1:5,] # example data sim$trueValues$cuts # simulated partition sim$trueValues$beta # coefficient matrix ## continuous data censored at zero, note latent w for obs y = 0 sim <- gjamSimData(n = 5, S = 5, typeNames = 'CA') sim$w sim$y ## continuous and discrete data types <- c(rep('DA',5), rep('CA',4)) sim <- gjamSimData(n = 10, S = length(types), Q = 4, typeNames = types) sim$typeNames sim$ydata ## composition count data sim <- gjamSimData(n = 10, S = 8, typeNames = 'CC') totalCount <- rowSums(sim$ydata) cbind(sim$ydata, totalCount) # data with sample effort ## multiple categorical responses - compare matrix y and data.frqme ydata types <- rep('CAT',2) sim <- gjamSimData(S = length(types), typeNames = types) head(sim$ydata) head(sim$y) ## discrete abundance, heterogeneous effort S <- 5 n <- 1000 ef <- list( columns = 1:S, values = round(runif(n,.5,5),1) ) sim <- gjamSimData(n, S, typeNames = 'DA', effort = ef) sim$effort$values[1:20] ## combinations of scales, partition only for 'OC' columns types <- c('OC','OC','OC','CC','CC','CC','CC','CC','CA','CA','PA','PA') sim <- gjamSimData(S = length(types), typeNames = types) sim$typeNames head(sim$ydata) sim$trueValues$cuts ## End(Not run)
## Not run: ## ordinal data, show true parameter values sim <- gjamSimData(S = 5, typeNames = 'OC') sim$ydata[1:5,] # example data sim$trueValues$cuts # simulated partition sim$trueValues$beta # coefficient matrix ## continuous data censored at zero, note latent w for obs y = 0 sim <- gjamSimData(n = 5, S = 5, typeNames = 'CA') sim$w sim$y ## continuous and discrete data types <- c(rep('DA',5), rep('CA',4)) sim <- gjamSimData(n = 10, S = length(types), Q = 4, typeNames = types) sim$typeNames sim$ydata ## composition count data sim <- gjamSimData(n = 10, S = 8, typeNames = 'CC') totalCount <- rowSums(sim$ydata) cbind(sim$ydata, totalCount) # data with sample effort ## multiple categorical responses - compare matrix y and data.frqme ydata types <- rep('CAT',2) sim <- gjamSimData(S = length(types), typeNames = types) head(sim$ydata) head(sim$y) ## discrete abundance, heterogeneous effort S <- 5 n <- 1000 ef <- list( columns = 1:S, values = round(runif(n,.5,5),1) ) sim <- gjamSimData(n, S, typeNames = 'DA', effort = ef) sim$effort$values[1:20] ## combinations of scales, partition only for 'OC' columns types <- c('OC','OC','OC','CC','CC','CC','CC','CC','CA','CA','PA','PA') sim <- gjamSimData(S = length(types), typeNames = types) sim$typeNames head(sim$ydata) sim$trueValues$cuts ## End(Not run)
Constructs community-weighted mean-mode (CWMM) trait matrix for analysis with gjam
for n
observations, S
species, P
traits, and M
total trait levels.
gjamSpec2Trait(pbys, sbyt, tTypes)
gjamSpec2Trait(pbys, sbyt, tTypes)
pbys |
|
sbyt |
|
tTypes |
|
Generates the objects needed for a trait response model (TRM). As inputs the sbyt
data.frame
has P
columns containing numeric values, ordinal scores, and categorical variables, identified by data type in tTypes
. Additional trait columns can appear in the n x M
output matrix plotByCWMM
, because each level of a category becomes a new 'FC'
column as a CWMM. Thus, M
can exceed P
, depending on the number of factors in sbyt
. The exception is for categorical traits with only two levels, which can be treated as (0, 1) censored 'CA'
data.
As output, the CWMM data types are given in traitTypes
.
The list censor = NULL
unless some data types are censored. In the example below there are two censored columns.
A detailed vignette on trait analysis is obtained with:
browseVignettes('gjam')
plotByCWM |
|
traitTypes |
|
specByTrait |
|
censor |
|
James S Clark, [email protected]
Clark, J.S. 2016. Why species tell us more about traits than traits tell us about species: Predictive models. Ecology 97, 1979-1993.
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/forestTraits.RData?raw=True") xdata <- forestTraits$xdata plotByTree <- gjamReZero(forestTraits$treesDeZero) # re-zero traitTypes <- forestTraits$traitTypes specByTrait <- forestTraits$specByTrait tmp <- gjamSpec2Trait(pbys = plotByTree, sbyt = specByTrait, tTypes = traitTypes) tTypes <- tmp$traitTypes traity <- tmp$plotByCWM censor <- tmp$censor ml <- list(ng=2000, burnin=500, typeNames = tTypes, censor = censor) out <- gjam(~ temp + stdage + deficit, xdata, ydata = traity, modelList = ml) gjamPlot( output = out ) ## End(Not run)
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/forestTraits.RData?raw=True") xdata <- forestTraits$xdata plotByTree <- gjamReZero(forestTraits$treesDeZero) # re-zero traitTypes <- forestTraits$traitTypes specByTrait <- forestTraits$specByTrait tmp <- gjamSpec2Trait(pbys = plotByTree, sbyt = specByTrait, tTypes = traitTypes) tTypes <- tmp$traitTypes traity <- tmp$plotByCWM censor <- tmp$censor ml <- list(ng=2000, burnin=500, typeNames = tTypes, censor = censor) out <- gjam(~ temp + stdage + deficit, xdata, ydata = traity, modelList = ml) gjamPlot( output = out ) ## End(Not run)
Returns a list
that includes a subset of columns in y
. Rare species can be aggregated into a single class.
gjamTrimY(y, minObs = 2, maxCols = NULL, OTHER = TRUE)
gjamTrimY(y, minObs = 2, maxCols = NULL, OTHER = TRUE)
y |
|
minObs |
minimum number of non-zero observations |
maxCols |
maximum number of response variables |
OTHER |
|
Data sets commonly have many responses that are mostly zeros, large numbers of rare species, even singletons. Response matrix y
can be trimmed to include only taxa having > minObs
non-zero observations or to <= maxCol
total columns. The option OTHER
is recommended for composition data ('CC', 'FC'), where the 'other'
column is taken as the reference class. If there are unidentified species they might be included in this class. [See gjamSimData
for typeName
codes].
Returns a list
containing three elements.
y |
trimmed version of |
colIndex |
length- |
nobs |
number of non-zero observations by column in |
James S Clark, [email protected]
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs 87, 34-56.
gjamSimData
simulates data
gjam
analyzes data
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
web site 'http://sites.nicholas.duke.edu/clarklab/code/'.
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/forestTraits.RData?raw=True") y <- gjamReZero(fungEnd$yDeZero) # re-zero data dim(y) y <- gjamTrimY(y, minObs = 200)$y # species in >= 200 observations dim(y) tail(colnames(y)) # last column is 'other' ## End(Not run)
## Not run: library(repmis) source_data("https://github.com/jimclarkatduke/gjam/blob/master/forestTraits.RData?raw=True") y <- gjamReZero(fungEnd$yDeZero) # re-zero data dim(y) y <- gjamTrimY(y, minObs = 200)$y # species in >= 200 observations dim(y) tail(colnames(y)) # last column is 'other' ## End(Not run)