“gmGeostats” is a package for multivariate geostatistics, focusing in the usage of data from multivariate restricted sampling spaces. Such data include positive data, compositional data, distributional data and the like. Most of the times, the geostatistical analysis of such data includes three steps:
The package is loaded, as usual with
library(gmGeostats)
#> Welcome to 'gmGeostats', a package for multivariate geostatistical analysis.
#> Note: use 'fit_lmc' instead of fit.lmc
and it fundamentally depends on packages “compositions”, “gstat” and
“sp”. Other dependencies are more instrumental and less fundamental.
NOTE: if you separately need “compositions” or “gstat”, load these
packages first, then load “gmGeostats”. This will ensure that the
overloaded functions work properly for all three packages.
Alternatively, use fully qualified names
(e.g. pkg::foo()
).
This vignette very briefly presents the steps to follow in several analysis and modelling routes, illustrated with the case of compositional data. No explanations, theory or discussion is included.
The data can be visually inspected with scatterplots, in raw
representation (plot()
, pairs()
), in ternary
diagrams (compositions::plot.acomp()
), and in scatterplots
of logratio transformed data (use the transformations
pwlr()
, alr()
, clr()
or
ilr()
from package “compositions”, then
pairs()
). Function pairs()
can be given panel
functions such as e.g. vp.lrdensityplot()
,
vp.kde2dplot()
or vp.lrboxplot()
resp. for
creating histograms of pairwise logratios, 2D kernel density maps on the
scatterplots or boxplots of the pairwise logratios. Package
“compositions” provides the class “acomp” to directly deal with the
right representation in the several methods.
Principal component analysis is also a strong help. For this, you
need an isometry (not just an isomorphism). Transformation
clr()
is the best for this, and is actually automatically
used when you do princomp(acomp(YOURDATA))
. “gmGeostats”
provides generalised diagonalisation methods to account for the spatial
dependence, see ?genDiag
for details.
Create your spatial objects by connecting the spatial coordinates to
the multivariate observations via the functions
sp:SpatialPointsDataFrame()
or better the “gmGeostats”
functions make.gmMultivariateGaussianSpatialModel()
for
multivariate data and
make.gmCompositionalGaussianSpatialModel()
for
compositional data. The functions
make.gm******SpatialModel()
produces objects of spatial
data container class “gmSpatialModel”, that are necessary for the rest
of the analysis and modelling.
Swath plots are available with command swath()
. If you
give it an “acomp” object you will obtain a matrix of logratio swath
plots. Otherwise you will get an set of swath plots, one for each
variable. Function pairsmap()
works similarly, but produces
bubble maps (you can control size and color of the symbols).
Empirical variograms can be obtained with function
variogram()
out of the spatial data container. You can also
use the function logratioVariogram()
for compositional
data. Both accept anisotropy. Their output can be plotted with
plot()
, which has specific methods for compositional and
non-compositional data. In the case of anisotropic data, you can also
use a method of image()
to visualise the variogram maps,
see ?image.logratioVariogramAnisotropy
for details.
Finally you can also check for the strength of the spatial dependence
with the test noSpatCorr.test()
. This is a permutations
test, which null hypothesis is that the data do not exhibit spatial
autocorrelation.
Modelling the empirical variograms obtained in the last step can be
done with the function fit_lmc()
. This requires specifying
a variogram model, which parameters will be fitted by that function.
Variogram specifications are available with any of the following
functions: gstat::vgm()
and
gmGeostats::gmCgram()
for multivariate data,
compositions::CompLinModCoReg()
and
gmGeostats::LMCAnisCompo()
for compositional data.
CompLinModCoReg()
is the only one not accepting anisotropy.
You can mix and merge empirical and theoretical models from different
packages, as fit_lmc()
will take care to convert between
them for appropriate consistency.
Plotting of LMCs against their empirical variograms can be done with
function variogramModelPlot()
.
Neighbourhood descriptions are created with function
KrigingNeighbourhood()
. Kriging neighbourhoods and LMC
variogram models and can be attached to the “gmSpatialModel” objects at
the moment of creation via make.gm*()
functions, using
arguments ng
and model
(this last one strictly
requiring you to also specify the formula
argument).
Validation of the model or of the neighbourhood can then be obtained
with the function validate()
. This requires an
object
(the complete “gmSpatialModel”) and a
strategy
. Validation strategies are small S3-objects
describing what will exactly be done in the validation. They can be
quickly defined by means of configuration functions as
LeaveOneOut()
or NfoldCrossValidation()
. The
call to validate()
will provide some output that can be
evaluated with functions such as xvErrorMeasures()
,
accuracy()
or prediction()
.
This way of working is common to the package. You always build a
model (with a make.*()
function), define a method parameter
object (created with a specific, verbose, helper function), and feed
both to a common umbrella function describing what do you want to do:
validate()
or predict()
, the second one also
requires an argument newdata
as is standard in R. The
output can then be postprocessed by specific functions.
The method parameter for cokriging is actually the neighbourhood. So,
you can give predict()
an object resulting from
KrigingNeighbourhood()
, otherwise it will take the standard
one stored in the “gmSpatialModel”, or produce a global neighbourhood
cokriging if no neighbourhood description is found.
Output of predict
for “gstat” objects (the current
default) can be re-formed to compositional shape by means of the
function gsi.gstatCokriging2compo()
; there is an
gsi.gstatCokriging2rmult()
as well for multivariate data.
Maps can be obtained with function image_cokriged()
, that
produces a choropleth map with legend, and returns the color scale
(i.e. some breaks and a palette of colors) to be used, e.g. for plotting
the initial data on top of the maps using the same color scale. NOTE:
this funtion does NOT freeze the plot! Most probably you will need to
call par(mfrow=c(1,1))
to create a clean slate device for
the next plot.
Gaussian cosimulation requires joint multivariate normality. The
package provides the flow anamorphosis algorithm for this goal. This is
obtained in two steps. First, you create the transformation by calling
function ana()
with your data and storing it. Then, you
apply the stored transformation to the data, and obtain the normalised
scores. These scores can then be treated with all methods and techniques
of the preceding sections “Exploratory analysis” and “Linear model of
coregionalisation (LMC)”, in particular a call to
make.gmMultivariateGaussianSpatialModel()
will create the
passing “gmSpatialModel”.
Cosimulation method parameters are created by calls to one of the
functions SequentialSimulation()
,
TurningBands()
or CholeskyDecomposition()
. All
these functions include an argument nsim
controlling the
number of realisations desired. These are then obtained by a call to
predict()
, giving it the “gmSpatialModel”, the
newdata
and the method parameters, in this order.
Multivariate cosimulation output can be seen analogue to a
3-dimensional array, with one dimension running along the simulated
locations, one dimension along the realisations and one dimension along
the variables. This structure is captured in “gmGeostats” with an object
of class “DataFrameStack” (extending “data.frame” and mimicking arrays).
Point-wise, simulation-wise and variable-wise transformations on this
array can be computed with function gmApply()
, a wrapper on
base::apply()
allowing for an easier management of the
dimensions of the simulation stack. Maps can also be produced by
function image_cokriged()
.
Multipoint cosimulation is available with the same strategy than
Gaussian based simulation. One needs first to define a “gmSpatialModel”
containing the conditioning data (the original data) and the stochastic
model (the training image). Second, a simulation grid must be created,
and provided to predict()
as the newdata
argument. And third, we must provide some method parameters defining the
simulation algorithm to use: currently, only direct sampling is
available, and its parameters can be constructed calling
DSpars()
.
Training images are currently objects of class “SpatialPixelsDataFrame” or “SpatialGridDataFrame”, from package “sp”. The conditioning data will be migrated to the simulation grid internally; the grid topologies for simulation and training image will be checked for consistency.